From: Marcio Barbosa Date: Mon, 21 Aug 2017 18:21:54 +0000 (-0400) Subject: ubik: avoid DISK_Begin on sites that didn't vote for sync X-Git-Tag: openafs-devel-1_9_0~672 X-Git-Url: https://git.openafs.org/?p=openafs.git;a=commitdiff_plain;h=68ec78950a6e39dc1bf15012d4b889728086d0b7 ubik: avoid DISK_Begin on sites that didn't vote for sync As already described on 7c708506, SDISK_Begin fails on remotes if lastYesState is not set. To fix this problem, 7c708506 does not allow write transactions until we know that lastYesState is set on at least quorum (ubik_syncSiteAdvertised == 1). In other words, if enough sites received a beacon packet informing that a sync-site was elected, write transactions will be allowed. This means that ubik_syncSiteAdvertised can be true while lastYesState is not set in a few sites. Consider the following scenario in a cell with frequent write transactions: Site A => Sync-site (up) Site B => Remote 1 (up) Site C => Remote 2 (down - unreachable) Since A and B are up, we have quorum. After the second wave of beacons, ubik_syncSiteAdvertised will be true and write transactions will be allowed. At some point, C is not unreachable anymore. Site A sends a copy of its database to C, but C did not vote for A yet (lastYesState == 0). A new write transaction is initialized and, since lastYesState is not set on C, DISK_Begin fails on this remote site and C is marked as down. Since C is reachable, A will mark this remote site as up. The sync-site will send its database to C, but C did not vote for A yet. A new write transaction is initialized and, since lastYesState is not set on C, DISK_Begin fails on this remote site and C is marked as down. In a cell with frequent write transactions, this cycle will repeat forever. As a result, the sync-site will be constantly sending its database to C and quorum will be operating with less sites, increasing the chances of re-elections. To fix this problem, do not call DISK_Begin on remotes that did not vote for the sync-site yet. Change-Id: I27f5122a089064e7b83beba3533261d8a4e31c64 Reviewed-on: https://gerrit.openafs.org/12715 Tested-by: BuildBot Reviewed-by: Mark Vitale Reviewed-by: Benjamin Kaduk --- diff --git a/src/ubik/ubik.c b/src/ubik/ubik.c index 7eda47a..bc04f99 100644 --- a/src/ubik/ubik.c +++ b/src/ubik/ubik.c @@ -108,6 +108,7 @@ static void *securityRock = NULL; struct version_data version_globals; #define CStampVersion 1 /* meaning set ts->version */ +#define CCheckSyncAdvertised 2 /* check if the remote knows we are the sync-site */ static_inline struct rx_connection * Quorum_StartIO(struct ubik_trans *atrans, struct ubik_server *as) @@ -181,7 +182,10 @@ ContactQuorum_iterate(struct ubik_trans *atrans, int aflags, struct ubik_server if (!(*ts)) return 1; UBIK_BEACON_LOCK; - if (!(*ts)->up || !(*ts)->currentDB) { + if (!(*ts)->up || !(*ts)->currentDB || + /* do not call DISK_Begin until we know that lastYesState is set on the + * remote in question; otherwise, DISK_Begin will fail. */ + ((aflags & CCheckSyncAdvertised) && !((*ts)->beaconSinceDown && (*ts)->lastVote))) { UBIK_BEACON_UNLOCK; (*ts)->currentDB = 0; /* db is no longer current; we just missed an update */ return 0; /* not up-to-date, don't bother. NULL conn will tell caller not to use */ @@ -678,7 +682,7 @@ BeginTrans(struct ubik_dbase *dbase, afs_int32 transMode, if (transMode == UBIK_WRITETRANS) { /* next try to start transaction on appropriate number of machines */ - code = ContactQuorum_NoArguments(DISK_Begin, tt, 0); + code = ContactQuorum_NoArguments(DISK_Begin, tt, CCheckSyncAdvertised); if (code) { /* we must abort the operation */ udisk_abort(tt);