ubik: Log urecovery_CheckTid-aborted txes 62/13862/3
authorAndrew Deason <adeason@sinenomine.net>
Wed, 11 Sep 2019 21:42:47 +0000 (16:42 -0500)
committerBenjamin Kaduk <kaduk@mit.edu>
Fri, 13 Mar 2020 15:36:55 +0000 (11:36 -0400)
Log when urecovery_CheckTid aborts/ends a running remote transaction.
This is usually a rare event, occurring when some ubik sites get
"stuck" or confused about the state of the quorum. Logging some
details when this happens can be useful when investigating issues
post-mortem, or just to see why a transaction failed.

Change-Id: If0a7cd134aaac3722fe7214a1d8f0efab550ad11
Reviewed-on: https://gerrit.openafs.org/13862
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Marcio Brito Barbosa <mbarbosa@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

src/ubik/recovery.c

index 5e42b54..99b9fd8 100644 (file)
@@ -159,8 +159,20 @@ urecovery_CheckTid(struct ubik_tid *atid, int abortalways)
        if (atid->epoch != ubik_currentTrans->tid.epoch
            || atid->counter > ubik_currentTrans->tid.counter || abortalways) {
            /* don't match, abort it */
+           int endit = 0;
            /* If the thread is not waiting for lock - ok to end it */
            if (ubik_currentTrans->locktype != LOCKWAIT) {
+               endit = 1;
+           }
+
+           ViceLog(0, ("urecovery_CheckTid: Aborting/ending bad remote "
+                       "transaction. (tx %d.%d, atid %d.%d, abortalways %d, "
+                       "endit %d)\n",
+                       ubik_currentTrans->tid.epoch,
+                       ubik_currentTrans->tid.counter,
+                       atid->epoch, atid->counter,
+                       abortalways, endit));
+           if (endit) {
                udisk_end(ubik_currentTrans);
            }
            ubik_currentTrans = (struct ubik_trans *)0;