ubik: Don't leak UBIK_VERSION_LOCK if setlabel fails
authorSimon Wilkinson <sxw@your-file-system.com>
Mon, 25 Aug 2014 15:15:26 +0000 (16:15 +0100)
committerJeffrey Altman <jaltman@your-file-system.com>
Thu, 4 Sep 2014 12:52:01 +0000 (08:52 -0400)
If a call to the setlabel() physical IO function fails, don't
leak the UBIK_VERSION_LOCK.

This is the possible cause of a vlserver deadlock, which had
approximately 4800 threads blocked. Analysis of backtrace of all
of these threads showed that all blocked threads were waiting in
ubik.c:555 (blocked on DBHOLD) with the exception of:

One in beacon.c:388 (blocked on UBIK_VERSION_LOCK)
One in recovery.c:503 (blocked on DBHOLD)
One in ubik.c:125 (blocked on DBHOLD)
One in ubik.c:585 (blocked on UBIK_VERSION_LOCK)

The last of these is the critical one, because it already holds
the lock that DBHOLD waits on - so despite the vast majority of
threads being blocked in DBHOLD, it's actually UBIK_VERSION_LOCK
that we're waiting on.

There is no sign of a thread which is still active which currently
holds UBIK_VERSION_LOCK.

Change-Id: Ie6093409e9375d50fa69733908b5ce99586e1b1d
Reviewed-on: http://gerrit.openafs.org/11426
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Perry Ruiter <pruiter@sinenomine.net>
Reviewed-by: Chas Williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Reviewed-by: Jeffrey Altman <jaltman@your-file-system.com>

src/ubik/disk.c

index acf0077..b0da8dc 100644 (file)
@@ -881,8 +881,11 @@ udisk_commit(struct ubik_trans *atrans)
            newversion.counter = 1;
 
            code = (*dbase->setlabel) (dbase, 0, &newversion);
-           if (code)
-               return (code);
+           if (code) {
+               UBIK_VERSION_UNLOCK;
+               return code;
+           }
+
            version_globals.ubik_epochTime = newversion.epoch;
            dbase->version = newversion;
            UBIK_VERSION_UNLOCK;