Commit
bc56f5cc97a982ee29219e6f258b372dbfe1a020 ("volser: Delete
timed-out temporary volumes") introduced new logic to GCTrans().
Unfortunately, part of this logic temporarily drops VTRANS_LOCK in order
to call VPurgeVolume(). While this lock is dropped, other volser_trans
may be added or deleted from the allTrans list. Therefore, GCTrans
should not trust the next pointer (nt = tt->next) which was obtained
before the lock was dropped.
One symptom observed in the field was a segfault while examining
tt->volume. Neither tt nor volume were valid any longer, since tt had
been set from a stale nt at the top of the loop.
To repair, improve, and clarify this logic:
- Refactor so nt is assigned correctly and as late as possible.
- Add comments to explain the placement of the assigns to future
maintainers.
Change-Id: Ibd3a504bddd3622730aa349576341e20f2f27836
Reviewed-on: https://gerrit.openafs.org/13286
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
VTRANS_LOCK;
for (tt = allTrans; tt; tt = nt) {
- nt = tt->next; /* remember in case we zap it */
if (tt->time + OLDTRANSWARN < now) {
Log("trans %u on volume %" AFS_VOLID_FMT " %s than %d seconds\n", tt->tid,
afs_printable_VolumeId_lu(tt->volid),
VTRANS_UNLOCK;
VPurgeVolume(&error, tt->volume);
+ /*
+ * While the lock was dropped, tt->next may have changed.
+ * Therefore, defer reading tt->next until _after_ we regain the lock.
+ */
VTRANS_LOCK;
}
-
- DeleteTrans(tt, 0); /* drops refCount or deletes it */
+ nt = tt->next;
+ /*
+ * DeleteTrans() will decrement tt->refCount; if it falls to 0, it will
+ * also delete tt itself. Therefore, we must read tt->next _before_
+ * calling DeleteTrans().
+ */
+ DeleteTrans(tt, 0);
GCDeletes++;
+ continue;
}
+ /*
+ * This path never dropped VTRANS_LOCK or modified the allTrans list.
+ * Therefore, no special care is required to determine the next trans in the chain.
+ */
+ nt = tt->next;
}
VTRANS_UNLOCK;
return 0;