From: Andrew Deason Date: Fri, 26 Jul 2019 20:28:44 +0000 (-0500) Subject: afs: Let afs_ShakeLooseVCaches run longer X-Git-Tag: openafs-devel-1_9_0~36 X-Git-Url: http://git.openafs.org/?p=openafs.git;a=commitdiff_plain;h=cd65475e95e25c8e7071e099a682bdcc03d2cce1 afs: Let afs_ShakeLooseVCaches run longer Currently, when afs_ShakeLooseVCaches runs osi_TryEvictVCache, we check if osi_TryEvictVCache slept (i.e. dropped afs_xvcache/GLOCK). If we sleep over 100 times, then we stop trying to evict vcaches and return. If we have recently accessed a lot of AFS files, this limitation can severely reduce our ability to keep our number of vcaches limited to a reasonable size. For example: Say a Linux client runs a process that quickly accesses 1 million files (a simple 'find' command) and then does nothing else. A few minutes later, afs_ShakeLooseVCaches is run, but since all of the newly accessed vcaches have dentries attached to them, we will sleep on each one in order to try to prune the attached dentries. This means that afs_ShakeLooseVCaches will evict 100 vcaches, and then return, leaving us with still almost 1 million vcaches. This will happen repeatedly until afs_ShakeLooseVCaches finally works its way through all of the vcaches (which takes quite a while, if we only clear 100 at once), or the dentries get pruned by other means (such as, if Linux evicts them due to memory pressure). The limit of 100 sleeps was originally added in commit 29277d96 (newvcache-dont-spin-20060128), but the current effect of it was largely introduced in commit 9be76c0d (Refactor afs_NewVCache). It exists to ensure that afs_ShakeLooseVCaches doesn't take forever to run, but the limit of 100 sleeps may seem quite low, especially if those 100 sleeps run very quickly. To avoid the situation described above, instead of limiting afs_ShakeLooseVCaches based on a fixed number of sleeps, limit it based on how long we've been running, and set an arbitrary limit of roughly 3 seconds. Only check how long we've been running after 100 sleeps like before, so we're not constantly checking the time while running. Log a new warning if we exit afs_ShakeLooseVCaches prematurely if we've been running for too long, to help indicate what is going on. Change-Id: I65729ace748e8507cc0d5c26dec39e74d7bff5d2 Reviewed-on: https://gerrit.openafs.org/14254 Reviewed-by: Cheyenne Wills Tested-by: BuildBot Reviewed-by: Benjamin Kaduk --- diff --git a/src/afs/afs_vcache.c b/src/afs/afs_vcache.c index 160b011..407e5a6 100644 --- a/src/afs/afs_vcache.c +++ b/src/afs/afs_vcache.c @@ -799,12 +799,16 @@ afs_VCacheStressed(void) int afs_ShakeLooseVCaches(afs_int32 anumber) { + /* Try not to run for more than about 3 seconds */ + static const int DEADLINE = 3; + afs_int32 i, loop; int evicted; struct vcache *tvc; struct afs_q *tq, *uq; int fv_slept, defersleep = 0; int limit; + afs_uint32 start = osi_Time(); loop = 0; @@ -833,8 +837,33 @@ afs_ShakeLooseVCaches(afs_int32 anumber) } if (fv_slept) { - if (loop++ > 100) - break; + if (loop++ > 100) { + afs_uint32 now = osi_Time(); + loop = 0; + if (now < start) { + start = now; + } + if (now - start >= DEADLINE) { + static afs_uint32 last_warned; + /* Warn about this at most every VCACHE_STRESS_LOGINTERVAL secs */ + if (now < last_warned || + now - last_warned > VCACHE_STRESS_LOGINTERVAL) { + last_warned = now; + afs_warn("afs: Warning: it took us a long time (around " + "%d seconds) to try to trim our stat cache " + "down to a reasonable size. This may indicate " + "someone is accessing an excessive number of " + "files, or something is wrong with the AFS " + "cache.\n", + now - start); + afs_warn("afs: Consider raising the afsd -stat parameter " + "(current setting: %d, current vcount: %d), or " + "figure out what is accessing so many files.\n", + afs_cacheStats, afs_vcount); + } + break; + } + } if (!evicted) { /* * This vcache was busy and we slept while trying to evict it.