From: Mark Vitale Date: Tue, 17 Sep 2019 19:14:44 +0000 (-0400) Subject: viced: consistently enforce host thread quota for ICBS(3) X-Git-Tag: openafs-devel-1_9_0~207 X-Git-Url: https://git.openafs.org/?p=openafs.git;a=commitdiff_plain;h=aefc4c4f46e13f59b4cbe043e1a2a6f4ed99e076 viced: consistently enforce host thread quota for ICBS(3) From time to time, the fileserver may issue potentially long-running RXAFSCB_* RPCs back to a host (client). If these are holding h_Lock_r (host->lock) while running, they may cause other service threads for the same host (client) to block. In order to prevent a given host from tying up too many service threads in this way, the fileserver enforces a quota limiting how many threads can be waiting for h_Lock_r on a particular host while waiting for one of the following RPCs to complete: - RXAFSCB_TellMeABoutYourself (TMAY) - RXAFSCB_WhoAreYou - RXAFSCB_ProbeUuid - RXAFSCB_InitCallBackState (ICBS) - RXAFSCB_InitCallBackState3 (ICBS3) Note: Although some of these RPCs are relatively lightweight, they may still experience network delays. This quota is enforced by calling h_threadquota() in h_Lookup_r and h_GetHost_r. The quota check is enabled for a given host by turning on host->hostFlags HWHO_INPROGRESS for the duration of the RXAFSCB_* RPC. The quota check is only needed, and should only be enabled, when the RPC is issued while h_Lock_r is held. However, there are a few paths to ICBS(3) where h_Lock_r is held but HWHO_INPROGRESS is not set. A delay in those paths may allow a host to consume an unlimited number of fileserver threads. One such path observed in a field report was SRXAFS_FetchStatus -> CallPreamble -> BreakDelayedCallBacks_r -> RXAFSCB_ICBS3. Instead, enable host thread quotas for all remaining unregulated ICBS(3) RPCs. Change-Id: I70b96055ff80d8650bdbaec0302b7d18a8f22d56 Reviewed-on: https://gerrit.openafs.org/13873 Reviewed-by: Benjamin Kaduk Tested-by: BuildBot --- diff --git a/src/viced/callback.c b/src/viced/callback.c index f685398..980d7e5 100644 --- a/src/viced/callback.c +++ b/src/viced/callback.c @@ -1035,6 +1035,7 @@ BreakDelayedCallBacks_r(struct host *host) cbstuff.nbreakers++; if (!(host->z.hostFlags & RESETDONE) && !(host->z.hostFlags & HOSTDELETED)) { host->z.hostFlags &= ~ALTADDR; /* alternate addresses are invalid */ + host->z.hostFlags |= HWHO_INPROGRESS; /* ICBS(3) invokes thread quota */ cb_conn = host->z.callback_rxcon; rx_GetConnection(cb_conn); if (host->z.interface) { @@ -1049,6 +1050,7 @@ BreakDelayedCallBacks_r(struct host *host) cb_conn = NULL; H_LOCK; host->z.hostFlags |= ALTADDR; /* alternate addresses are valid */ + host->z.hostFlags &= ~HWHO_INPROGRESS; if (code) { if (ShowProblems) { ViceLog(0, @@ -1642,6 +1644,7 @@ ClearHostCallbacks_r(struct host *hp, int locked) } else if (!(hp->z.hostFlags & HOSTDELETED)) { /* host is up, try a call */ hp->z.hostFlags &= ~ALTADDR; /* alternate addresses are invalid */ + hp->z.hostFlags |= HWHO_INPROGRESS; /* enable host thread quota enforcement */ cb_conn = hp->z.callback_rxcon; rx_GetConnection(hp->z.callback_rxcon); if (hp->z.interface) { @@ -1656,6 +1659,7 @@ ClearHostCallbacks_r(struct host *hp, int locked) cb_conn = NULL; H_LOCK; hp->z.hostFlags |= ALTADDR; /* alternate addresses are valid */ + hp->z.hostFlags &= ~HWHO_INPROGRESS; if (code) { /* failed, mark host down and need reset */ hp->z.hostFlags |= VENUSDOWN;