From 304d758983b499dc568d6ca57b6e92df24b69de8 Mon Sep 17 00:00:00 2001 From: Benjamin Kaduk Date: Sat, 7 Oct 2017 22:42:38 -0500 Subject: [PATCH 1/1] Standardize rx_event usage Go over all consumers of the rx event framework and normalize its usage according to the following principles: rxevent_Post() is used to create an event, and it returns an event handle (with a reference on the event structure) that can be used to cancel the event before its timeout fires. (There is also an additional reference on the event held by the global event tree.) In all(*) usage within the tree, that event handle is stored within either an rx_connection or an rx_call. Reads/writes to the member variable that holds the event handle require either the conn_data_lock or call lock, respectively -- that means that in most cases, callers of rxevent_Post() and rxevent_Cancel() will be holding one of those aforementioned locks. The event handlers themselves will need to modify the call/connection object according to the nature of the event, which requires holding those same locks, and also a guarantee that the call/connection is still a live object and has not been deallocated! Whether or not rxevent_Cancel() succeeds in cancelling the event before it fires, whenever passed a non-NULL event structure it will NULL out the supplied pointer and drop a reference on the event structure. This is the correct behavior, since the caller has asked to cancel the event and has no further use for the event handle or its reference on the event structure. The caller of rxevent_Cancel() must check its return value to know whether or not the event was cancelled before its handler was able to run. The interaction window between the call/connection lock and the lock protecting the red/black tree of pending events opens up a somewhat problematic race window. Because the application thread is expected to hold the call/connection lock around rxevent_Cancel() (to protect the write to the field in the call/connection structure that holds an event handle), and rxevent_Cancel() must take the lock protecting the red/black tree of events, this establishes a lock order with the call/connection lock taken before the eventTree lock. This is in conflict with the event handler thread, which must take the eventTree lock first, in order to select an event to run (and thus know what additional lock would need to be taken, by virtue of what handler function is to be run). The conflict is easy to resolve in the standard way, by having a local pointer to the event that is obtained while the event is removed from the red/black tree under the eventTree lock, and then the eventTree lock can be dropped and the event run based on the local variable referring to it. The race window occurs when the caller of rxevent_Cancel() holds the call/connection lock, and rxevent_Cancel() obtains the eventTree lock just after the event handler thread drops it in order to run the event. The event handler function begins to execute, and immediately blocks trying to obtain the call/connection lock. Now that rxevent_Cancel() has the eventTree lock it can proceed to search the tree, fail to find the indicated event in the tree, clear out the event pointer from the call/connection data structure, drop its caller's reference to the event structure, and return failure (the event was not cancelled). Only then does the caller of rxevent_Cancel() drop the call/connection lock and allow the event handler to make progress. This race is not necessarily problematic if appropriate care is taken, but in the previous code such was not the case. In particular, it is a common idiom for the firing event to call rxevent_Put() on itself, to release the handle stored in the call/connection that could have been used to cancel the event before it fired. Failing to do so would result in a memory leak of event structures; however, rxevent_Put() does not check for a NULL argument, so a segfault (NULL dereference) was observed in the test suite when the race occurred and the event handler tried to rxevent_Put() the reference that had already been released by the unsuccessful rxevent_Cancel() call. Upon inspection, many (but not all) of the uses in rx.c were susceptible to a similar race condition and crash. The test suite also papers over a related issue in that the event handler in the test suite always knows that the data structure containing the event handle will remain live, since it is a global array that is allocated for the entire scope of the test. In rx.c, events are associated with calls and connections that have a finite lifetime, so we need to take care to ensure that the call/connection pointer stored in the event remains valid for the duration of the event's lifecycle. In particular, even an attempt to take the call/connection lock to check whether the corresponding event field is NULL is fraught with risk, as it could crash if the lock (and containing call/connection) has already been destroyed! There are several potential ways to ensure the liveness of the associated call/connection while the event handler runs, most notably to take care in the call/connection destruction path to ensure that all associated events are either successfully cancelled or run to completion before tearing down the call/connection structure, and to give the pending event its own reference on the associated call/connection. Here, we opt for the latter, acknowledging that this may result in the event handler thread doing the full call/connection teardown and delay the firing of subsequent events. This is deemed acceptable, as pending events are for intentionally delayed tasks, and some extra delay is probably acceptable. (The various keepalive events and the challenge event could delay the user experience and/or security properties if significantly delayed, but I do not believe that this change admits completely unbounded delay in the event handler thread, so the practical risk seems minimal.) Accordingly, this commit attempts to ensure that: * Each event holds a formal reference on its associated call/connection. * The appropriate lock is held for all accesses to event pointers in call/connection structures. * Each event handler (after taking the appropriate lock) checks whether it raced with rxevent_Cancel() and only drops the call/connection's reference to the event if the race did not occur. * Each event handler drops its reference to the associated call/connection *after* doing any actions that might access/modify the call/connection. * The per-event reference on the associated call/connection is dropped by the thread that removes the event from the red/black tree. That is, the event handler function if the event runs, or by the caller of rxevent_Cancel() when the cancellation succeed. * No non-NULL event handles remain in a call/connection being destroyed, which would indicate a refcounting error. (*) There is an additional event used in practice, to reap old connections, but it is effectively a background task that reschedules itself periodically, with no handle to the event retained so as to be able to cancel it. As such, it is unaffected by the concerns raised here. While here, standardize on the rx_GetConnection() function for incrementing the reference count on a connection object, instead of inlining the corresponding mutex lock/unlock and variable access. Also enable refcount checking unconditionally on unix, as this is a rather invasive change late in the 1.8.0 release process and we want to get as much sanity checking coverage as possible. Change-Id: I27bcb932ec200ff20364fb1b83ea811221f9871c Reviewed-on: https://gerrit.openafs.org/12756 Reviewed-by: Mark Vitale Reviewed-by: Michael Meffie Tested-by: BuildBot Reviewed-by: Benjamin Kaduk --- src/rx/Makefile.in | 2 +- src/rx/rx.c | 221 +++++++++++++++++++++++++++++----------------------- src/rx/rx_globals.h | 4 - tests/rx/event-t.c | 30 +++++-- 4 files changed, 150 insertions(+), 107 deletions(-) diff --git a/src/rx/Makefile.in b/src/rx/Makefile.in index abb54d4..4c750e0 100644 --- a/src/rx/Makefile.in +++ b/src/rx/Makefile.in @@ -10,7 +10,7 @@ include @TOP_OBJDIR@/src/config/Makefile.config include @TOP_OBJDIR@/src/config/Makefile.lwp include @TOP_OBJDIR@/src/config/Makefile.lwptool -MODULE_CFLAGS=$(RXDEBUG) +MODULE_CFLAGS=$(RXDEBUG) -DRX_REFCOUNT_CHECK LT_objs = xdr.lo xdr_array.lo xdr_rx.lo xdr_mem.lo xdr_len.lo xdr_afsuuid.lo \ xdr_int32.lo xdr_int64.lo xdr_update.lo xdr_refernce.lo \ diff --git a/src/rx/rx.c b/src/rx/rx.c index 97faa43..7043986 100644 --- a/src/rx/rx.c +++ b/src/rx/rx.c @@ -690,10 +690,8 @@ rxi_rto_startTimer(struct rx_call *call, int lastPacket, int istack) static_inline void rxi_rto_cancel(struct rx_call *call) { - if (call->resendEvent != NULL) { - rxevent_Cancel(&call->resendEvent); + if (rxevent_Cancel(&call->resendEvent)) CALL_RELE(call, RX_CALL_REFCOUNT_RESEND); - } } /*! @@ -781,15 +779,15 @@ rxi_PostDelayedAckEvent(struct rx_call *call, struct clock *offset) when = now; clock_Add(&when, offset); - if (call->delayedAckEvent && clock_Gt(&call->delayedAckTime, &when)) { - /* The event we're cancelling already has a reference, so we don't - * need a new one */ - rxevent_Cancel(&call->delayedAckEvent); + if (clock_Gt(&call->delayedAckTime, &when) && + rxevent_Cancel(&call->delayedAckEvent)) { + /* We successfully cancelled an event too far in the future to install + * our new one; we can reuse the reference on the call. */ call->delayedAckEvent = rxevent_Post(&when, &now, rxi_SendDelayedAck, call, NULL, 0); call->delayedAckTime = when; - } else if (!call->delayedAckEvent) { + } else if (call->delayedAckEvent == NULL) { CALL_HOLD(call, RX_CALL_REFCOUNT_DELAY); call->delayedAckEvent = rxevent_Post(&when, &now, rxi_SendDelayedAck, @@ -801,10 +799,9 @@ rxi_PostDelayedAckEvent(struct rx_call *call, struct clock *offset) void rxi_CancelDelayedAckEvent(struct rx_call *call) { - if (call->delayedAckEvent) { - rxevent_Cancel(&call->delayedAckEvent); + /* Only drop the ref if we cancelled it before it could run. */ + if (rxevent_Cancel(&call->delayedAckEvent)) CALL_RELE(call, RX_CALL_REFCOUNT_DELAY); - } } /* called with unincremented nRequestsRunning to see if it is OK to start @@ -1212,7 +1209,6 @@ rxi_DestroyConnectionNoLock(struct rx_connection *conn) { struct rx_connection **conn_ptr; int havecalls = 0; - struct rx_packet *packet; int i; SPLVAR; @@ -1224,6 +1220,9 @@ rxi_DestroyConnectionNoLock(struct rx_connection *conn) if (conn->refCount > 0) conn->refCount--; else { +#ifdef RX_REFCOUNT_CHECK + osi_Assert(conn->refCount == 0); +#endif if (rx_stats_active) { MUTEX_ENTER(&rx_stats_mutex); rxi_lowConnRefCount++; @@ -1299,21 +1298,6 @@ rxi_DestroyConnectionNoLock(struct rx_connection *conn) return; } - if (conn->natKeepAliveEvent) { - rxi_NatKeepAliveOff(conn); - } - - if (conn->delayedAbortEvent) { - rxevent_Cancel(&conn->delayedAbortEvent); - packet = rxi_AllocPacket(RX_PACKET_CLASS_SPECIAL); - if (packet) { - MUTEX_ENTER(&conn->conn_data_lock); - rxi_SendConnectionAbort(conn, packet, 0, 1); - MUTEX_EXIT(&conn->conn_data_lock); - rxi_FreePacket(packet); - } - } - /* Remove from connection hash table before proceeding */ conn_ptr = &rx_connHashTable[CONN_HASH @@ -1331,10 +1315,13 @@ rxi_DestroyConnectionNoLock(struct rx_connection *conn) rxLastConn = 0; /* Make sure the connection is completely reset before deleting it. */ - /* get rid of pending events that could zap us later */ - rxevent_Cancel(&conn->challengeEvent); - rxevent_Cancel(&conn->checkReachEvent); - rxevent_Cancel(&conn->natKeepAliveEvent); + /* + * Pending events hold a refcount, so we can't get here if they are + * non-NULL. */ + osi_Assert(conn->challengeEvent == NULL); + osi_Assert(conn->delayedAbortEvent == NULL); + osi_Assert(conn->natKeepAliveEvent == NULL); + osi_Assert(conn->checkReachEvent == NULL); /* Add the connection to the list of destroyed connections that * need to be cleaned up. This is necessary to avoid deadlocks @@ -3658,6 +3645,14 @@ rxi_ConnClearAttachWait(struct rx_connection *conn) } } +/* + * Event handler function for connection-specific events for checking + * reachability. Also called directly from main code with |event| == NULL + * in order to trigger the initial reachability check. + * + * When |event| == NULL, must be called with the connection data lock held, + * but returns with the lock unlocked. + */ static void rxi_CheckReachEvent(struct rxevent *event, void *arg1, void *arg2, int dummy) { @@ -3667,15 +3662,12 @@ rxi_CheckReachEvent(struct rxevent *event, void *arg1, void *arg2, int dummy) struct clock when, now; int i, waiting; - MUTEX_ENTER(&conn->conn_data_lock); + if (event != NULL) + MUTEX_ENTER(&conn->conn_data_lock); - if (event) + if (event != NULL && event == conn->checkReachEvent) rxevent_Put(&conn->checkReachEvent); - waiting = conn->flags & RX_CONN_ATTACHWAIT; - if (event) { - putConnection(conn); - } MUTEX_EXIT(&conn->conn_data_lock); if (waiting) { @@ -3707,9 +3699,7 @@ rxi_CheckReachEvent(struct rxevent *event, void *arg1, void *arg2, int dummy) when.sec += RX_CHECKREACH_TIMEOUT; MUTEX_ENTER(&conn->conn_data_lock); if (!conn->checkReachEvent) { - MUTEX_ENTER(&rx_refcnt_mutex); - conn->refCount++; - MUTEX_EXIT(&rx_refcnt_mutex); + rx_GetConnection(conn); conn->checkReachEvent = rxevent_Post(&when, &now, rxi_CheckReachEvent, conn, NULL, 0); @@ -3717,6 +3707,9 @@ rxi_CheckReachEvent(struct rxevent *event, void *arg1, void *arg2, int dummy) MUTEX_EXIT(&conn->conn_data_lock); } } + /* If fired as an event handler, drop our refcount on the connection. */ + if (event != NULL) + putConnection(conn); } static int @@ -3742,9 +3735,12 @@ rxi_CheckConnReach(struct rx_connection *conn, struct rx_call *call) return 1; } conn->flags |= RX_CONN_ATTACHWAIT; - MUTEX_EXIT(&conn->conn_data_lock); - if (!conn->checkReachEvent) + if (conn->checkReachEvent == NULL) { + /* rxi_CheckReachEvent(NULL, ...) will drop the lock. */ rxi_CheckReachEvent(NULL, conn, call, 0); + } else { + MUTEX_EXIT(&conn->conn_data_lock); + } return 1; } @@ -4686,6 +4682,7 @@ rxi_SendConnectionAbortLater(struct rx_connection *conn, int msec) clock_GetTime(&now); when = now; clock_Addmsec(&when, msec); + rx_GetConnection(conn); conn->delayedAbortEvent = rxevent_Post(&when, &now, rxi_SendDelayedConnAbort, conn, NULL, 0); } @@ -4903,6 +4900,11 @@ rxi_AckAll(struct rx_call *call) call->flags |= RX_CALL_ACKALL_SENT; } +/* + * Event handler for per-call delayed acks. + * Also called synchronously, with |event| == NULL, to send a "delayed" ack + * immediately. + */ static void rxi_SendDelayedAck(struct rxevent *event, void *arg1, void *unused1, int unused2) @@ -4913,7 +4915,6 @@ rxi_SendDelayedAck(struct rxevent *event, void *arg1, void *unused1, MUTEX_ENTER(&call->lock); if (event == call->delayedAckEvent) rxevent_Put(&call->delayedAckEvent); - CALL_RELE(call, RX_CALL_REFCOUNT_DELAY); } (void)rxi_SendAck(call, 0, 0, RX_ACK_DELAY, 0); if (event) @@ -4923,6 +4924,9 @@ rxi_SendDelayedAck(struct rxevent *event, void *arg1, void *unused1, rxevent_Put(&call->delayedAckEvent); (void)rxi_SendAck(call, 0, 0, RX_ACK_DELAY, 0); #endif /* RX_ENABLE_LOCKS */ + /* Release the call reference for the event that fired. */ + if (event) + CALL_RELE(call, RX_CALL_REFCOUNT_DELAY); } #ifdef RX_ENABLE_LOCKS @@ -5090,10 +5094,8 @@ rxi_SendCallAbort(struct rx_call *call, struct rx_packet *packet, static void rxi_CancelDelayedAbortEvent(struct rx_call *call) { - if (call->delayedAbortEvent) { - rxevent_Cancel(&call->delayedAbortEvent); + if (rxevent_Cancel(&call->delayedAbortEvent)) CALL_RELE(call, RX_CALL_REFCOUNT_ABORT); - } } /* Send an abort packet for the specified connection. Packet is an @@ -5121,7 +5123,8 @@ rxi_SendConnectionAbort(struct rx_connection *conn, if (force || rxi_connAbortThreshhold == 0 || conn->abortCount < rxi_connAbortThreshhold) { - rxevent_Cancel(&conn->delayedAbortEvent); + if (rxevent_Cancel(&conn->delayedAbortEvent)) + putConnection(conn); error = htonl(conn->error); conn->abortCount++; MUTEX_EXIT(&conn->conn_data_lock); @@ -5151,10 +5154,11 @@ rxi_ConnectionError(struct rx_connection *conn, dpf(("rxi_ConnectionError conn %"AFS_PTR_FMT" error %d\n", conn, error)); MUTEX_ENTER(&conn->conn_data_lock); - rxevent_Cancel(&conn->challengeEvent); - rxevent_Cancel(&conn->natKeepAliveEvent); - if (conn->checkReachEvent) { - rxevent_Cancel(&conn->checkReachEvent); + if (rxevent_Cancel(&conn->challengeEvent)) + putConnection(conn); + if (rxevent_Cancel(&conn->natKeepAliveEvent)) + putConnection(conn); + if (rxevent_Cancel(&conn->checkReachEvent)) { conn->flags &= ~(RX_CONN_ATTACHWAIT|RX_CONN_NAT_PING); putConnection(conn); } @@ -5930,10 +5934,8 @@ rxi_Resend(struct rxevent *event, void *arg0, void *arg1, int istack) /* Make sure that the event pointer is removed from the call * structure, since there is no longer a per-call retransmission * event pending. */ - if (event == call->resendEvent) { - CALL_RELE(call, RX_CALL_REFCOUNT_RESEND); + if (event == call->resendEvent) rxevent_Put(&call->resendEvent); - } rxi_CheckPeerDead(call); @@ -5986,6 +5988,7 @@ rxi_Resend(struct rxevent *event, void *arg0, void *arg1, int istack) rxi_Start(call, istack); out: + CALL_RELE(call, RX_CALL_REFCOUNT_RESEND); MUTEX_EXIT(&call->lock); } @@ -6347,6 +6350,7 @@ rxi_NatKeepAliveEvent(struct rxevent *event, void *arg1, struct sockaddr_in taddr; char *tp; char a[1] = { 0 }; + int resched = 0; struct iovec tmpiov[2]; osi_socket socket = (conn->type == @@ -6379,20 +6383,28 @@ rxi_NatKeepAliveEvent(struct rxevent *event, void *arg1, osi_NetSend(socket, &taddr, tmpiov, 1, 1 + sizeof(struct rx_header), 1); MUTEX_ENTER(&conn->conn_data_lock); + /* We ran, so the handle is no longer needed to try to cancel ourselves. */ + if (event == conn->natKeepAliveEvent) + rxevent_Put(&conn->natKeepAliveEvent); MUTEX_ENTER(&rx_refcnt_mutex); /* Only reschedule ourselves if the connection would not be destroyed */ - if (conn->refCount <= 1) { - rxevent_Put(&conn->natKeepAliveEvent); - MUTEX_EXIT(&rx_refcnt_mutex); - MUTEX_EXIT(&conn->conn_data_lock); - rx_DestroyConnection(conn); /* drop the reference for this */ - } else { - conn->refCount--; /* drop the reference for this */ - MUTEX_EXIT(&rx_refcnt_mutex); - rxevent_Put(&conn->natKeepAliveEvent); - rxi_ScheduleNatKeepAliveEvent(conn); - MUTEX_EXIT(&conn->conn_data_lock); + if (conn->refCount > 1) + resched = 1; + if (conn->refCount <= 0) { +#ifdef RX_REFCOUNT_CHECK + osi_Assert(conn->refCount == 0); +#endif + if (rx_stats_active) { + MUTEX_ENTER(&rx_stats_mutex); + rxi_lowConnRefCount++; + MUTEX_EXIT(&rx_stats_mutex); + } } + MUTEX_EXIT(&rx_refcnt_mutex); + if (resched) + rxi_ScheduleNatKeepAliveEvent(conn); + MUTEX_EXIT(&conn->conn_data_lock); + putConnection(conn); } static void @@ -6403,9 +6415,7 @@ rxi_ScheduleNatKeepAliveEvent(struct rx_connection *conn) clock_GetTime(&now); when = now; when.sec += conn->secondsUntilNatPing; - MUTEX_ENTER(&rx_refcnt_mutex); - conn->refCount++; /* hold a reference for this */ - MUTEX_EXIT(&rx_refcnt_mutex); + rx_GetConnection(conn); conn->natKeepAliveEvent = rxevent_Post(&when, &now, rxi_NatKeepAliveEvent, conn, NULL, 0); } @@ -6439,7 +6449,6 @@ rxi_KeepAliveEvent(struct rxevent *event, void *arg1, void *dummy, struct rx_connection *conn; afs_uint32 now; - CALL_RELE(call, RX_CALL_REFCOUNT_ALIVE); MUTEX_ENTER(&call->lock); if (event == call->keepAliveEvent) @@ -6455,6 +6464,7 @@ rxi_KeepAliveEvent(struct rxevent *event, void *arg1, void *dummy, /* Don't try to keep alive dallying calls */ if (call->state == RX_STATE_DALLY) { MUTEX_EXIT(&call->lock); + CALL_RELE(call, RX_CALL_REFCOUNT_ALIVE); return; } @@ -6467,6 +6477,7 @@ rxi_KeepAliveEvent(struct rxevent *event, void *arg1, void *dummy, } rxi_ScheduleKeepAliveEvent(call); MUTEX_EXIT(&call->lock); + CALL_RELE(call, RX_CALL_REFCOUNT_ALIVE); } /* Does what's on the nameplate. */ @@ -6476,22 +6487,17 @@ rxi_GrowMTUEvent(struct rxevent *event, void *arg1, void *dummy, int dummy2) struct rx_call *call = arg1; struct rx_connection *conn; - CALL_RELE(call, RX_CALL_REFCOUNT_MTU); MUTEX_ENTER(&call->lock); if (event == call->growMTUEvent) rxevent_Put(&call->growMTUEvent); - if (rxi_CheckCall(call, 0)) { - MUTEX_EXIT(&call->lock); - return; - } + if (rxi_CheckCall(call, 0)) + goto out; /* Don't bother with dallying calls */ - if (call->state == RX_STATE_DALLY) { - MUTEX_EXIT(&call->lock); - return; - } + if (call->state == RX_STATE_DALLY) + goto out; conn = call->conn; @@ -6504,7 +6510,9 @@ rxi_GrowMTUEvent(struct rxevent *event, void *arg1, void *dummy, int dummy2) conn->idleDeadTime) (void)rxi_SendAck(call, NULL, 0, RX_ACK_MTU, 0); rxi_ScheduleGrowMTUEvent(call, 0); +out: MUTEX_EXIT(&call->lock); + CALL_RELE(call, RX_CALL_REFCOUNT_MTU); } static void @@ -6523,10 +6531,8 @@ rxi_ScheduleKeepAliveEvent(struct rx_call *call) static void rxi_CancelKeepAliveEvent(struct rx_call *call) { - if (call->keepAliveEvent) { - rxevent_Cancel(&call->keepAliveEvent); + if (rxevent_Cancel(&call->keepAliveEvent)) CALL_RELE(call, RX_CALL_REFCOUNT_ALIVE); - } } static void @@ -6555,10 +6561,8 @@ rxi_ScheduleGrowMTUEvent(struct rx_call *call, int secs) static void rxi_CancelGrowMTUEvent(struct rx_call *call) { - if (call->growMTUEvent) { - rxevent_Cancel(&call->growMTUEvent); + if (rxevent_Cancel(&call->growMTUEvent)) CALL_RELE(call, RX_CALL_REFCOUNT_MTU); - } } /* @@ -6608,7 +6612,8 @@ rxi_SendDelayedConnAbort(struct rxevent *event, void *arg1, void *unused, struct rx_packet *packet; MUTEX_ENTER(&conn->conn_data_lock); - rxevent_Put(&conn->delayedAbortEvent); + if (event == conn->delayedAbortEvent) + rxevent_Put(&conn->delayedAbortEvent); error = htonl(conn->error); conn->abortCount++; MUTEX_EXIT(&conn->conn_data_lock); @@ -6620,6 +6625,7 @@ rxi_SendDelayedConnAbort(struct rxevent *event, void *arg1, void *unused, sizeof(error), 0); rxi_FreePacket(packet); } + putConnection(conn); } /* This routine is called to send call abort messages @@ -6634,7 +6640,8 @@ rxi_SendDelayedCallAbort(struct rxevent *event, void *arg1, void *dummy, struct rx_packet *packet; MUTEX_ENTER(&call->lock); - rxevent_Put(&call->delayedAbortEvent); + if (event == call->delayedAbortEvent) + rxevent_Put(&call->delayedAbortEvent); error = htonl(call->error); call->abortCount++; packet = rxi_AllocPacket(RX_PACKET_CLASS_SPECIAL); @@ -6648,25 +6655,35 @@ rxi_SendDelayedCallAbort(struct rxevent *event, void *arg1, void *dummy, CALL_RELE(call, RX_CALL_REFCOUNT_ABORT); } -/* This routine is called periodically (every RX_AUTH_REQUEST_TIMEOUT +/* + * This routine is called periodically (every RX_AUTH_REQUEST_TIMEOUT * seconds) to ask the client to authenticate itself. The routine * issues a challenge to the client, which is obtained from the - * security object associated with the connection */ + * security object associated with the connection + * + * This routine is both an event handler and a function called directly; + * when called directly the passed |event| is NULL and the + * conn->conn->data>lock must must not be held. + */ static void rxi_ChallengeEvent(struct rxevent *event, void *arg0, void *arg1, int tries) { struct rx_connection *conn = arg0; - if (event) + MUTEX_ENTER(&conn->conn_data_lock); + if (event != NULL && event == conn->challengeEvent) rxevent_Put(&conn->challengeEvent); + MUTEX_EXIT(&conn->conn_data_lock); /* If there are no active calls it is not worth re-issuing the * challenge. If the client issues another call on this connection * the challenge can be requested at that time. */ - if (!rxi_HasActiveCalls(conn)) + if (!rxi_HasActiveCalls(conn)) { + putConnection(conn); return; + } if (RXS_CheckAuthentication(conn->securityObject, conn) != 0) { struct rx_packet *packet; @@ -6692,6 +6709,7 @@ rxi_ChallengeEvent(struct rxevent *event, } } MUTEX_EXIT(&conn->conn_call_lock); + putConnection(conn); return; } @@ -6707,22 +6725,33 @@ rxi_ChallengeEvent(struct rxevent *event, clock_GetTime(&now); when = now; when.sec += RX_CHALLENGE_TIMEOUT; - conn->challengeEvent = - rxevent_Post(&when, &now, rxi_ChallengeEvent, conn, 0, - (tries - 1)); + MUTEX_ENTER(&conn->conn_data_lock); + /* Only reschedule ourselves if not already pending. */ + if (conn->challengeEvent == NULL) { + rx_GetConnection(conn); + conn->challengeEvent = + rxevent_Post(&when, &now, rxi_ChallengeEvent, conn, 0, + (tries - 1)); + } + MUTEX_EXIT(&conn->conn_data_lock); } + putConnection(conn); } /* Call this routine to start requesting the client to authenticate * itself. This will continue until authentication is established, * the call times out, or an invalid response is returned. The * security object associated with the connection is asked to create - * the challenge at this time. N.B. rxi_ChallengeOff is a macro, - * defined earlier. */ + * the challenge at this time. */ static void rxi_ChallengeOn(struct rx_connection *conn) { - if (!conn->challengeEvent) { + int start = 0; + MUTEX_ENTER(&conn->conn_data_lock); + if (!conn->challengeEvent) + start = 1; + MUTEX_EXIT(&conn->conn_data_lock); + if (start) { RXS_CreateChallenge(conn->securityObject, conn); rxi_ChallengeEvent(NULL, conn, 0, RX_CHALLENGE_MAXTRIES); }; diff --git a/src/rx/rx_globals.h b/src/rx/rx_globals.h index 66dff99..caf0d3b 100644 --- a/src/rx/rx_globals.h +++ b/src/rx/rx_globals.h @@ -539,10 +539,6 @@ EXT afs_kmutex_t rx_connHashTable_lock; #define PEER_HASH(host, port) ((host ^ port) % rx_hashTableSize) /* Forward definitions of internal procedures */ -#define rxi_ChallengeOff(conn) \ - rxevent_Cancel(&(conn)->challengeEvent) -#define rxi_NatKeepAliveOff(conn) \ - rxevent_Cancel(&(conn)->natKeepAliveEvent) #define rxi_AllocSecurityObject() rxi_Alloc(sizeof(struct rx_securityClass)) #define rxi_FreeSecurityObject(obj) rxi_Free(obj, sizeof(struct rx_securityClass)) diff --git a/tests/rx/event-t.c b/tests/rx/event-t.c index c990246..bc07fd3 100644 --- a/tests/rx/event-t.c +++ b/tests/rx/event-t.c @@ -44,8 +44,29 @@ eventSub(struct rxevent *event, void *arg, void *arg1, int arg2) { struct testEvent *evrecord = arg; + /* + * The eventListMutex protects the contents of fields in the global + * 'events' array, including reading/writing evrecord->event. + * However, in this test code, we have an additional guarantee that + * the events array will remain allocated for the duration of the test, + * and as such that it is safe to dereference |evrecord| at all. In real + * application code where the passed args are pointers to allocated data + * structures with finite lifetime, the programmer must ensure that the + * firing event can safely access these fields (i.e., that the object + * lifetime does not permit the object to be destroyed while an event + * pointing to it is outstanding or in progress). The simplest way to + * do this (for reference counted objects) is to have the pending event + * hold a reference on the pointed-to object. This reference should be + * dropped at the end of the event handler or if the event is + * (successfully!) cancelled before it fires. Other strategies are also + * possible, such as deferring object destruction until after all pending + * events have run or gotten cancelled, noting that the calling code must + * take care to allow the event handler to obtain any needed locks and + * avoid deadlock. + */ pthread_mutex_lock(&eventListMutex); - rxevent_Put(&evrecord->event); + if (evrecord->event != NULL) + rxevent_Put(&evrecord->event); evrecord->event = NULL; evrecord->fired = 1; pthread_mutex_unlock(&eventListMutex); @@ -116,8 +137,7 @@ main(void) /* Test for a problem when there is only a single event in the tree */ event = rxevent_Post(&now, &now, reportSub, NULL, NULL, 0); ok(event != NULL, "Created a single event"); - rxevent_Cancel(&event); - ok(1, "Cancelled a single event"); + ok(rxevent_Cancel(&event), "Cancelled a single event"); rxevent_RaiseEvents(&now); ok(1, "RaiseEvents happened without error"); @@ -151,10 +171,8 @@ main(void) if (random() % 4 == 0) { int victim = random() % counter; - if (events[victim].event != NULL) { - rxevent_Cancel(&events[victim].event); + if (rxevent_Cancel(&events[victim].event)) events[victim].cancelled = 1; - } } pthread_mutex_unlock(&eventListMutex); } -- 1.9.4