git.openafs.org Git - openafs.git/log

volser: take RO volume offline during convertROtoRW

The vos convertROtoRW command converts a RO volume into a RW volume.
Unfortunately, the RO volume is not checked out from the fileserver
during this process. As a result, accesses to the volume being converted
can leave volume objects in an inconsistent state.

Moreover, consider the following scenario:

1. Create a volume on host_b and add replicas on host_a and host_b.

$ vos create host_b a vol_1
$ vos addsite host_b a vol_1
$ vos addiste host_a a vol_1

2. Mount the volume:

$ fs mkmount /afs/.mycell/vol_1 vol_1
$ vos release vol_1
$ vos release root.cell

3. Shutdown dafs on host_b:

$ bos shutdown host_b dafs

4. Remove RO reference to host_b from the vldb:

$ vos remsite host_b a vol_1

5. Attach the RO copy by touching it:

$ fs flushall
$ ls /afs/mycell/vol_1

6. Convert RO copy to RW:

$ vos convertROtoRW host_a a vol_1

Notice that FSYNC_com_VolDone fails silently (FSYNC_BAD_STATE), leaving
the volume object for the RO copy set as VOL_STATE_ATTACHED (on success,
this volume should be set as VOL_STATE_DELETED).

7. Add replica on host_a:

$ vos addsite host_a a vol_1

8. Wait until the "inUse" flag of the RO entry is cleared (or force this
to happen by attaching multiple volumes).

9. Release the volume:

$ vos release vol_1

Failed to start transaction on volume 536870922
Volume not attached, does not exist, or not on line
Error in vos release command.
Volume not attached, does not exist, or not on line

Notice that this happens because we cannot mark an attached volume as
destroyed (FSYNC_com_VolDone).

To avoid the problem mentioned above and to prevent accesses to the
volume being converted, take the RO volume offline before converting it
to RW.

Change-Id: Ifd342e1f420dc42e5da49242a7aa70db7d97a884
Reviewed-on: https://gerrit.openafs.org/14340
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

vos: Cleanup indentation whitespace

Fix the indentation whitespace in vos.c, and remove double blank
lines. No functional change.

Change-Id: I97587779d6d2c131b5eac98bbee49efae73fafe9
Reviewed-on: https://gerrit.openafs.org/14007
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

vos: Return true when GetServerAndPart finds a site

Change the GetServerAndPart() function to return true when a volume site
in the vldb entry is found. Do not change the output arguments unless
the site is found. Also, add a function comment header and fix some
comment typos in this function.

Change-Id: I10b43054b1bf9e6757ccdc95cb4559ab8b6dc013
Reviewed-on: https://gerrit.openafs.org/14006
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>

vos: Add missing -partition requires -server checks

The `vos remove` command was missing a check for the -server option when
the -partition option is given. This command requires the -server option
when the -partition is given, as documented in the man page.

The `vos syncvldb` command performed the check for the -server option
when the -partition option is given, but in the wrong location.

As documented, the `vos unlockvldb` command permits the -partition
option without a -server option, in which case all of the volumes listed
in the VLDB with sites on the specified partition are unlocked.
However, this command incorrectly issued an RPC to a volume server at
address 0.0.0.0 when only the partition is given.

Change-Id: I6b878678e28b34250e63d2d082747f6fd416972d
Reviewed-on: https://gerrit.openafs.org/14005
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>

vos: avoid double release of a volume lock

To update a volume entry in the VLDB, vos commands typically lock the
volume entry via VL_SetLock, then call VL_UpdateEntryN, then release the
lock via VL_ReleaseLock.  However, some vos commands exploit the
optional lock release flags of VL_UpdateEntryN to combine the update and
unlock operations into a single RPC.  This approach requires extra care
to ensure that VL_ReleaseLock is issued for a failed VL_UpdateEntryN,
but NOT for a successful VL_UpdateEntryN.

Unfortunately, the following commands have success paths that fall
through to the error path, resulting in a double release of the volume
lock:
- vos convertROtoRW
- vos release

A second VL_ReleaseLock of a volume entry that has already been unlocked
via VL_UpdateEntryN is essentially a harmless no-op (other than negating
any benefit of exploiting the VL_UpdateEntryN lock flags).  However, if
there is a race with another volume operation on the same volume, it is
possible for this bug to release the volume lock of a different volume
operation.

This problem has been present in 'vos release' since OpenAFS 1.0.  This
problem has been present in 'vos convertROtoRW' since the command's
introduction in commit 8af8241e94284522feb77d75aee8ea3deb73f3cc
vol-ro-to-rw-tool-20030314.

Properly maintain state to avoid unlocking a volume (with
VL_ReleaseLock) that has already been unlocked (via VL_UpdateEntryN).

Thanks to Andrew Deason for discovering the issue and suggesting the
fix.

Change-Id: I757b4619b9431d1ca980f755349806993add14a5
Reviewed-on: https://gerrit.openafs.org/14426
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

volser: document 'vos restore -readonly' restriction

Commit 0c03f8607e15 vos-command-enhancements-20011008 introduced the
'vos restore' -readonly option, which allows the restored volume to be
RO instead of the default RW.  The commit message documents the
following restriction:

- ... This option causes the restored volume to be an RO volume.  It is
  not permitted to restore an RO volume when the associated RW volume
  already exists.  While it is possible to restore an RW volume where an
  RO volume exists, caution should be used to avoid doing this with VLDB
  entries created by 'vos restore -readonly', since such entries have
  their ROVOL and RWVOL ID's set to the same thing.

Document this restriction in the 'vos restore' man page, and in a code
comment.

No functional change is incurred by this commit.

Change-Id: I34f6c5434b82da538a38a9d219207b33dcf62b17
Reviewed-on: https://gerrit.openafs.org/14348
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

volser: improve error checking for 'vos restore'

UV_RestoreVolume2 calls VLDB_GetEntryByName to obtain information for
sanity checking, but only checks for a VL_NOENT error code; other codes
are thus ignored, which may lead to confusing results.

Add an additional error check for 'vos restore' (and other callers of
UV_RestoreVolume2) to stop and issue an error message if a non-VL_NOENT
error code is received from VLDB_GetEntryByName.

Change-Id: Idf41965fdd84fa282a3397215ec393ae10f72018
Reviewed-on: https://gerrit.openafs.org/14347
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

volser: fix 'cant' typos

Correctly spell "can't" in a log message and a comment.

Change-Id: I9d5c667d9c5ea3c5b726f958431c497353433239
Reviewed-on: https://gerrit.openafs.org/14346
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: avoid panic in DNew when afs_WriteDCache fails

afs_WriteDCache may fail for an IO error, or if interrupted (EINTR).
Unfortunately, DNew will panic in this case, crashing the entire
machine.

In order to avoid an outage in this case, don't panic. Instead, reflect
the error back to the caller of DNew.

While here, add Doxygen comments to DNew.

Change-Id: I27a8f89bab979c5691dded70e8b9eacbe8aff4fd
Reviewed-on: https://gerrit.openafs.org/13804
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

afs: remove redundant assignment

DRelease has two assignments for tp = entry->buffer; remove the second
(redundant) one.

Introduced with 0284e65f97861e888d95576f22a93cd681813c39 'dir: Explicitly
state buffer locations for data'.

No functional change should be incurred by this commit.

Change-Id: If4a17862f451973075fa3fa267b5139046d97ede
Reviewed-on: https://gerrit.openafs.org/13802
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

dir: check DNew return code

Commit 0284e65f97861e888d95576f22a93cd681813c39 'dir: Explicitly state
buffer locations for data' changed DNew and DRead to return a return
code.  However, the callers of DNew were not modified to check the new
return code.  (This commit applied only to the implementations dealing
with AFS directories, in afs/afs_buffer.c and dir/dir.c.  The ubik
implmentations of DNew and DRead, dealing with ubik databases, were not
modified.)

Modify all (non-ubik) callers of DNew to check the return code.  In
addition, modify code as needed so return codes are properly propagated
to the callers.

While here, add Doxygen comments for AddPage and FindBlobs.

Change-Id: Iabde6499745dd351f3fcda73c9f52c440a36490e
Reviewed-on: https://gerrit.openafs.org/13801
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

Remove unused xdr types

Numerous types and constants are defined in our various RPC-L files
that are never used or referenced by anything. Remove them.

Change-Id: I0b03be1ce0e186a88f80d2f3f7a66a1e25965ff3
Reviewed-on: https://gerrit.openafs.org/14404
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

volser: apply static keyword to VolPartitionInfo definition

The function declaration was already marked as static; mark the
definition as well for consistency (and consistency with the other
helpers in this file).

Change-Id: I642db1d27efd34ab2a09f7299791c19d07b1f923
Reviewed-on: https://gerrit.openafs.org/13321
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

dir: check afs_dir_Create return code in afs_dir_MakeDir

afs_dir_MakeDir() ignores the return code from afs_dir_Create() for the
'.' and '..' ("dot" and "dotdot") directories. This has been the case
from the earliest implementation (MakeDir() calling Create()) in the
original IBM import.

Instead, check the return codes to prevent the possibility of creating
malformed directories.

Change-Id: I60179488429dfa9afe60c4862c5e42b41f1e0048
Reviewed-on: https://gerrit.openafs.org/13800
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

ptserver: rename NameToID and IDToName helpers

These helper function names alias the names of public RPCs and can
cause confusion when grepping the code. Rename them in a different
style to provide greater hamming distance between the various
functions involved in handling these RPCs.

Change-Id: I0e2c7997bc145888affdac28716293ff820756c7
Reviewed-on: https://gerrit.openafs.org/13320
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

dir: check afs_dir_MakeDir return code in DirSalvage

Since the original IBM import, DirSalvage() has ignored the return code
from afs_dir_MakeDir() (f.k.a. MakeDir). This has been safe because, as
the comment states, afs_dir_MakeDir returns no (non-zero) error code.

In preparation for a future commit, add a check for the return from
afs_dir_MakeDir and remove the comment.

Change-Id: Ibb259a7aaeeb21ef70a7794143a0dadb2a75725d
Reviewed-on: https://gerrit.openafs.org/13799
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

dir: distinguish logical and physical errors on reads

The directory package (src/dir) salvage routines DirOK and DirSalvage
check a global variable 'DErrno' to distinguish logical errors (e.g.
short read) from physical errors (e.g. EIO). However, since the
original IBM import, this logic has not worked correctly because there
is no longer any code that sets the value of DErrno - its value is
always zero.

Instead, modify all implementations of ReallyRead to optionally return
the errno for low-level IO errors.

Also, create a new userspace-only variant - DReadWithErrno() - of the
src/dir/buffer.c version of DRead (the version called by DirOK and
DirSalvage, and the only caller of ReallyRead) to return the ReallyRead
errno upon request.

Also create an analogous variant of afs_dir_GetBlobs,
afs_dir_GetBlobsWithErrno().

Finally, convert DirOK and DirSalvage to use the new variants and
replace DErrno with equivalent logic. Remove all other references to
DErrno.

Change-Id: I3de182ce49c1682572142da594af5dc2c00ede74
Reviewed-on: https://gerrit.openafs.org/13798
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: Log pid with disk cache read errors

Log the current pid (and procname) when we complain about an error
when reading from CacheItems in afs_UFSGetDSlot. These errors can
result in confusing situations, so it can be helpful to know at least
what process saw the error.

Our logic for logging this information is getting a bit large, so also
move this to a new function, LogCacheError.

Change-Id: I3427e736458784df0d516f4182684605e930e128
Reviewed-on: https://gerrit.openafs.org/14416
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

roken: use strtok_r from roken

Windows standard library doesn't provide strtok_r. Use the strtok_r
that is provided from roken.

Change-Id: I1bccb9a306c9dd1963f044127fb5dfe4da5728cc
Reviewed-on: https://gerrit.openafs.org/13891
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

Import of code from heimdal

This commit updates the code imported from heimdal to
5dfaa0d10b8320293e85387778adcdd043dfc1fe (git2svn-syncpoint-master-311-g5dfaa0d10)

New files are:
roken/strtok_r.c

Change-Id: I27042f614c7d6ce9a95a80d01474e8bf401e4760
Reviewed-on: https://gerrit.openafs.org/13890
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

roken: add strtok_r to the imported file list

Import the strtok_r function which is needed by audit for parsing
command line options.

Change-Id: I8412c5a663dc3315c4146665edb72d9a6b8df5be
Reviewed-on: https://gerrit.openafs.org/13889
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

Detect realloc failure

While reviewing other commits, a call to realloc() was discovered that
would leak memory on failure (by virtue of always assigning the realloc()
return value to the pointer holding the input address, even when the
return value is NULL). Check for failure and return early in that case
(giving an incomplete list of events).

Change-Id: Ic6e889f1d990bd289812ce4bf8e9cd4ebce488ec
Reviewed-on: https://gerrit.openafs.org/13313
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

ptserver: move IDToName, NameToID to ptprocs.c and make static

These two helpers are only used in implementing server-side RPC handlers,
and having to track the codeflow across files is unhelpful. Move them
into the file where they're used, make them static, and remove the
prototypes from ptrototypes.h (which is not an installed header, so
there is no API/ABI breakage).

Change-Id: I236d17865a296933f41aaee206535d341c3a955d
Reviewed-on: https://gerrit.openafs.org/13319
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

Assign explicit opcodes to butc RPCs

This should prevent inadvertent reassignment if additional RPCs
are introduced in the future.

Change-Id: I5645ca478d2ecef9962f4bde04ab8f9895dd9497
Reviewed-on: https://gerrit.openafs.org/13317
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

vlserver: Return VL_DBBAD on unhash failure

If we try to delete a vlentry, and the vlentry cannot be found on one
of its hash chains, we cannot unhash the vlentry properly and the
operation fails with VL_NOENT. This results in the following error
messages to the user:

        $ vos delentry 123456
        Could not delete entry for volume 123456
        You must specify a RW volume name or ID (the entire VLDB entry will be deleted)
        VLDB: no such entry
        Deleted 0 VLDB entries

This is confusing, because VL_NOENT can also occur if the user
specifies a volume that does actually not exist. This situation is
indicative of database corruption, usually because of a ubik
transaction that was only half-applied, or because of other ubik bugs
in the past.

The situation can only really be fixed by repairing the database, so
return VL_DBBAD in this case instead, to more clearly indicate that
something is wrong with the database, and not a problem with the
arguments the caller provided.

Change-Id: I6fc275c3ad05c108778f36687227b0a927cca5da
Reviewed-on: https://gerrit.openafs.org/13384
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

vlserver: Add VL_DBBAD error code

The VL_ error table currently doesn't have an error code to indicate
that an operation cannot succeed because the database is corrupted.
There are a few error codes for specific cases of errors that are
probably the result of corruption (like VL_IDALREADYHASHED, or
VL_EMPTY), but these are only for specific cases and indicate rather
low-level internal problems.

There are some instances where the real problem preventing an
operation from succeeding is that the database is just corrupt or
inconsistent in some way, and the administrator must repair the
database before it can succeed. And we currently don't have any way of
indicating that situation via an error code.

So, introduce the VL_DBBAD code, to indicate this situation. Error
codes already exist in other tables for similar situations, such as
PRDBBAD, and KADATABASEINCONSISTENT.

This commit does not use the new error code anywhere; we just
introduce it into the VL_ error table, so comerr-using applications
will be able to interpret it.

Note that the VL_DBBAD error code has been recognized by the AFS
Assigned Numbers Registry as recorded in the ticket history of
<https://rt.central.org/rt/Ticket/Display.html?id=134817>

Change-Id: I8fea356a4e0db907ec8418efe6ef35d547be0a63
Reviewed-on: https://gerrit.openafs.org/13383
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

ptserver: move allocation out of put_prentries() into listEntries()

put_prentries() is a helper function for listEntries(), but the contract
between the two is rather odd -- put_prentries() is expected to notice
when the backing store has not yet been allocated and silently allocate
it, even though there is only the single caller and the allocation could
be done in the caller.

Move the allocation to the caller and adjust the "buffer is full"
logic accordingly, and normalize the initialization of the output
array to just use calloc() instead of individual memset()s when
populating each entry.

Change-Id: Icf84e3b60eae81a1570b12d7adbf006a24a104f3
Reviewed-on: https://gerrit.openafs.org/13315
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

volser: Avoid calling osi_audit before audit init

volmain.c calls osi_audit before the audit facility is fully
initialized.

Commit 16d67791 (auditlogs-for-everyone-20050702) introduced the
-auditlog parameter; it appears that it didn't remove the call
to osi_audit (right after osi_audit_init) that was called before command
line argument processing. This resulted in calling the audit facility
before it was fulling initialized with the -auditlog and
-audit-interface parameters. The 16d67791 commit replicated the
osi_audit call after command line processing.

Change-Id: Ia0c0054a2fb11892b5b30c0f0838a4d6bbdf9bbb
Reviewed-on: https://gerrit.openafs.org/13772
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

audit: Always call pthread_once in osi_audit_init

Currently, we skip the pthread_once call in osi_audit_init if
audit_lock_initialized is set. But this is somewhat pointless, since
pthread_once will effectively do this check itself, and better (it
will wait if osi_audit_init is actively running in another thread).

So just get rid of audit_lock_initialized, and replace the other
assert for audit_lock_initialized with another plain pthread_once
call.

Change-Id: I466c8ec2d1516edecaae23d4354892e7e3a88918
Reviewed-on: https://gerrit.openafs.org/14403
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

remove unused src/butc/common.h

Change-Id: Ie25a9ca4f715c841a7f7fa130176cfbdc5ef18e7
Reviewed-on: https://gerrit.openafs.org/13322
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

butc: consistently spell taskId parameter

All but one RPC used the capitalization "taskId"; adjust the long
straggler for consistent style.

Change-Id: I996d96a4fc67af7f745bf67041c90390073ca9ea
Reviewed-on: https://gerrit.openafs.org/13318
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

Remove commented-out butc RPC definitions

These functions have been commented out since the original IBM
import, and un-commenting them in their current location would
be an ABI break (by causing opcodes to be reassigned for subsequent
RPCs). Since they are just noise in the interface description
file, remove them.

Change-Id: I7e8cd2e7dfa4469e39e26a0437059c108f3ef218
Reviewed-on: https://gerrit.openafs.org/13316
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

butc: Initialize RPC outputs at top of function

RPC handlers are a little bit special in that their output parameters
are discarded on error and an Rx abort is sent instead of the usual
response fields. Nonetheless, it is good code hygeine to adhere to
the practices we use for the rest of the functions in the tree:
initialize output variables before the first return.

Change-Id: I6c2e25b04ccb6277bd28e398121723b92fe42b04
Reviewed-on: https://gerrit.openafs.org/13314
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

vlserver: Warn when we cannot unhash deleted entry

If we are trying to delete an entry from the vldb, we fail with
VL_NOENT if we cannot find the given entry on one of its hash chains.
This is indicative of corruption in the vldb (since we have an entry
not on a hash chain), but we don't really indicate this clearly. There
are no log messages, and the user running 'vos' only sees an error
like this:

    $ vos delentry 123456
    Could not delete entry for volume 123456
    You must specify a RW volume name or ID (the entire VLDB entry will be deleted)
    VLDB: no such entry
    Deleted 0 VLDB entries

Which is the exact same error message if the user tries to delete a
volume that does not actually exist.

We currently do not have an error code that clearly says that the
database appears corrupted and needs to be fixed, but we can at least
log an error in VLLog for this case, to give the administrator a
chance at fixing the situation. So, log a message in this situation.

Change-Id: I4f0ee8749a90441e1f8d779890293dc5d1d9dbee
Reviewed-on: https://gerrit.openafs.org/13382
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

bos: do not assume fs just if dafs bnode is stopped

If dafs is configured but stopped, 'bos salvage <fs> <vicep>
-forceDAFS' will fail with:

  bos: failed to get instance info for 'fs' (no such entity)
  bos: shutting down 'fs'.
  bos: can't stop 'fs' (no such entity)

This is due to incomplete logic in IsDAFS, introduced with commit
e46f10a0a0a930f318833a8a86b10c19744160c1 'bos: Do not assume DAFS just
if DAFS bnode exists'

Add logic to IsDAFS to work correctly when dafs is configured but
stopped.

Change-Id: I50f8209180536d25e68c0ad6fb826202d8f27ce7
Reviewed-on: https://gerrit.openafs.org/14382
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

bozo: defer audit open until log dir is created and current

On a new OpenAFS install where the log directory has not yet been
created. 'bosserver -auditlog /usr/afs/logs/<auditlog>' (absolute path)
fails with ENOENT because the log directory doesn't exist yet.

Furthermore, 'bosserver -auditlog <auditlog>' (relative path) succeeds,
but the audit file is created in the current working directory when
bosserver was started, not in the expected log directory (Transarc
/usr/afs/logs).

Both problems have been present since bosserver audit log support was
introduced by commit 16d67791dce45e5d4ee9b854c796492ffcde2113
'auditlogs-for-everyone-20050702'.

Reorder the bosserver initialization steps to ensure that the log
directory has been created and is the current working directory, before
creating and opening the audit log.

Change-Id: I1dc3c136edd12c5425ef0b7a3212a18d4c3036f7
Reviewed-on: https://gerrit.openafs.org/14381
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

bozo: Properly detect presence of -auditlog

cmd_OptionAsString returns non-zero if the given option _isn't_ given
(CMD_MISSING), so we need to call osi_audit_file only when
cmd_OptionAsString returns 0. Since commit
f6cdf71 (bozo: Use libcmd for command line options), this causes
bosserver to complain on startup if no -auditlog was given:

$ bosserver
Warning: auditlog (null) not writable, ignored.

To fix this, skip calling osi_audit_file if -auditlog was not given.

While we're changing this anyway, change our processing of our
audit-related options to more closely match what other daemons do,
like ptserver or viced, so it's easier to see if we're doing the right
thing. That is, just call cmd_OptionAsString() without a conditional,
and just test if auditFileName is non-NULL later on, after options
processing.

Change-Id: I563c7efd02cb5210c32c0cc7f5a03683db792e98
Reviewed-on: https://gerrit.openafs.org/14402
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: prevent double release of global lock afs_xvcb

afs_GetServer calls ReleaseWriteLock(&afs_xvcb) twice within a few
lines.  The second one is spurious.

Commits b18653de7ae90491c2e75f4a98410581655d776c 'xserver lock order
violation' and f2bf60ed4f1323cd6f74f2f01114f7e4f714db53 'xvcb lock order
violation' were written by the same author at the same time and
apparently were victims of a bad merge.

Discovered during a lock audit project as a panic during afsd startup:

  assertion failed: (&afs_xvcb)->excl_locked == WRITE_LOCK, file:
  /home/mvitale/src/sna-openafs/src/afs/afs_server.c, line: 2089

afs_GetServer is called frequently by many threads and so this bug could
easily have released another thread's write lock on afs_xvcb.

Remove the spurious second release.

Change-Id: I495f4775e18ed37cfbccd03b5f25594586864b83
Reviewed-on: https://gerrit.openafs.org/14411
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

ubik: Introduce IndexOf()

To make the ubik_Call* functions cleaner, consolidate code that finds
the index of the connection associated with a host into a new function.

No functional change should be incurred by this commit.

Change-Id: I320d7a41221cb533e8d077c412f872152ac43b75
Reviewed-on: https://gerrit.openafs.org/14060
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: Handle osi_NewVnode failures

Currently, code inside afs_vcache.c assumes that osi_NewVnode always
returns non-NULL, which means that osi_NewVnode must panic if it
cannot create a new vnode.

All of the callers of afs_GetVCache, afs_NewVCache, etc, already
handle getting a NULL return, though (after all, the given fid may not
exist or be inaccessible due to network errors, etc). So, just
propagate NULL returns from osi_NewVnode up to our callers, to avoid
panics in these situations.

Modify osi_NewVnode on many arches to return an error on allocation
failure, instead of panic'ing.

Change-Id: Ib578b1747590bdf65327d4674e0849811ed999eb
Reviewed-on: https://gerrit.openafs.org/13701
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Yadavendra Yadav <yadayada@in.ibm.com>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

stats: incorrect clock square algorithm

Since the original IBM code import, OpenAFS has an algorithm for
squaring clock values, implemented identically in three different
places.  This algorithm does not account correctly for microsecs
overflow into seconds, resulting in incorrect "sum-of-squares" values
for queue and execution time in several OpenAFS performance utilities.

Specifically, this code:

        t1.tv_usec += (2 * t2.tv_sec * t2.tv_usec) % 1000000                   \
                      + (t2.tv_usec / 1000)*(t2.tv_usec / 1000)                \
                      + 2 * (t2.tv_usec / 1000) * (t2.tv_usec % 1000) / 1000   \
                      + (((t2.tv_usec % 1000) > 707) ? 1 : 0);                 \

Can allow for the tv_usec field to be increased by a theoretical max
of around:

        t1.tv_usec += 999998                                                   \
                      + 999*999                                                \
                      + 2 * 999 * 999 / 1000                                   \
                      + 1;                                                     \

Or:

        t1.tv_usec += 1999996;                                                 \

If t1.tv_usec is already 999999, after this calculation its value
could be as high as 2999995. So just checking once if t1.tv_usec is
over 1000000 is not sufficient, since the resulting value (1999995) is
still over 1000000.

Correct all implementations by repeatedly checking if tv_usec is over
1000000 after the above calculation:

macro                   affected utility
=====================   ============================
afs_stats_SquareAddTo   xstat_cm_test
fs_stats_SquareAddTo    xstat_fs_test
clock_AddSq             rxstat_get_process and _peer

Change-Id: I3145d592ba6bc1556729eac657f43d476c99eede
Reviewed-on: https://gerrit.openafs.org/14376
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

rxstats: correctly report vlserver VL_* RPC stats

Since the original IBM code import, rxstat_get_process and
rxstat_get_peer have reported vlserver VL_* RPC stats as for the
"volserver interface".

Correct this to read "vlserver interface".

Change-Id: Ie65fd41150bed8180ad8792c21a67012084459ab
Reviewed-on: https://gerrit.openafs.org/14375
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

rxstats: correctly distinguish client and server stats

Commit d3eaa39da3693bba708fa2fa951568009e929550 'rx: Make the rx_call
structure private' inadvertently caused all rxstats (aka rpcstats) to be
recorded as client stats by hardcoding the value for isServer to 1.

Therefore, when peer or process rxstats are enabled for a OpenAFS
component, the rxstat_get_process and rxstat_get_peer utilities will
erroneously report both client and server stats as "accessed as a client".

This is particularly problematic for ubik VOTE_* and DISK_* RPC stats,
for which a given ubik server may be both client and server over time.
In this case, both client and server stats are conflated into the same
"accessed as a client" counters.

Instead, properly pass the value of isServer from
rx_RecordCallStatistics through to rxi_IncrementTimeAndCount.

Note to maintainers:
This bug is only in master and all 1.8.x releases; no 1.6.x releases are
affected.

Note:
Confusingly, isServer=1 indicates client stats and isServer=0 indicates
server stats. However, this is a quirk of the original implementation
and wire format of the RXSTATS_* RPCs and cannot be changed. isServer
is actually shorthand for "remote is server"; thus all RPC client stubs
record their rxstats with isServer == 1, and all RPC server stubs record
their rxstats with isServer == 0.

Change-Id: I2420f807e2c18ddfb9de7093a487825fa2d0a68e
Reviewed-on: https://gerrit.openafs.org/14374
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

volser: Close dirp on error in ConvertROtoRW

Currently, if SAFSVolConvertROtoRWvolume cannot create a new transaction
for the volume to be converted, it returns without closing the directory
stream opened by it. To prevent this leak, go through a new 'goto done'
destructor if NewTrans fails.

Change-Id: Ie0580e7739ae667f1cd2f9cabb8aaf5e15d3f2dd
Reviewed-on: https://gerrit.openafs.org/14342
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

bozo: Log each dir and file with bad access rights

The bosserver directory and file access check stops after finding one
directory or file with incorrect permissions or owner. A log message is
written for this first one found, but more than one directory or file
may have incorrect access rights.

Instead check all of them so the bosserver logs a warning message for
each incorrect director or file permission found. This should make it
easier to fix all of the file permission problems at once.

Change-Id: Ia3f14800ce036aa390929109a286cf21828e8a35
Reviewed-on: https://gerrit.openafs.org/14330
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

bozo: Add KeyFileExt and rxkad.keytab to access rights check

When the KeyFileExt and rxkad.keytab were added to OpenAFS, they were
not added to the bosserver's access rights check. Add these files to the
bosserver access checks, with the same access rights needed for the
original KeyFile.

Also, add the full path for KeyFileExt to the dirpath package (not just
the filename), which was not done when the KeyFileExt was introduced.
This is needed to perform the access checks.

Change-Id: I8c9028e846fad9f15823baeb7cc15a8f80ed5c1c
Reviewed-on: https://gerrit.openafs.org/14329
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: remove vestigial externs for afs_xvcache

These have not been needed since src/afs/afs_prototypes.h gained 'extern
afs_rwlock_t afs_xvcache' with commit
8f2df21ffe59e9aa66219bf24656775b584c122d
"pull-prototypes-to-head-20020821"

Remove the vestigial extern references.

Change-Id: Id6aceff0d5df1f1bed210a3fbf2951c62f35ddbb
Reviewed-on: https://gerrit.openafs.org/14406
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: remove vestigial externs for afs_xcbhash

Commit 64cc7f0ca7a44bb214396c829268a541ab286c69 "afs: Create
afs_StaleVCache" consolidated many references to afs_xcbhash into a new
function afs_StaleVCache. However, this left many references to 'extern
afs_wrlock_t afs_xcbhash' that are no longer needed.

But actually, many of these have not been needed since
src/afs/afs_prototypes.h gained 'extern afs_rwlock_t afs_xcbhash' with
commit 8f2df21ffe59e9aa66219bf24656775b584c122d
"pull-prototypes-to-head-20020821"

Remove the vestigial extern references.

No functional change is incurred by this commit.

Change-Id: Ie6cfb6d90c52951795378d3b42e041567d207305
Reviewed-on: https://gerrit.openafs.org/14405
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

xstat: prevent CPU loop when -period 0

Historically xstat_cm_test and xstat_fs_test have supported option
'-period <mm>' to specify continuous operaiton for a length of time.  If
'-period 0' was specified, both programs exited immediately.

Beginning with commits 2c1a7e47336c8f8d14dd6c65d53925a9e0e87c66 'xstat:
add xstat_*_Wait functions' and 6b67cac432043a43d7cdfa6af972ab54412aff94
'convert xstat and friends to pthreads', xstat_cm_test and xstat_fs_test
now support -period 0 to run "forever".  This support is implemented in
xstat_cm_Wait and xstat_fs_Wait, respectively.  Although the "wait
forever" logic was added to allow consolidation of similar code in
afsmonitor, it also changed how xstat_cm_test and xstat_fs_test behave
for '-period 0'.

Unfortunately, there is a bug in this support, at least when running on
pthreads.  After the initial 24 minute timer expires, the while (1) will
repeatedly run select with a timeout that is now 0.  This causes the
while loop to consume 100% of the CPU on which this thread is
dispatched.

Instead, modify the wait-forever logic to specify NULL for the select()
timeout value.  Also update the man page to document that '-period 0'
means forever.

Change-Id: I25d0d5be0eedb8bf3de495785b9b03a3e3d45221
Reviewed-on: https://gerrit.openafs.org/14366
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: Return to userspace after AFS_NEW_BKG reqs

Currently, for AFS_NEW_BKG, background daemons run in the context of a
normal user process (afsd), in order to return to run
userspace-handled background ops. For non-AFS_NEW_BKG when
AFS_DAEMONOP_ENV is defined, background daemons run as kernel threads
instead, and have no corresponding userspace process.

On LINUX, whether or not we run as a kernel thread has some odd
side-effects: at least one example of this is how open file handles
(struct file) are treated when closed. When the last reference to a
struct file is closed, the final free is deferred to an asynchronous
call to be executed "later", in order to avoid issues with lock
inversion. For kernel threads, "later" means the work is schedule on
the global system work queue (schedule_work()), but for userspace
processes, it is scheduled on the task work queue (task_work_add()),
which is run around when the thread returns to userspace. For
background daemons, we never return from the relevant syscall until we
get a userspace background request (or the client is shutting down),

Commit ca472e66 (LINUX: Turn on AFS_NEW_BKG) changed LINUX to use
AFS_NEW_BKG background daemons, so background requests now run as a
normal userspace process, and last-reference file closes are deferred.
Since we may never return to userspace, this means that our file
handles (used for accessing the disk cache) may never get freed,
leading to an unbounded number of file handles remaining open.

This can be seen by seeing the first value in /proc/sys/fs/file-nr
growing without bound (possibly slowly), as accessing /afs causes
background requests. Eventually the number of open files can exceed
the /proc/sys/fs/file-max limit, causing further file opens to fail,
causing various other problems and potentially panics.

To avoid this issue, define a new userspace background op, called
AFS_USPC_NOOP, which gets returned to the afsd background daemon
process. When afsd sees this, it just does nothing and calls the
AFSOP_BKG_HANDLER syscall again, to go into the background daemon loop
again. In afs_BackgroundDaemon, we return the AFS_USPC_NOOP op
whenever there are no pending background requests, or if we've run 100
background requests in a row without a break. This causes us to return
to userspace periodically, preventing any such task-queued work from
building up indefinitely.

Do this for all platforms (currently just LINUX and DARWIN), in order
to simplify the code, and possibly avoid other similar issues, since
staying inside a syscall for so long while doing real work is arguably
weird.

Add a documentation comment block for afs_BackgroundDaemon while we're
here.

Thanks to mvitale@sinenomine.net for discovering the file leak.

Change-Id: I1953d73b2142d4128b064f8c5f95a5858d7df131
Reviewed-on: https://gerrit.openafs.org/13984
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

ubik: Remove unused sampleName

The RPC-L type sampleName and related constant UMAXNAMELEN are not
referenced by anything, and have been unused since OpenAFS 1.0. Remove
the unused definitions.

Change-Id: I21a11d9db9ed80547de8685623fb09f9a86934f1
Reviewed-on: https://gerrit.openafs.org/14386
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

dir: Set srcdir correctly in src/dir/test

srcdir is a magic variable that needs to be set to @srcdir@, not some
relative path like ../../.. (which will usually be somewhere in the
objdir, not srcdir). Set it correctly in here.

Without this, objdir builds can fail with:

make[4]: Entering directory '...obj/src/dir/test'
make[4]: *** No rule to make target 'dtest.o', needed by 'dtest'. Stop.

Which happens because the automatic rule for dtest.o can't be
constructed, since we cannot find dtest.c automatically because srcdir
isn't set properly.

This has been broken since commit 37b4195d (dtest-20021111), but was
not noticeable until commit 192a2ff4 (dir: make dtest buildable
again), since that caused dtest to actually get built.

Also set LIBS correctly in here, using the conventional ${TOP_LIBDIR},
since ${srcdir} no longer points to "../../..".

Change-Id: I539e01a4397c558dc0eda492834b3f9913f71634
Reviewed-on: https://gerrit.openafs.org/14384
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

bozo: Use libcmd for command line options

Update bosserver to use libcmd for command line parsing.

Change-Id: Iaa55dc33b72983a48089a7b359260916bea2d1e7
Reviewed-on: https://gerrit.openafs.org/13845
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

afs: refactor directory checking in DRead

Commit d566c1cf874d15ca02020894ff0af62c4e39e7bb
'dread-do-validation-20041012' modified directory checking (in the
afs_buffer.c implementation of DRead()) to use size information passed
to DRead, rather than obtained from the cache via afs_CFileOpen.

Because this directory checking does not require any information from
the cache buffers or the cache partition, we can make the check right
away, before searching the cache buffers or calling afs_newslot.

To clarify and simplify, move the directory sanity checking logic to the
beginning of DRead. Remove the afs_newslot cleanup logic which is no
longer needed.

While here, add Doxygen comments for DRead.

Change-Id: I8cea4e885ece64e760271c8194c126250f87104e
Reviewed-on: https://gerrit.openafs.org/13803
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

dir: check afs_dir_MakeDir return code in dtest

The dtest test program ignores the return from afs_dir_Makedir.

Fix this so errors may be identified in testing.

While here, also improve the diagnostic message for afs_dir_Create
failures, to make it consistent with the new diagnostic message for
afs_dir_MakeDir failures.

Change-Id: Ib882947e01c864344f17faad8a646b2487793f29
Reviewed-on: https://gerrit.openafs.org/13797
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

dir: dtest should flush on error when creating directories

The dtest -f subcommand (CRTest()) exits immediately if there is an
error while adding files. This may create an empty, incomplete, or
corrupt directory object on disk because we neglected to call DFlush
before exiting.

Always call DFlush from CRTest() whether it fails or succeeds.

Change-Id: Ia7b4ad00ea6f4f9f788cd75ae726bdadb60ee9c3
Reviewed-on: https://gerrit.openafs.org/13796
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

dir: correct fid type for dtest

The dtest utility has had its fid[] arrays defined as 'long' since the
initial IBM import. Commit 0a98548832472152304410e41306adcc5b91f6a2
'dir: Make test utility build again' converted some - but not all - the
fid arrays to afs_int32.

Allow dtest to operate correctly by converting the rest of the fid
arrays to afs_int32.

Change-Id: I2ebe36272e02cf860577153ab94f3591e1d707e8
Reviewed-on: https://gerrit.openafs.org/13795
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

dir: make dtest buildable again

Commit 7fe4125fe3435092b75ed29b884d8d3c2d1a2cad 'dir/vol: Die() really
does' overlooked src/dir/test/dtest.c, breaking its build.

Fix the signature of Die() and the makefile so dtest can be built.
In addition, change the Makefile so it is always built.

Change-Id: I18129acbfdaa770987c7f0b8055ff593f776e518
Reviewed-on: https://gerrit.openafs.org/13794
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

dir: remove unused test files

Makefile rules for physio.c and test-salvage.c have been commented out
since the original IBM code import, and were removed in commit
37b4195d603630498664fa0975ea5d5c82f9aa4f 'dtest-20021111' to fix dtest.
However, that commit neglected to remove the source files and other
references to them in Makefile.in

Finish the job by removing the files and references to them.

No functional change is incurred by this commit.

Change-Id: I57527be99cd28a481a86b659d1eb3227af9f1c99
Reviewed-on: https://gerrit.openafs.org/14052
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

vol: de-orbit test programs

The updateDirInode and listVicepx utilities are obsolete; they no longer
build, are severely bitrotted, and have been largely replaced by
volscan.

While here, also remove other objects that have not been built by default
since before the original IBM import:
- ILIST ilist.exe
- NAMEI_PROGS nicreate, nincdec, nino, nilist

Remove all of them from the tree.

Change-Id: I8f68ec425cce5e84bcc5f41d598eec23102109de
Reviewed-on: https://gerrit.openafs.org/13793
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

Make OpenAFS 1.9.0

Update version strings for the first 1.9.x development release.

Change-Id: I0d0e204ffe8d64d7c0f794f313c0f24ccea12783
Reviewed-on: https://gerrit.openafs.org/14362
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

Import NEWS from OpenAFS 1.8.6

Stay up to date with the stable branch at least until the initial
version of the new release series.

Change-Id: Iefcd9cc039399cd4cbbcc0474c2cabffa7780305
Reviewed-on: https://gerrit.openafs.org/14344
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>

Update 1.9.0 NEWS for recent changes

Add some entries for the commits that landed since the previous update.

Change-Id: I74820ee5a07c3fb539f233b2bd0c30aab262ba74
Reviewed-on: https://gerrit.openafs.org/14343
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

DARWIN: disable kextutil check for versions requiring notarization

Our kextutil signing check will fail for releases that require
notarization (Mojave 10.14.5 and up, Catalina 10.15 all versions),
because we aren't notarized yet at the time of the check.

Instead, disable the check for those releases.

Change-Id: Iec1b74d18ae02cdd031ed3194ffb9900aa8a1b55
Reviewed-on: https://gerrit.openafs.org/14222
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

dumpscan: Don't call cb_dirent twice

This fixes a bug where p->cb_dirent is called twice, if
it exists.

Change-Id: I7a7a6abf522b62eb310d003a61b3bbcdcda9e850
Reviewed-on: https://gerrit.openafs.org/14308
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

Revert "vos: take RO volume offline during convertROtoRW"

This reverts commit 32d35db64061e4102281c235cf693341f9de9271. While that
commit did fix the mentioned problem, depending on "vos" to set the
volume to be converted as "out of service" is not ideal. Instead, this
volume should be set as offline by the SAFSVolConvertROtoRWvolume RPC,
executed on the volume server.

The proper fix for this problem will be introduced by another commit.

Change-Id: I0ce5ba793fe3c07e535225191b74eeb402ab5bfd
Reviewed-on: https://gerrit.openafs.org/14339
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

build: Add rpm target

Add a top-level makefile target to build RPMs for Red Hat distributions
from the currently checked out commit. The resulting rpms are placed in
the packages/rpmbuild/RPMS/<arch> directory.

The rpm target is intended to be a convenience for testing changes to
the rpm packaging or generating packages for local testing.

Change-Id: Id951eb2b03629be59f6258e89e8356fe1fde1ff5
Reviewed-on: https://gerrit.openafs.org/14114
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

makesrpm: Support custom version strings

The makesrpm.pl script generates a source RPM by creating a temporary
rpmbuild workspace, populating the SOURCES and SPECS directories in that
workspace, running rpmbuild to build the source RPM, and finally copying
the resulting source RPM out of the temporary workspace.

The name of the source RPM file created by rpmbuild depends on the
package version and release strings. Unfortunately, the format of the
source RPM file name changed around OpenAFS 1.6.0, so makesrpm.pl has
special logic to find the version string and extra code depending on the
detected OpenAFS version.

Instead of trying to predict the name of the resulting source RPM file
from the OpenAFS version string, and having different logic for old
versions of OpenAFS, use a filename glob to find resulting source RPM
file name in the temporary rpmbuild workspace.

Remove the major, minor, and patch level variables, which were only used
to guess the name of the resulting source RPM file name.

Convert '-' characters to '_' in the package version and package
release, since the '-' character is reserved by rpm as a field
separator.

While here, add the --dir option to specify the path of the generated
source RPM, and change the 'srpm' makefile target to use the new --dir
option, instead of changing the current directory before running
makesrpm.pl. Also, add a dependency on the 'dist' makefile target,
since the the source and document tarballs are required to build the
source RPM.

Add pod documentation and add the --help (-h) option to print a brief
help message, and add the --man option to print the full man page.

With this change, we can build a source RPM even when the .version file
in the src.tar.bz file has a custom format or was created from a
checkout of the master branch or other non-release reference.

Change-Id: I7320afe6ac1f77d4dd38fcc194d41678fde5c950
Reviewed-on: https://gerrit.openafs.org/14116
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

Correct our contributor's code of conduct

There are no races. Racism does exist though.

Change-Id: I0a4cde55a5f470649eb99c5d7f30c9cec86d9baa
Reviewed-on: https://gerrit.openafs.org/14320
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

UKERNEL: Build linktest with COMMON_CFLAGS

Currently, 'linktest' in libuafs is built with a weird custom rule
that specifies several various CFLAGS and LDFLAGS, etc. One
side-effect of this is that linktest is built without specifying -O,
even if optimization is otherwise enabled.

Normally nobody would care about the optimization of linktest, since
it's never supposed to be run, but this can cause an error when
building with -D_FORTIFY_SOURCE=1 on some systems (such as RHEL7):

    In file included from /usr/include/sys/types.h:25:0,
                     from /.../src/config/afsconfig.h:1485,
                     from /.../src/libuafs/linktest.c:15:
    /usr/include/features.h:330:4: error: #warning _FORTIFY_SOURCE requires compiling with optimization (-O) [-Werror=cpp]
     #  warning _FORTIFY_SOURCE requires compiling with optimization (-O)
        ^
    cc1: all warnings being treated as errors
    make[3]: *** [linktest] Error 1

For now, to fix this just include $(COMMON_CFLAGS) in the flags we
give for linktest, so $(OPTMZ) also gets pulled in, and building
linktest gets a little closer to a normal compilation step.

Change-Id: I3362dcfe8407825ab88854ae59da4188ed16be9d
Reviewed-on: https://gerrit.openafs.org/14324
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

ptserver: Remove duplicate ubik_SetLock in listSuperGroups

It looks like a call to ubik_SetLock(.. LOCKREAD) was left in
place in listSuperGroups after locking was moved to ReadPreamble
in commit a6d64d70 (ptserver: Refactor per-call ubik initialisation)
When compiled with 'supergroups', and once contacted by
"pts mem -expandgroups ..", ptserver will therefore abort() with
Ubik: Internal Error: attempted to take lock twice
This patch removes the superfluous ubik_SetLock.

FIXES 135147

Change-Id: I8779710a6d68e4126fc482123b576690d86e4225
Reviewed-on: https://gerrit.openafs.org/14338
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

INSTALL: document the minimum Linux kernel level

The change associated with gerrit #14300 removed support for older
Linux kernels (2.6.10 and earlier).

The commit 'Import of code from autoconf-archive' (d8205bbb4) introduced
a check for Autoconf 2.64.  Autoconf 2.64 was released in 2009.

The commit 'regen.sh: Use libtoolize -i, and .gitignore generated
build-tools' (a7cc505d3) introduced a dependency on libtool's  '-i'
option.  Libtool supported the '-i' option with libtool 1.9b in 2004.

Update the INSTALL instructions to document a minimum Linux kernel
level and the minimum levels for autoconf and libtool.

Notes: RHEL4 (EOL in 2017) had a 2.6.9 kernel and RHEL5 has a 2.6.18
kernel. RHEL5 has libtool 1.5.22 and autoconf 2.59, RHEL6 has libtool
2.2.6 and autoconf 2.63, and RHEL7 has libtool 2.4.2 and autoconf 2.69.

Change-Id: I235eeffa4adb152e05aab7aca839700816e62c83
Reviewed-on: https://gerrit.openafs.org/14305
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: Avoid NatPing event on all connection

Inside release_conns_user_server, connection vector is traversed and after
destroying a connection new eligible connection is found on which NatPing
event will be set. Ideally there should be only one connection on which
NatPing should be set but in current code while traversing all connection
of server a NatPing event is set on all connections to that server. In
cases where we have large number of connection to a server this can lead
to huge number of “RX_PACKET_TYPE_VERSION” packets sent to a server.
Since this happen during Garbage collection of user structs, to simulate
this issue below steps were tried

  - had one script which “cd” to a volume mount and then script sleeps for
    large time.
  - Ran one infinite while loop where above script was called using PAG
    based tokens (As new connection will be created for each PAG)
  - Instrumented the code, so that we hit above code segment where NatPing
    event is set. Mainly reduced NOTOKTIMEOUT to 60 sec.

To fix this issue set NatPing on one connection and once it is set break
from “for” loop traversing the server connection.

Change-Id: Ia38cec0403fde76cdd59aa664bd261481e2edee6
Reviewed-on: https://gerrit.openafs.org/14312
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>

vos: avoid 'half-locked' volume after interrupted 'vos rename'

Reported symptoms:

If a 'vos rename' is interrupted after it has locked the volume and
replaced the VLDB entry, but before it has unlocked the volume, the
volume will remain locked.  However, the locked volume will NOT be
listed as locked in any vos commands that display locked status (see
below for details).

Background:

Most vos write operations lock the VLDB volume entry before proceeding,
then release the volume lock when finished.  This is accomplished via
VL_SetLock and VL_ReleaseLock, respectively.

VL_SetLock always sets these members in the VLDB volume entry:
- flags is modified to set the required VLOP_* code bit as specified
- LockAFSid is set to 0 (never implemented)
- LockTimestamp is set to the current time

VL_ReleaseLock always sets them as follows:
- flags is cleared of any VLOP_* code bit
- LockAFSid is set to 0 (never implemented)
- LockTimestamp is set to 0

VL_ReplaceEntry(N) may also optionally clear each of these members:
- flags operation bits may be explicitly cleared via LOCKREL_OPCODE
- LockAFSid may be explicitly cleared via LOCKREL_AFSID
- LockTimestamp may be explicitly cleared via LOCKREL_TIMESTAMP

When all 3 options are specified, VL_ReplaceEntry also does the
functional equivalent of a VL_ReleaseLock.  Most vos operations use this
method.  However, when no lock release options are specified on
VL_ReplaceEntry(N), the VLDB entry is simply replaced with the supplied
entry.  This includes whatever flags values are specified in the
supplied entry; therefore, this amounts to an additional, implicit way
to set or modify the flags.

Root cause:

'vos rename' (UV_RenameVolume) is the only vos operation that does all
of the following things:
- accepts a replacement volume entry that was obtained before VL_SetLock
  (and thus does NOT have any lock flags set)
- issues VL_SetLock (which sets the lock flag in the VLDB)
- issues VL_ReplaceEntry(N) with the original unlocked entry, and with
  no lock release options (thus with explicit intent to leave the lock
  flag unchanged, but inadvertently doing an implicit clear of the lock
  flag in the VLDB)
- (performs some additional volserver work)
- issues VL_ReleaseLock to release the volume lock

Therefore, if 'vos rename' is cancelled or killed before reaching the
final VL_ReleaseLock step, the VLDB entry is left with the lock flags
cleared but the LockTimestamp still set.  As we will see below, this
'half-locked' state produces confusing results from other vos commands.

Detection of locked state:

The 'vos lock' command (and all other vos commands that issue
VL_SetLock) use the lock timestamp to determine if a volume is locked.

However, several other vos commands ('vos listvldb <vol>', 'vos examine
<vol>', 'vos listvldb -locked') use the VLDB entry's lock flags (not the
lock timestamp) to determine if the volume is locked.  Therefore, if the
lock flags have been cleared but the lock timestamp is still set, these
commands fail to detect that the volume is still locked.  Yet an
administrator's 'vos lock <volume>' will still fail with:

  Could not lock VLDB entry for volume <volume>
  VLDB: vldb entry is already locked

This is the external manifestation of the 'half-locked' state.

Workaround and fix:

This scenario has a simple workaround: 'vos unlock <volume>'.  However,
to avoid this confusing outcome in the first place, modify the 'vos
rename' logic so that the lock flags are no longer inadvertently
cleared.  Now, if the 'vos rename' is interrupted before the volume is
unlocked, it will still appear locked in normal vos command output.

Change-Id: I6cc16d20c4487de4e9a866c6f0c89d950efd2f7d
Reviewed-on: https://gerrit.openafs.org/14157
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

rxgen: remove dead code hndle_param_tail

Since the original IBM code import, hndle_param_tail has been dead code.
It was later ifdef'd out in commit 8f2df21ffe59
'pull-prototypes-to-head-20020821'

Remove the dead code from the tree.

No functional change is incurred by this commit.

Change-Id: I29128eecc93a5871f5bb9369c3983baf5b537beb
Reviewed-on: https://gerrit.openafs.org/14322
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

bos: suppress unnecessary warn if -noauth

Commit d008089a7 (Add interface to select client security objects)
consolidated the code that selects the client security objects into a
set of new interfaces. Before this commit, the "bos: running
unauthenticated" message, which warns the user when an unauthenticated
connection is established, used to be suppressed if the -noauth flag was
specified.

Similarly to commit b3c16324e (ubik: Make ugen_ClientInit honor
noAuthFlag), recover the original behavior avoiding warn messages about
unauthenticated connections if the -noauth flag is provided.

Change-Id: Iaf0ac6bd91ea160256823512f060afc94b5926bf
Reviewed-on: https://gerrit.openafs.org/14306
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

vlserver: fix missing read-only entries from ListAttributesN2

The ListAttributesN2() RPC can fail to list read-only entries under
certain circumstances. This RPC is used by the `vos listvldb` command to
retrieve vldb entries (unless the -name option is given). The `vos
listvldb` command fails to list volume entries when run with the
'-server' option for volumes that have read-only replicas, but have not
been released.

Consider the following example volume:

    $ vos create fs1.example.com a test
    $ vos addsite fs1.example.com a test
    $ vos addsite fs2.example.com a test
    $ vos listvldb
    ...
    test
        RWrite: 536870921
        number of sites -> 3
           server fs1.example.com partition /vicepa RW Site
           server fs1.example.com partition /vicepa RO Site  -- Not released
           server fs2.example.com partition /vicepa RO Site  -- Not released

`vos listvldb` fails to find the volume when the search is limited to
server 'fs2':

    $ vos listvldb -server fs2.example.com
    VLDB entries for server fs2.example.com
    Total entries: 0

Instead of the expected results:

    $ vos listvldb -server fs2.example.com
    test
        RWrite: 536870921
        number of sites -> 3
           server fs1.example.com partition /vicepa RW Site
           server fs1.example.com partition /vicepa RO Site  -- Not released
           server fs2.example.com partition /vicepa RO Site  -- Not released

This situation makes it difficult to remove old server addresses from
the vldb.  In this situation, 'vos remaddrs' and 'vos changeaddr
-remove' commands will complain the server addresses are still in use by
volume entries, however running 'vos listvldb -server' will not show
which volumes entries are in use.

The entries are not listed for unreleased volumes because the
ListAttributesN2() RPC is currently checking the volume VLF_ROEXISTS
flag, instead of the server site flags (serverFlags) to determine when
the entry is a read-only site. The volume VLF_ROEXISTS flag is set when
a volume is released.

To fix this, make ListAttributesN2 check for the VLSF_ROVOL site flag,
instead of the VLF_ROEXISTS entry flag.

Change-Id: Ib636fbe016d1d2f5b117624d9930dba83ebcef8a
Reviewed-on: https://gerrit.openafs.org/14154
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

LINUX 5.9: Remove HAVE_UNLOCKED_IOCTL/COMPAT_IOCTL

Linux-5.9-rc1 commit 'fs: remove the HAVE_UNLOCKED_IOCTL and
HAVE_COMPAT_IOCTL defines' (4e24566a) removed the two referenced macros
from the kernel.

The support for unlocked_ioctl and compat_ioctl were introduced in
Linux 2.6.11.

Remove references to HAVE_UNLOCKED_IOCTL and HAVE_COMPAT_IOCTL using
the assumption that they were always defined.

Notes:

With this change, building against kernels 2.6.10 and older will fail.
RHEL4 (EOL in March 2017) used a 2.6.9 kernel. RHEL5 uses a 2.6.18
kernel.

In linux-2.6.33-rc1 the commit messages for "staging: comedi:
Remove check for HAVE_UNLOCKED_IOCTL" (00a1855c) and "Staging: comedi:
remove check for HAVE_COMPAT_IOCTL" (5d7ae225) both state that all new
kernels have support for unlocked_ioctl/compat_ioctl so the checks can
be removed along with removing support for older kernels.

Change-Id: Idd2716f3573ea455f8a5e1535bca584af0787717
Reviewed-on: https://gerrit.openafs.org/14300
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>

vos: avoid CreateVolume when restoring over an existing volume

Currently, the UV_RestoreVolume2 function always attempts to create a
new volume, even when doing a incremental restore over an existing
volume.  When the volume already exists, the volume creation operation
fails on the volume server with a VVOLEXISTS error. The client will then
attempt to obtain a transaction on the existing volume. If a transaction
is obtained, the incremental restore operation will proceed. If a full
restore is being done, the existing volume is removed and a new empty
volume is created.

Unfortunately, the failed volume creation is logged to by the volume
server, and so litters the log file with:

    Volser: CreateVolume: Unable to create the volume; aborted, error code 104

To avoid polluting the volume server log with these messages, reverse
the logic in UV_RestoreVolume2. Assume the volume already exists and try
to get the transaction first when doing an incremental restore. Create a
new volume if the transaction cannot be obtained because the volume is
not present.  When doing a full restore, remove the existing volume, if
one exists, and then create a new empty volume.

Change-Id: I8bdc13130d12c81cd2cd18a9484852708cac64d7
Reviewed-on: https://gerrit.openafs.org/14208
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Marcio Brito Barbosa <marciobritobarbosa@gmail.com>
Tested-by: Marcio Brito Barbosa <marciobritobarbosa@gmail.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

tests: Accommodate c-tap-harness 4.7

The SOURCE and BUILD environment variables have been changed to
C_TAP_SOURCE and C_TAP_BUILD in the new version of c-tap-harness. The
runtests command syntax has changed as well.

Convert all of the old SOURCE and BUILD environment variables to the new
C_TAP_SOURCE and C_TAP_BUILD names.

Add the required -l command line option to specify the test list.

Add the new runtests -v option to run the tests in verbose mode to make
it easier to see which tests failed.

Change-Id: I209a6dc13d6cd1507519234fce1564fc4641e70b
Reviewed-on: https://gerrit.openafs.org/14295
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

Import of code from c-tap-harness

This commit updates the code imported from c-tap-harness to
abdb66561ffd4d2f238fdb06f448ccf09d80c059 (release/4.7)

Upstream changes are:

Daniel Collins (1):
      Add is_blob() test function.

Daniel Kahn Gillmor (1):
      LICENSE: use https for all URLs

Daria Brashear (1):
      Add verbose mode environment variable to runtests

Julien ÉLIE (2):
      Document -v in usage and comments of runtests
      Avoid realloc of zero length in tests/runtests.c

Marc Dionne (1):
      Add test_cleanup_register_with_data

Russ Allbery (115):
      clang --analyze cleanups for runtests
      Modernize POD tests
      Update README to my current layout
      Explicitly note that test programs must be executable
      Fix comment typo in tests/runtests.c
      Switch to a copyright-format 1.0 LICENSE file
      Flush harness output after each line
      Show the test count as ? when the plan is deferred
      More correctly backspace over test counts when aborting
      Refactor test list handling
      Allow passing tests on the runtests command line
      Don't allow command-line arguments if a list was given
      Search for tests under the name given as well
      Release 2.0
      Fix backward incompatibility when searching for tests
      Document decision to ignore TAP version directives
      Release 2.1
      Document different runtests behavior in bail handling
      Change exit status of bail to 255
      Release 2.2
      Add a new test_cleanup_register C API
      Add warn_unused_result attributes
      Add portability for warn_unsed_result attributes to tap/macros.h
      Minor coding style fix (spacing) in runtests.c
      Split the runtests usage string for ISO C90 string limits
      Include stddef.h
      Diagnose failure to register the exit handler
      Use diag internally in the basic C TAP library
      Some additional comments about cleanup functions
      Move repetitive printing code in the C TAP library to a macro
      Set a flag when bailing for more correct cleanup
      Change my email address to eagle@eyrie.org
      Release 2.3
      Add diag_file_add and diag_file_remove functions
      Don't die for unknown files passed to diag_file_remove
      Release 2.4
      Update comment about AIX and WCOREDUMP
      Don't test for NULL before calling free
      Be more careful about file descriptors in child processes
      Run cleanup functions in non-primary processes as well
      Release 3.0
      Update collective package copyright notices at start of LICENSE
      Check integer overflows on memory allocation, fix string creation
      Switch POD spelling test to use Lancaster consensus variable
      Add new bnrealloc API for brealloc with checked multiplication
      Rename nrealloc to reallocarray
      Return the test status from test functions
      Fix the overflow check for breallocarray
      Fix the overflow check for xreallocarray in runtests
      Restructure test result reallocation in runtests
      Change diag and sysdiag to always return true
      Release 3.1
      Fix typos in basic.c and basic.h
      Fix usage message when running runtests with no arguments
      Update introductory runtests comments for current syntax
      Add the -l flag to suggested runtests invocation in README
      Support comments and blank lines in test lists
      Release 3.2
      Update licensing information
      Various improvements to verbose support
      Compile warning-free with Clang, check Autoconf macros
      Release 3.3
      Remove unnecessary assert.h include in tap/basic.c
      Fix some additional -v documentation issues
      Rebalance usage to avoid too-long strings
      Fix segfault in runtests with empty test list
      Release 3.4
      Document running autogen if starting from Git
      Rename autogen to bootstrap
      Support and prefer C_TAP_SOURCE and C_TAP_BUILD
      Fix comment typo in tests/runtests.c
      Add missing va_end to is_double
      Release 4.0
      Fix all non-https www.eyrie.org URLs
      Add is_bool C test function
      Add DocKnot metadata and a Markdown README file
      Update documentation for new DocKnot standards
      Release 4.1
      Use more defaults from DocKnot templates
      Fix new fall-through warning in GCC 7
      Use compiler warnings from rra-c-util, fix issues
      Merge pull request #4 from solemnwarning/master
      Coding style fixes and NEWS for is_blob
      Re-enable -Wunknown-pragmas for GCC
      Avoid zero-length realloc allocations in breallocarray
      Update copyright date on tests/runtests.c
      Release 4.2
      Add SPDX-License-Identifier headers to source files
      Add and run new check-cppcheck target
      Fix instructions for running one test
      Identify values as left and right
      Fix is_string comparisons with NULL pointers
      Add support for running tests under valgrind
      Replace putc with fprintf
      Update shared files from rra-c-util
      Release 4.3
      Update NEWS date for 4.3 release
      Collapse some copyright dates
      NEWS and coding style for test_cleanup_register_with_data
      Remove unused variables caught by Clang scan-build
      Update to rra-c-util 8.0
      Fix error checking in bstrndup
      Release 4.4
      Add support for C++
      Document that C TAP Harness can be built as C++
      Release 4.5
      Regenerate README files
      Reformat using clang-format 10
      Update to rra-c-util 8.1
      Release 4.6
      Fix spelling errors caught by codespell
      Protect the test suite against C_TAP_VERBOSE
      Switch to GitHub Actions for CI
      Add NEWS entry for GCC 10 warning fixes
      Release 4.7

Change-Id: I5a78215bf99b53bd848f0fa6bb9092deab38f24e
Reviewed-on: https://gerrit.openafs.org/14294
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: Always define our own osi_timeval32_t

Since OpenAFS 1.0, osi_GetTime has taken a timeval-like pointer, which
contains 32-bit fields (the actual type has been called either
osi_timeval_t or osi_timeval32_t over time). For platforms that have a
native timeval-like type with 32-bit fields, we just define
osi_timeval32_t to that type, and elsewhere we define our own struct
to be osi_timeval32_t. For platforms that use the native timeval, we
can then define osi_GetTime() to just be, e.g., microtime().

This approach is difficult to maintain, though, because we must keep
track of whether 'struct timeval' contains 32-bit fields on each
platform, which can depend on many factors. It's easy to make mistakes
(the current tree already contains mistakes), and there's not much
benefit.

To avoid all of this, just always define osi_timeval32_t to be our own
struct with afs_int32 fields, and provide definitions for osi_GetTime
that convert from the native time struct to our osi_timeval32_t. This
does mean that for some platforms we do an unnecessary type
conversion, but this is a small price to pay for more straightforward
and maintainable code.

To be a little more sure that our types are correct, change
osi_GetTime to be defined as an inline function instead of a macro.

At the same time, do a similar conversion for the KERNEL
implementation of the rx clock_GetTime function. Get rid of
platform-specific mess, and do a straightforward type conversion
between osi_timeval32_t and struct clock in an inline function.

Change-Id: I18819acb556a2a7f1b6da6994db9783c48108934
Reviewed-on: https://gerrit.openafs.org/14238
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

afs: Move osi_GetTime out of param.h

Most platforms currently #define osi_GetTime in their param.h. This is
really redundant, since the definition of osi_GetTime almost never
changes for a given platform, so we end up with many copies of the
same osi_GetTime definition for a given platform.

Move osi_GetTime out of param.h for these platforms, and define it in
osi_machdep.h instead, which is where most platform-specific
definitions go.

For DFBSD, we don't have an osi_machdep.h at all yet, so create a new
one to contain the osi_GetTime definition. Currently we don't build
libafs at all on DFBSD, but do this anyway so we don't lose the
existing osi_GetTime definition.

For NBSD, we were providing (conflicting!) definitions for osi_GetTime
in param.h and in osi_machdep.h. Just remove the definitions in
param.h, since those should have been getting overridden by the
osi_machdep.h definition.

Change-Id: I7097d9fe2fcd38c06ecc275e8fe3a2c69c9d0436
Reviewed-on: https://gerrit.openafs.org/14237
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: Avoid using logical OR when setting f_fsid

Building with clang-10 produces the warning/error message
    warning: converting the result of '<<' to a boolean always evaluates
    to true [-Wtautological-constant-compare]
for the expression
    abp->f_fsid = (AFS_VFSMAGIC << 16) || AFS_VFSFSID;

The message is because a logical OR '||' is used instead of a bitwise
OR '|'.  The result of this expression will always set the f_fsid
member to a 1 and not the intended value of AFS_VFSMAGIC combined with
AFS_VFSFSID.

Update the expression to use a bitwise OR instead of the logical OR.

Note: This will change value stored in the f_fsid that is returned from
statfs.

Using a logical OR has existed since OpenAFS 1.0 for hpux/solaris and in
UKERNEL since OpenAFS 1.5 with the commit 'UKERNEL: add uafs_statvfs'
b822971a.

Change-Id: I3e85ba48058ac68e3e3ac7f277623f660187926c
Reviewed-on: https://gerrit.openafs.org/14292
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: Set AFS_VFSFSID to a numerical value

Currently when UKERNEL is defined, AFS_VFSFSID is always set to
AFS_MOUNT_AFS, which is a string for many platforms for UKERNEL.

Update src/afs/afs.h to insure that the define for AFS_VFSFSID is a
numeric value when building UKERNEL.

Clean up the preprocessor indentation in src/afs/afs.h in the area
around the AFS_VFSFSID defines.

Thanks to adeason@sinenomine.net for pointing out a much easier solution
for resolving this problem.

Change-Id: I618fc4c89029a6cca2ca6f530b8f65399299a9d1
Reviewed-on: https://gerrit.openafs.org/14279
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

clang-10: ignore fallthrough warning in generated code

Clang-10 will not recognize '/* fall through */' as an indicator to
turn off the fallthrough warning due to the lack of a 'break' in a case
statement.

Code generated by flex uses the '/* fall through */' comments to turn
off compiler warnings for fallthroughs in case statements.

For code generated by flex, ignore the implicit-fallthrough via pragma
or disable the warning via a compile time flag.

Add new env variable "CFLAGS_NOIMPLICIT_FALLTHROUGH" to selectively
disable the compile check in Makefiles when checking is enabled.

Change-Id: I4c054defda03daa2aeb645ae2271dfa0cb54925f
Reviewed-on: https://gerrit.openafs.org/14275
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

clang-10: use AFS_FALLTHROUGH for case fallthrough

Clang-10 will not recognize '/* fallthrough */' as an indicator to
turn off the fallthrough diagnostic due to the lack of a 'break' in a
case statement.  Clang-10 requires the '__attribute__((fallthrough))'
statement to disable the diagnostic.

In addition clang-10 is finding additional locations where fall throughs
occur.

Determine if the compiler supports '__attribute__((fallthrough))' to
disable the implicit fallthrough diagnostic.

Define a new macro 'AFS_FALLTHROUGH' that will disable the fallthrough
diagnostic. Set it as a wrapper for the Linux kernel's 'fallthrough'
macro if available, otherwise set it as a wrapper macro for
'__attribute__((fallthrough))' if the compiler supports it.

Update CODING to document the use of AFS_FALLTHROUGH when needing to
fallthrough between case statements.

Replace the '/* fallthrough */' comments with AFS_FALLTHROUGH, and add
AFS_FALLTHROUGH as needed.

Replace some fallthroughs with a break (or goto) if the flow was was
just to a break (or goto).

e.g.   case x:                 case x:
           somestmt;               somestmt;
                                   break;
       case y:                 case y:
           break;                  break;

Correct a mis-indented brace '}' in src/WINNT/afsd/smb3.c

Note, the clang maintainers have rejected the use of comments as a flag
to turn off the fall through warnings.

Change-Id: Ia5da10fc14fc1874baca035a3cf471e618e0d5f5
Reviewed-on: https://gerrit.openafs.org/14274
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

redhat: Add make to the dkms-openafs pre-requirements

If `make` is not installed before dkms-openafs, the OpenAFS kernel
module is not built during the dkms-openafs package installation.

The failure happens in the "checking if linux kernel module build works"
configure step, which invokes `make` to check the linux buildsystem.
configure fails when `make` is not available, and gives the unhelpful
suggestion (in this case) of configuring with --disable-kernel module.

Running the configure.log in the dkms build directory shows:

    configure:7739: checking if linux kernel module build works
    make -C /lib/modules/4.18.0-193.6.3.el8_2.x86_64/build M=/var/lib/dkms/openafs/...
    ./configure: line 7771: make: command not found
    configure: failed using Makefile:

Avoid this build failure by adding `make` to the list of dkms-openafs
package pre-requirements.

Change-Id: I98b3508341eea1df4fa7b6f43e88add1bda9ee2c
Reviewed-on: https://gerrit.openafs.org/14266
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

vol: Blank opts in VOptDefaults

Instead of needing to set every single field in the 'opts' structure
individually, blank the whole thing to make sure the entire struct is
initialized. Remove the now-redundant lines that initialize various
items to 0.

Change-Id: I799cdb55becd66a8f3d6ec2f81338843038d0abd
Reviewed-on: https://gerrit.openafs.org/14280
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Kailas Zadbuke <kailashsz@in.ibm.com>
Reviewed-by: Yadavendra Yadav <yadayada@in.ibm.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

volser: Don't NUL-pad failed pread()s in dumps

Currently, the volserver SAFSVolDump RPC and the 'voldump' utility
handle short reads from pread() for vnode payloads by padding the
missing data with NUL bytes. That is, if we request 4k of data for our
pread() call, and we only get back 1k of data, we'll write 1k of data
to the volume dump stream followed by 3k of NUL bytes, and log
messages like this:

    1 Volser: DumpFile: Error reading inode 1234 for vnode 5678
    1 Volser: DumpFile: Null padding file: 3072 bytes at offset 40960

This can happen if we hit EOF on the underlying file sooner than
expected, or if the OS just responds with fewer bytes than requested
for any reason.

The same code path tries to do the same NUL-padding if pread() returns
an error (for example, EIO), padding the entire e.g. 4k block with
NULs. However, in this case, the "padding" code often doesn't work as
intended, because we compare 'n' (set to -1) with 'howMany' (set to 4k
in this example), like so:

    if (n < howMany)

Here, 'n' is signed (ssize_t), and 'howMany' is unsigned (size_t), and
so compilers will promote 'n' to the unsigned type, causing this
conditional to fail when n is -1. As a result, all of the relevant log
messages are skipped, and the data in the dumpstream gets corrupted
(we skip a block of data, and our 'howFar' offset goes back by 1). So
this can result in rare silent data corruption in volume dumps, which
can occur during volume releases, moves, etc.

To fix all of this, remove this bizarre NUL-padding behavior in the
volserver. Instead:

- For actual errors from pread(), return an error, like we do for I/O
  errors in most other code paths.

- For short reads, just write out the amount of data we actually read,
  and keep going.

- For premature EOF, treat it like a pread() error, but log a slightly
  different message.

For the 'voldump' utility, the padding behavior can make sense if a
user is trying to recover volume data offline in a disaster recovery
scenario. So for voldump, add a new switch (-pad-errors) to enable the
padding behavior, but change the default behavior to bail out on
errors.

Change-Id: Ibd6e76c5ea0dea95e3354d9b34536296f81b4f67
Reviewed-on: https://gerrit.openafs.org/14255
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

butc: fix int to float conversion warning

Building with clang-10 results in 2 warnings/errors associated with
with trying to convert 0x7fffffff to a floating point value.

    tcmain.c:240:18: error: implicit conversion from 'int' to 'float'
                 changes value from 2147483647 to 2147483648 [-Werror,
                 -Wimplicit-int-float-conversion]
    if ((total > 0x7fffffff) || (total < 0))    /* Don't go over 2G */

and the same conversion warning on the statement on the following line:
    total = 0x7fffffff;

Use floating point and decimal constants instead of the hex constants.

For the test, use 2147483648.0 which is cleanly represented by a float.
Change the comparison in the test from '>' to '>='.

If the total value exceeds 2G, just assign the max value directly to the
return variable.

Change-Id: I79b2afa006496a756bd7b50976050c24827aa027
Reviewed-on: https://gerrit.openafs.org/14277
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

autoconf: fix detection for fallthrough attribute

Due to bug <https://savannah.gnu.org/patch/?9949>,
ax_gcc_func_attribute.m4 fails to properly detect __attribute__((fallthrough))
in clang. Until this is fixed in autoconf-archive upstream, fix our
local copy of ax_gcc_func_attribute.m4, so we can detect
__attribute__((fallthrough)) to make --enable-checking work with clang.

Change-Id: I80a4557384f8e1438344e48bfe722e20c8773882
Reviewed-on: https://gerrit.openafs.org/14273
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

cf: Make local copy of ax_gcc_func_attribute.m4

Make a local copy of ax_gcc_func_attribute from autoconf-archive. This
is needed in order to fix a bug in the detection of the fallthrough
attribute.

Remove ax_gcc_func_attribute.m4 from src/external/autoconf-archive/m4.
Update LICENSE file to point to the local copy in src/cf.

Change-Id: I6c4244d2cd4edab4262c1820435c00419d85303b
Reviewed-on: https://gerrit.openafs.org/14272
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

rx: prevent leakage of non-cached rx_connections (pthread)

The rxi_connectionCache (AFS_PTHREAD_ENV only) allows applications to
reuse rx_connection structs.  Cached rx_connections are obtained via
rx_GetCachedConnection and released via rx_ReleaseCachedConnection.
This feature is used most heavily by libadmin and kauth, but there are
other users in the tree as well.

For instance, ubikclient routines ubik_ClientInit and ubik_ClientDestroy
call rx_ReleaseCachedConnections (if AFS_PTHREAD_ENV) when disposing of
their rx_connections.  Unfortunately, in many cases these rx_connections
were obtained via rx_NewConnection, _not_ from the cache via
rx_GetCachedConnection.  In those cases, rx_ReleaseCachedConnection will
not find the rx_connection in the rxi_connectionCache, and thus it
returns without doing anything.

Therefore, when ubik_ClientInit is passed an existing ubik_client (for
re-initialization) that contains rx_connections NOT allocated via
rx_GetCachedConnection, those connections are not destroyed, but will be
silently leaked.  Similarly, ubik_ClientDestroy will leak its
rx_connections when it frees the ubik_client struct.

For example, the fileserver host package calls ubik_ClientInit (via
hpr_Initialize) and ubik_ClientDestroy (via hpr_End) to manage
connections to the ptserver.  However, these connections were obtained
via rx_NewConnection, not rx_GetCachedConnection.  If the fileserver has
a failed call to the ptserver that sets prfail=1, the next RPC scheduled
for that client (in CallPreamble) will refresh the thread's ubik_client
(viced_uclient_key) by calling hprEnd -> ubik_ClientDestroy ->
rx_ReleaseCachedConnection.  The "released" connections will be leaked.

This problem exists in all versions of OpenAFS going back to IBM 1.0.
Starting with 1.8.x, many components that were formerly LWP-only are now
pthreaded and thus susceptible to this leak.

It seems difficult and error-prone to identify all possible code paths
that may pass a non-cached rx_connection to rx_ReleaseCachedConnection,
and convert them to obtain connections via rx_GetCachedConnection.

Instead, prevent all existing and future leaks by modifying the connection
cache to:
- flag all rx_connections it allocates
- correctly release any rx_connection it is passed, whether they came
  from the cache or not.

Change-Id: Ibe164ccd30a8ddd799438c28fd6e1d8a0a9040dd
Reviewed-on: https://gerrit.openafs.org/13042
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>

rx: fix out-of-range value for RX_CONN_NAT_PING

Commit 496fb87372555f6acddd4fd88b03c94c85f48511 ("rx: avoid nat ping
until connection is attached") introduced functionality to defer turning
on NAT ping for server connections until after reachability had been
established for the client.

Unfortunately, this feature could never work correctly because it
assigned an out-of-range flag value of 256 (0x100) for the u_char flags
field. Instead of calling this out as an error, both gcc and Solaris cc
elide this flag so that it is never set in
rx_SetConnSecondsUntilNatPing(), Furthermore, the test in
rxi_ConnClearAttachWait() will always fail; therefore
rxi_ScheduleNatKeepAliveEvent is never called after attach wait has
ended.

Fortunately, this bug is currently moot - not actually exposed in
OpenAFS. (It was discovered by inspection). This is because there are
currently no rx_connection objects in the tree that have both NAT ping
and checkReach (rx_SetCheckReach) enabled. I also searched git history
and found no time when this bug could ever have been exposed. This does
raise the question of why the original commit was needed; but instead of
reverting the original commit, this commit attempts to fix it.

To prevent problems if NAT ping and checkReach are ever both enabled for
an rx_connection, enlarge the rx_connection flags member so that the
RX_CONN_NAT_PING value is no longer out of range.

Change-Id: Ib667ece632f66fa5c63a76398acb3153fed6f9c3
Reviewed-on: https://gerrit.openafs.org/13041
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

auth: Avoid cellconfig.c stdio renaming

Since commit 35777145 (solaris-fopen-sucks-20060916), cellconfig.c has
redirected fopen, fclose, and fgets to local functions on
non-64bit-sparc Solaris, in order to work around that platform's stdio
limitations.

Commit 7c431f7571 (auth: retire writeconfig.c) moved the contents of
writeconfig.c into cellconfig.c. The previous writeconfig.c contained
some calls to stdio, including calling fprintf() on a pointer returned
by fopen() in that file.

Because fopen() was redirected to our local version, this means that
afsconf_SetExtendedCellInfo() calls fopen() to get an
afsconf_iobuffer*, and passes that pointer to the real system
fprintf() later on (instead of a native FILE*). The compiler does warn
about this, but this only happens on Solaris, where --enable-checking
is not implemented, so the build never fails.

To avoid this, remove the #defines for fopen, fgets, and fclose.
Instead, change all of the old cellconfig.c callers to explicitly call
afsconf_fopen, afsconf_fgets, and afsconf_fclose. On the affected
Solaris platforms, we keep our local definitions, and for other
platforms, we just make those functions call their system stdio
equivalents. For the code that was pulled in from writeconfig.c,
callers will just call the system fopen, fprintf, and fclose.

We still keep our local afsconf_FILE* definition on all platforms, so
the compiler will still do typechecking for our local afsconf_f*
functions on all platforms. So now if we make a mistake, it should be
a mistake on all platforms, so platforms with --enable-checking should
flag the error.

Change-Id: I4064d7f5ee82d5acab04a33b01c0603564a391e8
Reviewed-on: https://gerrit.openafs.org/14214
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Tested-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: Let afs_ShakeLooseVCaches run longer

Currently, when afs_ShakeLooseVCaches runs osi_TryEvictVCache, we
check if osi_TryEvictVCache slept (i.e. dropped afs_xvcache/GLOCK). If
we sleep over 100 times, then we stop trying to evict vcaches and
return.

If we have recently accessed a lot of AFS files, this limitation can
severely reduce our ability to keep our number of vcaches limited to a
reasonable size. For example:

Say a Linux client runs a process that quickly accesses 1 million
files (a simple 'find' command) and then does nothing else. A few
minutes later, afs_ShakeLooseVCaches is run, but since all of the
newly accessed vcaches have dentries attached to them, we will sleep
on each one in order to try to prune the attached dentries. This means
that afs_ShakeLooseVCaches will evict 100 vcaches, and then return,
leaving us with still almost 1 million vcaches. This will happen
repeatedly until afs_ShakeLooseVCaches finally works its way through
all of the vcaches (which takes quite a while, if we only clear 100 at
once), or the dentries get pruned by other means (such as, if Linux
evicts them due to memory pressure).

The limit of 100 sleeps was originally added in commit 29277d96
(newvcache-dont-spin-20060128), but the current effect of it was
largely introduced in commit 9be76c0d (Refactor afs_NewVCache). It
exists to ensure that afs_ShakeLooseVCaches doesn't take forever to
run, but the limit of 100 sleeps may seem quite low, especially if
those 100 sleeps run very quickly.

To avoid the situation described above, instead of limiting
afs_ShakeLooseVCaches based on a fixed number of sleeps, limit it
based on how long we've been running, and set an arbitrary limit of
roughly 3 seconds. Only check how long we've been running after 100
sleeps like before, so we're not constantly checking the time while
running.

Log a new warning if we exit afs_ShakeLooseVCaches prematurely if
we've been running for too long, to help indicate what is going on.

Change-Id: I65729ace748e8507cc0d5c26dec39e74d7bff5d2
Reviewed-on: https://gerrit.openafs.org/14254
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: Skip bulkstat if stat cache looks full

Currently, afs_lookup() will try to prefetch dir entries for normal
dirs via bulkstat whenever multiple pids are reading that dir.
However, if we already have a lot of vcaches, ShakeLooseVCaches may be
struggling to limit the vcaches we already have. Entering
afs_DoBulkStat can make this worse, since we grab afs_xvcache
repeatedly, we may kick out other vcaches, and we'll possibly create
30 new vcaches that may not even be used before they're evicted.

To try to avoid this, skip running afs_DoBulkStat if it looks like the
stat cache is really full.

Change-Id: I1634530170a189f32cb962dd7df28f88bc758b71
Reviewed-on: https://gerrit.openafs.org/13256
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

afs: Log warning when we detect too many vcaches

Currently, afs_ShakeLooseVCaches has a kind of warning that is logged
when we fail to free up any vcaches. This information can be useful to
know, since it may be a sign that users are trying to access way more
files than our configured vcache limit, hindering performance as we
constantly try to evict and re-create vcaches for files.

However, the current warning is not clear at all to non-expert users,
and it can only occur for non-dynamic vcaches (which is uncommon these
days).

To improve this, try to make a general determination if it looks like
the stat cache is "stressed", and log a message if so after
afs_ShakeLooseVCaches runs (for all platforms, regardless of dynamic
vcaches). Also try to make the message a little more user-friendly,
and only log it (at most) once per 4 hours.

Determining whether the stat cache looks stressed or not is difficult
and arguably subjective (especially for dynamic vcaches). This commit
draws a few arbitrary lines in the sand to make the decision, so at
least something will be logged in the cases where users are constantly
accessing way more files than our configured vcache limit.

Change-Id: I022478dc8abb7fdef24ccc06d477b349cca759ac
Reviewed-on: https://gerrit.openafs.org/13255
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>

viced: propagate return from CleanupTimedOutCallBacks_r

The fileserver's FiveMinuteCheckLWP periodically calls
CleanupTimedOutCallBacks, and logs an informational messages if the
return code indicates that any callbacks were discarded.

However, since the original IBM code import, CleanupTimedOutCallBacks
has 1) ignored the return value from CleanupTimedOutCallBacks_r and 2)
unconditionally returned 0. This makes the informational message
essentially dead code.

Instead, check the code from CleanupTimedOutCallBacks_r and pass it back
to the caller.

Change-Id: I631831c398e43431b79f4a3a0c6f01307ac0c05e
Reviewed-on: https://gerrit.openafs.org/14256
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>