openafs.git
2 years agoMake OpenAFS 1.8.8pre1 64/14264/4 openafs-stable-1_8_8pre1
Stephan Wiesand [Tue, 30 Jun 2020 20:53:33 +0000]
Make OpenAFS 1.8.8pre1

Update version strings for the first 1.8.8 prerelease.

Change-Id: Ia7468e6ae5ec93a81e13dda55842ec57135c2a03
Reviewed-on: https://gerrit.openafs.org/14264
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agoUpdate NEWS for 1.8.8pre1 40/14540/5
Stephan Wiesand [Fri, 19 Feb 2021 20:51:15 +0000]
Update NEWS for 1.8.8pre1

Release notes for the first 1.8.8 prerelease.

Change-Id: I04762b28b3cc5528f31c2b5d8f1d7f906e46f62f
Reviewed-on: https://gerrit.openafs.org/14540
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agoafscp: Link against opr/roken/hcrypto 27/14627/2
Andrew Deason [Thu, 27 May 2021 17:02:01 +0000]
afscp: Link against opr/roken/hcrypto

Link afscp against libopr, libroken, and libafshcrypto, so afscp can
be built again.

Reviewed-on: https://gerrit.openafs.org/13656
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 4eeed830fa31b7b8b5487ba619acbc8d30642aaa)

Change-Id: I73264df34743dcb6a8f6232267892ee602a76053
Reviewed-on: https://gerrit.openafs.org/14627
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Tested-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agoafs: free the Buffers array correctly during shutdown 16/14616/2
Mark Vitale [Fri, 29 Jan 2016 06:30:47 +0000]
afs: free the Buffers array correctly during shutdown

DInit() allocates 'Buffers' with afs_max_buffers = 4*nbuffers
worth of buffer structs to allow for run-time expansion.

But shutdown_bufferpackage() free 'Buffers' as if it only had
nbuffers worth of buffer structs.

Correct the size of Buffers passed to afs_osi_Free().

Discovered during Solaris shutdown testing with kmem_flags=x0f.
This bug is not Solaris-specific, but it may be symptomless on other
platforms.

Introduced by commit e7c966354c428a5a5929a3db6a829ee71c8ba2fc 'Flexible
client buffer growth'; this only affected cold shutdowns (afsd
-shutdown).

After commit 2336164d1bf63980419d3a870f908f1f384fdfc0 'afs: Actually
free resources during warm shutdown', this bug also affects warm
shutdowns (the default when /afs is unmounted).

Reviewed-on: https://gerrit.openafs.org/12183
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
(cherry picked from commit d1c944ec167b8845d703a6b6a8d9492751056b98)

Change-Id: I612b33a3788d2c9a0c39c5cb22a3473f8e1c01e1
Reviewed-on: https://gerrit.openafs.org/14616
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agoafs: Actually free resources during warm shutdown 15/14615/2
Andrew Deason [Sun, 21 Jul 2019 22:02:34 +0000]
afs: Actually free resources during warm shutdown

Currently, the shutdown_*() code paths for several subsystems only
free the memory for that subsystem for "cold" shutdowns, and not for
"warm" shutdowns. This means the memory gets leaked during a "warm"
shutdown, since we never free these resources anywhere else.
Specifically, this happens in shutdown_bufferpackage, shutdown_AFS,
and shutdown_osinet.

To avoid these leaks for warm shutdowns, just move the
afs_cold_shutdown check around a little, so we free the relevant items
in either codepath.

Reviewed-on: https://gerrit.openafs.org/13716
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 2336164d1bf63980419d3a870f908f1f384fdfc0)

Change-Id: I1d2360ea777b7a7488e599b6e707c98295d8fbdd
Reviewed-on: https://gerrit.openafs.org/14615
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agovol: ensure ih package defaults are set for salvage 14/14614/2
Mark Vitale [Thu, 30 Jul 2020 20:42:19 +0000]
vol: ensure ih package defaults are set for salvage

Like most OpenAFS components that work with ihandles, salvager relies on
implicit invocation of ih_PkgDefaults via the one-shot in the first call
to IH_INIT.

Unfortunately, there is at least one reachable code path in salvager
that asserts (panics) because vol_io_params has not yet been
initialized.  This is when salvaging a volume group that does not have a
link table; the salvager then panics while attempting to create a new
link table:

SalvageFileSys -> SalvageFileSys1 -> DoSalvageVolumeGroup ->
CreateLinkTable -> IH_CREATE -> namei_icreate -> icreate ->
namei_SetLinkCount -> FDH_SYNC -> ih_fdsync -> osi_Assert(0)

This path was discovered while testing the non-dafs salvager, but it has
also been observed in the field with the DAFS salvageserver.  It is
possible that there are additional undiscovered paths where
vol_io_params are required but uninitialized.

Add an implicit ih_PkgDefaults call to icreate to avoid triggering the
assert via the code path above.

Reviewed-on: https://gerrit.openafs.org/14378
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 9d65bcf8833a826a74fc433777599380cd5b03b9)

Change-Id: I8c7fb5acbaf2d84b290ce95e11a7baf0421b920d
Reviewed-on: https://gerrit.openafs.org/14614
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agovol: move ih_PkgDefaultsSet check inside ih_PkgDefaults 13/14613/2
Mark Vitale [Fri, 9 Oct 2020 20:28:15 +0000]
vol: move ih_PkgDefaultsSet check inside ih_PkgDefaults

Instead of repeating the oneshot check in each caller of ih_PkgDefaults,
move the oneshot check into ih_PkgDefaults itself.

While here, ensure that ih_PkgDefaults does its work under IH_LOCK.

Finally, make ih_PkgDefaultsSet a local static variable (no longer
exported).

Reviewed-on: https://gerrit.openafs.org/14383
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 896524963ca1f1d8144a668dafefc8ce88ad440b)

Change-Id: I179640df6e0a5fd6b9a97b57cfde6377213e1d14
Reviewed-on: https://gerrit.openafs.org/14613
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agoafsd: remove unused variable afs_shutdown 12/14612/2
Mark Vitale [Thu, 1 Oct 2020 21:13:51 +0000]
afsd: remove unused variable afs_shutdown

Since the original IBM code import, the variable afs_shutdown has been
set but never read.

Remove it from the code base.

No functional change is incurred by this commit.

Reviewed-on: https://gerrit.openafs.org/14380
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 0761bb7c58260985fbbfcf477d597da3c5d64fc5)

Change-Id: I35d657fb93b9e6f611d91d5a374899249cec5b88
Reviewed-on: https://gerrit.openafs.org/14612
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agoafs: remove duplicate declaration for afs_shutdown() 11/14611/2
Mark Vitale [Fri, 29 Jan 2016 04:38:59 +0000]
afs: remove duplicate declaration for afs_shutdown()

Somehow there were two.  Now there is but one.

Reviewed-on: https://gerrit.openafs.org/12181
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 3e2fe677a2d3b9f76644175f3a59d392872b87f3)

Change-Id: I19a2f0ebe2c79fd32800cc388e488fa11fd8f0ce
Reviewed-on: https://gerrit.openafs.org/14611
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agoafs: afsd -shutdown sets afs_cold_shutdown too soon 10/14610/2
Mark Vitale [Thu, 28 Jan 2016 15:01:13 +0000]
afs: afsd -shutdown sets afs_cold_shutdown too soon

'afsd -shutdown' always invokes syscall(AFSOP_SHUTDOWN)
with parm2 set to 1 to indicate a "cold" shutdown.
(There are no other callers to AFSOP_SHUTDOWN).

AFSOP_SHUTDOWN sets global variable afs_cold_shutdown
based on the value of parm2.  Then it checks to see if
AFS is still mounted; if so, we return early with EACCES.
However, global afs_cold_shutdown remains set.

Therefore, the next successful 'umount' after a "failed"
'afsd -shutdown' will always trigger a "cold" shutdown.
This is contrary to the intent of the current implementation,
which is to perform a "warm" shutdown upon 'umount' for
most platforms.  (Exceptions:  AIX, OBSD, NBSD always
specify a cold shutdown upon 'umount').

This bug would never be noticed on the "cold" exception
platforms, but on the "warm" platforms the inconsistency
of seeing an unexpected "COLD" shutdown may be confusing
and surprising.

Make shutdown operation more self-consistent by modifying
AFSOP_SHUTDOWN to defer setting of afs_cold_shutdown until
after the mount test.

Reviewed-on: https://gerrit.openafs.org/12180
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit c72ec68bcea646aa3e0395ad103afb2ee9ba9cde)

Change-Id: I7b40728bcb56c9bb0d86912f140fed315e93bf64
Reviewed-on: https://gerrit.openafs.org/14610
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agomacos: delegate sock_* calls to bkg daemons 00/14600/3
Marcio Barbosa [Fri, 9 Apr 2021 15:14:52 +0000]
macos: delegate sock_* calls to bkg daemons

As part of Apple's ongoing effort to modernize macOS, improve security
and reliability, the deprecation of kernel extensions was officially
announced at WWDC19. According to this announcement, Kernel programming
interfaces will be deprecated as alternatives become available, and
future OS releases will no longer load kernel extensions that use
deprecated KPIs by default.

Unfortunately, the following KPIs, extensively used by rx, are included
in the list of deprecated KPIs as of macOS 10.15:

- sock_receivembuf
- sock_close
- sock_send
- sock_socket
- sock_setsockopt
- sock_bind

To workaround this problem, delegate calls to the functions mentioned
above to bkg daemons forked by afsd. Notice that the ifadd_* and ifnet_*
functions are also deprecated. Fortunately, these calls can be avoided
enabling AFS_USERSPACE_IP_ADDR.

Thanks to Andrew Deason for his assistance (ideas, suggestions,
documentation, etc).

Reviewed-on: https://gerrit.openafs.org/14431
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 70e5c4f6a45854ae3a4241568769279747a8b76f)

Conflicts:
src/config/afs_args.h

Change-Id: I4370c0aa3978f208c763ed43c3cc5567ee74e730
Reviewed-on: https://gerrit.openafs.org/14600
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Tested-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agoafs: Add AFS_USPC_SHUTDOWN bkg request 05/14605/2
Andrew Deason [Tue, 14 Aug 2018 20:53:20 +0000]
afs: Add AFS_USPC_SHUTDOWN bkg request

When AFS_NEW_BKG was added, the kernel module indicated to the
relevant afsd process that it's time to shutdown by returning -2. This
works on DARWIN, but it's difficult to make this work on all
platforms, because of the different way that platforms handle error
codes from our pioctls and other AFS syscalls.

Specifically, on LINUX, negative error codes are assumed to be
negative errno codes, and so returning -2 from the syscall handler
means we return -1 to userspace, with errno set to 2 (ENOENT).

Getting this to work consistently across platforms is probably more
trouble than its worth, so instead of relying on specific return codes
from the syscall, just add a new background daemon operation called
AFS_USPC_SHUTDOWN, which just tells the background daemon to exit.

Reviewed-on: https://gerrit.openafs.org/13281
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 397199a1992d74d8b7e693a2d76df836f7a70080)

Change-Id: Ib809a27476f49baef70dcbcc749eed95a4de8d2f
Reviewed-on: https://gerrit.openafs.org/14605
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Tested-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agomacos: packaging support for MacOS X 11.0 98/14598/2
Marcio Barbosa [Thu, 28 Jan 2021 23:49:25 +0000]
macos: packaging support for MacOS X 11.0

This commit introduces the new set of changes / files required to
successfully create the dmg installer on OS X 11.0 "Big Sur".

Reviewed-on: https://gerrit.openafs.org/14430
Reviewed-by: Yadavendra Yadav <yadayada@in.ibm.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 40c6f3aa5576d7e1ac23caff1ae4ffd69e74dc44)

Change-Id: I55bab1631c41fdb636fa84359f2d76d4bfc2b6a1
Reviewed-on: https://gerrit.openafs.org/14598
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Tested-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agomacos: add support for MacOS 11.0 97/14597/2
Marcio Barbosa [Thu, 28 Jan 2021 22:45:10 +0000]
macos: add support for MacOS 11.0

This commit introduces the new set of changes / files required to
successfully build the OpenAFS source code on OS X 11.0 "Big Sur".

While here, refactor the code that checks if the "-Xlinker -kext"
system-specific linker option is needed.

Reviewed-on: https://gerrit.openafs.org/14429
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit acc955bc17e1a1e10f634e7017c1323954f07b31)

Change-Id: Ie5b791c7444612c617eeb3b16e1165510fe9f251
Reviewed-on: https://gerrit.openafs.org/14597
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agorx: Indent ifdef maze in rx_kcommon.c 99/14599/2
Andrew Deason [Sat, 21 Dec 2019 04:09:35 +0000]
rx: Indent ifdef maze in rx_kcommon.c

Reviewed-on: https://gerrit.openafs.org/13997
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 08c769967ca12f1ac99c736789f1925763d8a115)

Change-Id: I75d5ec5c9f75f79817adec3f259e546e79fc3629
Reviewed-on: https://gerrit.openafs.org/14599
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agoafs: Indent ifdef maze in afs_server.c 04/14604/2
Andrew Deason [Sat, 21 Dec 2019 03:51:18 +0000]
afs: Indent ifdef maze in afs_server.c

Reviewed-on: https://gerrit.openafs.org/13996
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 2a8db42664cc450c2db097fe19472fe7876203df)

Change-Id: Iff5bb059ea7005c4f174401b9daa45f1ae6d092d
Reviewed-on: https://gerrit.openafs.org/14604
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agobozo: Fix the test for bosserver '-cores none' 89/14589/2
Cheyenne Wills [Thu, 18 Mar 2021 14:28:22 +0000]
bozo: Fix the test for bosserver '-cores none'

The check for the '-cores none' parameter is incorrect resulting in the
parameter to be taken as a directory path.

Update the string comparison.

Reviewed-on: https://gerrit.openafs.org/14559
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 000fe6b7e6e7bf140c7cca7becc8fb7e8006fec7)

Change-Id: I45ac645bb7cdd6f3cd1dfd81d6ccdc9cda4547a8
Reviewed-on: https://gerrit.openafs.org/14589
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agovol: always build vol-bless utility 01/14601/3
Mark Vitale [Mon, 7 Dec 2020 19:42:54 +0000]
vol: always build vol-bless utility

In order to avoid future bit-rot, always build vol-bless.  Also add it
to the clean rule.  However, continue to leave it undistributed and
uninstalled by default.

Reviewed-on: https://gerrit.openafs.org/14464
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 1efa4e49f2dabe2f3a1ef235e21a96ae9d5ff6bf)

Change-Id: I62b2f192e2bcb24221c94468e2e72aaa567568d4
Reviewed-on: https://gerrit.openafs.org/14601
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agovol: add vol-bless to .gitignore 02/14602/3
Mark Vitale [Mon, 7 Dec 2020 19:40:33 +0000]
vol: add vol-bless to .gitignore

No functional change is incurred by this commit.

Reviewed-on: https://gerrit.openafs.org/14463
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 986ee6a0a70d70f366baeb43670eb367f0525b97)

Change-Id: I1819537c8ac26101a81100871f85a6de78408cea
Reviewed-on: https://gerrit.openafs.org/14602
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agovol: make vol-bless buildable again 03/14603/3
Mark Vitale [Mon, 7 Dec 2020 18:13:28 +0000]
vol: make vol-bless buildable again

The vol-bless utility is not built by default and so is subject to
bit-rot.  Thus commit 170dbb3ce301329ff127bb23fb588db31439ae8d 'rx: Use
opr queues' overlooked vol-bless.c when adding includes for users of
struct rx_queue.

Add the required #include <rx/rx_queue.h> so vol-bless builds again.

Note to maintainers: this change is only required for 1.8.x and later;
vol-bless builds fine in 1.6.x and earlier releases.

Reviewed-on: https://gerrit.openafs.org/14462
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit e1f20287a4d0cd80c6bfe7309b907fe5a4ac1464)

Change-Id: I1f9acb176758bd34b7f63d5ebde54e9af191ad77
Reviewed-on: https://gerrit.openafs.org/14603
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agoFBSD: Add support for FreeBSD 12.2 90/14590/2
Tim Creech [Fri, 30 Oct 2020 01:29:10 +0000]
FBSD: Add support for FreeBSD 12.2

Reviewed-on: https://gerrit.openafs.org/14474
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 96ebee44c322934b9eda1bab5907ee87b03d571f)

Change-Id: I95dedbee8b67a2bb05a8bb3614045fa3a49f9a11
Reviewed-on: https://gerrit.openafs.org/14590
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

2 years agoFBSD: Add support for FreeBSD 12.1 37/14537/3
Tim Creech [Tue, 10 Dec 2019 02:13:58 +0000]
FBSD: Add support for FreeBSD 12.1

Reviewed-on: https://gerrit.openafs.org/13982
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 71ce9fff8e682a77e17490a54e091656cbf96925)

Change-Id: I4214101d314cac6d677a08f760ccf990a4441306
Reviewed-on: https://gerrit.openafs.org/14537
Reviewed-by: Tim Creech <tcreech@tcreech.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agobos: suppress unnecessary warn if -noauth 46/14546/2
Marcio Barbosa [Tue, 18 Aug 2020 13:56:26 +0000]
bos: suppress unnecessary warn if -noauth

Commit d008089a7 (Add interface to select client security objects)
consolidated the code that selects the client security objects into a
set of new interfaces. Before this commit, the "bos: running
unauthenticated" message, which warns the user when an unauthenticated
connection is established, used to be suppressed if the -noauth flag was
specified.

Similarly to commit b3c16324e (ubik: Make ugen_ClientInit honor
noAuthFlag), recover the original behavior avoiding warn messages about
unauthenticated connections if the -noauth flag is provided.

Reviewed-on: https://gerrit.openafs.org/14306
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit d5f0e16ac44475be55a7cc3e2895fc4a3a923ece)

Change-Id: Id78494c2a189f2e99e25111200cabde32a4add2b
Reviewed-on: https://gerrit.openafs.org/14546
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoLinux 5.12: Add user_namespace param to inode ops 65/14565/2
Cheyenne Wills [Fri, 5 Mar 2021 23:31:03 +0000]
Linux 5.12: Add user_namespace param to inode ops

The Linux commits:
"fs: make helpers idmap mount aware" (549c72977) and
"attr: handle idmapped mounts" (2f221d6f7) that were merged into
Linux-5.12-rc1 cause a build failure when creating the kernel module.

Several functions within the inode_operations structure had their
signature updated to include a user_namespace parameter.  This allows
a filesystem to support idmapped mounts.

OpenAFS only implements some of the changed functions.

   LINUX/vnodeops function inode_operation
   =====================   ===============
   afs_notify_change       setattr
   afs_linux_getattr       getattr
   afs_linux_create        create
   afs_linux_symlink       symlink
   afs_linux_mkdir         mkdir
   afs_linux_rename        rename
   afs_linux_permission    permission

Update the autoconf tests to determine if the Linux kernel requires
the user_namespace structure for inode_operations functions. If so,
define a generic "IOP_TAKES_USER_NAMESPACE" macro.

Update the above vnodeops functions to accept a 'struct user_namespace'
parameter.

When using the 'setattr_prepare' function a user namespace must be
now provided. In order to provide compatibility as a non-idmapped mount
filesystem the initial user namespace can be used. With OpenAFS, the
initial user namespace obtained at kernel module load time is stored in
a global variable 'afs_ns'.

Update the call to setattr_prepare to pass the user namespace pointed
to by the 'afs_ns' global variable.

Update calls to setattr to pass the user namespace pointed to by
the 'afs_ns' global variable.

Notes:

The changes introduced with Linux 5.12 allow a filesystem to support
idmapped mounts if desired. This commit does not implement support for
idmapped mounts, but will continue to use the same initial user
namespace as prior to Linux 5.12.

With Linux 5.12 the following autoconf checks fail:

 HAVE_LINUX_INODE_OPERATIONS_RENAME_TAKES_FLAGS
 HAVE_LINUX_SETATTR_PREPARE
 IOP_CREATE_TAKES_BOOL
 IOP_GETATTR_TAKES_PATH_STRUCT
 IOP_MKDIR_TAKES_UMODE_T

The new macro 'IOP_TAKES_USER_NAMESPACE' covers the cases where these
macros where used.

Reviewed-on: https://gerrit.openafs.org/14549
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 1bd68506be3243c5670aaf53798b2e4e715d4c8b)

Change-Id: I8cd54042da4e0295f3cf8417c84138bb0458f881
Reviewed-on: https://gerrit.openafs.org/14565
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoLinux: Create wrapper for setattr_prepare 64/14564/2
Cheyenne Wills [Mon, 8 Mar 2021 16:22:04 +0000]
Linux: Create wrapper for setattr_prepare

Move call to setattr_prepare/inode_change_ok into an osi_compat.h
wrapper called 'afs_setattr_prepare'.  This moves some of the #if logic
out of the mainline code.

Reviewed-on: https://gerrit.openafs.org/14548
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 12ae2beeeb172cebdfa24d5ea149f73fd85541f8)

Change-Id: I1c7806893daf2404a8b3ac1b5c88ca04e6409226
Reviewed-on: https://gerrit.openafs.org/14564
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoLINUX: Introduce afs_d_path 63/14563/2
Andrew Deason [Tue, 23 Jul 2019 18:50:31 +0000]
LINUX: Introduce afs_d_path

Move our preprocessor logic around d_path into an osi_compat.h
wrapper, called afs_d_path. This just makes it a little easier to use
d_path, and moves a tiny bit of #ifdef cruft away from real code.

Reviewed-on: https://gerrit.openafs.org/13721
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 4c4fb6e36634e5663c8be25acd4a1ac872e4738c)

Change-Id: I08763c71006e4ac6f2bf88d8ac71941fc44e6ab8
Reviewed-on: https://gerrit.openafs.org/14563
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agotests: Skip SIGBUS test on FreeBSD 36/14536/2
Andrew Deason [Mon, 13 Apr 2020 03:28:29 +0000]
tests: Skip SIGBUS test on FreeBSD

Currently, 'softsig-helper -buserror' causes a SIGBUS on most
platforms, but can result in SIGSEGV on FreeBSD by default (at least
on 11.3-RELEASE). Skip the test on FreeBSD, until we can provide a
more reliable way to generate SIGBUS.

Note that when the sysctl machdep.prot_fault_translation is set to 1,
'softsig-helper -buserror' generates a SIGBUS instead of SIGSEGV,
suggesting that generating a SIGBUS here is the old 'compat' behavior.
When machdep.prot_fault_translation is 0 (the default), the code path
in the FreeBSD kernel that dictates whether to send a SIGBUS or
SIGSEGV in this situation depends on some autodetection heuristics,
and so may produce different results depending on FreeBSD releases or
even compiler settings (due to detection of ABI based on some ELF
notes in the relevant binary).

For some details on this sysctl, see
<https://www.freebsd.org/news/status/report-2019-07-2019-09.html#Signals-delivered-on-unhandled-Page-Faults>
or the FreeBSD source code. In 11.3-RELEASE, the decision to issue a
SIGBUS or SIGSEGV can be found around sys/amd64/amd64/trap.c:355.

Reviewed-on: https://gerrit.openafs.org/14145
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit df5480057c2994914e22bd14b169dbcd8857485a)

Change-Id: Ifd2c17c52a7a9be7a8a09776cf15500fdc9ca62d
Reviewed-on: https://gerrit.openafs.org/14536
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoFBSD: Remove MA_* abstractions 35/14535/2
Andrew Deason [Sun, 8 Sep 2019 21:10:40 +0000]
FBSD: Remove MA_* abstractions

In FBSD/osi_vnops.c, we have a few abstractions (e.g. MA_VOP_UNLOCK)
that used to expand to different things for older FreeBSD versions.
Currently, they always expand to the same thing, so just remove the
abstractions.

While we are changing these calls, also change one instance of
MA_VOP_LOCK to vn_lock (instead of VOP_LOCK), since we're not usually
supposed to call VOP_LOCK directly, according to the VOP_LOCK(9)
manpage. The MA_VOP_LOCK call was added in commit bd707fb7
(freebsd-almost-working-client-20020216), seemingly by mistake.

Reviewed-on: https://gerrit.openafs.org/13843
Reviewed-by: Tim Creech <tcreech@tcreech.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 7260c7164b9a2199c7b5f83279fa18af16e7d387)

Change-Id: I831e798546da97eeba923965c24dd79be14a9b89
Reviewed-on: https://gerrit.openafs.org/14535
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoFBSD: Build vnode_if.h before libafs objs 34/14534/2
Tim Creech [Sat, 14 Dec 2019 03:24:57 +0000]
FBSD: Build vnode_if.h before libafs objs

Currently, if we are building with -j2 or higher, we can easily fail
to build some libafs objects because vnode_if.h does not exist yet.
vnode_if.h is generated by the FreeBSD build, but none of our objects
depend on it, so during parallel builds it may not be available by the
time we build, for example, src/external/heimdal/hcrypto/sha256.c.

This results in build errors that can look like this:

    --- sha256-kernel.o ---
cc -I. -I.. -I../nfs [...]/src/external/heimdal/hcrypto/sha256.c
    In file included from [...]/src/external/heimdal/hcrypto/sha256.c:34:
    In file included from [...]/src/crypto/hcrypto/kernel/config.h:30:
    In file included from [...]/src/afs/sysincludes.h:354:
    /usr/src/sys/sys/vnode.h:588:10: fatal error: 'vnode_if.h' file not found
    #include "vnode_if.h"
             ^~~~~~~~~~~~
    1 error generated.
    *** [sha256-kernel.o] Error code 1

    make[4]: stopped in [...]/src/libafs/MODLOAD
    1 error

To avoid this, make all of our libafs objects depends on vnode_if.h.

[adeason@dson.org: Expanded commit message.]

Reviewed-on: https://gerrit.openafs.org/13983
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 0ee53d2fe9341e60f420662749d5ae8c6d4b5f24)

Change-Id: I85696c23aeeabf8ebc38c8a9ea320fdcf8141ad9
Reviewed-on: https://gerrit.openafs.org/14534
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Tim Creech <tcreech@tcreech.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoFBSD: Remove LOCKPARENT/ISLASTCN lookup logic 33/14533/2
Tim Creech [Sun, 5 Mar 2017 23:15:58 +0000]
FBSD: Remove LOCKPARENT/ISLASTCN lookup logic

Currently, our afs_vop_lookup on FBSD tries to only lock 'dvp' for
ISDOTDOT requests when LOCKPARENT and ISLASTCN are set. There are a
couple of problems with this:

- The conditional locking logic involving LOCKPARENT/ISLASTCN is only
  relevant in very old FreeBSD releases (per-fs checking of these
  flags for parent locking went away around the FreeBSD 6 era).

- Our current logic here is wrong anyway, since we try to lock 'dvp'
  twice when those flags are set. This was mostly introduced by commit
  2f6be821 (FBSD: band-aid vnode locking in lookup), which added a
  lock/unlock pair for 'dvp' around the lock for 'vp', even though
  'dvp' was unlocked several lines earlier.

This means that if we hit the relevant code path, we will deadlock,
since we try to lock 'dvp' twice. To avoid this, just remove the
relevant logic for LOCKPARENT/ISLASTCN, since it is only relevant for
old FreeBSD releases that are not supported by us or FreeBSD.

Add and rearrange some comments around here to try to more explicitly
explain the relevant locking rules.

[adeason@dson.org: Commit message rewrite, adding comments, removing
old FreeBSD code.]

Reviewed-on: https://gerrit.openafs.org/12578
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 17a845c8d44f453b09b21afd59182e616234e872)

Change-Id: I105dfe397bb723b0939bb626103d857007e1a7bf
Reviewed-on: https://gerrit.openafs.org/14533
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Tim Creech <tcreech@tcreech.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoFBSD: Remove unused 'wantparent' logic 32/14532/2
Andrew Deason [Mon, 13 Apr 2020 03:40:14 +0000]
FBSD: Remove unused 'wantparent' logic

In afs_vop_lookup, the 'wantparent' variable doesn't actually change
any logic in the function. In the if() clause that it's used, the
value of 'wantparent' is only ever used if cnp->cn_nameiop is RENAME
and ISLASTCN is set. But if both of those are true, then the second
half of the if() conditional will always be true, so the value of
'wantparent' doesn't matter.

So to remove this confusing unused logic, remove the 'wantparent'
local var, and all its associated logic.

Issue spotted by kaduk@mit.edu.

Change-Id: Ia63b88d67d21cc2b81a0c25aa31ea60ab202b0a7
Reviewed-on: https://gerrit.openafs.org/14143
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 7df5c003ed6eb17a693d67ffdfc0556f0c569cc1)
Reviewed-on: https://gerrit.openafs.org/14532
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoFBSD: Add support for FreeBSD 11.3 31/14531/2
Andrew Deason [Mon, 19 Aug 2019 00:59:50 +0000]
FBSD: Add support for FreeBSD 11.3

Reviewed-on: https://gerrit.openafs.org/13792
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 7116de596a8f1d0be3da6eebe92d486f57aefd02)

Change-Id: I9bbf3a72041dda4220b63963b6fc9bd8bd2342e8
Reviewed-on: https://gerrit.openafs.org/14531
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoFBSD: Declare vnops/vfsops static 30/14530/2
Andrew Deason [Sun, 1 Dec 2019 21:39:04 +0000]
FBSD: Declare vnops/vfsops static

Declare our vnode and vfs operations as static functions, since they
are not referenced outside of osi_vfsops.c/osi_vnodeops.c. Shuffle
around the definitions in osi_vnodeops.c so that we don't need forward
declarations for the functions.

Reviewed-on: https://gerrit.openafs.org/13973
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 37c5db3ce767868803135c916b282ff2e541d052)

Change-Id: I8817e0e2a02bc4211dc84c0d9f8b418de756120e
Reviewed-on: https://gerrit.openafs.org/14530
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoFBSD: Remove pre-8 code 29/14529/2
Andrew Deason [Mon, 26 Aug 2019 04:21:23 +0000]
FBSD: Remove pre-8 code

Commit 123f0fb1 (config: remove support for old FreeBSD releases)
removed our support for FreeBSD releases before FreeBSD 8. However,
various areas of code still reference the symbols from those old
versions (e.g. AFS_FBSD53_ENV). Remove our ifdef logic for these old
symbols, according to the following rules:

- In FBSD-specific dirs, assume AFS_FBSD80_ENV is always true (as well
  as the symbols for earlier versions)

- In non-FBSD dirs, convert AFS_FBSD80_ENV to AFS_FBSD_ENV (and do the
  same for all earlier versions)

This allows us to remove code that was specific to older FreeBSD
versions, and simplify some ifdef conditionals.

Also remove the definitions for AFS_FBSD80_ENV and earlier versions in
our existing param.h files.

With this commit, the functions afs_start, afs_vop_lock,
afs_vop_unlock, and afs_vop_islocked are now always unreferenced, so
remove them.

Reviewed-on: https://gerrit.openafs.org/13812
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Tim Creech <tcreech@tcreech.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 847b63af92dd527de31675a0c3c82c9a57e6c4b3)

Change-Id: Icaaf660a95084a358d1ddf6fbc63944eff90492f
Reviewed-on: https://gerrit.openafs.org/14529
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agolibafs: Create debug KMODDIR for FBSD debug inst 28/14528/2
Andrew Deason [Sun, 23 Jun 2019 22:48:53 +0000]
libafs: Create debug KMODDIR for FBSD debug inst

Commit 99418024 (libafs: Create $(DESTDIR)$(KMODDIR) on FBSD inst)
made it so we create the kmod installation dir before copying our
module into it. However, if we build a 'debug' variant of our module,
the FreeBSD build process also installs debug symbols in a different
directory, ${DESTDIR}${KERN_DEBUGDIR}${KMODDIR}, which may not exist.
So do the same thing for that dir too, if --enable-debug-kernel is
turned on, so the build still works.

To do this, introduce the LIBAFS_REQ_DIRS var, to make it easier to
keep track of which dirs we may need to create.

Reviewed-on: https://gerrit.openafs.org/13690
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Andrew Deason <adeason@sinenomine.net>
(cherry picked from commit 3bc03e7a5f8ef521e71a30cb8e66e07e2d1b4605)

Change-Id: Idee5614e92b99bd1140a3cef971537fb68eec151
Reviewed-on: https://gerrit.openafs.org/14528
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoFBSD: Use ucontext for FreeBSD 10+ on amd64 27/14527/2
Andrew Deason [Sun, 14 Jul 2019 22:31:30 +0000]
FBSD: Use ucontext for FreeBSD 10+ on amd64

Currently, running any LWP program on recent FreeBSD on amd64 causes
(or can cause) a SIGBUS very quickly. This is possibly because our
stack management code in LWP only ensures our stacks are 4 or 8-byte
aligned in most cases (except DARWIN, which gets 16-byte-aligned
stacks), according to the value of STACK_ALIGN. The amd64 ABI mandates
that stacks be 16-byte-aligned, and some function calls assume that
this is followed, causing a SIGBUS when it is not. FreeBSD on amd64
currently uses process.amd64.s for its savecontext() implementation,
which does not do any checking or fixup of the stack alignment.

This behavior has been observed on amd64 with FreeBSD 11 specifically,
but it probably happens on any FreeBSD release when using clang.
FreeBSD switched to clang as the default compiler with FreeBSD 10, so
this probably occurs with FreeBSD 10 and newer.

We could perhaps try to fix this by changing our stack management
code, but we can also avoid most of this nonsense by just using
ucontext instead of our custom assembly code. So, do that, by setting
USE_UCONTEXT for FreeBSD 10+. Also enable the same 'stackvar'-based
workaround in savecontext() as Linux uses, since otherwise 'topstack'
appears to always be NULL, and triggers our stack overflow checks.

Note that while LWP use is deprecated, as of this commit many small
utilities (like 'fs') are still linked to LWP, and so are unusable
without a fix like this.

Reviewed-on: https://gerrit.openafs.org/13691
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 0a39efee224e8d4431ae79281ca353a7ba6fdce4)

Change-Id: I8cb4c20eb32c12310f41e38a3f40b132c919bace
Reviewed-on: https://gerrit.openafs.org/14527
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoFBSD: Set KERNBUILDDIR for --with-bsd-kernel-build 26/14526/2
Andrew Deason [Sun, 28 Jul 2019 20:03:43 +0000]
FBSD: Set KERNBUILDDIR for --with-bsd-kernel-build

Currently, specifying --with-bsd-kernel-build during configure causes
us to set BSD_KERNEL_BUILD, which sets KBLD in MakefileProto.FBSD.in,
but nothing ever uses KBLD. This means that when we use
--with-bsd-kernel-build, we don't actually build against the
configuration for that kernel, which can result in a libafs.ko that
cannot be loaded or causes other errors. Specifically, if trying to
build for a VIMAGE kernel, the kernel complains when trying to load
libafs:

    [...] kernel: link_elf_obj: symbol in_ifaddrhead undefined
    [...] kernel: linker_load_file: Unsupported file type

The FreeBSD module build system looks for KERNBUILDDIR for an
alternative build, which it uses to pull in opt_global.h and other
required pieces from the build tree. So just specify KERNBUILDDIR if
we have one.

At the same time, avoid setting our default value for BSD_KERNEL_BUILD
for FBSD when the calculated dir doesn't exist. At least for the
default GENERIC kernel on FreeBSD 11.2-RELEASE, there may not be a
build dir on the running machine, and so setting BSD_KERNEL_BUILD to
the calculated value causes the build to fail when it doesn't exist.

Reviewed-on: https://gerrit.openafs.org/13746
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Tim Creech <tcreech@tcreech.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 8f9c92a888df7b2fd61a3e84aaf1d2c96a8b10dd)

Change-Id: I7afc54121ac3a9d81a7a8005d53eb2ed5df489c1
Reviewed-on: https://gerrit.openafs.org/14526
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoFBSD: Call CURVNET_SET/CURVNET_RESTORE for VIMAGE 25/14525/2
Tim Creech [Sun, 5 Mar 2017 23:18:01 +0000]
FBSD: Call CURVNET_SET/CURVNET_RESTORE for VIMAGE

In commit 9703b023 (FBSD: VIMAGE support), we changed a couple of our
variable references to their V_* equivalents, to accommodate kernels
with VIMAGE turned on. This allows us to build, but causes us to crash
whenever we hit that code when VIMAGE is enabled, because the relevant
macros reference 'curvnet', which is NULL outside of networking code.

What we're supposed to do is to set 'curvnet' before entering
networking code by calling 'CURVNET_SET(xxx)', and reset it afterwards
by calling 'CURVNET_RESTORE()'. We must make exactly one _RESTORE call
for each _SET, and they are supposed to be run at the same level of
scope.

So to avoid the crashes, make the relevant CURVNET_* calls whenever we
look at networking info. We currently only do this in a few places:

- In afs_SetServerPrefs, to try to detect if a given server address is
  in the same network as one our local interfaces (V_in_ifaddrhead)

- In rxi_GetIFInfo, for some MTU-related info (V_ifnet)

- In rxi_FindIfnet, for some MTU-related info (ifa_ifwithnet)

As for what vnet we actually set 'curvnet' to, we could set it to the
vnet of the current thread (TD_TO_VNET(curthread)), or we could set it
to the vnet of an associated network object (a socket, an interface,
etc). Since all of our network-related code goes through Rx, in this
commit we set curvnet to the vnet of the Rx socket
(rx_socket->so_vnet).

Note that VIMAGE is optional in 11-RELEASE, but is turned on by
default in 12.0-RELEASE. For more information, see:
https://wiki.freebsd.org/VIMAGE/porting-to-vimage

[adeason@dson.org: Reworded commit message; moved some code around.]

Reviewed-on: https://gerrit.openafs.org/12580
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 1effc3517fdb4b4653d47c59bf67076567209324)

Change-Id: I5fd8b2bf204790b1da6427fe72b8743a7aaa4f13
Reviewed-on: https://gerrit.openafs.org/14525
Reviewed-by: Tim Creech <tcreech@tcreech.com>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoFBSD: Remove unnecessary explicit osi_fbsd_alloc 24/14524/2
Andrew Deason [Mon, 15 Jul 2019 03:53:39 +0000]
FBSD: Remove unnecessary explicit osi_fbsd_alloc

AFS_KALLOC is already defined to be osi_fbsd_alloc on FBSD, so this
extra #ifdef here is completely unnecessary. Remove it.

Do the same for AFS_KFREE/osi_fbsd_free.

Reviewed-on: https://gerrit.openafs.org/13708
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit ad1fe5e1a825a3b3f88c04fd84613e4105206443)

Change-Id: Ib62b52d040ffd4170a0bb556684244ee1f372401
Reviewed-on: https://gerrit.openafs.org/14524
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoFBSD: Give 0 'rootrefs' to vflush on unmount 23/14523/2
Andrew Deason [Sun, 21 Jul 2019 04:09:27 +0000]
FBSD: Give 0 'rootrefs' to vflush on unmount

Currently, in afs_unmount, we give vflush a 'rootrefs' arg of 1,
indicating that we hold 1 reference on the root vnode. But ever since
commit 6eb1088a (freebsd: properly track vcache references), we drop
the ref for the root vnode at the beginning of this function.

What happens currently in afs_unmount for a normal successful umount
is something like this (at least, on FreeBSD 11.2-RELEASE):

- We afs_PutVCache the afs_globalVp vcache, reducing its v_usecount
  and v_holdcnt to 0, and afs_globalVp is set to NULL.

- vflush calls afs_root() to get the root vnode, which sees that
  afs_globalVp is NULL, and so calls afs_GetVCache for the root fid
  and returns it (and sets afs_globalVp to that vcache), with a
  v_usecount of 1.

- vflush tries to vgonel() all of our vnodes, which calls our
  afs_vop_reclaim, which calls afs_FlushVCache(). For the root vnode
  specifically, vflush() sees that v_usecount is nonzero, and so skips
  calling vgonel() at first, but later calls vgone() on it
  specifically because we gave a nonzero 'rootrefs'. The resulting
  afs_FlushVCache() for the root vnode fails, because the root vnode's
  v_usecount is still 1. Since a failure from afs_vop_reclaim would
  cause a panic, we just log a warning and try to continue on anyway.

- vflush() calls vrele() on the root vnode, right before returning.

All of this allows the unmount to proceed, but this means that most of
afs_FlushVCache() doesn't actually run for the root vcache, and it
means we always log a warning like this on unmount:

    afs_vop_reclaim: afs_FlushVCache failed code 16 [...]

In addition, this means that setting afs_globalVp at the beginning of
afs_unmount() is largely pointless, since it gets set to a vcache
again near the beginning of vflush().

To avoid all of this, stop lying to vflush about how many references
to the root vnode we hold, and just say that we hold 0 references.

Reviewed-on: https://gerrit.openafs.org/13709
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit d13b647aa392e1d802be1023930a8e1a07fb11ab)

Change-Id: I7ca79ee5c10277d6ef94b5f317aa4ba091642ffd
Reviewed-on: https://gerrit.openafs.org/14523
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoFBSD: Handle F_UNLCK in VOP_ADVLOCK 22/14522/2
Tim Creech [Sun, 5 Mar 2017 23:17:23 +0000]
FBSD: Handle F_UNLCK in VOP_ADVLOCK

When a_fl->type is F_UNLCK, FreeBSD gives our VOP_ADVLOCK an a_op of
F_UNLCK, instead of F_SETLK like we expect. This causes afs_lockctl to
return EINVAL, since F_UNLCK isn't a normal fcntl lock op, and so
userspace requests to unlock fcntl-style locks always fail. This can
be seen, for example, when trying to use sqlite3 to access a database
that lives in afs.

This F_UNLCK behavior in FreeBSD seems a bit peculiar, but has been
around effectively forever (since 4.4BSD-Lite). So just work around
it.

[adeason@dson.org: minor style adjustments and commit message/comment
rewording.]

Reviewed-on: https://gerrit.openafs.org/12579
Reviewed-by: Tim Creech <tcreech@tcreech.com>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit f5acf1b1bfe940faf0a6f4bd11c55d6c90f60242)

Change-Id: I52d0c513aeabd54019fc6d7bb6c3b542e95b2dee
Reviewed-on: https://gerrit.openafs.org/14522
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Tim Creech <tcreech@tcreech.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agolibafs: Create $(DESTDIR)$(KMODDIR) on FBSD inst 21/14521/2
Andrew Deason [Sun, 23 Jun 2019 22:48:53 +0000]
libafs: Create $(DESTDIR)$(KMODDIR) on FBSD inst

We rely on bsd.kmod.mk for our actual rules during 'make install', but
that tries to install our kernel module into $(DESTDIR)$(KMODDIR),
without creating it first. If the user tries to 'make install
DESTDIR=/some/path' and that path doesn't exist, we will fail with
something like:

make DESTDIR=/home/adeason/git/destdir single_instdir_libafs
/usr/bin/install -c -T release -o root -g wheel -m 555   libafs.ko /home/adeason/git/destdir/boot/modules/
install: /home/adeason/git/destdir/boot/modules/: No such file or directory
*** Error code 71

To avoid this, add a dependency on the 'install' target which causes
our target dir to be created.

Reviewed-on: https://gerrit.openafs.org/13653
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Tim Creech <tcreech@tcreech.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 99418024276c94da5982d7dad6126a8d53924d7e)

Change-Id: I439b7b9514b3ab060c887003e0af19557fd2c812
Reviewed-on: https://gerrit.openafs.org/14521
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoAdd param.h files and sysnames for FreeBSD 11.2 20/14520/2
Stephan Wiesand [Fri, 22 Mar 2019 11:46:17 +0000]
Add param.h files and sysnames for FreeBSD 11.2

Thanks to MÃ¥ns Nilsson for filing the bug. Note that this change
differs from the proposed patch in the report, in that it
doesn't define the 10.4 symbols in the 11.2 param.h files.

FIXES 134850

Reviewed-on: https://gerrit.openafs.org/13534
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
(cherry picked from commit 2ee35afa339731f6a60f1e5e99ccaf63baa6c891)

Change-Id: I6ba8ba41df12f1a5977f5b530aa1353902de5ebe
Reviewed-on: https://gerrit.openafs.org/14520
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoSOLARIS: provide cache manager stats via kstat 18/14518/2
Michael Meffie [Tue, 31 May 2016 20:23:41 +0000]
SOLARIS: provide cache manager stats via kstat

Provide statistical information via the solaris kstat framework.  Data
can be examined with the kstat tool or the kstat userspace api.

The kstat module is called openafs. Three kstat names are provided.  The
"param" name provides cache manager parameters as given by the cmdebug
-cache program.

    # kstat -m openafs -n param

The "cache" name provides cache manager statistics as given by the
xstats plus some additional cache related stats. The "cache" name also
provides the libafs kernel module version string and the current local
cellname.

    # kstat -m openafs -n cache

The "rx" name provides general rx statistics as given by rxdebug -rxstat.

    # kstat -m openafs -n rx

Reviewed-on: https://gerrit.openafs.org/13170
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 9338cb5fce2e38b864b8f957b6ea4c56c78d20f8)

Change-Id: Ic70d766d7a112d673b6c5898da43b3eea11b1065
Reviewed-on: https://gerrit.openafs.org/14518
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoafs: consolidate duplicated wait-for-cache-drain code 17/14517/2
Mark Vitale [Thu, 9 Aug 2018 21:40:09 +0000]
afs: consolidate duplicated wait-for-cache-drain code

Consolidate duplicated logic into a new routine
afs_MaybeWaitForCacheDrain().

Reviewed-on: https://gerrit.openafs.org/13278
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 23bd776b0140deb596287869872a41de555ba99a)

Change-Id: I31b04da2170dcdf795b8a50ea7ab78d964eeebf5
Reviewed-on: https://gerrit.openafs.org/14517
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoafs: more cache truncation stats 16/14516/2
Michael Meffie [Mon, 20 Jun 2016 19:29:45 +0000]
afs: more cache truncation stats

Add counters for cache too full and waiting to drain occurrences. These
will be used in later commits to indicate how often the cache truncation
is required and how often the cache manager is waiting for cache
truncation to complete.

Reviewed-on: https://gerrit.openafs.org/13168
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 25792e246362a201743533a970f90dbc77d0ed5c)

Change-Id: I659cce58951c869ce40ff47b13aa79ab33cd26aa
Reviewed-on: https://gerrit.openafs.org/14516
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agovol: prevent salvage segfault for orphaned vnode with out-of-range parent 15/14515/2
Mark Vitale [Thu, 20 Aug 2020 20:09:02 +0000]
vol: prevent salvage segfault for orphaned vnode with out-of-range parent

While salvaging a RW volume, salvager may segfault if it encounters an
orphaned directory with a parent vnode that does not exist.  For
example, if the large vnode index contains a maximum vnode of 2901, any
parent vnode encountered that is larger than 2901 will result in an
out-of-bounds reference to our vnode essence array, leading to a
segfault or undefined behavior.

Modify the logic to check for out-of-bounds parent vnodes, and log them
rather than segfaulting.

Reviewed-on: https://gerrit.openafs.org/14385
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 750628da77bb71e24ed3061431bbb913ff8d5f72)

Change-Id: Ib0cabde440d59394704967dd3ab2eb73f07aec22
Reviewed-on: https://gerrit.openafs.org/14515
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agorx: prevent leakage of non-cached rx_connections (pthread) 14/14514/2
Mark Vitale [Fri, 20 Apr 2018 04:57:28 +0000]
rx: prevent leakage of non-cached rx_connections (pthread)

The rxi_connectionCache (AFS_PTHREAD_ENV only) allows applications to
reuse rx_connection structs.  Cached rx_connections are obtained via
rx_GetCachedConnection and released via rx_ReleaseCachedConnection.
This feature is used most heavily by libadmin and kauth, but there are
other users in the tree as well.

For instance, ubikclient routines ubik_ClientInit and ubik_ClientDestroy
call rx_ReleaseCachedConnections (if AFS_PTHREAD_ENV) when disposing of
their rx_connections.  Unfortunately, in many cases these rx_connections
were obtained via rx_NewConnection, _not_ from the cache via
rx_GetCachedConnection.  In those cases, rx_ReleaseCachedConnection will
not find the rx_connection in the rxi_connectionCache, and thus it
returns without doing anything.

Therefore, when ubik_ClientInit is passed an existing ubik_client (for
re-initialization) that contains rx_connections NOT allocated via
rx_GetCachedConnection, those connections are not destroyed, but will be
silently leaked.  Similarly, ubik_ClientDestroy will leak its
rx_connections when it frees the ubik_client struct.

For example, the fileserver host package calls ubik_ClientInit (via
hpr_Initialize) and ubik_ClientDestroy (via hpr_End) to manage
connections to the ptserver.  However, these connections were obtained
via rx_NewConnection, not rx_GetCachedConnection.  If the fileserver has
a failed call to the ptserver that sets prfail=1, the next RPC scheduled
for that client (in CallPreamble) will refresh the thread's ubik_client
(viced_uclient_key) by calling hprEnd -> ubik_ClientDestroy ->
rx_ReleaseCachedConnection.  The "released" connections will be leaked.

This problem exists in all versions of OpenAFS going back to IBM 1.0.
Starting with 1.8.x, many components that were formerly LWP-only are now
pthreaded and thus susceptible to this leak.

It seems difficult and error-prone to identify all possible code paths
that may pass a non-cached rx_connection to rx_ReleaseCachedConnection,
and convert them to obtain connections via rx_GetCachedConnection.

Instead, prevent all existing and future leaks by modifying the connection
cache to:
- flag all rx_connections it allocates
- correctly release any rx_connection it is passed, whether they came
  from the cache or not.

Reviewed-on: https://gerrit.openafs.org/13042
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit bb5397e4c409e3c075ee73d6bf54a3b6eacc0060)

Change-Id: Ia48e29a53a83211c1011eba24c16f78f7253d84b
Reviewed-on: https://gerrit.openafs.org/14514
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agorx: fix out-of-range value for RX_CONN_NAT_PING 13/14513/2
Mark Vitale [Mon, 30 Apr 2018 22:34:28 +0000]
rx: fix out-of-range value for RX_CONN_NAT_PING

Commit 496fb87372555f6acddd4fd88b03c94c85f48511 ("rx: avoid nat ping
until connection is attached") introduced functionality to defer turning
on NAT ping for server connections until after reachability had been
established for the client.

Unfortunately, this feature could never work correctly because it
assigned an out-of-range flag value of 256 (0x100) for the u_char flags
field. Instead of calling this out as an error, both gcc and Solaris cc
elide this flag so that it is never set in
rx_SetConnSecondsUntilNatPing(), Furthermore, the test in
rxi_ConnClearAttachWait() will always fail; therefore
rxi_ScheduleNatKeepAliveEvent is never called after attach wait has
ended.

Fortunately, this bug is currently moot - not actually exposed in
OpenAFS. (It was discovered by inspection). This is because there are
currently no rx_connection objects in the tree that have both NAT ping
and checkReach (rx_SetCheckReach) enabled. I also searched git history
and found no time when this bug could ever have been exposed. This does
raise the question of why the original commit was needed; but instead of
reverting the original commit, this commit attempts to fix it.

To prevent problems if NAT ping and checkReach are ever both enabled for
an rx_connection, enlarge the rx_connection flags member so that the
RX_CONN_NAT_PING value is no longer out of range.

Reviewed-on: https://gerrit.openafs.org/13041
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 55fca11421055d0bcee79f118ea2a035393cc6e5)

Change-Id: I9b02ff06d7bf6ba0dfa30ed5ca17ddb89b517987
Reviewed-on: https://gerrit.openafs.org/14513
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoafs: Fix XBSD check for VNOVAL va_uid 12/14512/2
Andrew Deason [Wed, 23 Dec 2020 18:44:35 +0000]
afs: Fix XBSD check for VNOVAL va_uid

Commit e86eb73e (obsd-vattrs-20040125) introduced an XBSD-specific
check to detect some unchanged attributes. But the #ifdef for XBSD for
the va_uid section was added in the middle of an HPUX-specific block
by mistake.

Move this #ifdef one level higher, so it's actually used on BSD
platforms.

Reviewed-on: https://gerrit.openafs.org/14473
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Tim Creech <tcreech@tcreech.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit cd35aa9e2aec16d622177eeea1e1b3ec8aacdd45)

Change-Id: I6a840cffc1e3dfc6df1237261aa3a21bb3b73fbc
Reviewed-on: https://gerrit.openafs.org/14512
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoLinux 5.11: Test 32bit compat with in_compat_syscall 11/14511/2
Cheyenne Wills [Fri, 22 Jan 2021 14:57:55 +0000]
Linux 5.11: Test 32bit compat with in_compat_syscall

Linux 5.11 removed the TIF_IA32 thread flag with commit:
  x86: Reclaim TIF_IA32 and TIF_X32 (8d71d2bf6efec)

The flag TIF_IA32 was being used by openafs to determine if the task was
handling a syscall request from a 32 bit process.  Building against a
Linux 5.11 kernel results in a build failure as TIF_IA32 is undefined.

The function 'in_compat_syscall' was introduced in Linux 4.6 as
the preferred method to determine if a syscall needed to handle a
compatible call (e.g. 32bit application).

To resolve the build problem, use 'in_compat_syscall' if present (Linux
4.6 and later) to determine if the syscall needs to handle a
compatibility mode call.

Add autoconf check for in_compat_syscall.

Notes about in_compat_syscall:

In Linux 4.6 'in_compat_syscall' was defined for all architectures with
a generic return of 'is_compat_task', but allows architecture specific
overriding implementations (x86 and sparc).

At 4.6 (and later), the function 'is_compat_task' is defined only for
the following architectures to return:

Arch              Returns
=======           ==============================
arm64             test_thread_flag(TIF_32BIT);
mips              test_thread_flag(TIF_32BIT_ADDR)
parisc            test_ti_thread_flag(task_thread_info(t), TIF_32BIT)
powerpc           is_32bit_task()
s390              test_thread_flag(TIF_31BIT)
sparc             test_thread_flag(TIF_32BIT)

If the Linux kernel is not built with compat mode, is_compat_task and
in_compat_syscall is set to always return 0

Linux commit that introduced in_compat_syscall:
  compat: add in_compat_syscall to ask whether we're in a compat syscall
  (5180e3e24fd3e8e7)

Reviewed-on: https://gerrit.openafs.org/14499
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 78ef922612bef5f5fd6904896e84b9d2ea802404)

Change-Id: I4eca62f19ae58fd830915feff5098cec2825f099
Reviewed-on: https://gerrit.openafs.org/14511
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoLinux: Refactor test for 32bit compat 10/14510/2
Cheyenne Wills [Fri, 29 Jan 2021 18:32:36 +0000]
Linux: Refactor test for 32bit compat

Refactor the preprocessor checks for determining the method to test for
32bit compatibility (64bit kernel performing work for a 32bit task) into
a common inline function, 'afs_in_compat_syscall' that is defined in
LINUX/osi_machdep.h.  Update osi_ioctl.c and afs_syscall.c to use
afs_in_compat_syscall.

Add include afs/sysincludes into osi_machdep.h to ensure linux/compat.h
is pulled for the functions called in afs_in_compat_syscall.

Reviewed-on: https://gerrit.openafs.org/14500
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 32cc6b0796495e596262d84c428172a511f757c4)

Change-Id: I746e5777737d49381c4a74627b79d2a61cbd4f8e
Reviewed-on: https://gerrit.openafs.org/14510
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoLINUX: Fix includes for fatal_signal_pending test 09/14509/2
Andrew Deason [Thu, 28 Jan 2021 22:59:47 +0000]
LINUX: Fix includes for fatal_signal_pending test

Commit 8b6ae289 (LINUX: Avoid lookup ENOENT on fatal signals) added a
configure test for fatal_signal_pending(). However, this check fails
incorrectly ever since Linux 4.11, because fatal_signal_pending() was moved
from linux/sched.h to linux/sched/signal.h in Linux commit 2a1f062a
(sched/headers: Move signal wakeup [...]). Fix this by including
linux/sched/signal.h if we have it during the configure test.

A false negative on this configure test doesn't break the build, but
it disables one of our safeguards preventing incorrect negative
dentries at runtime. The function fatal_signal_pending() hasn't
changed in quite some time (except for what header it lives in); it
was introduced in Linux 2.6.25 via Linux commit f776d12d (Add
fatal_signal_pending). So to try to avoid this mistake again in the
future, make it so a missing fatal_signal_pending() breaks the build
if we're on Linux 2.6.25+.

Reviewed-on: https://gerrit.openafs.org/14508
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 0c1465e4f3310daa54f1e799f76237604222666d)

Change-Id: I1334c060f8ab5733461ebf7c191dffa7be830021
Reviewed-on: https://gerrit.openafs.org/14509
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agorx: Avoid new server calls for big-seq DATA pkts 07/14507/2
Andrew Deason [Thu, 19 Sep 2019 17:18:08 +0000]
rx: Avoid new server calls for big-seq DATA pkts

We currently never open our receive window to more than 32 packets. If
we received a DATA packet for an unrecognized call with a seq of 33 or
more, the packet is almost certainly from a previously-running call
that we were restarted during.

As described in commit 7b204946 (rx: Avoid lastReceiveTime update for
invalid ACKs) and commit "rx: Avoid new server calls for non-DATA
packets", clients can get confused when we respond to calls in these
situations, so drop the packets instead.

Reviewed-on: https://gerrit.openafs.org/13876
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit a36832e2d891caab8644a3b4641c7c94fab4105f)

Change-Id: I72f903b81a205bb3e64862da03f9c1c76cc37b75
Reviewed-on: https://gerrit.openafs.org/14507
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agorx: Avoid new server calls for non-DATA packets 06/14506/2
Mark Vitale [Thu, 8 Aug 2019 22:18:22 +0000]
rx: Avoid new server calls for non-DATA packets

Normally, a client starts a new Rx call by sending DATA packets for
that call to a server, and rxi_ReceiveServerCall on the server creates
a new call struct for that call (since we don't recognize it as an
existing call).

Under certain circumstances, it's possible for a server to see a
non-DATA packet as the first packet for a call, and currently
rxi_ReceiveServerCall will create a new server call for any packet
type. The call cannot actually proceed until the server receives data
from the client (and goes through the challenge/response auth
handshake, if needed), but usually this is harmless, since the
existence of any packets for a particular call channel indicate that
the client is trying to run such a call. The server will respond to
the client with ACKs to indicate that it is missing the needed DATA
packet(s), and the client will send them and the call can proceed.

However, if a call is in the middle of running when the server is
restarted, the client may be sending ACKs for a pre-existing call that
the server doesn't know about. In this case, the server generates ACKs
that indicate the server has not received any DATA packets, which may
appear to violate the protocol, depending on the prior state of the
call (e.g. the server appears to try to move the window backwards).

Clients should be able to detect this and kill the call, but many do
not. For many OpenAFS releases before commit 7b204946 (rx: Avoid
lastReceiveTime update for invalid ACKs), the client will get confused
in this situation and will keep the call open forever, never making
progress.

There isn't any benefit to creating a new server call in these
situations, so just ignore non-DATA packets for unrecognized calls, to
avoid stalled calls from such clients. Those clients will not get a
response from the server, and so the call will eventually die from the
normal Rx call timeout.

Reviewed-on: https://gerrit.openafs.org/13758
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 3f9a08db86f951df3f6f69f1143f17dd7b43b150)

Change-Id: Iaf8ee360f8aa634b5a7728866e41de267389e1f3
Reviewed-on: https://gerrit.openafs.org/14506
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agorx: Avoid lastReceiveTime update for invalid ACKs 05/14505/2
Andrew Deason [Wed, 28 Aug 2019 03:58:23 +0000]
rx: Avoid lastReceiveTime update for invalid ACKs

Currently, we ignore ACK packets in a few cases:

- If the ACK appears to move the window backwards (if firstPacket is
  smaller than call->tfirst).

- If the ACK appears to have been received out of order (if
  previousPacket is smaller than call->tprev).

- If the ACK packet appears truncated.

In all of these cases, we ignore the ACK packet completely in our ACK
processing code (rxi_ReceiveAckPacket), but we still process the
packet at higher levels (rxi_ReceivePacket). Notably, this means we
update call->lastReceiveTime after rxi_ReceiveAckPacket returns, even
for ACK packets we haven't really looked at.

Normally this does not cause any noticeable problems, because such
packets should either never be encountered, or only consist of a small
number of packets that are mixed in with valid packets.

However, if our peer is a server, and it is restarted in the middle of
a call, our peer may exclusively send us packets that fall into the
above categories. (This does not happen if our peer is a client,
because clients just ignore packets for calls they do not recognize.)
For example:

Consider a call where a client is sending data to a server, and the
server restarts after the client has sent a DATA packet with sequence
number 1000. The server may then start responding to the client with ACKs
with firstPacket set to 1, since the restarted server has no knowledge
of the call's state.

In this case, a firstPacket of 1 is well below where our window was,
so all of the ACKs from the server are ignored. But we keep updating
call->lastReceiveTime for all of these packets, and so the call stays
alive forever until an idle-dead or hard-dead timeout activates (if
any are set).

As another example, consider the case where a client is sending data
to a server, and the server receives a full window of packets (say, 16
packets), has not yet passed any data to the application yet, and the
server restarts. The restarted server then starts responding to the
client with ACKs with firstPacket set to 1, and previousPacket set to
0. We also ignore all of the ACKs from the server in this case,
because even though firstPacket looks sane, it looks like
previousPacket has gone backwards. We still update
call->lastReceiveTime for each ignored ACK we get, keeping the call
alive.

Before commit 4e71409f (Rx: Reject out of order ACK packets) was
introduced in 1.6.0, neither of these issues could occur. That commit
introduced the issue specifically if previousPacket goes backwards;
that is, if the server restarts before firstPacket moves forwards.

Commit 8d359e6d (rx: Remove duplicate out of order ACK check) in 1.8.0
introduced the issue when 'firstPacket' goes backwards, since
previously the FIRSTACKOFFSET-based check caused us to ignore those
packets without updating call->lastReceiveTime. That is, if the server
restarts after firstPacket moves forwards.

In this commit, we still ignore packets in the above cases, but we
also avoid updating lastReceiveTime when we update such packets, to
make sure that we do not keep a call alive solely from receiving these
invalid packets.

Alternatively, we could change our logic to immediately abort calls
where firstPacket moves backwards (since this violates the Rx
protocol), or to not ignore some packets where previousPacket goes
backwards (since these calls may be recoverable). And we could also
skip updating lastReceiveTime for invalid packets of other types. But
for now, this commit just avoids updating lastReceiveTime for invalid
ACK packets, in order to just try to restore our behavior before
1.6.0, while still retaining the benefits of ignoring out-of-order
ACKs. Further changes in this area can potentially be handled
separately by future commits.

Also increment the spuriousPacketsRead counter for packets that we
ignore in this way (which we used to do for some packets before commit
8d359e6d), so we are not entirely silent about ignoring them.

Written in collaboration with mvitale@sinenomine.net.

Reviewed-on: https://gerrit.openafs.org/13875
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 7b204946010673506e0f74991f59a0865292199c)

Change-Id: I8e0299bdeedb005fe49a2d1c4a00a21301fbbb04
Reviewed-on: https://gerrit.openafs.org/14505
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agorx: Introduce ack_is_valid 04/14504/2
Andrew Deason [Wed, 28 Aug 2019 22:12:53 +0000]
rx: Introduce ack_is_valid

Take some of our existing logic for ignoring invalid ACK packets and
split it out into a separate function, ack_is_valid. This just makes
it easier to add more complex logic in here and write longer comments
explaining the decisions.

Note that the bug mentioned regarding the previousPacket field was
introduced in IBM AFS 3.5, and was fixed in OpenAFS in commit bbf92017
(rx: rxi_ReceiveDataPacket do not set rprev on drop), included in
OpenAFS 1.6.23.

This commit incurs no functional change; it is just code
reorganization.

Reviewed-on: https://gerrit.openafs.org/13874
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit f6490629e1239c412002f316804c656c9be61400)

Change-Id: I5d0ee9b7fc56659e445705a01d5d90141eb8cfe2
Reviewed-on: https://gerrit.openafs.org/14504
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agorx: For AFS_RXERRQ_ENV, retry sendmsg on error 03/14503/2
Andrew Deason [Mon, 2 Nov 2020 19:52:25 +0000]
rx: For AFS_RXERRQ_ENV, retry sendmsg on error

When AFS_RXERRQ_ENV is defined, we currently end up doing something
like this for our sendmsg abstractions:

    if (sendmsg(...) < 0) {
        while (rxi_HandleSocketError(sock))
            ;
        return error;
    }
    return success;

This means that when sendmsg() returns an error, we process the socket
error queue before returning an error.

The problem with this is that when we receive an ICMP error on our
socket, it creates a pending socket error that is returned for any
operation on the socket. So, if we receive an ICMP error after trying
to contact any peer, sendmsg() could return an error when trying to
send for any other peer. Even though there is no issue preventing us
from sending the packet, we'll fail to actually send the packet
because sendmsg() returned an error. This effectively causes an extra
outgoing packet drop, possibly delaying the related RPC.

To avoid this, change Rx to retry the sendmsg call when it returns an
error, since the error may be due to an unrelated ICMP error.

To avoid needing to implement this retry loop in multiple places, move
around our sendmsg code for AFS_RXERRQ_ENV, so that the higher-level
function rxi_NetSend performs the retry and checks for socket errors
(instead of the lower-level rxi_Sendmsg or osi_NetSend). Also change
our functions to process socket errors to be more consistent between
kernel and userspace: now we always have rxi_HandleSocketErrors, which
runs a loop around the platform-specific osi_HandleSocketError.

With this commit, osi_HandleSocketError is now required to be
implemented when AFS_RXERRQ_ENV is defined. We hadn't been
implementing this for UKERNEL, so just turn off AFS_RXERRQ_ENV for
UKERNEL.

Thanks to mbarbosa@sinenomine.net for discovering and providing
information about the relevant issue.

Reviewed-on: https://gerrit.openafs.org/14424
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 5c9234694543f206174d30e21886286d419fd8df)

Change-Id: I1b21ba4d2b98abae240cb683d6061462db028431
Reviewed-on: https://gerrit.openafs.org/14503
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agorx: Save errno in pthread rxi_Sendmsg 02/14502/2
Andrew Deason [Mon, 2 Nov 2020 19:16:41 +0000]
rx: Save errno in pthread rxi_Sendmsg

Currently, our pthread version of rxi_Sendmsg uses 'errno' in some
logic if sendmsg fails, but we do so after calling functions that
might alter errno (e.g. fflush).

To make sure we get the correct errno value, save the value of errno
right after sendmsg returns an error. Reorganize this function a bit
to help make the logic easier to follow.

Reviewed-on: https://gerrit.openafs.org/14423
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit eff7fa4b2eb9a3001dc18dca157ccbd5f19f89b6)

Change-Id: Ie761bf8fbf930718d933fdc2d0ad6961b2660607
Reviewed-on: https://gerrit.openafs.org/14502
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoaklog: Make dummy write AIX-specific 83/14483/2
Andrew Deason [Thu, 9 Jan 2020 18:38:45 +0000]
aklog: Make dummy write AIX-specific

This weird write() call exists to work around some old AIX-specific
bug. The ifdef looks like it is intended to restrict this to pre-5
AIX, but it also turns this on for all non-AIX platforms.

Make this area AIX-specific, to avoid this weird write on other
platforms that have nothing to do with the relevant workaround.

Reviewed-on: https://gerrit.openafs.org/14022
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 6ee2d6de7d87c93c849f3afbe4326906e4c10852)

Change-Id: Iaa84f1c1df57f9b0749c2577e496fbf8740e48c1
Reviewed-on: https://gerrit.openafs.org/14483
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoaklog: avoid infinite lifetime tokens by default 82/14482/2
Yadavendra Yadav [Wed, 28 Aug 2019 11:56:41 +0000]
aklog: avoid infinite lifetime tokens by default

Currently we get tokens for infinite lifetime using aklog impersonate
feature. Based on inputs from Ben, this was done for server to server
tickets to be valid forever.  However on 1.8.x we have other
mechanisms that were usable for server-to-server authentication with
strong enctypes, so we do not need to provide user level akimpersonate
to generate tokens for infinite lifetime. For this we have added new
option -token-lifetime <hrs>, this can take values from 0 to 720
hours. If 0 is specified it means tokens will have infinite lifetime.
By default 10 hours will be token lifetime for akimpersonate tokens.

Reviewed-on: https://gerrit.openafs.org/13828
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 1de602aaada15df1008140784092c2a76a2613a1)

Change-Id: I032431ab1b8b174ac8d80322b688dc2a7285b8fa
Reviewed-on: https://gerrit.openafs.org/14482
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoaklog: use any enctype in get_credv5 81/14481/2
Yadavendra Yadav [Wed, 28 Aug 2019 11:34:31 +0000]
aklog: use any enctype in get_credv5

We currently always pass DES as the requested enctype to
get_credv5_akimpersonate, but this means we will fail to use our
service princ if we're using another enctype (say, AES) with rxkad-k5.
To allow this to work with any enctype, just don't pass any requested
enctypes, and just use the enctype inside the 'entry' returned to us
from krb5_kt_get_entry.

Remove all of the logic associated with the now-unused
"allowed_enctypes" argument. Also remove the logic handling the case
where "service_principal" is NULL (since no callers pass a NULL
service_principal), to make it easier to take out the allowed_enctypes
related code.

Reviewed-on: https://gerrit.openafs.org/13827
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 276bd5c7f8a2ec7673d2ad084566203eb2055938)

Change-Id: Ia4b2cab3b2cd81214683dc00d7092a302d5af1bd
Reviewed-on: https://gerrit.openafs.org/14481
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoaklog: retry getting tokens for KRB5_KT_NOTFOUND error 80/14480/2
Yadavendra Yadav [Wed, 28 Aug 2019 11:13:35 +0000]
aklog: retry getting tokens for KRB5_KT_NOTFOUND error

If we're creating tokens with -keytab and our AFS service principal is
afs@<cellname>, we'll first try creating tokens with
afs/<cellname>@<cellname> and krb5_kt_get_entry will fail with
KRB5_KT_NOTFOUND. Since we do not retry for KRB5_KT_NOTFOUND error, we
will not get tokens. So in order to get tokens for principal
afs@<cellname> we should retry for KRB5_KT_NOTFOUND error. Thanks to
jpjanosi@us.ibm.com for finding this issue and suggesting a fix.

Reviewed-on: https://gerrit.openafs.org/13826
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 7a13bce2513baf5a3a61db94f3d88232241cea5b)

Change-Id: I4f4dfb4c1372aef88a938d1b96d012a3f6bb4218
Reviewed-on: https://gerrit.openafs.org/14480
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoaklog: Use HAVE_ENCODE_KRB5_ENC_TKT_PART for aklog impersonate 79/14479/2
Yadavendra Yadav [Wed, 28 Aug 2019 10:55:49 +0000]
aklog: Use HAVE_ENCODE_KRB5_ENC_TKT_PART for aklog impersonate

In get_credv5_akimpersonate we use HAVE_ENCODE_KRB5_ENC_TKT which is not
defined, due to this we always return -1 from this routine for non
Heimdal case. We have a another define i.e
HAVE_ENCODE_KRB5_ENC_TKT_PART which is defined if
encode_krb5_enc_tkt_part function is present. In current code
encode_krb5_enc_tkt_part is called from krb5_encrypt_tkt_part and
krb5_encrypt_tkt_part is called from get_credv5_akimpersonate for non
Heimdal case. So we should change HAVE_ENCODE_KRB5_ENC_TKT to
HAVE_ENCODE_KRB5_ENC_TKT_PART.
Also while we're here, add a declaration for the internal function
encode_krb5_ticket, so we can build this newly-enabled code without
warnings.

Reviewed-on: https://gerrit.openafs.org/13825
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
(cherry picked from commit 6559297610de0f71c9050f3582d4d146e0cc1f3c)

Change-Id: Ia89cdbf23160c71e5b65b8220e1c1f73f1055064
Reviewed-on: https://gerrit.openafs.org/14479
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoaklog: Free client/server princs in get_credv5 78/14478/2
Yadavendra Yadav [Fri, 9 Aug 2019 21:24:38 +0000]
aklog: Free client/server princs in get_credv5

Inside get_credv5, client_principal is static so the first time
get_credv5 runs we'll allocate memory for it, and on subsequent calls
we'll reuse the same value.

However, if we call get_credv5_akimpersonate, we'll free
client_principal and never change what client_principal points to. If we
need to call get_credv5 again (because we need to retry getting creds),
we'll reuse the old value for client_principal, but since it points to
free memory we'll segfault or cause other problems.

To avoid this, change get_credv5 so we allocate the client and server
principals on each invocation of get_credv5 and free them before
returning from get_credv5. Since we free the client and server
principals inside get_credv5, remove freeing the client and server
principals inside get_credv5_akimpersonate.

Reviewed-on: https://gerrit.openafs.org/13761
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit ab8b28540ef17d67db02d5dbcb7585443c164e45)

Change-Id: I818202660be4522bd49bf207c59d202ed8adf88d
Reviewed-on: https://gerrit.openafs.org/14478
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoaklog: free kbr5_creds before returning from rxkad_get_token 77/14477/2
Yadavendra Yadav [Fri, 9 Aug 2019 21:11:01 +0000]
aklog: free kbr5_creds before returning from rxkad_get_token

rxkad_get_ticket allocates 'v5cred' which should be freed when we
return from rxkad_get_token.

Reviewed-on: https://gerrit.openafs.org/13760
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 130a92214cc0b9a8f4ea24a3dcd3ed04575e3c4e)

Change-Id: I02720f37c71ee56b4bd3d79d5f3e06c3ee647c9b
Reviewed-on: https://gerrit.openafs.org/14477
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoWINNT: Restore missing '#ifdef PC' 90/14490/2
Andrew Deason [Sat, 9 Jan 2021 18:47:09 +0000]
WINNT: Restore missing '#ifdef PC'

Commit 339167ef (Remove dead code) meant to remove the '#ifdef notdef'
block in here, but we accidentally also removed the subsequent '#ifdef
PC'.

This file may not be very important, since WINNT still builds with
this mistake, but an unbalanced #ifdef is potentially super confusing,
so fix it.

Reviewed-on: https://gerrit.openafs.org/14487
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 2971dcb3b4da04fff3f4bd9c3d3e3f0ab7a94cae)

Change-Id: I273ad30d38d7a41e7ec662994d91e084c24194bb
Reviewed-on: https://gerrit.openafs.org/14490
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoRemove dead code 76/14476/2
Andrew Deason [Wed, 10 Jul 2019 20:14:28 +0000]
Remove dead code

There is a perhaps-surprisingly large amount of code disabled behind
directives like '#if 0', '#ifdef notdef', and '#ifdef notyet'. At
best, this code is clutter, and at worst some of it is
confusing/outdated, and/or confusingly nested inside other
preprocessor conditionals. Sometimes this disabled code shows up when
grepping the tree, and causes a nuisance when refactoring related
areas of code.

Get rid of all of it. If anyone ever wants this code back, it can
always be restored by reverting portions of this commit.

Also delete some comments that clearly refer to the disabled code, and
in some cases, adjust the adjacent comments to make sense accordingly.

This commit doesn't touch any files in src/external/.

Reviewed-on: https://gerrit.openafs.org/13683
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 339167ef1fda899655969f4572ff95271dfdb7cf)

Change-Id: I440b01de0fdb0ef2602557bf3fa35dcdf9a22cdc
Reviewed-on: https://gerrit.openafs.org/14476
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoMerge branch 'openafs-stable-1_8_7-branch' into openafs-stable-1_8_7
Benjamin Kaduk [Thu, 14 Jan 2021 22:24:38 +0000]
Merge branch 'openafs-stable-1_8_7-branch' into openafs-stable-1_8_7

Record the history of the 1.8.7 emergency patch release.
Resolve the nominal conflict in configure.ac due to 1.8.7 bumping
the version and openafs-stable-1_8_x removing the LINUX_PKGREL variable.

Change-Id: Ifa719bcec3948b2634841fba90e835f9db088dd6

3 years agoMake OpenAFS 1.8.7 openafs-stable-1_8_7
Benjamin Kaduk [Thu, 14 Jan 2021 21:08:41 +0000]
Make OpenAFS 1.8.7

Update version strings for the 1.8.7 emergency patch release.

Change-Id: I665bedad864b1c2cbbe55978d6b06e917ed26faa

3 years agoUpdate NEWS for 1.8.7
Benjamin Kaduk [Thu, 14 Jan 2021 21:06:18 +0000]
Update NEWS for 1.8.7

Add the release notes for the 1.8.7 emergency patch release.

Change-Id: I813f11e4e72c12cb927f66472b099febbf3d899f

3 years agoRemove overflow check from update_nextCid
Benjamin Kaduk [Thu, 14 Jan 2021 18:20:59 +0000]
Remove overflow check from update_nextCid

The rx_nextCid global has been an unsigned type since
http://gerrit.openafs.org/11106 (which was actually merged before
the refactoring of overflow check to avoid signed integer overflow)
and thus there is no need to avoid signed overflow.  The per-connection
cid has been unsigned since the IBM import.

The natural unsigned behavior on overflow of wrapping is the desired
behvaior here, so just remove the extra logic and always increment.

Reviewed-on: https://gerrit.openafs.org/14496
Reviewed-by: Jeffrey Hutzelman <jhutz@cmu.edu>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 43ef1f2a5d80aa1c3f5b4831ada8e776ac0c7d13)

Change-Id: I64fabe5229039f7af040902ed2e6f03dba7bc14d
Reviewed-on: https://gerrit.openafs.org/14497
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
(cherry picked from commit 5004f888e32e8274fcd8a28a7bff6aa3a79f41c8)

3 years agorx: update_nextCid overflow handling is broken
Jeffrey Altman [Thu, 14 Jan 2021 14:57:13 +0000]
rx: update_nextCid overflow handling is broken

The overflow handling in update_nextCid() produces a rx_nextCid
value of 0x80000001 which itself is out of the valid range.   When
used to construct the first call of a new connection the connection
id for the call becomes 0x80000002, and all subsequent connections
also trigger the overflow handling and thus also receive connection
id 0x80000002.

If the same connection id is used for multiple connections from
the same endpoint the accepting rx peer will be very confused.

When authenticated connections are used, the CHALLENGE/RESPONSE
will fail because of a mismatch in the connection's callNumber
array.

If an initiator makes only a single connection to a given rx peer,
that connection would succeed, but once multiple connections are
initiated all communication from a broken initiator to any rx peer
will fail.

The incorrect overflow calculation was introduced by
39b165cdda941181845022c183fea1c7af7e4356 ("Move epoch and cid
generation into the rx core").

This change corrects the overflow value to become

  1 << RX_CIDSHIFT

Reviewed-on: https://gerrit.openafs.org/14492
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 2c0a3901cbfcb231b7b67eb0899a3133516f33c8)

Change-Id: I74d70706ddf99022bed639891cb610fba9ef863d
Reviewed-on: https://gerrit.openafs.org/14494
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
(cherry picked from commit f5ed8c2fac4c94914099881250f5f2e893f3f9f7)

3 years agorx: rx_InitHost do not overwrite RAND_bytes rx_nextCid
Jeffrey Altman [Thu, 14 Jan 2021 14:41:39 +0000]
rx: rx_InitHost do not overwrite RAND_bytes rx_nextCid

39b165cdda941181845022c183fea1c7af7e4356 ("Move epoch and cid
generation into the rx core") introduced the use of RAND_bytes()
to generate the initial 'rx_nextCid' but failed to remove the

  rx_nextCid = ((tv.tv_sec ^ tv.tv_usec) << RX_CIDSHIFT;

assignment inherited from IBM/Transarc.

At Thu, 14 Jan 2021 08:25:36 GMT the IBM inherited calculation
overflows the value CID range.   This triggers broken overflow
logic in update_nextCid().

Reviewed-on: https://gerrit.openafs.org/14491
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Jeffrey Hutzelman <jhutz@cmu.edu>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit a3bc7ff1501d51ceb3b39d9caed62c530a804473)

Change-Id: If5f7d4ba1cacc6978c83fd512653fbaa0c1559d8
Reviewed-on: https://gerrit.openafs.org/14493
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
(cherry picked from commit a41fe24be574f35ca852fc3ea9750e370cdb71d0)

3 years agoRemove overflow check from update_nextCid 97/14497/3
Benjamin Kaduk [Thu, 14 Jan 2021 18:20:59 +0000]
Remove overflow check from update_nextCid

The rx_nextCid global has been an unsigned type since
http://gerrit.openafs.org/11106 (which was actually merged before
the refactoring of overflow check to avoid signed integer overflow)
and thus there is no need to avoid signed overflow.  The per-connection
cid has been unsigned since the IBM import.

The natural unsigned behavior on overflow of wrapping is the desired
behvaior here, so just remove the extra logic and always increment.

Reviewed-on: https://gerrit.openafs.org/14496
Reviewed-by: Jeffrey Hutzelman <jhutz@cmu.edu>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 43ef1f2a5d80aa1c3f5b4831ada8e776ac0c7d13)

Change-Id: I64fabe5229039f7af040902ed2e6f03dba7bc14d
Reviewed-on: https://gerrit.openafs.org/14497
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>

3 years agorx: update_nextCid overflow handling is broken 94/14494/2
Jeffrey Altman [Thu, 14 Jan 2021 14:57:13 +0000]
rx: update_nextCid overflow handling is broken

The overflow handling in update_nextCid() produces a rx_nextCid
value of 0x80000001 which itself is out of the valid range.   When
used to construct the first call of a new connection the connection
id for the call becomes 0x80000002, and all subsequent connections
also trigger the overflow handling and thus also receive connection
id 0x80000002.

If the same connection id is used for multiple connections from
the same endpoint the accepting rx peer will be very confused.

When authenticated connections are used, the CHALLENGE/RESPONSE
will fail because of a mismatch in the connection's callNumber
array.

If an initiator makes only a single connection to a given rx peer,
that connection would succeed, but once multiple connections are
initiated all communication from a broken initiator to any rx peer
will fail.

The incorrect overflow calculation was introduced by
39b165cdda941181845022c183fea1c7af7e4356 ("Move epoch and cid
generation into the rx core").

This change corrects the overflow value to become

  1 << RX_CIDSHIFT

Reviewed-on: https://gerrit.openafs.org/14492
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 2c0a3901cbfcb231b7b67eb0899a3133516f33c8)

Change-Id: I74d70706ddf99022bed639891cb610fba9ef863d
Reviewed-on: https://gerrit.openafs.org/14494
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>

3 years agorx: rx_InitHost do not overwrite RAND_bytes rx_nextCid 93/14493/2
Jeffrey Altman [Thu, 14 Jan 2021 14:41:39 +0000]
rx: rx_InitHost do not overwrite RAND_bytes rx_nextCid

39b165cdda941181845022c183fea1c7af7e4356 ("Move epoch and cid
generation into the rx core") introduced the use of RAND_bytes()
to generate the initial 'rx_nextCid' but failed to remove the

  rx_nextCid = ((tv.tv_sec ^ tv.tv_usec) << RX_CIDSHIFT;

assignment inherited from IBM/Transarc.

At Thu, 14 Jan 2021 08:25:36 GMT the IBM inherited calculation
overflows the value CID range.   This triggers broken overflow
logic in update_nextCid().

Reviewed-on: https://gerrit.openafs.org/14491
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Jeffrey Hutzelman <jhutz@cmu.edu>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit a3bc7ff1501d51ceb3b39d9caed62c530a804473)

Change-Id: If5f7d4ba1cacc6978c83fd512653fbaa0c1559d8
Reviewed-on: https://gerrit.openafs.org/14493
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>

3 years agovldb_check: Check for volume lock inconsistencies 50/14450/2
Michael Meffie [Mon, 17 Aug 2020 19:44:55 +0000]
vldb_check: Check for volume lock inconsistencies

Verify the a lock timestamp is set if, and only if, a lock volume
operation flag is also set.

When running vldb_check with the -fix option, fix the inconsistent
entries by setting the lock timestamp to the current time if a lock flag
is set, or by setting the VLOP_DELETE flag if the lock timestamp is set
but no lock flags are set. (The VLOP_DELETE flag is the flag set by the
'vos lock command, and is shown in vos output as "delete/misc".)

Volume lock fields can be put into an inconsistent state, at least, by
interupted vos rename operations, due to bugs in vos rename. When the
volume lock timestamp and lock flags are in this inconsistent state, the
volume is locked, but that is not indicated by 'vos listvldb'. The
volume can be unlocked by issuing 'vos unlock'.

Reviewed-on: https://gerrit.openafs.org/14307
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 4c33820525af510a8a937289005e39d5b6683b19)

Change-Id: Ia894139145d92948b2af43bd115792556131cd5a
Reviewed-on: https://gerrit.openafs.org/14450
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoauth: Close fd on SetExtendedCellInfo write error 61/14461/3
Andrew Deason [Mon, 18 May 2020 17:09:38 +0000]
auth: Close fd on SetExtendedCellInfo write error

Currently, and since OpenAFS 1.0, if write() fails here, we leak the
file descriptor. A write() failure should be very unlikely, but close
the fd to make sure we avoid the leak.

Reviewed-on: https://gerrit.openafs.org/14213
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit c81579dc7b0c0ac6bc34f63384d705a4445c2bbd)

Change-Id: I4dd96cca2fd9c01390fb508ab12d507ab1a56c8b
Reviewed-on: https://gerrit.openafs.org/14461
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoRemove DUX/OSF code 52/14452/2
Andrew Deason [Thu, 26 Jul 2018 20:48:00 +0000]
Remove DUX/OSF code

Remove code for DUX/OSF platforms. DUX code was removed from the
libafs client in commit 392dcf67 ("Complete removal of DUX client
code") and the alpha_dux* param files were removed in dc4d9d64 ("afs:
Remove AFS_BOZONLOCK_ENV"). This code has always been disabled since
those commits, so remove any code referencing AFS_DUX*_ENV,
AFS_OSF_ENV, and related symbols.

Reviewed-on: https://gerrit.openafs.org/13260
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 6534b10a4180ec10bceebbc11405718e7969fa21)

Change-Id: I632636fe6c5111b60c5b586c346ecc10ccfa8f3c
Reviewed-on: https://gerrit.openafs.org/14452
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoLINUX: Return errors in our d_revalidate 51/14451/2
Andrew Deason [Mon, 26 Oct 2020 17:35:32 +0000]
LINUX: Return errors in our d_revalidate

In our d_revalidate callback (afs_linux_dentry_revalidate), we
currently 'goto bad_dentry' when we encounter any error. This can
happen if we can't allocate memory or some other internal errors, or
if the relevant afs_lookup call fails just due to plain network
errors.

For any of these cases, we'll treat the dentry as if it's no longer
valid, so we'll return '0' and call d_invalidate() on the dentry.
However, the behavior of d_invalidate changed, as mentioned in commit
afbc199f1 (LINUX: Avoid d_invalidate() during
afs_ShakeLooseVCaches()). After a certain point in the Linux kernel,
d_invalidate() will also effectively d_drop() the given dentry,
unhashing it. This can cause getcwd() calls to fail with ENOENT for
those directories (as mentioned in afbc199f1), and can cause
bind-mount calls to fail similarly during a small window.

To avoid all of this, when we encounter an error that prevents us from
checking if the dentry is valid or not, we need to return an error,
instead of saying 'yes' or 'no'. So, change
afs_linux_dentry_revalidate to jump to the 'done' label when we
encounter such errors, and avoid calling d_drop/d_invalidate in such
cases. This also lets us remove the 'lookup_good' variable and
consolidate some of the related logic.

Important note: in older Linux kernels, d_revalidate cannot return
errors; callers just interpreted its return value as either 'valid'
(non-zero) or 'not valid' (zero). The treatment of negative values as
errors was introduced in Linux commit
bcdc5e019d9f525a9f181a7de642d3a9c27c7610, which was included in
2.6.19. This is very old, but technically still above our stated
requirements for the Linux kernel, so try to handle this case, by
jumping to 'bad_dentry' still for those old kernels. Just do this with
a version check, since no configure check can detect this (no function
signatures changed), and the only Linux versions that are a concern
are quite old.

Reviewed-on: https://gerrit.openafs.org/14417
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 78e5e1b0e54b31bb08b7578e86a6a2a95770d94c)

Change-Id: I9f9e2cd3a10cc8fa30a770cabd6ae9757f412ce5
Reviewed-on: https://gerrit.openafs.org/14451
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agovos: avoid 'half-locked' volume after interrupted 'vos rename' 49/14449/2
Mark Vitale [Mon, 20 Apr 2020 18:51:08 +0000]
vos: avoid 'half-locked' volume after interrupted 'vos rename'

Reported symptoms:

If a 'vos rename' is interrupted after it has locked the volume and
replaced the VLDB entry, but before it has unlocked the volume, the
volume will remain locked.  However, the locked volume will NOT be
listed as locked in any vos commands that display locked status (see
below for details).

Background:

Most vos write operations lock the VLDB volume entry before proceeding,
then release the volume lock when finished.  This is accomplished via
VL_SetLock and VL_ReleaseLock, respectively.

VL_SetLock always sets these members in the VLDB volume entry:
- flags is modified to set the required VLOP_* code bit as specified
- LockAFSid is set to 0 (never implemented)
- LockTimestamp is set to the current time

VL_ReleaseLock always sets them as follows:
- flags is cleared of any VLOP_* code bit
- LockAFSid is set to 0 (never implemented)
- LockTimestamp is set to 0

VL_ReplaceEntry(N) may also optionally clear each of these members:
- flags operation bits may be explicitly cleared via LOCKREL_OPCODE
- LockAFSid may be explicitly cleared via LOCKREL_AFSID
- LockTimestamp may be explicitly cleared via LOCKREL_TIMESTAMP

When all 3 options are specified, VL_ReplaceEntry also does the
functional equivalent of a VL_ReleaseLock.  Most vos operations use this
method.  However, when no lock release options are specified on
VL_ReplaceEntry(N), the VLDB entry is simply replaced with the supplied
entry.  This includes whatever flags values are specified in the
supplied entry; therefore, this amounts to an additional, implicit way
to set or modify the flags.

Root cause:

'vos rename' (UV_RenameVolume) is the only vos operation that does all
of the following things:
- accepts a replacement volume entry that was obtained before VL_SetLock
  (and thus does NOT have any lock flags set)
- issues VL_SetLock (which sets the lock flag in the VLDB)
- issues VL_ReplaceEntry(N) with the original unlocked entry, and with
  no lock release options (thus with explicit intent to leave the lock
  flag unchanged, but inadvertently doing an implicit clear of the lock
  flag in the VLDB)
- (performs some additional volserver work)
- issues VL_ReleaseLock to release the volume lock

Therefore, if 'vos rename' is cancelled or killed before reaching the
final VL_ReleaseLock step, the VLDB entry is left with the lock flags
cleared but the LockTimestamp still set.  As we will see below, this
'half-locked' state produces confusing results from other vos commands.

Detection of locked state:

The 'vos lock' command (and all other vos commands that issue
VL_SetLock) use the lock timestamp to determine if a volume is locked.

However, several other vos commands ('vos listvldb <vol>', 'vos examine
<vol>', 'vos listvldb -locked') use the VLDB entry's lock flags (not the
lock timestamp) to determine if the volume is locked.  Therefore, if the
lock flags have been cleared but the lock timestamp is still set, these
commands fail to detect that the volume is still locked.  Yet an
administrator's 'vos lock <volume>' will still fail with:

  Could not lock VLDB entry for volume <volume>
  VLDB: vldb entry is already locked

This is the external manifestation of the 'half-locked' state.

Workaround and fix:

This scenario has a simple workaround: 'vos unlock <volume>'.  However,
to avoid this confusing outcome in the first place, modify the 'vos
rename' logic so that the lock flags are no longer inadvertently
cleared.  Now, if the 'vos rename' is interrupted before the volume is
unlocked, it will still appear locked in normal vos command output.

Change-Id: Iefc6ef54ea4b0e95e3ae8e8a43d3ded0f15af0fa
Reviewed-on: https://gerrit.openafs.org/14157
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-on: https://gerrit.openafs.org/14449
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agovos: Cleanup function definitions 48/14448/2
Cheyenne Wills [Thu, 5 Nov 2020 20:50:59 +0000]
vos: Cleanup function definitions

The functions defined within vos.c are not referenced outside of vos.c
but are not declared as static.

Convert the functions within vos.c to static declarations.

Reviewed-on: https://gerrit.openafs.org/14009
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 56aa396d8359276d778d41aa509041c8c75b4e96)

Change-Id: Idca045431959bb3e4b31d12ef754a883d4118a89
Reviewed-on: https://gerrit.openafs.org/14448
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agovos: Remove dead code 47/14447/2
Cheyenne Wills [Thu, 5 Nov 2020 20:49:54 +0000]
vos: Remove dead code

Clean out dead code from vos.c

GetVolumeType - not referenced anywhere
CompareVLDBEntry - commented out since 1st git commit
osi_audit - Comment indicates this might have been needed at one point.
            Builds without it.  Does not look like the vos executable
            is pulling in any of the audit code.
RestoreVolume - remove stale comment about typo previous to openafs 1.0
RemoveSite - remove commented out partition check

Reviewed-on: https://gerrit.openafs.org/14008
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit a3be2c74a95489f63837840af8ec42049ce021bf)

Change-Id: I71a78d2a46b8d64cdde9db05a0079e9db954d191
Reviewed-on: https://gerrit.openafs.org/14447
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agovos: Cleanup indentation whitespace 46/14446/2
Cheyenne Wills [Tue, 10 Nov 2020 16:17:16 +0000]
vos: Cleanup indentation whitespace

Fix the indentation whitespace in vos.c, and remove double blank
lines.  No functional change.

Reviewed-on: https://gerrit.openafs.org/14007
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit c17c157641d83226fee5bc20f588f14bb132bb68)

Change-Id: Iecde7505a3f59c4b6e59d4644b7a1e56127c272d
Reviewed-on: https://gerrit.openafs.org/14446
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agovsprocs: Remove dead code 45/14445/2
Michael Meffie [Fri, 27 Dec 2019 16:53:05 +0000]
vsprocs: Remove dead code

Remove the dead code in UV_VolumeMove() commented out with the macro
ENABLE_BUGFIX_1165.

Remove two commented out lines of code in UV_ConvertRO().

Reviewed-on: https://gerrit.openafs.org/14004
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 6779e30d372b2cd5e7995da23ed5e2971124b79c)

Change-Id: Ibeddebdf24ca50341bba3031c6f8548cab245b8a
Reviewed-on: https://gerrit.openafs.org/14445
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agobozo: defer audit open until log dir is created and current 44/14444/2
Mark Vitale [Tue, 6 Oct 2020 14:18:11 +0000]
bozo: defer audit open until log dir is created and current

On a new OpenAFS install where the log directory has not yet been
created. 'bosserver -auditlog /usr/afs/logs/<auditlog>' (absolute path)
fails with ENOENT because the log directory doesn't exist yet.

Furthermore, 'bosserver -auditlog <auditlog>' (relative path) succeeds,
but the audit file is created in the current working directory when
bosserver was started, not in the expected log directory (Transarc
/usr/afs/logs).

Both problems have been present since bosserver audit log support was
introduced by commit 16d67791dce45e5d4ee9b854c796492ffcde2113
'auditlogs-for-everyone-20050702'.

Reorder the bosserver initialization steps to ensure that the log
directory has been created and is the current working directory, before
creating and opening the audit log.

Reviewed-on: https://gerrit.openafs.org/14381
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit f372ec041a83288a5d096360f0ad8589e4db666a)

Change-Id: I14a0a4a2a23c8e9b3b658d52511872ecaa4010af
Reviewed-on: https://gerrit.openafs.org/14444
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agobozo: Properly detect presence of -auditlog 43/14443/2
Andrew Deason [Sun, 18 Oct 2020 01:51:51 +0000]
bozo: Properly detect presence of -auditlog

cmd_OptionAsString returns non-zero if the given option _isn't_ given
(CMD_MISSING), so we need to call osi_audit_file only when
cmd_OptionAsString returns 0. Since commit
f6cdf71 (bozo: Use libcmd for command line options), this causes
bosserver to complain on startup if no -auditlog was given:

    $ bosserver
    Warning: auditlog (null) not writable, ignored.

To fix this, skip calling osi_audit_file if -auditlog was not given.

While we're changing this anyway, change our processing of our
audit-related options to more closely match what other daemons do,
like ptserver or viced, so it's easier to see if we're doing the right
thing. That is, just call cmd_OptionAsString() without a conditional,
and just test if auditFileName is non-NULL later on, after options
processing.

Reviewed-on: https://gerrit.openafs.org/14402
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 87041d676c93dfe35a085b9b5aaa73e74c08bc90)

Change-Id: Ic05e5453c28b4c408300ea35439a519adc282486
Reviewed-on: https://gerrit.openafs.org/14443
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agobozo: Use libcmd for command line options 42/14442/2
Cheyenne Wills [Fri, 21 Aug 2020 18:53:30 +0000]
bozo: Use libcmd for command line options

Update bosserver to use libcmd for command line parsing.

Reviewed-on: https://gerrit.openafs.org/13845
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit f6cdf7165b4e66772ee06314658b7c209928d611)

Change-Id: I8fdf27d099f81c08a37db728846bd7596a8cf62e
Reviewed-on: https://gerrit.openafs.org/14442
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoafs: prevent double release of global lock afs_xvcb 41/14441/2
Mark Vitale [Fri, 28 Oct 2016 22:12:19 +0000]
afs: prevent double release of global lock afs_xvcb

afs_GetServer calls ReleaseWriteLock(&afs_xvcb) twice within a few
lines.  The second one is spurious.

Commits b18653de7ae90491c2e75f4a98410581655d776c 'xserver lock order
violation' and f2bf60ed4f1323cd6f74f2f01114f7e4f714db53 'xvcb lock order
violation' were written by the same author at the same time and
apparently were victims of a bad merge.

Discovered during a lock audit project as a panic during afsd startup:

  assertion failed: (&afs_xvcb)->excl_locked == WRITE_LOCK, file:
  /home/mvitale/src/sna-openafs/src/afs/afs_server.c, line: 2089

afs_GetServer is called frequently by many threads and so this bug could
easily have released another thread's write lock on afs_xvcb.

Remove the spurious second release.

Reviewed-on: https://gerrit.openafs.org/14411
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit e8702e6a615a160cdbe464f76bd6f100667720d2)

Change-Id: I3165a63e774296b97e09c374b068b012224776e1
Reviewed-on: https://gerrit.openafs.org/14441
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agostats: incorrect clock square algorithm 40/14440/2
Mark Vitale [Mon, 18 Sep 2017 23:45:10 +0000]
stats: incorrect clock square algorithm

Since the original IBM code import, OpenAFS has an algorithm for
squaring clock values, implemented identically in three different
places.  This algorithm does not account correctly for microsecs
overflow into seconds, resulting in incorrect "sum-of-squares" values
for queue and execution time in several OpenAFS performance utilities.

Specifically, this code:

        t1.tv_usec += (2 * t2.tv_sec * t2.tv_usec) % 1000000                   \
                      + (t2.tv_usec / 1000)*(t2.tv_usec / 1000)                \
                      + 2 * (t2.tv_usec / 1000) * (t2.tv_usec % 1000) / 1000   \
                      + (((t2.tv_usec % 1000) > 707) ? 1 : 0);                 \

Can allow for the tv_usec field to be increased by a theoretical max
of around:

        t1.tv_usec += 999998                                                   \
                      + 999*999                                                \
                      + 2 * 999 * 999 / 1000                                   \
                      + 1;                                                     \

Or:

        t1.tv_usec += 1999996;                                                 \

If t1.tv_usec is already 999999, after this calculation its value
could be as high as 2999995. So just checking once if t1.tv_usec is
over 1000000 is not sufficient, since the resulting value (1999995) is
still over 1000000.

Correct all implementations by repeatedly checking if tv_usec is over
1000000 after the above calculation:

macro                   affected utility
=====================   ============================
afs_stats_SquareAddTo   xstat_cm_test
fs_stats_SquareAddTo    xstat_fs_test
clock_AddSq             rxstat_get_process and _peer

Reviewed-on: https://gerrit.openafs.org/14376
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit e1e5df918fee00d4d9152c31c24cc1e7f23b71a6)

Change-Id: I4055ed61311ed7d6ac435b8660d5b7c55f467699
Reviewed-on: https://gerrit.openafs.org/14440
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agorxstats: correctly report vlserver VL_* RPC stats 39/14439/2
Mark Vitale [Mon, 28 Sep 2020 20:35:38 +0000]
rxstats: correctly report vlserver VL_* RPC stats

Since the original IBM code import, rxstat_get_process and
rxstat_get_peer have reported vlserver VL_* RPC stats as for the
"volserver interface".

Correct this to read "vlserver interface".

Reviewed-on: https://gerrit.openafs.org/14375
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit e985d43d99d93172b5608a3c73fd3201d3b3a212)

Change-Id: Ifbbe4df8ede22b287ab7c67d20e9ccd951367765
Reviewed-on: https://gerrit.openafs.org/14439
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agorxstats: correctly distinguish client and server stats 38/14438/2
Mark Vitale [Mon, 28 Sep 2020 19:40:34 +0000]
rxstats: correctly distinguish client and server stats

Commit d3eaa39da3693bba708fa2fa951568009e929550 'rx: Make the rx_call
structure private' inadvertently caused all rxstats (aka rpcstats) to be
recorded as client stats by hardcoding the value for isServer to 1.

Therefore, when peer or process rxstats are enabled for a OpenAFS
component, the rxstat_get_process and rxstat_get_peer utilities will
erroneously report both client and server stats as "accessed as a client".

This is particularly problematic for ubik VOTE_* and DISK_* RPC stats,
for which a given ubik server may be both client and server over time.
In this case, both client and server stats are conflated into the same
"accessed as a client" counters.

Instead, properly pass the value of isServer from
rx_RecordCallStatistics through to rxi_IncrementTimeAndCount.

Note to maintainers:
This bug is only in master and all 1.8.x releases; no 1.6.x releases are
affected.

Note:
Confusingly, isServer=1 indicates client stats and isServer=0 indicates
server stats.  However, this is a quirk of the original implementation
and wire format of the RXSTATS_* RPCs and cannot be changed.  isServer
is actually shorthand for "remote is server"; thus all RPC client stubs
record their rxstats with isServer == 1, and all RPC server stubs record
their rxstats with isServer == 0.

Reviewed-on: https://gerrit.openafs.org/14374
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 18c345a9f8ee9b2ff73f23dae68757b19d3283f5)

Change-Id: I6d41d015803967363f3702f5dda7083ccbf7508a
Reviewed-on: https://gerrit.openafs.org/14438
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agoafs: Log pid with disk cache read errors 37/14437/2
Andrew Deason [Mon, 26 Oct 2020 17:19:19 +0000]
afs: Log pid with disk cache read errors

Log the current pid (and procname) when we complain about an error
when reading from CacheItems in afs_UFSGetDSlot. These errors can
result in confusing situations, so it can be helpful to know at least
what process saw the error.

Our logic for logging this information is getting a bit large, so also
move this to a new function, LogCacheError.

Reviewed-on: https://gerrit.openafs.org/14416
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 1caeeea43c038011306dd1c391680c24fc318e3d)

Change-Id: Ia159eeea47191f71fc5892cbc54af79b55bf4828
Reviewed-on: https://gerrit.openafs.org/14437
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agovlserver: Return VL_DBBAD on unhash failure 36/14436/2
Andrew Deason [Mon, 12 Nov 2018 21:06:09 +0000]
vlserver: Return VL_DBBAD on unhash failure

If we try to delete a vlentry, and the vlentry cannot be found on one
of its hash chains, we cannot unhash the vlentry properly and the
operation fails with VL_NOENT. This results in the following error
messages to the user:

        $ vos delentry 123456
        Could not delete entry for volume 123456
        You must specify a RW volume name or ID (the entire VLDB entry will be deleted)
        VLDB: no such entry
        Deleted 0 VLDB entries

This is confusing, because VL_NOENT can also occur if the user
specifies a volume that does actually not exist. This situation is
indicative of database corruption, usually because of a ubik
transaction that was only half-applied, or because of other ubik bugs
in the past.

The situation can only really be fixed by repairing the database, so
return VL_DBBAD in this case instead, to more clearly indicate that
something is wrong with the database, and not a problem with the
arguments the caller provided.

Reviewed-on: https://gerrit.openafs.org/13384
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit fd6add0aca03a5a17f7109c785b6027a76f13cdf)

Change-Id: Ib1cf72b7f0d6c65e37c13f00d6f6049a3049b644
Reviewed-on: https://gerrit.openafs.org/14436
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agovlserver: Add VL_DBBAD error code 35/14435/2
Andrew Deason [Mon, 12 Nov 2018 20:41:44 +0000]
vlserver: Add VL_DBBAD error code

The VL_ error table currently doesn't have an error code to indicate
that an operation cannot succeed because the database is corrupted.
There are a few error codes for specific cases of errors that are
probably the result of corruption (like VL_IDALREADYHASHED, or
VL_EMPTY), but these are only for specific cases and indicate rather
low-level internal problems.

There are some instances where the real problem preventing an
operation from succeeding is that the database is just corrupt or
inconsistent in some way, and the administrator must repair the
database before it can succeed. And we currently don't have any way of
indicating that situation via an error code.

So, introduce the VL_DBBAD code, to indicate this situation. Error
codes already exist in other tables for similar situations, such as
PRDBBAD, and KADATABASEINCONSISTENT.

This commit does not use the new error code anywhere; we just
introduce it into the VL_ error table, so comerr-using applications
will be able to interpret it.

Note that the VL_DBBAD error code has been recognized by the AFS
Assigned Numbers Registry as recorded in the ticket history of
<https://rt.central.org/rt/Ticket/Display.html?id=134817>

Reviewed-on: https://gerrit.openafs.org/13383
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Mark Vitale <mvitale@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 878d27c845157bb64c32bbd6c3cacce17c681d70)

Change-Id: I93b4916890ec9e4f6f5453ecf28c8a8ce04af7ea
Reviewed-on: https://gerrit.openafs.org/14435
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agovlserver: Warn when we cannot unhash deleted entry 34/14434/2
Andrew Deason [Mon, 12 Nov 2018 21:01:18 +0000]
vlserver: Warn when we cannot unhash deleted entry

If we are trying to delete an entry from the vldb, we fail with
VL_NOENT if we cannot find the given entry on one of its hash chains.
This is indicative of corruption in the vldb (since we have an entry
not on a hash chain), but we don't really indicate this clearly. There
are no log messages, and the user running 'vos' only sees an error
like this:

    $ vos delentry 123456
    Could not delete entry for volume 123456
    You must specify a RW volume name or ID (the entire VLDB entry will be deleted)
    VLDB: no such entry
    Deleted 0 VLDB entries

Which is the exact same error message if the user tries to delete a
volume that does not actually exist.

We currently do not have an error code that clearly says that the
database appears corrupted and needs to be fixed, but we can at least
log an error in VLLog for this case, to give the administrator a
chance at fixing the situation. So, log a message in this situation.

Reviewed-on: https://gerrit.openafs.org/13382
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 3e3fce24da31a31ca9a3f4ad356c4e4eaf0ad897)

Change-Id: Ia76c5d7a19c3d21a89fc502e14922672afd8a84f
Reviewed-on: https://gerrit.openafs.org/14434
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agovolser: take RO volume offline during convertROtoRW 33/14433/2
Marcio Barbosa [Thu, 3 Sep 2020 23:57:34 +0000]
volser: take RO volume offline during convertROtoRW

The vos convertROtoRW command converts a RO volume into a RW volume.
Unfortunately, the RO volume is not checked out from the fileserver
during this process. As a result, accesses to the volume being converted
can leave volume objects in an inconsistent state.

Moreover, consider the following scenario:

1. Create a volume on host_b and add replicas on host_a and host_b.

$ vos create host_b a vol_1
$ vos addsite host_b a vol_1
$ vos addiste host_a a vol_1

2. Mount the volume:

$ fs mkmount /afs/.mycell/vol_1 vol_1
$ vos release vol_1
$ vos release root.cell

3. Shutdown dafs on host_b:

$ bos shutdown host_b dafs

4. Remove RO reference to host_b from the vldb:

$ vos remsite host_b a vol_1

5. Attach the RO copy by touching it:

$ fs flushall
$ ls /afs/mycell/vol_1

6. Convert RO copy to RW:

$ vos convertROtoRW host_a a vol_1

Notice that FSYNC_com_VolDone fails silently (FSYNC_BAD_STATE), leaving
the volume object for the RO copy set as VOL_STATE_ATTACHED (on success,
this volume should be set as VOL_STATE_DELETED).

7. Add replica on host_a:

$ vos addsite host_a a vol_1

8. Wait until the "inUse" flag of the RO entry is cleared (or force this
to happen by attaching multiple volumes).

9. Release the volume:

$ vos release vol_1

Failed to start transaction on volume 536870922
Volume not attached, does not exist, or not on line
Error in vos release command.
Volume not attached, does not exist, or not on line

Notice that this happens because we cannot mark an attached volume as
destroyed (FSYNC_com_VolDone).

To avoid the problem mentioned above and to prevent accesses to the
volume being converted, take the RO volume offline before converting it
to RW.

Reviewed-on: https://gerrit.openafs.org/14340
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
(cherry picked from commit 45a69b61133ae8ca8e49a002ddc1895386796d51)

Change-Id: I94e08d09d5044f3c0cac7c700f26ec6e7b111d6f
Reviewed-on: https://gerrit.openafs.org/14433
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>
Tested-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agovolser: Close dirp on error in ConvertROtoRW 32/14432/2
Marcio Barbosa [Thu, 3 Sep 2020 20:11:34 +0000]
volser: Close dirp on error in ConvertROtoRW

Currently, if SAFSVolConvertROtoRWvolume cannot create a new transaction
for the volume to be converted, it returns without closing the directory
stream opened by it. To prevent this leak, go through a new 'goto done'
destructor if NewTrans fails.

Reviewed-on: https://gerrit.openafs.org/14342
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit f18b58f8227df2ab420d69eb5937a99f747c7692)

Change-Id: I81b5f7a330548eaecba1acfdc7231d2a953a365b
Reviewed-on: https://gerrit.openafs.org/14432
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Cheyenne Wills <cwills@sinenomine.net>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>

3 years agovlserver: fix missing read-only entries from ListAttributesN2 27/14427/2
Michael Meffie [Thu, 16 Apr 2020 20:29:09 +0000]
vlserver: fix missing read-only entries from ListAttributesN2

The ListAttributesN2() RPC can fail to list read-only entries under
certain circumstances. This RPC is used by the `vos listvldb` command to
retrieve vldb entries (unless the -name option is given). The `vos
listvldb` command fails to list volume entries when run with the
'-server' option for volumes that have read-only replicas, but have not
been released.

Consider the following example volume:

    $ vos create fs1.example.com a test
    $ vos addsite fs1.example.com a test
    $ vos addsite fs2.example.com a test
    $ vos listvldb
    ...
    test
        RWrite: 536870921
        number of sites -> 3
           server fs1.example.com partition /vicepa RW Site
           server fs1.example.com partition /vicepa RO Site  -- Not released
           server fs2.example.com partition /vicepa RO Site  -- Not released

`vos listvldb` fails to find the volume when the search is limited to
server 'fs2':

    $ vos listvldb -server fs2.example.com
    VLDB entries for server fs2.example.com
    Total entries: 0

Instead of the expected results:

    $ vos listvldb -server fs2.example.com
    test
        RWrite: 536870921
        number of sites -> 3
           server fs1.example.com partition /vicepa RW Site
           server fs1.example.com partition /vicepa RO Site  -- Not released
           server fs2.example.com partition /vicepa RO Site  -- Not released

This situation makes it difficult to remove old server addresses from
the vldb.  In this situation, 'vos remaddrs' and 'vos changeaddr
-remove' commands will complain the server addresses are still in use by
volume entries, however running 'vos listvldb -server' will not show
which volumes entries are in use.

The entries are not listed for unreleased volumes because the
ListAttributesN2() RPC is currently checking the volume VLF_ROEXISTS
flag, instead of the server site flags (serverFlags) to determine when
the entry is a read-only site. The volume VLF_ROEXISTS flag is set when
a volume is released.

To fix this, make ListAttributesN2 check for the VLSF_ROVOL site flag,
instead of the VLF_ROEXISTS entry flag.

Reviewed-on: https://gerrit.openafs.org/14154
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Benjamin Kaduk <kaduk@mit.edu>
(cherry picked from commit 904f5bd398db248c11b30ef7e360ce5141dcd1f3)

Change-Id: Iea4bbbc9fb0c42ac5e109ee94688436fdcc42a67
Reviewed-on: https://gerrit.openafs.org/14427
Reviewed-by: Michael Meffie <mmeffie@sinenomine.net>
Reviewed-by: Andrew Deason <adeason@sinenomine.net>
Tested-by: BuildBot <buildbot@rampaginggeek.com>
Reviewed-by: Stephan Wiesand <stephan.wiesand@desy.de>