doc/txt/README.linux-nfstrans

   1 ## Introduction
   2
   3 This version works on Linux 2.6, and provides the following features:
   4
   5 - Basic AFS/NFS translator functionality, similar to other platforms
   6 - Ability to distinguish PAG's assigned within each NFS client
   7 - A new 'afspag' kernel module, which provides PAG management on
   8   NFS client systems, and forwards AFS system calls to the translator
   9   system via the remote AFS system call (rmtsys) protocol.
  10 - Support for transparent migration of an NFS client from one translator
  11   server to another, without loss of credentials or sysnames.
  12 - The ability to force the translator to discard all credentials
  13   belonging to a specified NFS client host.
  14
  15
  16 The patch applies to OpenAFS 1.4.1, and has been tested against the
  17 kernel-2.6.9-22.0.2.EL kernel binaries as provided by the CentOS project
  18 (essentially these are rebuilds from source of Red Hat Enterprise Linux).
  19 This patch is not expected to apply cleanly to newer versions of OpenAFS,
  20 due to conflicting changes in parts of the kernel module source.  To apply
  21 this patch, use 'patch -p0'.
  22
  23 It has been integrated into OpenAFS 1.5.x.
  24
  25 ## New in Version 1.4
  26
  27 - There was no version 1.3
  28 - Define a "sysname generation number" which changes any time the sysname
  29   list is changed for the translator or any client.  This number is used
  30   as the nanoseconds part of the mtime of directories, which forces NFS
  31   clients to reevaluate directory lookups any time the sysname changes.
  32 - Fixed several bugs related to sysname handling
  33 - Fixed a bug preventing 'fs exportafs' from changing the flag which
  34   controls whether callbacks are made to NFS clients to obtain tokens
  35   and sysname lists.
  36 - Starting in this version, when the PAG manager starts up, it makes a
  37   call to the translator to discard any tokens belonging to that client.
  38   This fixes a problem where newly-created PAG's on the client would
  39   inherit tokens owned by an unrelated process from an earlier boot.
  40 - Enabled the PAG manager to forward non-V-series pioctl's.
  41 - Forward ported to OpenAFS 1.4.1 final
  42 - Added a file, /proc/fs/openafs/unixusers, which reports information
  43   about "unixuser" structures, which are used to record tokens and to
  44   bind translator-side PAG's to NFS client data and sysname lists.
  45
  46
  47 ## Finding the RPC server authtab
  48
  49 In order to correctly detect NFS clients and distinguish between them,
  50 the translator must insert itself into the RPC authentication process.
  51 This requires knowing the address of the RPC server authentication dispatch
  52 table, which is not exported from standard kernels.  To address this, the
  53 kernel must be patched such that net/sunrpc/svcauth.c exports the 'authtab'
  54 symbol, or this symbol's address must be provided when the OpenAFS kernel
  55 module is loaded, using the option "authtab_addr=0xXXXXXXXX" where XXXXXXXX
  56 is the address of the authtab symbol as obtained from /proc/kallsyms.  The
  57 latter may be accomplished by adding the following three lines to the
  58 openafs-client init script in place of 'modprobe openafs':
  59
  60     modprobe sunrpc
  61     authtab=`awk '/[ \t]authtab[ \t]/ { print $1 }' < /proc/kallsyms`
  62     modprobe openafs ${authtab:+authtab_addr=0x$authtab}
  63
  64
  65 ## Exporting the NFS filesystem
  66
  67 In order for the translator to work correctly, /afs must be exported with
  68 specific options.  Specifically, the 'no_subtree_check' option is needed
  69 in order to prevent the common NFS server code from performing unwanted
  70 access checks, and an fsid option must be provided to set the filesystem
  71 identifier to be used in NFS filehandles.  Note that for live migration
  72 to work, a consistent filesystem id must be used on all translator systems.
  73 The export may be accomplished with a line in /etc/exports:
  74
  75     /afs (rw,no_subtree_check,fsid=42)
  76
  77 Or with a command:
  78
  79     exportfs -o rw,no_subtree_check,fsid=42 :/afs
  80
  81 The AFS/NFS translator code is enabled by default; no additional command
  82 is required to activate it.  However, the 'fs exportafs nfs' command can
  83 be used to disable or re-enable the translator and to set options.  Note
  84 that support for client-assigned PAG's is not enabled by default, and
  85 must be enabled with the following command:
  86
  87     fs exportafs nfs -clipags on
  88
  89 Support for making callbacks to obtain credentials and sysnames from
  90 newly-discovered NFS clients is also not enabled by default, because this
  91 would result in long timeouts on requests from NFS clients which do not
  92 support this feature.  To enable this feature, use the following command:
  93
  94     fs exportafs nfs -pagcb on
  95
  96
  97 ## Client-Side PAG Management
  98
  99 Management of PAG's on individual NFS clients is provided by the kernel
 100 module afspag.ko, which is automatically built alongside the libafs.ko
 101 module on Linux 2.6 systems.  This component is not currently supported
 102 on any other platform.
 103
 104 To activate the client PAG manager, simply load the module; no additional
 105 parameters or commands are required.  Once the module is loaded, PAG's
 106 may be acquired using the setpag() call, exactly as on systems running the
 107 full cache manager.  Both the traditional system call and new-style ioctl
 108 entry points are supported.
 109
 110 In addition, the PAG manager can forward pioctl() calls to an AFS/NFS
 111 translator system via the remote AFS system call service (rmtsys).  To
 112 enable this feature, the kernel module must be loaded with a parameter
 113 specifying the location of the translator system:
 114
 115     insmod afspag.ko nfs_server_addr=0xAABBCCDD
 116
 117 In this example, 0xAABBCCDD is the IP address of the translator system,
 118 in network byte order.  For example, if the translator has the IP address
 119 192.168.42.100, the nfs_server_addr parameter should be set to 0xc0a82a64.
 120
 121 The PAG manager can be shut down using 'afsd -shutdown' (ironically, this
 122 is the only circumstance in which that command is useful).  Once the
 123 shutdown is complete, the kernel module can be removed using rmmod.
 124
 125
 126 ## Remote System Calls
 127
 128 The NFS translator supports the ability of NFS clients to perform various
 129 AFS-specific operations via the remote system call interface (rmtsys).
 130 To enable this feature, afsd must be run with the -rmtsys switch.  OpenAFS
 131 client utilities will use this feature automatically if the AFSSERVER
 132 environment variable is set to the address or hostname of the translator
 133 system, or if the file ~/.AFSSERVER or /.AFSSERVER exists and contains the
 134 translator's address or hostname.
 135
 136 On systems running the client PAG manager (afspag.ko), AFS system calls
 137 made via the traditional methods will be automatically forwarded to the
 138 NFS translator system, if the PAG manager is configured to do so.  This
 139 feature must be enabled, as described above.
 140
 141
 142 ## Credential Caching
 143
 144 The client PAG manager maintains a cache of credentials belonging to each
 145 PAG.  When an application makes a system call to set or remove AFS tokens,
 146 the PAG manager updates its cache in addition to forwarding the request
 147 to the NFS server.
 148
 149 When the translator hears from a previously-unknown client, it makes a
 150 callback to the client to retrieve a copy of any cached credentials.
 151 This means that credentials belonging to an NFS client are not lost if
 152 the translator is rebooted, or if the client's location on the network
 153 changes such that it is talking to a different translator.
 154
 155 This feature is automatically supported by the PAG manager if it has
 156 been configured to forward system calls to an NFS translator.  However,
 157 requests will be honored only if they come from port 7001 on the NFS
 158 translator host.  In addition, this feature must be enabled on the NFS
 159 translator system as described above.
 160
 161
 162 ## System Name List
 163
 164 When the NFS translator hears from a new NFS client whose system name
 165 list it does not know, it can make a callback to the client to discover
 166 the correct system name list.  This ability is enabled automatically
 167 with credential caching and retrieval is enabled as described above.
 168
 169 The PAG manager maintains a system-wide sysname list, which is used to
 170 satisfy callback requests from the NFS translator.  This list is set
 171 initially to contain only the compiled-in default sysname, but can be
 172 changed by the superuser using the VIOC_AFS_SYSNAME pioctl or the
 173 'fs sysname' command.  Any changes are automatically propagated to the
 174 NFS translator.
 175
 176
 177 ## Dynamic Mount Points
 178
 179 This patch introduces a special directory ".:mount", which can be found
 180 directly below the AFS root directory.  This directory always appears to
 181 be empty, but any name of the form "cell:volume" will resolve to a mount
 182 point for the specified volume.  The resulting mount points are always
 183 RW-path mount points, and so will resolve to an RW volume even if the
 184 specified name refers to a replicated volume.  However, the ".readonly"
 185 and ".backup" suffixes can be used to refer to volumes of those types,
 186 and a numeric volume ID will always be used as-is.
 187
 188 This feature is required to enable the NFS translator to reconstruct a
 189 reachable path for any valid filehandle presented by an NFS client.
 190 Specifically, when the path reconstruction algorithm is walking upward
 191 from a client-provided filehandle and encounters the root directory of
 192 a volume which is no longer in the cache (and thus has no known mount
 193 point), it will complete the path to the AFS root using the dynamic
 194 mount directory.
 195
 196 On non-linux cache managers, this feature is available when dynamic
 197 root and fake stat modes are enabled.
 198
 199 On Linux systems, it is also available even when dynroot is not enabled,
 200 to support the NFS translator.  It is presently not possible to disable
 201 this feature, though that ability may be added in the future.  It would
 202 be difficult to make this feature unavailable to users and still make the
 203 Linux NFS translator work, since the point of the check being performed
 204 by the NFS server is to ensure the requested file would be reachable by
 205 the client.
 206
 207
 208 ## Security
 209
 210 The security of the NFS translator depends heavily on the underlying
 211 network.  Proper configuration is required to prevent unauthorized
 212 access to files, theft of credentials, or other forms of attack.
 213
 214 NFS, remote syscall, and PAG callback traffic between an NFS client host
 215 and translator may contain sensitive file data and/or credentials, and
 216 should be protected from snooping by unprivileged users or other hosts.
 217
 218 Both the NFS translator and remote system call service authorize requests
 219 in part based on the IP address of the requesting client.  To prevent an
 220 attacker from making requests on behalf of another host, the network must
 221 be configured such that it is impossible for one client to spoof the IP
 222 address of another.
 223
 224 In addition, both the NFS translator and remote system call service
 225 associate requests with specific users based on user and group ID data
 226 contained within the request.  In order to prevent users on the same client
 227 from making filesystem access requests as each other, the NFS server must
 228 be configured to accept requests only from privileged ports.  In order to
 229 prevent users from making AFS system calls on each other's behalf, possibly
 230 including retrieving credentials, the network must be configured such that
 231 requests to the remote system call service (port 7009) are accepted only
 232 from port 7001 on NFS clients.
 233
 234 When a client is migrated away from a translator, any credentials held
 235 on behalf of that client must be discarded before that client's IP address
 236 can safely be reused.  The VIOC_NFS_NUKE_CREDS pioctl and 'fs nukenfscreds'
 237 command are provided for this purpose.  Both take a single argument, which
 238 is the IP address of the NFS client whose credentials should be discarded.
 239
 240
 241 ## Known Issues
 242
 243   + Because NFS clients do not maintain active references on every inode
 244     they are using, it is possible that portions of the directory tree
 245     in use by an NFS client will expire from the translator's AFS and
 246     Linux dentry cache's.  When this happens, the NFS server attempts to
 247     reconstruct the missing portion of the directory tree, but may fail
 248     if the client does not have sufficient access (for example, if his
 249     tokens have expired).  In these cases, a "stale NFS filehandle" error
 250     will be generated.  This behavior is similar to that found on other
 251     translator platforms, but is triggered under a slightly different set
 252     of circumstances due to differences in the architecture of the Linux
 253     NFS server.
 254
 255   + Due to limitations of the rmtsys protocol, some pioctl calls require
 256     large (several KB) transfers between the client and rmtsys server.
 257     Correcting this issues would require extensions to the rmtsys protocol
 258     outside the scope of this project.
 259
 260   + The rmtsys interface requires that AFS be mounted in the same place
 261     on both the NFS client and translator system, or at least that the
 262     translator be able to correctly resolve absolute paths provided by
 263     the client.
 264
 265   + If a client is migrated or an NFS translator host is unexpectedly
 266     rebooted while AFS filesystem access is in progress, there may be
 267     a short delay before the client recovers.  This is because the NFS
 268     client must time out any request it made to the old server before
 269     it can retransmit the request, which will then be handled by the
 270     new server.  The same applies to remote system call requests.