doc/arch/arch-overview.h

   1 /*!
   2         \page title AFS-3 Programmer's Reference:  Architectural Overview
   3
   4 \author Edward R. Zayas
   5 Transarc Corporation
   6 \version 1.0
   7 \date 2 September 1991 22:53 .cCopyright 1991 Transarc Corporation All Rights
   8 Reserved FS-00-D160
   9
  10
  11         \page chap1 Chapter 1: Introduction
  12
  13         \section sec1-1 Section 1.1: Goals and Background
  14
  15 \par
  16 This paper provides an architectural overview of Transarc's wide-area
  17 distributed file system, AFS. Specifically, it covers the current level of
  18 available software, the third-generation AFS-3 system. This document will
  19 explore the technological climate in which AFS was developed, the nature of
  20 problem(s) it addresses, and how its design attacks these problems in order to
  21 realize the inherent benefits in such a file system. It also examines a set of
  22 additional features for AFS, some of which are actively being considered.
  23 \par
  24 This document is a member of a reference suite providing programming
  25 specifications as to the operation of and interfaces offered by the various AFS
  26 system components. It is intended to serve as a high-level treatment of
  27 distributed file systems in general and of AFS in particular. This document
  28 should ideally be read before any of the others in the suite, as it provides
  29 the organizational and philosophical framework in which they may best be
  30 interpreted.
  31
  32         \section sec1-2 Section 1.2: Document Layout
  33
  34 \par
  35 Chapter 2 provides a discussion of the technological background and
  36 developments that created the environment in which AFS and related systems were
  37 inspired. Chapter 3 examines the specific set of goals that AFS was designed to
  38 meet, given the possibilities created by personal computing and advances in
  39 communication technology. Chapter 4 presents the core AFS architecture and how
  40 it addresses these goals. Finally, Chapter 5 considers how AFS functionality
  41 may be be improved by certain design changes.
  42
  43         \section sec1-3 Section 1.3: Related Documents
  44
  45 \par
  46 The names of the other documents in the collection, along with brief summaries
  47 of their contents, are listed below.
  48 \li AFS-3 Programmer?s Reference: File Server/Cache Manager Interface: This
  49 document describes the File Server and Cache Manager agents, which provide the
  50 backbone ?le managment services for AFS. The collection of File Servers for a
  51 cell supplies centralized ?le storage for that site, and allows clients running
  52 the Cache Manager component to access those ?les in a high-performance, secure
  53 fashion.
  54 \li AFS-3 Programmer?s Reference:Volume Server/Volume Location Server
  55 Interface: This document describes the services through which ?containers? of
  56 related user data are located and managed.
  57 \li AFS-3 Programmer?s Reference: Protection Server Interface: This paper
  58 describes the server responsible for mapping printable user names to and from
  59 their internal AFS identi?ers. The Protection Server also allows users to
  60 create, destroy, and manipulate ?groups? of users, which are suitable for
  61 placement on Access Control Lists (ACLs).
  62 \li AFS-3 Programmer?s Reference: BOS Server Interface: This paper covers the
  63 ?nanny? service which assists in the administrability of the AFS environment.
  64 \li AFS-3 Programmer?s Reference: Speci?cation for the Rx Remote Procedure Call
  65 Facility: This document speci?es the design and operation of the remote
  66 procedure call and lightweight process packages used by AFS.
  67
  68         \page chap2 Chapter 2: Technological Background
  69
  70 \par
  71 Certain changes in technology over the past two decades greatly in?uenced the
  72 nature of computational resources, and the manner in which they were used.
  73 These developments created the conditions under which the notion of a
  74 distributed ?le systems (DFS) was born. This chapter describes these
  75 technological changes, and explores how a distributed ?le system attempts to
  76 capitalize on the new computing environment?s strengths and minimize its
  77 disadvantages.
  78
  79         \section sec2-1 Section 2.1: Shift in Computational Idioms
  80
  81 \par
  82 By the beginning of the 1980s, new classes of computing engines and new methods
  83 by which they may be interconnected were becoming firmly established. At this
  84 time, a shift was occurring away from the conventional mainframe-based,
  85 timeshared computing environment to one in which both workstation-class
  86 machines and the smaller personal computers (PCs) were a strong presence.
  87 \par
  88 The new environment offered many benefits to its users when compared with
  89 timesharing. These smaller, self-sufficient machines moved dedicated computing
  90 power and cycles directly onto people's desks. Personal machines were powerful
  91 enough to support a wide variety of applications, and allowed for a richer,
  92 more intuitive, more graphically-based interface for them. Learning curves were
  93 greatly reduced, cutting training costs and increasing new-employee
  94 productivity. In addition, these machines provided a constant level of service
  95 throughout the day. Since a personal machine was typically only executing
  96 programs for a single human user, it did not suffer from timesharing's
  97 load-based response time degradation. Expanding the computing services for an
  98 organization was often accomplished by simply purchasing more of the relatively
  99 cheap machines. Even small organizations could now afford their own computing
 100 resources, over which they exercised full control. This provided more freedom
 101 to tailor computing services to the specific needs of particular groups.
 102 \par
 103 However, many of the benefits offered by the timesharing systems were lost when
 104 the computing idiom first shifted to include personal-style machines. One of
 105 the prime casualties of this shift was the loss of the notion of a single name
 106 space for all files. Instead, workstation-and PC-based environments each had
 107 independent and completely disconnected file systems. The standardized
 108 mechanisms through which files could be transferred between machines (e.g.,
 109 FTP) were largely designed at a time when there were relatively few large
 110 machines that were connected over slow links. Although the newer multi-megabit
 111 per second communication pathways allowed for faster transfers, the problem of
 112 resource location in this environment was still not addressed. There was no
 113 longer a system-wide file system, or even a file location service, so
 114 individual users were more isolated from the organization's collective data.
 115 Overall, disk requirements ballooned, since lack of a shared file system was
 116 often resolved by replicating all programs and data to each machine that needed
 117 it. This proliferation of independent copies further complicated the problem of
 118 version control and management in this distributed world. Since computers were
 119 often no longer behind locked doors at a computer center, user authentication
 120 and authorization tasks became more complex. Also, since organizational
 121 managers were now in direct control of their computing facilities, they had to
 122 also actively manage the hardware and software upon which they depended.
 123 \par
 124 Overall, many of the benefits of the proliferation of independent,
 125 personal-style machines were partially offset by the communication and
 126 organizational penalties they imposed. Collaborative work and dissemination of
 127 information became more difficult now that the previously unified file system
 128 was fragmented among hundreds of autonomous machines.
 129
 130         \section sec2-2 Section 2.2: Distributed File Systems
 131
 132 \par
 133 As a response to the situation outlined above, the notion of a distributed file
 134 system (DFS) was developed. Basically, a DFS provides a framework in which
 135 access to files is permitted regardless of their locations. Specifically, a
 136 distributed file system offers a single, common set of file system operations
 137 through which those accesses are performed.
 138 \par
 139 There are two major variations on the core DFS concept, classified according to
 140 the way in which file storage is managed. These high-level models are defined
 141 below.
 142 \li Peer-to-peer: In this symmetrical model, each participating machine
 143 provides storage for specific set of files on its own attached disk(s), and
 144 allows others to access them remotely. Thus, each node in the DFS is capable of
 145 both importing files (making reference to files resident on foreign machines)
 146 and exporting files (allowing other machines to reference files located
 147 locally).
 148 \li Server-client: In this model, a set of machines designated as servers
 149 provide the storage for all of the files in the DFS. All other machines, known
 150 as clients, must direct their file references to these machines. Thus, servers
 151 are the sole exporters of files in the DFS, and clients are the sole importers.
 152
 153 \par
 154 The notion of a DFS, whether organized using the peer-to-peer or server-client
 155 discipline, may be used as a conceptual base upon which the advantages of
 156 personal computing resources can be combined with the single-system benefits of
 157 classical timeshared operation.
 158 \par
 159 Many distributed file systems have been designed and deployed, operating on the
 160 fast local area networks available to connect machines within a single site.
 161 These systems include DOMAIN [9], DS [15], RFS [16], and Sprite [10]. Perhaps
 162 the most widespread of distributed file systems to date is a product from Sun
 163 Microsystems, NFS [13] [14], extending the popular unix file system so that it
 164 operates over local networks.
 165
 166         \section sec2-3 Section 2.3: Wide-Area Distributed File Systems
 167
 168 \par
 169 Improvements in long-haul network technology are allowing for faster
 170 interconnection bandwidths and smaller latencies between distant sites.
 171 Backbone services have been set up across the country, and T1 (1.5
 172 megabit/second) links are increasingly available to a larger number of
 173 locations. Long-distance channels are still at best approximately an order of
 174 magnitude slower than the typical local area network, and often two orders of
 175 magnitude slower. The narrowed difference between local-area and wide-area data
 176 paths opens the window for the notion of a wide-area distributed file system
 177 (WADFS). In a WADFS, the transparency of file access offered by a local-area
 178 DFS is extended to cover machines across much larger distances. Wide-area file
 179 system functionality facilitates collaborative work and dissemination of
 180 information in this larger theater of operation.
 181
 182         \page chap3 Chapter 3: AFS-3 Design Goals
 183
 184         \section sec3-1 Section 3.1: Introduction
 185
 186 \par
 187 This chapter describes the goals for the AFS-3 system, the first commercial
 188 WADFS in existence.
 189 \par
 190 The original AFS goals have been extended over the history of the project. The
 191 initial AFS concept was intended to provide a single distributed file system
 192 facility capable of supporting the computing needs of Carnegie Mellon
 193 University, a community of roughly 10,000 people. It was expected that most CMU
 194 users either had their own workstation-class machine on which to work, or had
 195 access to such machines located in public clusters. After being successfully
 196 implemented, deployed, and tuned in this capacity, it was recognized that the
 197 basic design could be augmented to link autonomous AFS installations located
 198 within the greater CMU campus. As described in Section 2.3, the long-haul
 199 networking environment developed to a point where it was feasible to further
 200 extend AFS so that it provided wide-area file service. The underlying AFS
 201 communication component was adapted to better handle the widely-varying channel
 202 characteristics encountered by intra-site and inter-site operations.
 203 \par
 204 A more detailed history of AFS evolution may be found in [3] and [18].
 205
 206         \section sec3-2 Section 3.2: System Goals
 207
 208 \par
 209 At a high level, the AFS designers chose to extend the single-machine unix
 210 computing environment into a WADFS service. The unix system, in all of its
 211 numerous incarnations, is an important computing standard, and is in very wide
 212 use. Since AFS was originally intended to service the heavily unix-oriented CMU
 213 campus, this decision served an important tactical purpose along with its
 214 strategic ramifications.
 215 \par
 216 In addition, the server-client discipline described in Section 2.2 was chosen
 217 as the organizational base for AFS. This provides the notion of a central file
 218 store serving as the primary residence for files within a given organization.
 219 These centrally-stored files are maintained by server machines and are made
 220 accessible to computers running the AFS client software.
 221 \par
 222 Listed in the following sections are the primary goals for the AFS system.
 223 Chapter 4 examines how the AFS design decisions, concepts, and implementation
 224 meet this list of goals.
 225
 226         \subsection sec3-2-1 Section 3.2.1: Scale
 227
 228 \par
 229 AFS differs from other existing DFSs in that it has the specific goal of
 230 supporting a very large user community with a small number of server machines.
 231 Unlike the rule-of-thumb ratio of approximately 20 client machines for every
 232 server machine (20:1) used by Sun Microsystem's widespread NFS distributed file
 233 system, the AFS architecture aims at smoothly supporting client/server ratios
 234 more along the lines of 200:1 within a single installation. In addition to
 235 providing a DFS covering a single organization with tens of thousands of users,
 236 AFS also aims at allowing thousands of independent, autonomous organizations to
 237 join in the single, shared name space (see Section 3.2.2 below) without a
 238 centralized control or coordination point. Thus, AFS envisions supporting the
 239 file system needs of tens of millions of users at interconnected yet autonomous
 240 sites.
 241
 242         \subsection sec3-2-2 Section 3.2.2: Name Space
 243
 244 \par
 245 One of the primary strengths of the timesharing computing environment is the
 246 fact that it implements a single name space for all files in the system. Users
 247 can walk up to any terminal connected to a timesharing service and refer to its
 248 files by the identical name. This greatly encourages collaborative work and
 249 dissemination of information, as everyone has a common frame of reference. One
 250 of the major AFS goals is the extension of this concept to a WADFS. Users
 251 should be able to walk up to any machine acting as an AFS client, anywhere in
 252 the world, and use the identical file name to refer to a given object.
 253 \par
 254 In addition to the common name space, it was also an explicit goal for AFS to
 255 provide complete access transparency and location transparency for its files.
 256 Access transparency is defined as the system's ability to use a single
 257 mechanism to operate on a file, regardless of its location, local or remote.
 258 Location transparency is defined as the inability to determine a file's
 259 location from its name. A system offering location transparency may also
 260 provide transparent file mobility, relocating files between server machines
 261 without visible effect to the naming system.
 262
 263         \subsection sec3-2-3 Section 3.2.3: Performance
 264
 265 \par
 266 Good system performance is a critical AFS goal, especially given the scale,
 267 client-server ratio, and connectivity specifications described above. The AFS
 268 architecture aims at providing file access characteristics which, on average,
 269 are similar to those of local disk performance.
 270
 271         \subsection sec3-2-4 Section 3.2.4: Security
 272
 273 \par
 274 A production WADFS, especially one which allows and encourages transparent file
 275 access between different administrative domains, must be extremely conscious of
 276 security issues. AFS assumes that server machines are "trusted" within their
 277 own administrative domain, being kept behind locked doors and only directly
 278 manipulated by reliable administrative personnel. On the other hand, AFS client
 279 machines are assumed to exist in inherently insecure environments, such as
 280 offices and dorm rooms. These client machines are recognized to be
 281 unsupervisable, and fully accessible to their users. This situation makes AFS
 282 servers open to attacks mounted by possibly modified client hardware, firmware,
 283 operating systems, and application software. In addition, while an organization
 284 may actively enforce the physical security of its own file servers to its
 285 satisfaction, other organizations may be lax in comparison. It is important to
 286 partition the system's security mechanism so that a security breach in one
 287 administrative domain does not allow unauthorized access to the facilities of
 288 other autonomous domains.
 289 \par
 290 The AFS system is targeted to provide confidence in the ability to protect
 291 system data from unauthorized access in the above environment, where untrusted
 292 client hardware and software may attempt to perform direct remote file
 293 operations from anywhere in the world, and where levels of physical security at
 294 remote sites may not meet the standards of other sites.
 295
 296         \subsection sec3-2-5 Section 3.2.5: Access Control
 297
 298 \par
 299 The standard unix access control mechanism associates mode bits with every file
 300 and directory, applying them based on the user's numerical identifier and the
 301 user's membership in various groups. This mechanism was considered too
 302 coarse-grained by the AFS designers. It was seen as insufficient for specifying
 303 the exact set of individuals and groups which may properly access any given
 304 file, as well as the operations these principals may perform. The unix group
 305 mechanism was also considered too coarse and inflexible. AFS was designed to
 306 provide more flexible and finer-grained control of file access, improving the
 307 ability to define the set of parties which may operate on files, and what their
 308 specific access rights are.
 309
 310         \subsection sec3-2-6 Section 3.2.6: Reliability
 311
 312 \par
 313 The crash of a server machine in any distributed file system causes the
 314 information it hosts to become unavailable to the user community. The same
 315 effect is observed when server and client machines are isolated across a
 316 network partition. Given the potential size of the AFS user community, a single
 317 server crash could potentially deny service to a very large number of people.
 318 The AFS design reflects a desire to minimize the visibility and impact of these
 319 inevitable server crashes.
 320
 321         \subsection sec3-2-7 Section 3.2.7: Administrability
 322
 323 \par
 324 Driven once again by the projected scale of AFS operation, one of the system's
 325 goals is to offer easy administrability. With the large projected user
 326 population, the amount of file data expected to be resident in the shared file
 327 store, and the number of machines in the environment, a WADFS could easily
 328 become impossible to administer unless its design allowed for easy monitoring
 329 and manipulation of system resources. It is also imperative to be able to apply
 330 security and access control mechanisms to the administrative interface.
 331
 332         \subsection sec3-2-8 Section 3.2.8: Interoperability/Coexistence
 333
 334 \par
 335 Many organizations currently employ other distributed file systems, most
 336 notably Sun Microsystem's NFS, which is also an extension of the basic
 337 single-machine unix system. It is unlikely that AFS will receive significant
 338 use if it cannot operate concurrently with other DFSs without mutual
 339 interference. Thus, coexistence with other DFSs is an explicit AFS goal.
 340 \par
 341 A related goal is to provide a way for other DFSs to interoperate with AFS to
 342 various degrees, allowing AFS file operations to be executed from these
 343 competing systems. This is advantageous, since it may extend the set of
 344 machines which are capable of interacting with the AFS community. Hardware
 345 platforms and/or operating systems to which AFS is not ported may thus be able
 346 to use their native DFS system to perform AFS file references.
 347 \par
 348 These two goals serve to extend AFS coverage, and to provide a migration path
 349 by which potential clients may sample AFS capabilities, and gain experience
 350 with AFS. This may result in data migration into native AFS systems, or the
 351 impetus to acquire a native AFS implementation.
 352
 353         \subsection sec3-2-9 Section 3.2.9: Heterogeneity/Portability
 354
 355 \par
 356 It is important for AFS to operate on a large number of hardware platforms and
 357 operating systems, since a large community of unrelated organizations will most
 358 likely utilize a wide variety of computing environments. The size of the
 359 potential AFS user community will be unduly restricted if AFS executes on a
 360 small number of platforms. Not only must AFS support a largely heterogeneous
 361 computing base, it must also be designed to be easily portable to new hardware
 362 and software releases in order to maintain this coverage over time.
 363
 364         \page chap4 Chapter 4: AFS High-Level Design
 365
 366         \section sec4-1 Section 4.1: Introduction
 367
 368 \par
 369 This chapter presents an overview of the system architecture for the AFS-3
 370 WADFS. Different treatments of the AFS system may be found in several
 371 documents, including [3], [4], [5], and [2]. Certain system features discussed
 372 here are examined in more detail in the set of accompanying AFS programmer
 373 specification documents.
 374 \par
 375 After the archtectural overview, the system goals enumerated in Chapter 3 are
 376 revisited, and the contribution of the various AFS design decisions and
 377 resulting features is noted.
 378
 379         \section sec4-2 Section 4.2: The AFS System Architecture
 380
 381         \subsection sec4-2-1 Section 4.2.1: Basic Organization
 382
 383 \par
 384 As stated in Section 3.2, a server-client organization was chosen for the AFS
 385 system. A group of trusted server machines provides the primary disk space for
 386 the central store managed by the organization controlling the servers. File
 387 system operation requests for specific files and directories arrive at server
 388 machines from machines running the AFS client software. If the client is
 389 authorized to perform the operation, then the server proceeds to execute it.
 390 \par
 391 In addition to this basic file access functionality, AFS server machines also
 392 provide related system services. These include authentication service, mapping
 393 between printable and numerical user identifiers, file location service, time
 394 service, and such administrative operations as disk management, system
 395 reconfiguration, and tape backup.
 396
 397         \subsection sec4-2-2 Section 4.2.2: Volumes
 398
 399         \subsubsection sec4-2-2-1 Section 4.2.2.1: Definition
 400
 401 \par
 402 Disk partitions used for AFS storage do not directly host individual user files
 403 and directories. Rather, connected subtrees of the system's directory structure
 404 are placed into containers called volumes. Volumes vary in size dynamically as
 405 the objects it houses are inserted, overwritten, and deleted. Each volume has
 406 an associated quota, or maximum permissible storage. A single unix disk
 407 partition may thus host one or more volumes, and in fact may host as many
 408 volumes as physically fit in the storage space. However, the practical maximum
 409 is currently 3,500 volumes per disk partition. This limitation is imposed by
 410 the salvager program, which examines and repairs file system metadata
 411 structures.
 412 \par
 413 There are two ways to identify an AFS volume. The first option is a 32-bit
 414 numerical value called the volume ID. The second is a human-readable character
 415 string called the volume name.
 416 \par
 417 Internally, a volume is organized as an array of mutable objects, representing
 418 individual files and directories. The file system object associated with each
 419 index in this internal array is assigned a uniquifier and a data version
 420 number. A subset of these values are used to compose an AFS file identifier, or
 421 FID. FIDs are not normally visible to user applications, but rather are used
 422 internally by AFS. They consist of ordered triplets, whose components are the
 423 volume ID, the index within the volume, and the uniquifier for the index.
 424 \par
 425 To understand AFS FIDs, let us consider the case where index i in volume v
 426 refers to a file named example.txt. This file's uniquifier is currently set to
 427 one (1), and its data version number is currently set to zero (0). The AFS
 428 client software may then refer to this file with the following FID: (v, i, 1).
 429 The next time a client overwrites the object identified with the (v, i, 1) FID,
 430 the data version number for example.txt will be promoted to one (1). Thus, the
 431 data version number serves to distinguish between different versions of the
 432 same file. A higher data version number indicates a newer version of the file.
 433 \par
 434 Consider the result of deleting file (v, i, 1). This causes the body of
 435 example.txt to be discarded, and marks index i in volume v as unused. Should
 436 another program create a file, say a.out, within this volume, index i may be
 437 reused. If it is, the creation operation will bump the index's uniquifier to
 438 two (2), and the data version number is reset to zero (0). Any client caching a
 439 FID for the deleted example.txt file thus cannot affect the completely
 440 unrelated a.out file, since the uniquifiers differ.
 441
 442         \subsubsection sec4-2-2-2 Section 4.2.2.2: Attachment
 443
 444 \par
 445 The connected subtrees contained within individual volumes are attached to
 446 their proper places in the file space defined by a site, forming a single,
 447 apparently seamless unix tree. These attachment points are called mount points.
 448 These mount points are persistent file system objects, implemented as symbolic
 449 links whose contents obey a stylized format. Thus, AFS mount points differ from
 450 NFS-style mounts. In the NFS environment, the user dynamically mounts entire
 451 remote disk partitions using any desired name. These mounts do not survive
 452 client restarts, and do not insure a uniform namespace between different
 453 machines.
 454 \par
 455 A single volume is chosen as the root of the AFS file space for a given
 456 organization. By convention, this volume is named root.afs. Each client machine
 457 belonging to this organization peforms a unix mount() of this root volume (not
 458 to be confused with an AFS mount point) on its empty /afs directory, thus
 459 attaching the entire AFS name space at this point.
 460
 461         \subsubsection sec4-2-2-3 Section 4.2.2.3: Administrative Uses
 462
 463 \par
 464 Volumes serve as the administrative unit for AFS ?le system data, providing as
 465 the basis for replication, relocation, and backup operations.
 466
 467         \subsubsection sec4-2-2-4 Section 4.2.2.4: Replication
 468
 469 Read-only snapshots of AFS volumes may be created by administrative personnel.
 470 These clones may be deployed on up to eight disk partitions, on the same server
 471 machine or across di?erent servers. Each clone has the identical volume ID,
 472 which must di?er from its read-write parent. Thus, at most one clone of any
 473 given volume v may reside on a given disk partition. File references to this
 474 read-only clone volume may be serviced by any of the servers which host a copy.
 475
 476         \subsubsection sec4-2-2-4 Section 4.2.2.5: Backup
 477
 478 \par
 479 Volumes serve as the unit of tape backup and restore operations. Backups are
 480 accomplished by first creating an on-line backup volume for each volume to be
 481 archived. This backup volume is organized as a copy-on-write shadow of the
 482 original volume, capturing the volume's state at the instant that the backup
 483 took place. Thus, the backup volume may be envisioned as being composed of a
 484 set of object pointers back to the original image. The first update operation
 485 on the file located in index i of the original volume triggers the
 486 copy-on-write association. This causes the file's contents at the time of the
 487 snapshot to be physically written to the backup volume before the newer version
 488 of the file is stored in the parent volume.
 489 \par
 490 Thus, AFS on-line backup volumes typically consume little disk space. On
 491 average, they are composed mostly of links and to a lesser extent the bodies of
 492 those few files which have been modified since the last backup took place.
 493 Also, the system does not have to be shut down to insure the integrity of the
 494 backup images. Dumps are generated from the unchanging backup volumes, and are
 495 transferred to tape at any convenient time before the next backup snapshot is
 496 performed.
 497
 498         \subsubsection sec4-2-2-6 Section 4.2.2.6: Relocation
 499
 500 \par
 501 Volumes may be moved transparently between disk partitions on a given file
 502 server, or between different file server machines. The transparency of volume
 503 motion comes from the fact that neither the user-visible names for the files
 504 nor the internal AFS FIDs contain server-specific location information.
 505 \par
 506 Interruption to file service while a volume move is being executed is typically
 507 on the order of a few seconds, regardless of the amount of data contained
 508 within the volume. This derives from the staged algorithm used to move a volume
 509 to a new server. First, a dump is taken of the volume's contents, and this
 510 image is installed at the new site. The second stage involves actually locking
 511 the original volume, taking an incremental dump to capture file updates since
 512 the first stage. The third stage installs the changes at the new site, and the
 513 fourth stage deletes the original volume. Further references to this volume
 514 will resolve to its new location.
 515
 516         \subsection sec4-2-3 Section 4.2.3: Authentication
 517
 518 \par
 519 AFS uses the Kerberos [22] [23] authentication system developed at MIT's
 520 Project Athena to provide reliable identification of the principals attempting
 521 to operate on the files in its central store. Kerberos provides for mutual
 522 authentication, not only assuring AFS servers that they are interacting with
 523 the stated user, but also assuring AFS clients that they are dealing with the
 524 proper server entities and not imposters. Authentication information is
 525 mediated through the use of tickets. Clients register passwords with the
 526 authentication system, and use those passwords during authentication sessions
 527 to secure these tickets. A ticket is an object which contains an encrypted
 528 version of the user's name and other information. The file server machines may
 529 request a caller to present their ticket in the course of a file system
 530 operation. If the file server can successfully decrypt the ticket, then it
 531 knows that it was created and delivered by the authentication system, and may
 532 trust that the caller is the party identified within the ticket.
 533 \par
 534 Such subjects as mutual authentication, encryption and decryption, and the use
 535 of session keys are complex ones. Readers are directed to the above references
 536 for a complete treatment of Kerberos-based authentication.
 537
 538         \subsection sec4-2-4 Section 4.2.4: Authorization
 539
 540         \subsubsection sec4-2-4-1 Section 4.2.4.1: Access Control Lists
 541
 542 \par
 543 AFS implements per-directory Access Control Lists (ACLs) to improve the ability
 544 to specify which sets of users have access to the ?les within the directory,
 545 and which operations they may perform. ACLs are used in addition to the
 546 standard unix mode bits. ACLs are organized as lists of one or more (principal,
 547 rights) pairs. A principal may be either the name of an individual user or a
 548 group of individual users. There are seven expressible rights, as listed below.
 549 \li Read (r): The ability to read the contents of the files in a directory.
 550 \li Lookup (l): The ability to look up names in a directory.
 551 \li Write (w): The ability to create new files and overwrite the contents of
 552 existing files in a directory.
 553 \li Insert (i): The ability to insert new files in a directory, but not to
 554 overwrite existing files.
 555 \li Delete (d): The ability to delete files in a directory.
 556 \li Lock (k): The ability to acquire and release advisory locks on a given
 557 directory.
 558 \li Administer (a): The ability to change a directory's ACL.
 559
 560         \subsubsection sec4-2-4-2 Section 4.2.4.2: AFS Groups
 561
 562 \par
 563 AFS users may create a certain number of groups, differing from the standard
 564 unix notion of group. These AFS groups are objects that may be placed on ACLs,
 565 and simply contain a list of AFS user names that are to be treated identically
 566 for authorization purposes. For example, user erz may create a group called
 567 erz:friends consisting of the kazar, vasilis, and mason users. Should erz wish
 568 to grant read, lookup, and insert rights to this group in directory d, he
 569 should create an entry reading (erz:friends, rli) in d's ACL.
 570 \par
 571 AFS offers three special, built-in groups, as described below.
 572 \par
 573 1. system:anyuser: Any individual who accesses AFS files is considered by the
 574 system to be a member of this group, whether or not they hold an authentication
 575 ticket. This group is unusual in that it doesn't have a stable membership. In
 576 fact, it doesn't have an explicit list of members. Instead, the system:anyuser
 577 "membership" grows and shrinks as file accesses occur, with users being
 578 (conceptually) added and deleted automatically as they interact with the
 579 system.
 580 \par
 581 The system:anyuser group is typically put on the ACL of those directories for
 582 which some specific level of completely public access is desired, covering any
 583 user at any AFS site.
 584 \par
 585 2. system:authuser: Any individual in possession of a valid Kerberos ticket
 586 minted by the organization's authentication service is treated as a member of
 587 this group. Just as with system:anyuser, this special group does not have a
 588 stable membership. If a user acquires a ticket from the authentication service,
 589 they are automatically "added" to the group. If the ticket expires or is
 590 discarded by the user, then the given individual will automatically be
 591 "removed" from the group.
 592 \par
 593 The system:authuser group is usually put on the ACL of those directories for
 594 which some specific level of intra-site access is desired. Anyone holding a
 595 valid ticket within the organization will be allowed to perform the set of
 596 accesses specified by the ACL entry, regardless of their precise individual ID.
 597 \par
 598 3. system:administrators: This built-in group de?nes the set of users capable
 599 of performing certain important administrative operations within the cell.
 600 Members of this group have explicit 'a' (ACL administration) rights on every
 601 directory's ACL in the organization. Members of this group are the only ones
 602 which may legally issue administrative commands to the file server machines
 603 within the organization. This group is not like the other two described above
 604 in that it does have a stable membership, where individuals are added and
 605 deleted from the group explicitly.
 606 \par
 607 The system:administrators group is typically put on the ACL of those
 608 directories which contain sensitive administrative information, or on those
 609 places where only administrators are allowed to make changes. All members of
 610 this group have implicit rights to change the ACL on any AFS directory within
 611 their organization. Thus, they don't have to actually appear on an ACL, or have
 612 'a' rights enabled in their ACL entry if they do appear, to be able to modify
 613 the ACL.
 614
 615         \subsection sec4-2-5 Section 4.2.5: Cells
 616
 617 \par
 618 A cell is the set of server and client machines managed and operated by an
 619 administratively independent organization, as fully described in the original
 620 proposal [17] and specification [18] documents. The cell's administrators make
 621 decisions concerning such issues as server deployment and configuration, user
 622 backup schedules, and replication strategies on their own hardware and disk
 623 storage completely independently from those implemented by other cell
 624 administrators regarding their own domains. Every client machine belongs to
 625 exactly one cell, and uses that information to determine where to obtain
 626 default system resources and services.
 627 \par
 628 The cell concept allows autonomous sites to retain full administrative control
 629 over their facilities while allowing them to collaborate in the establishment
 630 of a single, common name space composed of the union of their individual name
 631 spaces. By convention, any file name beginning with /afs is part of this shared
 632 global name space and can be used at any AFS-capable machine. The original
 633 mount point concept was modified to contain cell information, allowing volumes
 634 housed in foreign cells to be mounted in the file space. Again by convention,
 635 the top-level /afs directory contains a mount point to the root.cell volume for
 636 each cell in the AFS community, attaching their individual file spaces. Thus,
 637 the top of the data tree managed by cell xyz is represented by the /afs/xyz
 638 directory.
 639 \par
 640 Creating a new AFS cell is straightforward, with the operation taking three
 641 basic steps:
 642 \par
 643 1. Name selection: A prospective site has to first select a unique name for
 644 itself. Cell name selection is inspired by the hierarchical Domain naming
 645 system. Domain-style names are designed to be assignable in a completely
 646 decentralized fashion. Example cell names are transarc.com, ssc.gov, and
 647 umich.edu. These names correspond to the AFS installations at Transarc
 648 Corporation in Pittsburgh, PA, the Superconducting Supercollider Lab in Dallas,
 649 TX, and the University of Michigan at Ann Arbor, MI. respectively.
 650 \par
 651 2. Server installation: Once a cell name has been chosen, the site must bring
 652 up one or more AFS file server machines, creating a local file space and a
 653 suite of local services, including authentication (Section 4.2.6.4) and volume
 654 location (Section 4.2.6.2).
 655 \par
 656 3. Advertise services: In order for other cells to discover the presence of the
 657 new site, it must advertise its name and which of its machines provide basic
 658 AFS services such as authentication and volume location. An established site
 659 may then record the machines providing AFS system services for the new cell,
 660 and then set up its mount point under /afs. By convention, each cell places the
 661 top of its file tree in a volume named root.cell.
 662
 663         \subsection sec4-2-6 Section 4.2.6: Implementation of Server
 664 Functionality
 665
 666 \par
 667 AFS server functionality is implemented by a set of user-level processes which
 668 execute on server machines. This section examines the role of each of these
 669 processes.
 670
 671         \subsubsection sec4-2-6-1 Section 4.2.6.1: File Server
 672
 673 \par
 674 This AFS entity is responsible for providing a central disk repository for a
 675 particular set of files within volumes, and for making these files accessible
 676 to properly-authorized users running on client machines.
 677
 678         \subsubsection sec4-2-6-2 Section 4.2.6.2: Volume Location Server
 679
 680 \par
 681 The Volume Location Server maintains and exports the Volume Location Database
 682 (VLDB). This database tracks the server or set of servers on which volume
 683 instances reside. Among the operations it supports are queries returning volume
 684 location and status information, volume ID management, and creation, deletion,
 685 and modification of VLDB entries.
 686 \par
 687 The VLDB may be replicated to two or more server machines for availability and
 688 load-sharing reasons. A Volume Location Server process executes on each server
 689 machine on which a copy of the VLDB resides, managing that copy.
 690
 691         \subsubsection sec4-2-6-3 Section 4.2.6.3: Volume Server
 692
 693 \par
 694 The Volume Server allows administrative tasks and probes to be performed on the
 695 set of AFS volumes residing on the machine on which it is running. These
 696 operations include volume creation and deletion, renaming volumes, dumping and
 697 restoring volumes, altering the list of replication sites for a read-only
 698 volume, creating and propagating a new read-only volume image, creation and
 699 update of backup volumes, listing all volumes on a partition, and examining
 700 volume status.
 701
 702         \subsubsection sec4-2-6-4 Section 4.2.6.4: Authentication Server
 703
 704 \par
 705 The AFS Authentication Server maintains and exports the Authentication Database
 706 (ADB). This database tracks the encrypted passwords of the cell's users. The
 707 Authentication Server interface allows operations that manipulate ADB entries.
 708 It also implements the Kerberos mutual authentication protocol, supplying the
 709 appropriate identification tickets to successful callers.
 710 \par
 711 The ADB may be replicated to two or more server machines for availability and
 712 load-sharing reasons. An Authentication Server process executes on each server
 713 machine on which a copy of the ADB resides, managing that copy.
 714
 715         \subsubsection sec4-2-6-5 Section 4.2.6.5: Protection Server
 716
 717 \par
 718 The Protection Server maintains and exports the Protection Database (PDB),
 719 which maps between printable user and group names and their internal numerical
 720 AFS identifiers. The Protection Server also allows callers to create, destroy,
 721 query ownership and membership, and generally manipulate AFS user and group
 722 records.
 723 \par
 724 The PDB may be replicated to two or more server machines for availability and
 725 load-sharing reasons. A Protection Server process executes on each server
 726 machine on which a copy of the PDB resides, managing that copy.
 727
 728         \subsubsection sec4-2-6-6 Section 4.2.6.6: BOS Server
 729
 730 \par
 731 The BOS Server is an administrative tool which runs on each file server machine
 732 in a cell. This server is responsible for monitoring the health of the AFS
 733 agent processess on that machine. The BOS Server brings up the chosen set of
 734 AFS agents in the proper order after a system reboot, answers requests as to
 735 their status, and restarts them when they fail. It also accepts commands to
 736 start, suspend, or resume these processes, and install new server binaries.
 737
 738         \subsubsection sec4-2-6-7 Section 4.2.6.7: Update Server/Client
 739
 740 \par
 741 The Update Server and Update Client programs are used to distribute important
 742 system files and server binaries. For example, consider the case of
 743 distributing a new File Server binary to the set of Sparcstation server
 744 machines in a cell. One of the Sparcstation servers is declared to be the
 745 distribution point for its machine class, and is configured to run an Update
 746 Server. The new binary is installed in the appropriate local directory on that
 747 Sparcstation distribution point. Each of the other Sparcstation servers runs an
 748 Update Client instance, which periodically polls the proper Update Server. The
 749 new File Server binary will be detected and copied over to the client. Thus,
 750 new server binaries need only be installed manually once per machine type, and
 751 the distribution to like server machines will occur automatically.
 752
 753         \subsection sec4-2-7 Section 4.2.7: Implementation of Client
 754 Functionality
 755
 756         \subsubsection sec4-2-7-1 Section 4.2.7.1: Introduction
 757
 758 \par
 759 The portion of the AFS WADFS which runs on each client machine is called the
 760 Cache Manager. This code, running within the client's kernel, is a user's
 761 representative in communicating and interacting with the File Servers. The
 762 Cache Manager's primary responsibility is to create the illusion that the
 763 remote AFS file store resides on the client machine's local disk(s).
 764 \par
 765 s implied by its name, the Cache Manager supports this illusion by maintaining
 766 a cache of files referenced from the central AFS store on the machine's local
 767 disk. All file operations executed by client application programs on files
 768 within the AFS name space are handled by the Cache Manager and are realized on
 769 these cached images. Client-side AFS references are directed to the Cache
 770 Manager via the standard VFS and vnode file system interfaces pioneered and
 771 advanced by Sun Microsystems [21]. The Cache Manager stores and fetches files
 772 to and from the shared AFS repository as necessary to satisfy these operations.
 773 It is responsible for parsing unix pathnames on open() operations and mapping
 774 each component of the name to the File Server or group of File Servers that
 775 house the matching directory or file.
 776 \par
 777 The Cache Manager has additional responsibilities. It also serves as a reliable
 778 repository for the user's authentication information, holding on to their
 779 tickets and wielding them as necessary when challenged during File Server
 780 interactions. It caches volume location information gathered from probes to the
 781 VLDB, and keeps the client machine's local clock synchronized with a reliable
 782 time source.
 783
 784         \subsubsection sec4-2-7-2 Section 4.2.7.2: Chunked Access
 785
 786 \par
 787 In previous AFS incarnations, whole-file caching was performed. Whenever an AFS
 788 file was referenced, the entire contents of the file were stored on the
 789 client's local disk. This approach had several disadvantages. One problem was
 790 that no file larger than the amount of disk space allocated to the client's
 791 local cache could be accessed.
 792 \par
 793 AFS-3 supports chunked file access, allowing individual 64 kilobyte pieces to
 794 be fetched and stored. Chunking allows AFS files of any size to be accessed
 795 from a client. The chunk size is settable at each client machine, but the
 796 default chunk size of 64K was chosen so that most unix files would fit within a
 797 single chunk.
 798
 799         \subsubsection sec4-2-7-3 Section 4.2.7.3: Cache Management
 800
 801 \par
 802 The use of a file cache by the AFS client-side code, as described above, raises
 803 the thorny issue of cache consistency. Each client must effciently determine
 804 whether its cached file chunks are identical to the corresponding sections of
 805 the file as stored at the server machine before allowing a user to operate on
 806 those chunks.
 807 \par
 808 AFS employs the notion of a callback as the backbone of its cache consistency
 809 algorithm. When a server machine delivers one or more chunks of a file to a
 810 client, it also includes a callback "promise" that the client will be notified
 811 if any modifications are made to the data in the file at the server. Thus, as
 812 long as the client machine is in possession of a callback for a file, it knows
 813 it is correctly synchronized with the centrally-stored version, and allows its
 814 users to operate on it as desired without any further interaction with the
 815 server. Before a file server stores a more recent version of a file on its own
 816 disks, it will first break all outstanding callbacks on this item. A callback
 817 will eventually time out, even if there are no changes to the file or directory
 818 it covers.
 819
 820         \subsection sec4-2-8 Section 4.2.8: Communication Substrate: Rx
 821
 822 \par
 823 All AFS system agents employ remote procedure call (RPC) interfaces. Thus,
 824 servers may be queried and operated upon regardless of their location.
 825 \par
 826 The Rx RPC package is used by all AFS agents to provide a high-performance,
 827 multi-threaded, and secure communication mechanism. The Rx protocol is
 828 adaptive, conforming itself to widely varying network communication media
 829 encountered by a WADFS. It allows user applications to de?ne and insert their
 830 own security modules, allowing them to execute the precise end-to-end
 831 authentication algorithms required to suit their specific needs and goals. Rx
 832 offers two built-in security modules. The first is the null module, which does
 833 not perform any encryption or authentication checks. The second built-in
 834 security module is rxkad, which utilizes Kerberos authentication.
 835 \par
 836 Although pervasive throughout the AFS distributed file system, all of its
 837 agents, and many of its standard application programs, Rx is entirely separable
 838 from AFS and does not depend on any of its features. In fact, Rx can be used to
 839 build applications engaging in RPC-style communication under a variety of
 840 unix-style file systems. There are in-kernel and user-space implementations of
 841 the Rx facility, with both sharing the same interface.
 842
 843         \subsection sec4-2-9 Section 4.2.9: Database Replication: ubik
 844
 845 \par
 846 The three AFS system databases (VLDB, ADB, and PDB) may be replicated to
 847 multiple server machines to improve their availability and share access loads
 848 among the replication sites. The ubik replication package is used to implement
 849 this functionality. A full description of ubik and of the quorum completion
 850 algorithm it implements may be found in [19] and [20].
 851 \par
 852 The basic abstraction provided by ubik is that of a disk file replicated to
 853 multiple server locations. One machine is considered to be the synchronization
 854 site, handling all write operations on the database file. Read operations may
 855 be directed to any of the active members of the quorum, namely a subset of the
 856 replication sites large enough to insure integrity across such failures as
 857 individual server crashes and network partitions. All of the quorum members
 858 participate in regular elections to determine the current synchronization site.
 859 The ubik algorithms allow server machines to enter and exit the quorum in an
 860 orderly and consistent fashion.
 861 \par
 862 All operations to one of these replicated "abstract files" are performed as
 863 part of a transaction. If all the related operations performed under a
 864 transaction are successful, then the transaction is committed, and the changes
 865 are made permanent. Otherwise, the transaction is aborted, and all of the
 866 operations for that transaction are undone.
 867 \par
 868 Like Rx, the ubik facility may be used by client applications directly. Thus,
 869 user applicatons may easily implement the notion of a replicated disk file in
 870 this fashion.
 871
 872         \subsection sec4-2-10 Section 4.2.10: System Management
 873
 874 \par
 875 There are several AFS features aimed at facilitating system management. Some of
 876 these features have already been mentioned, such as volumes, the BOS Server,
 877 and the pervasive use of secure RPCs throughout the system to perform
 878 administrative operations from any AFS client machinein the worldwide
 879 community. This section covers additional AFS features and tools that assist in
 880 making the system easier to manage.
 881
 882         \subsubsection sec4-2-10-1 Section 4.2.10.1: Intelligent Access
 883 Programs
 884
 885 \par
 886 A set of intelligent user-level applications were written so that the AFS
 887 system agents could be more easily queried and controlled. These programs
 888 accept user input, then translate the caller's instructions into the proper
 889 RPCs to the responsible AFS system agents, in the proper order.
 890 \par
 891 An example of this class of AFS application programs is vos, which mediates
 892 access to the Volume Server and the Volume Location Server agents. Consider the
 893 vos move operation, which results in a given volume being moved from one site
 894 to another. The Volume Server does not support a complex operation like a
 895 volume move directly. In fact, this move operation involves the Volume Servers
 896 at the current and new machines, as well as the Volume Location Server, which
 897 tracks volume locations. Volume moves are accomplished by a combination of full
 898 and incremental volume dump and restore operations, and a VLDB update. The vos
 899 move command issues the necessary RPCs in the proper order, and attempts to
 900 recovers from errors at each of the steps.
 901 \par
 902 The end result is that the AFS interface presented to system administrators is
 903 much simpler and more powerful than that offered by the raw RPC interfaces
 904 themselves. The learning curve for administrative personnel is thus flattened.
 905 Also, automatic execution of complex system operations are more likely to be
 906 successful, free from human error.
 907
 908         \subsubsection sec4-2-10-2 Section 4.2.10.2: Monitoring Interfaces
 909
 910 \par
 911 The various AFS agent RPC interfaces provide calls which allow for the
 912 collection of system status and performance data. This data may be displayed by
 913 such programs as scout, which graphically depicts File Server performance
 914 numbers and disk utilizations. Such monitoring capabilites allow for quick
 915 detection of system problems. They also support detailed performance analyses,
 916 which may indicate the need to reconfigure system resources.
 917
 918         \subsubsection sec4-2-10-3 Section 4.2.10.3: Backup System
 919
 920 \par
 921 A special backup system has been designed and implemented for AFS, as described
 922 in [6]. It is not sufficient to simply dump the contents of all File Server
 923 partitions onto tape, since volumes are mobile, and need to be tracked
 924 individually. The AFS backup system allows hierarchical dump schedules to be
 925 built based on volume names. It generates the appropriate RPCs to create the
 926 required backup volumes and to dump these snapshots to tape. A database is used
 927 to track the backup status of system volumes, along with the set of tapes on
 928 which backups reside.
 929
 930         \subsection sec4-2-11 Section 4.2.11: Interoperability
 931
 932 \par
 933 Since the client portion of the AFS software is implemented as a standard
 934 VFS/vnode file system object, AFS can be installed into client kernels and
 935 utilized without interference with other VFS-style file systems, such as
 936 vanilla unix and the NFS distributed file system.
 937 \par
 938 Certain machines either cannot or choose not to run the AFS client software
 939 natively. If these machines run NFS, it is still possible to access AFS files
 940 through a protocol translator. The NFS-AFS Translator may be run on any machine
 941 at the given site that runs both NFS and the AFS Cache Manager. All of the NFS
 942 machines that wish to access the AFS shared store proceed to NFS-mount the
 943 translator's /afs directory. File references generated at the NFS-based
 944 machines are received at the translator machine, which is acting in its
 945 capacity as an NFS server. The file data is actually obtained when the
 946 translator machine issues the corresponding AFS references in its role as an
 947 AFS client.
 948
 949         \section sec4-3 Section 4.3: Meeting AFS Goals
 950
 951 \par
 952 The AFS WADFS design, as described in this chapter, serves to meet the system
 953 goals stated in Chapter 3. This section revisits each of these AFS goals, and
 954 identifies the specific architectural constructs that bear on them.
 955
 956         \subsection sec4-3-1 Section 4.3.1: Scale
 957
 958 \par
 959 To date, AFS has been deployed to over 140 sites world-wide, with approximately
 960 60 of these cells visible on the public Internet. AFS sites are currently
 961 operating in several European countries, in Japan, and in Australia. While many
 962 sites are modest in size, certain cells contain more than 30,000 accounts. AFS
 963 sites have realized client/server ratios in excess of the targeted 200:1.
 964
 965         \subsection sec4-3-2 Section 4.3.2: Name Space
 966
 967 \par
 968 A single uniform name space has been constructed across all cells in the
 969 greater AFS user community. Any pathname beginning with /afs may indeed be used
 970 at any AFS client. A set of common conventions regarding the organization of
 971 the top-level /afs directory and several directories below it have been
 972 established. These conventions also assist in the location of certain per-cell
 973 resources, such as AFS configuration files.
 974 \par
 975 Both access transparency and location transparency are supported by AFS, as
 976 evidenced by the common access mechanisms and by the ability to transparently
 977 relocate volumes.
 978
 979         \subsection sec4-3-3 Section 4.3.3: Performance
 980
 981 \par
 982 AFS employs caching extensively at all levels to reduce the cost of "remote"
 983 references. Measured data cache hit ratios are very high, often over 95%. This
 984 indicates that the file images kept on local disk are very effective in
 985 satisfying the set of remote file references generated by clients. The
 986 introduction of file system callbacks has also been demonstrated to be very
 987 effective in the efficient implementation of cache synchronization. Replicating
 988 files and system databases across multiple server machines distributes load
 989 among the given servers. The Rx RPC subsystem has operated successfully at
 990 network speeds ranging from 19.2 kilobytes/second to experimental
 991 gigabit/second FDDI networks.
 992 \par
 993 Even at the intra-site level, AFS has been shown to deliver good performance,
 994 especially in high-load situations. One often-quoted study [1] compared the
 995 performance of an older version of AFS with that of NFS on a large file system
 996 task named the Andrew Benchmark. While NFS sometimes outperformed AFS at low
 997 load levels, its performance fell off rapidly at higher loads while AFS
 998 performance degradation was not significantly affected.
 999
1000         \subsection sec4-3-4 Section 4.3.4: Security
1001
1002 \par
1003 The use of Kerberos as the AFS authentication system fits the security goal
1004 nicely. Access to AFS files from untrusted client machines is predicated on the
1005 caller's possession of the appropriate Kerberos ticket(s). Setting up per-site,
1006 Kerveros-based authentication services compartmentalizes any security breach to
1007 the cell which was compromised. Since the Cache Manager will store multiple
1008 tickets for its users, they may take on different identities depending on the
1009 set of file servers being accessed.
1010
1011         \subsection sec4-3-5 Section 4.3.5: Access Control
1012
1013 \par
1014 AFS extends the standard unix authorization mechanism with per-directory Access
1015 Control Lists. These ACLs allow specific AFS principals and groups of these
1016 principals to be granted a wide variety of rights on the associated files.
1017 Users may create and manipulate AFS group entities without administrative
1018 assistance, and place these tailored groups on ACLs.
1019
1020         \subsection sec4-3-6 Section 4.3.6: Reliability
1021
1022 \par
1023 A subset of file server crashes are masked by the use of read-only replication
1024 on volumes containing slowly-changing files. Availability of important,
1025 frequently-used programs such as editors and compilers may thus been greatly
1026 improved. Since the level of replication may be chosen per volume, and easily
1027 changed, each site may decide the proper replication levels for certain
1028 programs and/or data.
1029 Similarly, replicated system databases help to maintain service in the face of
1030 server crashes and network partitions.
1031
1032         \subsection sec4-3-7 Section 4.3.7: Administrability
1033
1034 \par
1035 Such features as pervasive, secure RPC interfaces to all AFS system components,
1036 volumes, overseer processes for monitoring and management of file system
1037 agents, intelligent user-level access tools, interface routines providing
1038 performance and statistics information, and an automated backup service
1039 tailored to a volume-based environment all contribute to the administrability
1040 of the AFS system.
1041
1042         \subsection sec4-3-8 Section 4.3.8: Interoperability/Coexistence
1043
1044 \par
1045 Due to its VFS-style implementation, the AFS client code may be easily
1046 installed in the machine's kernel, and may service file requests without
1047 interfering in the operation of any other installed file system. Machines
1048 either not capable of running AFS natively or choosing not to do so may still
1049 access AFS files via NFS with the help of a protocol translator agent.
1050
1051         \subsection sec4-3-9 Section 4.3.9: Heterogeneity/Portability
1052
1053 \par
1054 As most modern kernels use a VFS-style interface to support their native file
1055 systems, AFS may usually be ported to a new hardware and/or software
1056 environment in a relatively straightforward fashion. Such ease of porting
1057 allows AFS to run on a wide variety of platforms.
1058
1059         \page chap5 Chapter 5: Future AFS Design Re?nements
1060
1061         \section sec5-1 Section 5.1: Overview
1062
1063 \par
1064 The current AFS WADFS design and implementation provides a high-performance,
1065 scalable, secure, and flexible computing environment. However, there is room
1066 for improvement on a variety of fronts. This chapter considers a set of topics,
1067 examining the shortcomings of the current AFS system and considering how
1068 additional functionality may be fruitfully constructed.
1069 \par
1070 Many of these areas are already being addressed in the next-generation AFS
1071 system which is being built as part of Open Software Foundation?s (OSF)
1072 Distributed Computing Environment [7] [8].
1073
1074         \section sec5-2 Section 5.2: unix Semantics
1075
1076 \par
1077 Any distributed file system which extends the unix file system model to include
1078 remote file accesses presents its application programs with failure modes which
1079 do not exist in a single-machine unix implementation. This semantic difference
1080 is dificult to mask.
1081 \par
1082 The current AFS design varies from pure unix semantics in other ways. In a
1083 single-machine unix environment, modifications made to an open file are
1084 immediately visible to other processes with open file descriptors to the same
1085 file. AFS does not reproduce this behavior when programs on different machines
1086 access the same file. Changes made to one cached copy of the file are not made
1087 immediately visible to other cached copies. The changes are only made visible
1088 to other access sites when a modified version of a file is stored back to the
1089 server providing its primary disk storage. Thus, one client's changes may be
1090 entirely overwritten by another client's modifications. The situation is
1091 further complicated by the possibility that dirty file chunks may be flushed
1092 out to the File Server before the file is closed.
1093 \par
1094 The version of AFS created for the OSF offering extends the current, untyped
1095 callback notion to a set of multiple, independent synchronization guarantees.
1096 These synchronization tokens allow functionality not offered by AFS-3,
1097 including byte-range mandatory locking, exclusive file opens, and read and
1098 write privileges over portions of a file.
1099
1100         \section sec5-3 Section 5.3: Improved Name Space Management
1101
1102 \par
1103 Discovery of new AFS cells and their integration into each existing cell's name
1104 space is a completely manual operation in the current system. As the rate of
1105 new cell creations increases, the load imposed on system administrators also
1106 increases. Also, representing each cell's file space entry as a mount point
1107 object in the /afs directory leads to a potential problem. As the number of
1108 entries in the /afs directory increase, search time through the directory also
1109 grows.
1110 \par
1111 One improvement to this situation is to implement the top-level /afs directory
1112 through a Domain-style database. The database would map cell names to the set
1113 of server machines providing authentication and volume location services for
1114 that cell. The Cache Manager would query the cell database in the course of
1115 pathname resolution, and cache its lookup results.
1116 \par
1117 In this database-style environment, adding a new cell entry under /afs is
1118 accomplished by creating the appropriate database entry. The new cell
1119 information is then immediately accessible to all AFS clients.
1120
1121         \section sec5-4 Section 5.4: Read/Write Replication
1122
1123 \par
1124 The AFS-3 servers and databases are currently equipped to handle read/only
1125 replication exclusively. However, other distributed file systems have
1126 demonstrated the feasibility of providing full read/write replication of data
1127 in environments very similar to AFS [11]. Such systems can serve as models for
1128 the set of required changes.
1129
1130         \section sec5-5 Section 5.5: Disconnected Operation
1131
1132 \par
1133 Several facilities are provided by AFS so that server failures and network
1134 partitions may be completely or partially masked. However, AFS does not provide
1135 for completely disconnected operation of file system clients. Disconnected
1136 operation is a mode in which a client continues to access critical data during
1137 accidental or intentional inability to access the shared file repository. After
1138 some period of autonomous operation on the set of cached files, the client
1139 reconnects with the repository and resynchronizes the contents of its cache
1140 with the shared store.
1141 \par
1142 Studies of related systems provide evidence that such disconnected operation is
1143 feasible [11] [12]. Such a capability may be explored for AFS.
1144
1145         \section sec5-6 Section 5.6: Multiprocessor Support
1146
1147 \par
1148 The LWP lightweight thread package used by all AFS system processes assumes
1149 that individual threads may execute non-preemeptively, and that all other
1150 threads are quiescent until control is explicitly relinquished from within the
1151 currently active thread. These assumptions conspire to prevent AFS from
1152 operating correctly on a multiprocessor platform.
1153 \par
1154 A solution to this restriction is to restructure the AFS code organization so
1155 that the proper locking is performed. Thus, critical sections which were
1156 previously only implicitly defined are explicitly specified.
1157
1158         \page biblio Bibliography
1159
1160 \li [1] John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols,
1161 M. Satyanarayanan, Robert N. Sidebotham, Michael J. West, Scale and Performance
1162 in a Distributed File System, ACM Transactions on Computer Systems, Vol. 6, No.
1163 1, February 1988, pp. 51-81.
1164 \li [2] Michael L. Kazar, Synchronization and Caching Issues in the Andrew File
1165 System, USENIX Proceedings, Dallas, TX, Winter 1988.
1166 \li [3] Alfred Z. Spector, Michael L. Kazar,    Uniting File Systems, Unix
1167 Review, March 1989,
1168 \li [4] Johna Till Johnson, Distributed File System Brings LAN Technology to
1169 WANs, Data Communications, November 1990, pp. 66-67.
1170 \li [5] Michael Padovano, PADCOM Associates, AFS widens your horizons in
1171 distributed computing, Systems Integration, March 1991.
1172 \li [6] Steve Lammert,  The AFS 3.0 Backup System, LISA IV Conference
1173 Proceedings, Colorado Springs, Colorado, October 1990.
1174 \li [7] Michael L. Kazar, Bruce W. Leverett, Owen T. Anderson, Vasilis
1175 Apostolides, Beth A. Bottos, Sailesh Chutani, Craig F. Everhart, W. Anthony
1176 Mason, Shu-Tsui Tu, Edward R. Zayas, DEcorum File System Architectural
1177 Overview, USENIX Conference Proceedings, Anaheim, Texas, Summer 1990.
1178 \li [8]         AFS Drives DCE Selection, Digital Desktop, Vol. 1, No. 6,
1179 September 1990.
1180 \li [9] Levine, P.H., The Apollo DOMAIN Distributed File System, in NATO ASI
1181 Series: Theory and Practice of Distributed Operating Systems, Y. Paker, J-P.
1182 Banatre, M. Bozyigit, editors, Springer-Verlag, 1987.
1183 \li [10] M.N. Nelson, B.B. Welch, J.K. Ousterhout,      Caching in the Sprite
1184 Network File System, ACM Transactions on Computer Systems, Vol. 6, No. 1,
1185 February 1988.
1186 \li [11] James J. Kistler, M. Satyanarayanan, Disconnected Operaton in the Coda
1187 File System, CMU School of Computer Science technical report, CMU-CS-91-166, 26
1188 July 1991.
1189 \li [12] Puneet Kumar, M. Satyanarayanan,       Log-Based Directory Resolution
1190 in the Coda File System, CMU School of Computer Science internal document, 2
1191 July 1991.
1192 \li [13] Sun Microsystems, Inc.,        NFS: Network File System Protocol
1193 Specification, RFC 1094, March 1989.
1194 \li [14] Sun Microsystems, Inc,. Design and Implementation of the Sun Network
1195 File System, USENIX Summer Conference Proceedings, June 1985.
1196 \li [15] C.H. Sauer, D.W Johnson, L.K. Loucks, A.A. Shaheen-Gouda, and T.A.
1197 Smith, RT PC Distributed Services Overview, Operating Systems Review, Vol. 21,
1198 No. 3, July 1987.
1199 \li [16] A.P. Rifkin, M.P.      Forbes, R.L. Hamilton, M. Sabrio, S. Shah, and
1200 K. Yueh, RFS Architectural Overview, Usenix Conference Proceedings, Atlanta,
1201 Summer 1986.
1202 \li [17] Edward R. Zayas, Administrative Cells: Proposal for Cooperative Andrew
1203 File Systems, Information Technology Center internal document, Carnegie Mellon
1204 University, 25 June 1987.
1205 \li [18] Ed. Zayas, Craig Everhart, Design and Specification of the Cellular
1206 Andrew Environment, Information Technology Center, Carnegie Mellon University,
1207 CMU-ITC-070, 2 August 1988.
1208 \li [19] Kazar, Michael L., Information Technology Center, Carnegie Mellon
1209 University. Ubik -A Library For Managing Ubiquitous Data, ITCID, Pittsburgh,
1210 PA, Month, 1988.
1211 \li [20] Kazar, Michael L., Information Technology Center, Carnegie Mellon
1212 University. Quorum Completion, ITCID, Pittsburgh, PA, Month, 1988.
1213 \li [21] S. R. Kleinman.        Vnodes: An Architecture for Multiple file
1214 System Types in Sun UNIX, Conference Proceedings, 1986 Summer Usenix Technical
1215 Conference, pp. 238-247, El Toro, CA, 1986.
1216 \li [22] S.P. Miller, B.C. Neuman, J.I. Schiller, J.H. Saltzer. Kerberos
1217 Authentication and Authorization System, Project Athena Technical Plan, Section
1218 E.2.1, M.I.T., December 1987.
1219 \li [23] Bill   Bryant. Designing an Authentication System: a Dialogue in Four
1220 Scenes, Project Athena internal document, M.I.T, draft of 8 February 1988.
1221
1222
1223 */