1 <?xml version="1.0" encoding="UTF-8"?>
4 <title>An Overview of OpenAFS Administration</title>
6 <para>This chapter provides a broad overview of the concepts and
7 organization of AFS. It is strongly recommended that anyone involved in
8 administering an AFS cell read this chapter before beginning to issue
12 <title>A Broad Overview of AFS</title>
14 <para>This section introduces most of the key terms and concepts
15 necessary for a basic understanding of AFS. For a more detailed
16 discussion, see <link linkend="HDRWQ7">More Detailed Discussions of
17 Some Basic Concepts</link>.</para>
19 <sect2 renderas="sect3">
20 <title>AFS: A Distributed File System</title>
22 <para>AFS is a distributed file system that enables users to share
23 and access all of the files stored in a network of computers as
24 easily as they access the files stored on their local machines. The
25 file system is called distributed for this exact reason: files can
26 reside on many different machines (be distributed across them), but
27 are available to users on every machine.</para>
30 <sect2 renderas="sect3">
31 <title>Servers and Clients</title>
33 <para>In fact, AFS stores files on a subset of the machines in a
34 network, called file server machines. File server machines provide
35 file storage and delivery service, along with other specialized
36 services, to the other subset of machines in the network, the client
37 machines. These machines are called clients because they make use of
38 the servers' services while doing their own work. In a standard AFS
39 configuration, clients provide computational power, access to the
40 files in AFS and other "general purpose" tools to the users seated
41 at their consoles. There are generally many more client workstations
42 than file server machines.</para>
44 <para>AFS file server machines run a number of server processes, so
45 called because each provides a distinct specialized service: one
46 handles file requests, another tracks file location, a third manages
47 security, and so on. To avoid confusion, AFS documentation always
48 refers to server machines and server processes, not simply to
49 servers. For a more detailed description of the server processes,
50 see <link linkend="HDRWQ17">AFS Server Processes and the Cache
51 Manager</link>.</para>
54 <sect2 renderas="sect3">
57 <para>A cell is an administratively independent site running AFS. As
58 a cell's system administrator, you make many decisions about
59 configuring and maintaining your cell in the way that best serves
60 its users, without having to consult the administrators in other
61 cells. For example, you determine how many clients and servers to
62 have, where to put files, and how to allocate client machines to
66 <sect2 renderas="sect3">
67 <title>Transparent Access and the Uniform Namespace</title>
69 <para>Although your AFS cell is administratively independent, you
70 probably want to organize the local collection of files (your
71 filespace or tree) so that users from other cells can also access
72 the information in it. AFS enables cells to combine their local
73 filespaces into a global filespace, and does so in such a way that
74 file access is transparent--users do not need to know anything about
75 a file's location in order to access it. All they need to know is
76 the pathname of the file, which looks the same in every cell. Thus
77 every user at every machine sees the collection of files in the same
78 way, meaning that AFS provides a uniform namespace to its
82 <sect2 renderas="sect3">
83 <title>Volumes</title>
85 <para>AFS groups files into volumes, making it possible to
86 distribute files across many machines and yet maintain a uniform
87 namespace. A volume is a unit of disk space that functions like a
88 container for a set of related files, keeping them all together on
89 one partition. Volumes can vary in size, but are (by definition)
90 smaller than a partition.</para>
92 <para>Volumes are important to system administrators and users for
93 several reasons. Their small size makes them easy to move from one
94 partition to another, or even between machines. The system
95 administrator can maintain maximum efficiency by moving volumes to
96 keep the load balanced evenly. In addition, volumes correspond to
97 directories in the filespace--most cells store the contents of each
98 user home directory in a separate volume. Thus the complete contents
99 of the directory move together when the volume moves, making it easy
100 for AFS to keep track of where a file is at a certain time. Volume
101 moves are recorded automatically, so users do not have to keep track
102 of file locations.</para>
105 <sect2 renderas="sect3">
106 <title>Efficiency Boosters: Replication and Caching</title>
108 <para>AFS incorporates special features on server machines and
109 client machines that help make it efficient and reliable.</para>
111 <para>On server machines, AFS enables administrators to replicate
112 commonly-used volumes, such as those containing binaries for popular
113 programs. Replication means putting an identical read-only copy
114 (sometimes called a clone) of a volume on more than one file server
115 machine. The failure of one file server machine housing the volume
116 does not interrupt users' work, because the volume's contents are
117 still available from other machines. Replication also means that one
118 machine does not become overburdened with requests for files from a
119 popular volume.</para>
121 <para>On client machines, AFS uses caching to improve
122 efficiency. When a user on a client workstation requests a file, the
123 Cache Manager on the client sends a request for the data to the File
124 Server process running on the proper file server machine. The user
125 does not need to know which machine this is; the Cache Manager
126 determines file location automatically. The Cache Manager receives
127 the file from the File Server process and puts it into the cache, an
128 area of the client machine's local disk or memory dedicated to
129 temporary file storage. Caching improves efficiency because the
130 client does not need to send a request across the network every time
131 the user wants the same file. Network traffic is minimized, and
132 subsequent access to the file is especially fast because the file is
133 stored locally. AFS has a way of ensuring that the cached file stays
134 up-to-date, called a callback.</para>
137 <sect2 renderas="sect3">
138 <title>Security: Mutual Authentication and Access Control
141 <para>Even in a cell where file sharing is especially frequent and
142 widespread, it is not desirable that every user have equal access to
143 every file. One way AFS provides adequate security is by requiring
144 that servers and clients prove their identities to one another
145 before they exchange information. This procedure, called mutual
146 authentication, requires that both server and client demonstrate
147 knowledge of a "shared secret" (like a password) known only to the
148 two of them. Mutual authentication guarantees that servers provide
149 information only to authorized clients and that clients receive
150 information only from legitimate servers.</para>
152 <para>Users themselves control another aspect of AFS security, by
153 determining who has access to the directories they own. For any
154 directory a user owns, he or she can build an access control list
155 (ACL) that grants or denies access to the contents of the
156 directory. An access control list pairs specific users with specific
157 types of access privileges. There are seven separate permissions and
158 up to twenty different people or groups of people can appear on an
159 access control list.</para>
161 <para>For a more detailed description of AFS's mutual authentication
162 procedure, see <link linkend="HDRWQ75">A More Detailed Look at
163 Mutual Authentication</link>. For further discussion of ACLs, see
164 <link linkend="HDRWQ562">Managing Access Control
170 <title>More Detailed Discussions of Some Basic Concepts</title>
172 <para>The previous section offered a brief overview of the many
173 concepts that an AFS system administrator needs to understand. The
174 following sections examine some important concepts in more
175 detail. Although not all concepts are new to an experienced
176 administrator, reading this section helps ensure a common
177 understanding of term and concepts.</para>
180 <title>Networks</title>
183 <primary>network</primary>
185 <secondary>defined</secondary>
188 <para>A <emphasis>network</emphasis> is a collection of
189 interconnected computers able to communicate with each other and
190 transfer information back and forth.</para>
192 <para>A networked computing environment contrasts with two types of
193 computing environments: <emphasis>mainframe</emphasis> and
194 <emphasis>personal</emphasis>.
196 <primary>network</primary>
198 <secondary>as computing environment</secondary>
201 <primary>environment</primary>
203 <secondary>types compared</secondary>
207 <para>A <emphasis>mainframe</emphasis> computing environment
208 is the most traditional. It uses a single powerful computer
209 (the mainframe) to do the majority of the work in the system,
210 both file storage and computation. It serves many users, who
211 access their files and issue commands to the mainframe via
212 terminals, which generally have only enough computing power to
213 accept input from a keyboard and to display data on the
217 <primary>mainframe</primary>
219 <secondary>computing environment</secondary>
224 <para>A <emphasis>personal</emphasis> computing environment is
225 a single small computer that serves one (or, at the most, a
226 few) users. Like a mainframe computer, the single computer
227 stores all the files and performs all computation. Like a
228 terminal, the personal computer provides access to the
229 computer through a keyboard and screen.</para>
232 <primary>personal</primary>
234 <secondary>computing environment</secondary>
240 <para>A network can connect computers of any kind, but the typical
241 network running AFS connects high-function personal
242 workstations. Each workstation has some computing power and local
243 disk space, usually more than a personal computer or terminal, but
244 less than a mainframe. For more about the classes of machines used
245 in an AFS environment, see <link linkend="HDRWQ10">Servers and
246 Clients</link>.</para>
250 <title>Distributed File Systems</title>
253 <primary>file system</primary>
255 <secondary>defined</secondary>
259 <primary>distributed file system</primary>
262 <para>A <emphasis>file system</emphasis> is a collection of files
263 and the facilities (programs and commands) that enable users to
264 access the information in the files. All computing environments have
265 file systems. In a mainframe environment, the file system consists
266 of all the files on the mainframe's storage disks, whereas in a
267 personal computing environment it consists of the files on the
268 computer's local disk.</para>
270 <para>Networked computing environments often use
271 <emphasis>distributed file systems</emphasis> like AFS. A
272 distributed file system takes advantage of the interconnected nature
273 of the network by storing files on more than one computer in the
274 network and making them accessible to all of them. In other words,
275 the responsibility for file storage and delivery is "distributed"
276 among multiple machines instead of relying on only one. Despite the
277 distribution of responsibility, a distributed file system like AFS
278 creates the illusion that there is a single filespace.</para>
282 <title>Servers and Clients</title>
285 <primary>server/client model</primary>
289 <primary>server</primary>
291 <secondary>definition</secondary>
295 <primary>client</primary>
297 <secondary>definition</secondary>
300 <para>AFS uses a server/client model. In general, a server is a
301 machine, or a process running on a machine, that provides
302 specialized services to other machines. A client is a machine or
303 process that makes use of a server's specialized service during the
304 course of its own work, which is often of a more general nature than
305 the server's. The functional distinction between clients and server
306 is not always strict, however--a server can be considered the client
307 of another server whose service it is using.</para>
309 <para>AFS divides the machines on a network into two basic classes,
310 <emphasis>file server machines</emphasis> and <emphasis>client
311 machines</emphasis>, and assigns different tasks and
312 responsibilities to each.</para>
315 <title>File Server Machines</title>
318 <primary>file server machine</primary>
322 <primary>server</primary>
324 <secondary>process</secondary>
326 <tertiary>definition</tertiary>
329 <para><emphasis>File server machines</emphasis> store the files in
330 the distributed file system, and a <emphasis>server
331 process</emphasis> running on the file server machine delivers and
332 receives files. AFS file server machines run a number of
333 <emphasis>server processes</emphasis>. Each process has a special
334 function, such as maintaining databases important to AFS
335 administration, managing security or handling volumes. This
336 modular design enables each server process to specialize in one
337 area, and thus perform more efficiently. For a description of the
338 function of each AFS server process, see <link
339 linkend="HDRWQ17">AFS Server Processes and the Cache
340 Manager</link>.</para>
343 <para>Not all AFS server machines must run all of the server
344 processes. Some processes run on only a few machines because the
345 demand for their services is low. Other processes run on only one
346 machine in order to act as a synchronization site. See <link
347 linkend="HDRWQ90">The Four Roles for File Server
348 Machines</link>.</para>
351 <title>Client Machines</title>
354 <primary>client</primary>
356 <secondary>machine</secondary>
358 <tertiary>definition</tertiary>
361 <para>The other class of machines are the <emphasis>client
362 machines</emphasis>, which generally work directly for users,
363 providing computational power and other general purpose
364 tools. Clients also provide users with access to the files stored
365 on the file server machines. Clients do not run any special
366 processes per se, but do use a modified kernel that enables them
367 to communicate with the AFS server processes running on the file
368 server machines and to cache files. This collection of kernel
369 modifications is referred to as the Cache Manager; see <link
370 linkend="HDRWQ28">The Cache Manager</link>. There are usually many
371 more client machines in a cell than file server machines.</para>
375 <title>Client and Server Configuration</title>
378 <primary>personal</primary>
380 <secondary>workstation</secondary>
382 <tertiary>as typical AFS machine</tertiary>
385 <para>In the most typical AFS configuration, both file server
386 machines and client machines are high-function workstations with
387 disk drives. While this configuration is not required, it does
388 have some advantages.</para>
391 <para>There are several advantages to using personal workstations as
392 file server machines. One is that it is easy to expand the network
393 by adding another file server machine. It is also easy to increase
394 storage space by adding disks to existing machines. Using
395 workstations rather than more powerful mainframes makes it more
396 economical to use multiple file server machines rather than
397 one. Multiple file server machines provide an increase in system
398 availability and reliability if popular files are available on more
399 than one machine.</para>
401 <para>The advantage of using workstations as clients is that caching
402 on the local disk speeds the delivery of files to application
403 programs. (For an explanation of caching, see <link
404 linkend="HDRWQ16">Caching and Callbacks</link>.) Diskless machines
405 can access AFS if they are running NFS(R) and the NFS/AFS
406 Translator, an optional component of the AFS distribution.</para>
413 <primary>cell</primary>
416 <para>A <emphasis>cell</emphasis> is an independently administered
417 site running AFS. In terms of hardware, it consists of a collection
418 of file server machines and client machines defined as belonging to
419 the cell; a machine can only belong to one cell at a time. Users
420 also belong to a cell in the sense of having an account in it, but
421 unlike machines can belong to (have an account in) multiple
422 cells. To say that a cell is administratively independent means that
423 its administrators determine many details of its configuration
424 without having to consult administrators in other cells or a central
425 authority. For example, a cell administrator determines how many
426 machines of different types to run, where to put files in the local
427 tree, how to associate volumes and directories, and how much space
428 to allocate to each user.</para>
430 <para>The terms <emphasis>local cell</emphasis> and <emphasis>home
431 cell</emphasis> are equivalent, and refer to the cell in which a
432 user has initially authenticated during a session, by logging onto a
433 machine that belongs to that cell. All other cells are referred to
434 as <emphasis>foreign</emphasis> from the user's perspective. In
435 other words, throughout a login session, a user is accessing the
436 filespace through a single Cache Manager--the one on the machine to
437 which he or she initially logged in--whose cell membership defines
438 the local cell. All other cells are considered foreign during that
439 login session, even if the user authenticates in additional cells or
440 uses the <emphasis role="bold">cd</emphasis> command to change
441 directories into their file trees.</para>
444 <primary>local cell</primary>
448 <primary>cell</primary>
450 <secondary>local</secondary>
454 <primary>foreign cell</primary>
458 <primary>cell</primary>
460 <secondary>foreign</secondary>
463 <para>It is possible to maintain more than one cell at a single
464 geographical location. For instance, separate departments on a
465 university campus or in a corporation can choose to administer their
466 own cells. It is also possible to have machines at geographically
467 distant sites belong to the same cell; only limits on the speed of
468 network communication determine how practical this is.</para>
470 <para>Despite their independence, AFS cells generally agree to make
471 their local filespace visible to other AFS cells, so that users in
472 different cells can share files if they choose. If your cell is to
473 participate in the "global" AFS namespace, it must comply with a few
474 basic conventions governing how the local filespace is configured
475 and how the addresses of certain file server machines are advertised
476 to the outside world.</para>
480 <title>The Uniform Namespace and Transparent Access</title>
483 <primary>transparent access as AFS feature</primary>
487 <primary>access</primary>
489 <secondary>transparent (AFS feature)</secondary>
492 <para>One of the features that makes AFS easy to use is that it
493 provides transparent access to the files in a cell's
494 filespace. Users do not have to know which file server machine
495 stores a file in order to access it; they simply provide the file's
496 pathname, which AFS automatically translates into a machine
499 <para>In addition to transparent access, AFS also creates a
500 <emphasis>uniform namespace</emphasis>--a file's pathname is
501 identical regardless of which client machine the user is working
502 on. The cell's file tree looks the same when viewed from any client
503 because the cell's file server machines store all the files
504 centrally and present them in an identical manner to all
507 <para>To enable the transparent access and the uniform namespace
508 features, the system administrator must follow a few simple
509 conventions in configuring client machines and file trees. For
510 details, see <link linkend="HDRWQ39">Making Other Cells Visible in
511 Your Cell</link>.</para>
515 <title>Volumes</title>
518 <primary>volume</primary>
520 <secondary>definition</secondary>
523 <para>A <emphasis>volume</emphasis> is a conceptual container for a
524 set of related files that keeps them all together on one file server
525 machine partition. Volumes can vary in size, but are (by definition)
526 smaller than a partition. Volumes are the main administrative unit
527 in AFS, and have several characteristics that make administrative
528 tasks easier and help improve overall system
529 performance. <itemizedlist>
531 <para>The relatively small size of volumes makes them easy to
532 move from one partition to another, or even between
537 <para>You can maintain maximum system efficiency by moving
538 volumes to keep the load balanced evenly among the different
539 machines. If a partition becomes full, the small size of
540 individual volumes makes it easy to find enough room on other
541 machines for them.</para>
544 <primary>volume</primary>
546 <secondary>in load balancing</secondary>
551 <para>Each volume corresponds logically to a directory in the
552 file tree and keeps together, on a single partition, all the
553 data that makes up the files in the directory. By maintaining
554 (for example) a separate volume for each user's home
555 directory, you keep all of the user's files together, but
556 separate from those of other users. This is an administrative
557 convenience that is impossible if the partition is the
558 smallest unit of storage.</para>
561 <primary>volume</primary>
563 <secondary>correspondence with directory</secondary>
567 <primary>directory</primary>
569 <secondary>correspondence with volume</secondary>
573 <primary>correspondence</primary>
575 <secondary>of volumes and directories</secondary>
580 <para>The directory/volume correspondence also makes
581 transparent file access possible, because it simplifies the
582 process of file location. All files in a directory reside
583 together in one volume and in order to find a file, a file
584 server process need only know the name of the file's parent
585 directory, information which is included in the file's
586 pathname. AFS knows how to translate the directory name into
587 a volume name, and automatically tracks every volume's
588 location, even when a volume is moved from machine to
589 machine. For more about the directory/volume correspondence,
590 see <link linkend="HDRWQ14">Mount Points</link>.</para>
594 <para>Volumes increase file availability through replication
598 <primary>volume</primary>
600 <secondary>as unit of</secondary>
602 <tertiary>replication</tertiary>
606 <primary>volume</primary>
608 <secondary>as unit of</secondary>
610 <tertiary>backup</tertiary>
615 <para>Replication (placing copies of a volume on more than one
616 file server machine) makes the contents more reliably
617 available; for details, see <link
618 linkend="HDRWQ15">Replication</link>. Entire sets of volumes
619 can be backed up to tape and restored to the file system; see
620 <link linkend="HDRWQ248">Configuring the AFS Backup
621 System</link> and <link linkend="HDRWQ283">Backing Up and
622 Restoring AFS Data</link>. In AFS, backup also refers to
623 recording the state of a volume at a certain time and then
624 storing it (either on tape or elsewhere in the file system)
625 for recovery in the event files in it are accidentally deleted
626 or changed. See <link linkend="HDRWQ201">Creating Backup
627 Volumes</link>.</para>
631 <para>Volumes are the unit of resource management. A space
632 quota associated with each volume sets a limit on the maximum
633 volume size. See <link linkend="HDRWQ234">Setting and
634 Displaying Volume Quota and Current Size</link>.</para>
637 <primary>volume</primary>
639 <secondary>as unit of</secondary>
641 <tertiary>resource management</tertiary>
649 <title>Mount Points</title>
652 <primary>mount point</primary>
654 <secondary>definition</secondary>
657 <para>The previous section discussed how each volume corresponds
658 logically to a directory in the file system: the volume keeps
659 together on one partition all the data in the files residing in the
660 directory. The directory that corresponds to a volume is called its
661 <emphasis>root directory</emphasis>, and the mechanism that
662 associates the directory and volume is called a <emphasis>mount
663 point</emphasis>. A mount point is similar to a symbolic link in the
664 file tree that specifies which volume contains the files kept in a
665 directory. A mount point is not an actual symbolic link; its
666 internal structure is different.</para>
669 <para>You must not create a symbolic link to a file whose name
670 begins with the number sign (#) or the percent sign (%), because
671 the Cache Manager interprets such a link as a mount point to a
672 regular or read/write volume, respectively.</para>
676 <primary>root directory</primary>
680 <primary>directory</primary>
682 <secondary>root</secondary>
686 <primary>volume</primary>
688 <secondary>root directory of</secondary>
692 <primary>volume</primary>
694 <secondary>mounting</secondary>
697 <para>The use of mount points means that many of the elements in an
698 AFS file tree that look and function just like standard UNIX file
699 system directories are actually mount points. In form, a mount point
700 is a one-line file that names the volume containing the data for
701 files in the directory. When the Cache Manager (see <link
702 linkend="HDRWQ28">The Cache Manager</link>) encounters a mount
703 point--for example, in the course of interpreting a pathname--it
704 looks in the volume named in the mount point. In the volume the
705 Cache Manager finds an actual UNIX-style directory element--the
706 volume's root directory--that lists the files contained in the
707 directory/volume. The next element in the pathname appears in that
710 <para>A volume is said to be <emphasis>mounted</emphasis> at the
711 point in the file tree where there is a mount point pointing to the
712 volume. A volume's contents are not visible or accessible unless it
717 <title>Replication</title>
720 <primary>replication</primary>
722 <secondary>definition</secondary>
726 <primary>clone</primary>
729 <para><emphasis>Replication</emphasis> refers to making a copy, or
730 <emphasis>clone</emphasis>, of a source read/write volume and then
731 placing the copy on one or more additional file server machines in a
732 cell. One benefit of replicating a volume is that it increases the
733 availability of the contents. If one file server machine housing the
734 volume fails, users can still access the volume on a different
735 machine. No one machine need become overburdened with requests for a
736 popular file, either, because the file is available from several
739 <para>Replication is not necessarily appropriate for cells with
740 limited disk space, nor are all types of volumes equally suitable
741 for replication (replication is most appropriate for volumes that
742 contain popular files that do not change very often). For more
743 details, see <link linkend="HDRWQ50">When to Replicate
744 Volumes</link>.</para>
748 <title>Caching and Callbacks</title>
751 <primary>caching</primary>
754 <para>Just as replication increases system availability,
755 <emphasis>caching</emphasis> increases the speed and efficiency of
756 file access in AFS. Each AFS client machine dedicates a portion of
757 its local disk or memory to a cache where it stores data
758 temporarily. Whenever an application program (such as a text editor)
759 running on a client machine requests data from an AFS file, the
760 request passes through the Cache Manager. The Cache Manager is a
761 portion of the client machine's kernel that translates file requests
762 from local application programs into cross-network requests to the
763 <emphasis>File Server process</emphasis> running on the file server
764 machine storing the file. When the Cache Manager receives the
765 requested data from the File Server, it stores it in the cache and
766 then passes it on to the application program.</para>
768 <para>Caching improves the speed of data delivery to application
769 programs in the following ways:</para>
773 <para>When the application program repeatedly asks for data from
774 the same file, it is already on the local disk. The application
775 does not have to wait for the Cache Manager to request and
776 receive the data from the File Server.</para>
780 <para>Caching data eliminates the need for repeated request and
781 transfer of the same data, so network traffic is reduced. Thus,
782 initial requests and other traffic can get through more
786 <primary>AFS</primary>
788 <secondary>reducing traffic in</secondary>
792 <primary>network</primary>
794 <secondary>reducing traffic through caching</secondary>
798 <primary>slowed performance</primary>
800 <secondary>preventing in AFS</secondary>
806 <primary>callback</primary>
810 <primary>consistency guarantees</primary>
812 <secondary>cached data</secondary>
815 <para>While caching provides many advantages, it also creates the
816 problem of maintaining consistency among the many cached copies of a
817 file and the source version of a file. This problem is solved using
818 a mechanism referred to as a <emphasis>callback</emphasis>.</para>
820 <para>A callback is a promise by a File Server to a Cache Manager to
821 inform the latter when a change is made to any of the data delivered
822 by the File Server. Callbacks are used differently based on the type
823 of file delivered by the File Server: <itemizedlist>
825 <para>When a File Server delivers a writable copy of a file
826 (from a read/write volume) to the Cache Manager, the File
827 Server sends along a callback with that file. If the source
828 version of the file is changed by another user, the File
829 Server breaks the callback associated with the cached version
830 of that file--indicating to the Cache Manager that it needs to
831 update the cached copy.</para>
835 <para>When a File Server delivers a file from a read-only
836 volume to the Cache Manager, the File Server sends along a
837 callback associated with the entire volume (so it does not
838 need to send any more callbacks when it delivers additional
839 files from the volume). Only a single callback is required per
840 accessed read-only volume because files in a read-only volume
841 can change only when a new version of the complete volume is
842 released. All callbacks associated with the old version of the
843 volume are broken at release time.</para>
844 </listitem> </itemizedlist></para>
846 <para>The callback mechanism ensures that the Cache Manager always
847 requests the most up-to-date version of a file. However, it does not
848 ensure that the user necessarily notices the most current version as
849 soon as the Cache Manager has it. That depends on how often the
850 application program requests additional data from the File System or
851 how often it checks with the Cache Manager.</para>
856 <title>AFS Server Processes and the Cache Manager</title>
859 <primary>AFS</primary>
861 <secondary>server processes used in</secondary>
865 <primary>server</primary>
867 <secondary>process</secondary>
869 <tertiary>list of AFS</tertiary>
872 <para>As mentioned in <link linkend="HDRWQ10">Servers and
873 Clients</link>, AFS file server machines run a number of processes,
874 each with a specialized function. One of the main responsibilities of
875 a system administrator is to make sure that processes are running
876 correctly as much of the time as possible, using the administrative
877 services that the server processes provide.</para>
879 <para>The following list briefly describes the function of each server
880 process and the Cache Manager; the following sections then discuss the
881 important features in more detail.</para>
883 <para>The <emphasis>File Server</emphasis>, the most fundamental of
884 the servers, delivers data files from the file server machine to local
885 workstations as requested, and stores the files again when the user
886 saves any changes to the files.</para>
888 <para>The <emphasis>Basic OverSeer Server (BOS Server)</emphasis>
889 ensures that the other server processes on its server machine are
890 running correctly as much of the time as possible, since a server is
891 useful only if it is available. The BOS Server relieves system
892 administrators of much of the responsibility for overseeing system
895 <para>The third-party <emphasis>Kerberos Server</emphasis> replaces
896 the old <emphasis>Authentication Server</emphasis> and helps ensure
897 that communications on the network are secure. It verifies user
898 identities at login and provides the facilities through which
899 participants in transactions prove their identities to one another
900 (mutually authenticate).</para>
902 <para>The Protection Server helps users control who has access to
903 their files and directories. Users can grant access to several other
904 users at once by putting them all in a group entry in the Protection
905 Database maintained by the Protection Server.</para>
907 <para>The <emphasis>Volume Server</emphasis> performs all types of
908 volume manipulation. It helps the administrator move volumes from one
909 server machine to another to balance the workload among the various
912 <para>The <emphasis>Volume Location Server (VL Server)</emphasis>
913 maintains the Volume Location Database (VLDB), in which it records the
914 location of volumes as they move from file server machine to file
915 server machine. This service is the key to transparent file access for
918 <para>The <emphasis>Update Server</emphasis> distributes new versions
919 of AFS server process software and configuration information to all
920 file server machines. It is crucial to stable system performance that
921 all server machines run the same software.</para>
923 <para>The <emphasis>Backup Server</emphasis> maintains the Backup
924 Database, in which it stores information related to the Backup
925 System. It enables the administrator to back up data from volumes to
926 tape. The data can then be restored from tape in the event that it is
927 lost from the file system.</para>
929 <para>The <emphasis>Salvager</emphasis> is not a server in the sense
930 that others are. It runs only after the File Server or Volume Server
931 fails; it repairs any inconsistencies caused by the failure. The
932 system administrator can invoke it directly if necessary.</para>
934 <para>The <emphasis>Network Time Protocol Daemon (NTPD)</emphasis> is
935 not an AFS server process per se, but plays a vital role
936 nonetheless. It synchronizes the internal clock on a file server
937 machine with those on other machines. Synchronized clocks are
938 particularly important for correct functioning of the AFS distributed
939 database technology (known as Ubik); see <link
940 linkend="HDRWQ103">Configuring the Cell for Proper Ubik
941 Operation</link>. The NTPD is usually provided with the operating
944 <para>The <emphasis>Cache Manager</emphasis> is the one component in
945 this list that resides on AFS client rather than file server
946 machines. It not a process per se, but rather a part of the kernel on
947 AFS client machines that communicates with AFS server processes. Its
948 main responsibilities are to retrieve files for application programs
949 running on the client and to maintain the files in the cache.</para>
952 <title>The File Server</title>
955 <primary>File Server</primary>
957 <secondary>description</secondary>
960 <para>The <emphasis>File Server</emphasis> is the most fundamental
961 of the AFS server processes and runs on each file server machine. It
962 provides the same services across the network that the UNIX file
963 system provides on the local disk: <itemizedlist>
965 <para>Delivering programs and data files to client
966 workstations as requested and storing them again when the
967 client workstation finishes with them.</para>
971 <para>Maintaining the hierarchical directory structure that
972 users create to organize their files.</para> </listitem>
975 <para>Handling requests for copying, moving, creating, and
976 deleting files and directories.</para> </listitem>
979 <para>Keeping track of status information about each file and
980 directory (including its size and latest modification
985 <para>Making sure that users are authorized to perform the
986 actions they request on particular files or
991 <para>Creating symbolic and hard links between files.</para>
995 <para>Granting advisory locks (corresponding to UNIX locks) on
996 request.</para> </listitem>
1001 <sect2 id="HDRWQ19">
1002 <title>The Basic OverSeer Server</title>
1005 <primary>BOS Server</primary>
1007 <secondary>description</secondary>
1010 <para>The <emphasis>Basic OverSeer Server (BOS Server)</emphasis>
1011 reduces the demands on system administrators by constantly
1012 monitoring the processes running on its file server machine. It can
1013 restart failed processes automatically and provides a convenient
1014 interface for administrative tasks.</para>
1016 <para>The BOS Server runs on every file server machine. Its primary
1017 function is to minimize system outages. It also</para>
1021 <para>Constantly monitors the other server processes (on the
1022 local machine) to make sure they are running correctly.</para>
1026 <para>Automatically restarts failed processes, without
1027 contacting a human operator. When restarting multiple server
1028 processes simultaneously, the BOS server takes interdependencies
1029 into account and initiates restarts in the correct order.</para>
1032 <primary>system outages</primary>
1034 <secondary>reducing</secondary>
1038 <primary>outages</primary>
1040 <secondary>BOS Server role in,</secondary>
1045 <para>Accepts requests from the system administrator. Common
1046 reasons to contact BOS are to verify the status of server
1047 processes on file server machines, install and start new
1048 processes, stop processes either temporarily or permanently, and
1049 restart dead processes manually.</para>
1053 <para>Helps system administrators to manage system configuration
1054 information. The BOS server automates the process of adding and
1055 changing <emphasis>server encryption keys</emphasis>, which are
1056 important in mutual authentication. The BOS Server also provides
1057 a simple interface for modifying two files that contain
1058 information about privileged users and certain special file
1059 server machines. For more details about these configuration
1060 files, see <link linkend="HDRWQ85">Common Configuration Files in
1061 the /usr/afs/etc Directory</link>.</para>
1062 </listitem> </itemizedlist>
1065 <sect2 id="HDRWQ20">
1066 <title>The Kerberos Server</title>
1069 <primary>Kerberos Server</primary>
1070 <secondary>description</secondary>
1073 <primary>Authentication Server</primary>
1074 <secondary>description</secondary>
1075 <seealso>Kerberos Server</seealso>
1078 <primary>Active Directory</primary>
1079 <secondary>Kerberos Server</secondary>
1082 <primary>MIT Kerberos</primary>
1083 <secondary>Kerberos Server</secondary>
1086 <primary>Heimdal</primary>
1087 <secondary>Kerberos Server</secondary>
1090 <para>The <emphasis>Kerberos Server</emphasis> performs two main
1091 functions related to network security: <itemizedlist>
1093 <para>Verifying the identity of users as they log into the
1094 system by requiring that they provide a password. The Kerberos
1095 Server grants the user a ticket, which is converted into a
1096 token to prove to AFS server processes that the user has
1097 authenticated. For more on tokens, see <link
1098 linkend="HDRWQ76">Complex Mutual Authentication</link>.</para>
1102 <para>Providing the means through which server and client
1103 processes prove their identities to each other (mutually
1104 authenticate). This helps to create a secure environment in
1105 which to send cross-network messages.</para>
1106 </listitem> </itemizedlist></para>
1108 <para>The Kerberos Server is a required service which is provided by
1109 a third-party Kerberos server that supports version 5 of the
1110 Kerberos protocol. Kerberos server software is included with some
1111 operating systems or may be acquired separately. MIT Kerberos,
1112 Heimdal, and Microsoft Active Directory are known to work with
1113 OpenAFS as a Kerberos Server. (Most Kerberos commands begin with
1114 the letter <emphasis role="bold">k</emphasis>). This technology was
1115 originally developed by the Massachusetts Institute of Technology's
1116 Project Athena.</para>
1118 <para>The Kerberos Server also maintains the
1119 <emphasis>Authentication Database</emphasis>, in which it stores
1120 user passwords converted into encryption key form as well as the AFS
1121 server encryption key. To learn more about the procedures AFS uses
1122 to verify user identity and during mutual authentication, see <link
1123 linkend="HDRWQ75">A More Detailed Look at Mutual
1124 Authentication</link>.</para>
1126 <note><para>The <emphasis>Authentication Server</emphasis> known as
1127 kaserver which uses Kerberos 4 is obsolete and has been replaced by
1128 the Kerberos Server. All references to the <emphasis>Kerberos
1129 Server</emphasis> in this guide refer to a Kerberos 5
1130 server.</para></note>
1133 <primary>AFS</primary>
1135 <secondary></secondary>
1141 <primary>username</primary>
1143 <secondary>use by Kerberos</secondary>
1147 <primary>UNIX</primary>
1149 <secondary>UID</secondary>
1151 <tertiary>functional difference from AFS UID</tertiary>
1155 <primary>Kerberos</primary>
1157 <secondary>use of usernames</secondary>
1161 <sect2 id="HDRWQ21">
1162 <title>The Protection Server</title>
1165 <primary>protection</primary>
1167 <secondary>in AFS</secondary>
1171 <primary>Protection Server</primary>
1173 <secondary>description</secondary>
1177 <primary>protection</primary>
1179 <secondary>in UNIX</secondary>
1182 <para>The <emphasis>Protection Server</emphasis> is the key to AFS's
1183 refinement of the normal UNIX methods for protecting files and
1184 directories from unauthorized use. The refinements include the
1185 following: <itemizedlist>
1187 <para>Defining seven access permissions rather than the
1188 standard UNIX file system's three. In conjunction with the
1189 UNIX mode bits associated with each file and directory
1190 element, AFS associates an <emphasis>access control list
1191 (ACL)</emphasis> with each directory. The ACL specifies which
1192 users have which of the seven specific permissions for the
1193 directory and all the files it contains. For a definition of
1194 AFS's seven access permissions and how users can set them on
1195 access control lists, see <link linkend="HDRWQ562">Managing
1196 Access Control Lists</link>.</para>
1199 <primary>access</primary>
1201 <secondary></secondary>
1208 <para>Enabling users to grant permissions to numerous
1209 individual users--a different combination to each individual
1210 if desired. UNIX protection distinguishes only between three
1211 user or groups: the owner of the file, members of a single
1212 specified group, and everyone who can access the local file
1217 <para>Enabling users to define their own groups of users,
1218 recorded in the <emphasis>Protection Database</emphasis>
1219 maintained by the Protection Server. The groups then appear on
1220 directories' access control lists as though they were
1221 individuals, which enables the granting of permissions to many
1222 users simultaneously.</para>
1226 <para>Enabling system administrators to create groups
1227 containing client machine IP addresses to permit access when
1228 it originates from the specified client machines. These types
1229 of groups are useful when it is necessary to adhere to
1230 machine-based licensing restrictions.</para>
1236 <primary>group</primary>
1238 <secondary>definition</secondary>
1242 <primary>Protection Database</primary>
1245 <para>The Protection Server's main duty is to help the File Server
1246 determine if a user is authorized to access a file in the requested
1247 manner. The Protection Server creates a list of all the groups to
1248 which the user belongs. The File Server then compares this list to
1249 the ACL associated with the file's parent directory. A user thus
1250 acquires access both as an individual and as a member of any
1253 <para>The Protection Server also maps usernames (the name typed at
1254 the login prompt) to <emphasis>AFS user ID</emphasis> numbers
1255 (<emphasis>AFS UIDs</emphasis>). These UIDs are functionally
1256 equivalent to UNIX UIDs, but operate in the domain of AFS rather
1257 than in the UNIX file system on a machine's local disk. This
1258 conversion service is essential because the tokens that the
1259 Authentication Server grants to authenticated users are stamped with
1260 usernames (to comply with Kerberos standards). The AFS server
1261 processes identify users by AFS UID, not by username. Before they
1262 can understand whom the token represents, they need the Protection
1263 Server to translate the username into an AFS UID. For further
1264 discussion of tokens, see <link linkend="HDRWQ75">A More Detailed
1265 Look at Mutual Authentication</link>.</para>
1268 <sect2 id="HDRWQ22">
1269 <title>The Volume Server</title>
1272 <primary>Volume Server</primary>
1274 <secondary>description</secondary>
1277 <para>The <emphasis>Volume Server</emphasis> provides the interface
1278 through which you create, delete, move, and replicate volumes, as
1279 well as prepare them for archiving to tape or other media (backing
1280 up). <link linkend="HDRWQ13">Volumes</link> explained the advantages
1281 gained by storing files in volumes. Creating and deleting volumes
1282 are necessary when adding and removing users from the system; volume
1283 moves are done for load balancing; and replication enables volume
1284 placement on multiple file server machines (for more on replication,
1285 see <link linkend="HDRWQ15">Replication</link>).</para>
1288 <sect2 id="HDRWQ23">
1289 <title>The Volume Location (VL) Server</title>
1292 <primary>VL Server</primary>
1294 <secondary>description</secondary>
1298 <primary>VLDB</primary>
1301 <para>The <emphasis>VL Server</emphasis> maintains a complete list
1302 of volume locations in the <emphasis>Volume Location Database
1303 (VLDB)</emphasis>. When the Cache Manager (see <link
1304 linkend="HDRWQ28">The Cache Manager</link>) begins to fill a file
1305 request from an application program, it first contacts the VL Server
1306 in order to learn which file server machine currently houses the
1307 volume containing the file. The Cache Manager then requests the file
1308 from the File Server process running on that file server
1311 <para>The VLDB and VL Server make it possible for AFS to take
1312 advantage of the increased system availability gained by using
1313 multiple file server machines, because the Cache Manager knows where
1314 to find a particular file. Indeed, in a certain sense the VL Server
1315 is the keystone of the entire file system--when the information in
1316 the VLDB is inaccessible, the Cache Manager cannot retrieve files,
1317 even if the File Server processes are working properly. A list of
1318 the information stored in the VLDB about each volume is provided in
1319 <link linkend="HDRWQ180">Volume Information in the
1323 <primary>VL Server</primary>
1325 <secondary>importance to transparent access</secondary>
1329 <sect2 id="HDRWQ24">
1330 <title>The Update Server</title>
1333 <primary>Update Server</primary>
1335 <secondary>description</secondary>
1338 <para>The <emphasis>Update Server</emphasis> is an optional process
1339 that helps guarantee that all file server machines are running the
1340 same version of a server process. System performance can be
1341 inconsistent if some machines are running one version of the BOS
1342 Server (for example) and other machines were running another
1345 <para>To ensure that all machines run the same version of a process,
1346 install new software on a single file server machine of each system
1347 type, called the <emphasis>binary distribution machine</emphasis>
1348 for that type. The binary distribution machine runs the server
1349 portion of the Update Server, whereas all the other machines of that
1350 type run the client portion of the Update Server. The client
1351 portions check frequently with the <emphasis>server
1352 portion</emphasis> to see if they are running the right version of
1353 every process; if not, the <emphasis>client portion</emphasis>
1354 retrieves the right version from the binary distribution machine and
1355 installs it locally. The system administrator does not need to
1356 remember to install new software individually on all the file server
1357 machines: the Update Server does it automatically. For more on
1358 binary distribution machines, see <link linkend="HDRWQ93">Binary
1359 Distribution Machines</link>.</para>
1362 <primary>Update Server</primary>
1364 <secondary>server portion</secondary>
1368 <primary>Update Server</primary>
1370 <secondary>client portion</secondary>
1373 <para>The Update Server also distributes configuration files that
1374 all file server machines need to store on their local disks (for a
1375 description of the contents and purpose of these files, see <link
1376 linkend="HDRWQ85">Common Configuration Files in the /usr/afs/etc
1377 Directory</link>). As with server process software, the need for
1378 consistent system performance demands that all the machines have the
1379 same version of these files. The system administrator needs to make
1380 changes to these files on one machine only, the cell's
1381 <emphasis>system control machine</emphasis>, which runs a server
1382 portion of the Update Server. All other machines in the cell run a
1383 client portion that accesses the correct versions of these
1384 configuration files from the system control machine. Cells running
1385 the international edition of AFS do not use a system control machine
1386 to distribute configuration files. For more information, see <link
1387 linkend="HDRWQ94">The System Control Machine</link>.</para>
1390 <sect2 id="HDRWQ25">
1391 <title>The Backup Server</title>
1394 <primary>Backup System</primary>
1396 <secondary>Backup Server described</secondary>
1400 <primary>Backup Server</primary>
1402 <secondary>description</secondary>
1405 <para>The <emphasis>Backup Server</emphasis> maintains the
1406 information in the <emphasis>Backup Database</emphasis>. The Backup
1407 Server and the Backup Database enable administrators to back up data
1408 from AFS volumes to tape and restore it from tape to the file system
1409 if necessary. The server and database together are referred to as
1410 the Backup System.</para>
1412 <para>Administrators initially configure the Backup System by
1413 defining sets of volumes to be dumped together and the schedule by
1414 which the sets are to be dumped. They also install the system's tape
1415 drives and define the drives' <emphasis>Tape
1416 Coordinators</emphasis>, which are the processes that control the
1419 <para>Once the Backup System is configured, user and system data can
1420 be dumped from volumes to tape or disk. In the event that data is
1421 ever lost from the system (for example, if a system or disk failure
1422 causes data to be lost), administrators can restore the data from
1423 tape. If tapes are periodically archived, or saved, data can also be
1424 restored to its state at a specific time. Additionally, because
1425 Backup System data is difficult to reproduce, the Backup Database
1426 itself can be backed up to tape and restored if it ever becomes
1427 corrupted. For more information on configuring and using the Backup
1428 System, see <link linkend="HDRWQ248">Configuring the AFS Backup
1429 System</link> and <link linkend="HDRWQ283">Backing Up and Restoring
1430 AFS Data</link>.</para>
1433 <sect2 id="HDRWQ26">
1434 <title>The Salvager</title>
1437 <primary>Salvager</primary>
1439 <secondary>description</secondary>
1442 <para>The <emphasis>Salvager</emphasis> differs from other AFS
1443 Servers in that it runs only at selected times. The BOS Server
1444 invokes the Salvager when the File Server, Volume Server, or both
1445 fail. The Salvager attempts to repair disk corruption that can
1446 result from a failure.</para>
1448 <para>As a system administrator, you can also invoke the Salvager as
1449 necessary, even if the File Server or Volume Server has not
1450 failed. See <link linkend="HDRWQ232">Salvaging
1451 Volumes</link>.</para>
1454 <sect2 id="HDRWQ27">
1455 <title>The Network Time Protocol Daemon</title>
1458 <primary>ntpd</primary>
1460 <secondary>description</secondary>
1463 <para>The <emphasis>Network Time Protocol Daemon (NTPD)</emphasis>
1464 is not an AFS server process per se, but plays an important role. It
1465 helps guarantee that all of the file server machines and client
1466 machines agree on the time. The NTPD on all file server machines
1467 learns the correct time from a parent NTPD source, which may be
1468 located inside or outside the cell.</para>
1470 <para>Keeping clocks synchronized is particularly important to the
1471 correct operation of AFS's distributed database technology, which
1472 coordinates the copies of the Backup, Protection, and Volume
1473 Location Databases; see <link linkend="HDRWQ52">Replicating the
1474 OpenAFS Administrative Databases</link>. Client machines may also
1475 refer to these clocks for the correct time; therefore, it is less
1476 confusing if all file server machines have the same time. For more
1477 technical detail about the NTPD, see <ulink
1478 url="http://www.ntp.org/">The NTP web site</ulink> or the
1479 documentation for your operating system.</para>
1481 <important><title>Clock Skew Impact</title> <para>Client machines
1482 that are authenticating to an OpenAFS cell with valid credentials
1483 may still fail when the clocks of the client machine, Kerberos
1484 server, and the fileserver machines are not in
1485 sync.</para></important>
1487 <note><title>Legacy runntp</title> <para>It is no longer recommended
1488 to run the legacy NTPD process called <emphasis>runntp</emphasis>
1489 that is part of the OpenAFS suite. Running the NTPD software that
1490 comes with your operating system or from <ulink
1491 url="http://www.ntp.org/">www.ntp.org</ulink> is
1492 preferred.</para></note>
1496 <sect2 id="HDRWQ28">
1497 <title>The Cache Manager</title>
1500 <primary>Cache Manager</primary>
1502 <secondary>functions of</secondary>
1505 <para>As already mentioned in <link linkend="HDRWQ16">Caching and
1506 Callbacks</link>, the <emphasis>Cache Manager</emphasis> is the one
1507 component in this section that resides on client machines rather
1508 than on file server machines. It is not technically a stand-alone
1509 process, but rather a set of extensions or modifications in the
1510 client machine's kernel that enable communication with the server
1511 processes running on server machines. Its main duty is to translate
1512 file requests (made by application programs on client machines) into
1513 <emphasis>remote procedure calls (RPCs)</emphasis> to the File
1514 Server. (The Cache Manager first contacts the VL Server to find out
1515 which File Server currently houses the volume that contains a
1516 requested file, as mentioned in <link linkend="HDRWQ23">The Volume
1517 Location (VL) Server</link>). When the Cache Manager receives the
1518 requested file, it caches it before passing data on to the
1519 application program.</para>
1521 <para>The Cache Manager also tracks the state of files in its cache
1522 compared to the version at the File Server by storing the callbacks
1523 sent by the File Server. When the File Server breaks a callback,
1524 indicating that a file or volume changed, the Cache Manager requests
1525 a copy of the new version before providing more data to application