1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
5 >Administering Server Machines</TITLE
8 CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
10 TITLE="AFS Administration Guide"
11 HREF="book1.html"><LINK
13 TITLE="Managing File Server Machines"
14 HREF="p3023.html"><LINK
16 TITLE="Managing File Server Machines"
17 HREF="p3023.html"><LINK
19 TITLE="Monitoring and Controlling Server Processes"
20 HREF="c6449.html"></HEAD
31 SUMMARY="Header navigation table"
40 >AFS Administration Guide: Version 3.6</TH
77 >Chapter 3. Administering Server Machines</H1
79 >This chapter describes how to administer an AFS server machine. It describes the following configuration information and
80 administrative tasks: <UL
83 >The binary and configuration files that must reside in the subdirectories of the <SPAN
89 > directory on every server machine's local disk; see <A
90 HREF="c3025.html#HDRWQ83"
92 on a Server Machine</A
103 > or functions that an AFS server machine can perform, and how to determine which
104 machines are taking a role; see <A
105 HREF="c3025.html#HDRWQ90"
106 >The Four Roles for File Server Machines</A
111 >How to maintain database server machines; see <A
112 HREF="c3025.html#HDRWQ101"
113 >Administering Database Server
119 >How to maintain the list of database server machines in the <SPAN
123 >/usr/afs/etc/CellServDB</B
127 HREF="c3025.html#HDRWQ118"
128 >Maintaining the Server CellServDB File</A
133 >How to control authorization checking on a server machine; see <A
134 HREF="c3025.html#HDRWQ123"
135 >Managing Authentication and
136 Authorization Requirements</A
141 >How to install new disks or partitions on a file server machine; see <A
142 HREF="c3025.html#HDRWQ130"
143 >Adding or Removing Disks
149 >How to change a server machine's IP addresses and manager VLDB server entries; see <A
150 HREF="c3025.html#HDRWQ138"
152 Server IP Addresses and VLDB Server Entries</A
157 >How to reboot a file server machine; see <A
158 HREF="c3025.html#HDRWQ139"
159 >Rebooting a Server Machine</A
165 >To learn how to install and configure a new server machine, see the <SPAN
169 >IBM AFS Quick Beginnings</I
173 >To learn how to administer the server processes themselves, see <A
175 >Monitoring and Controlling Server
179 >To learn how to administer volumes, see <A
189 >Summary of Instructions</A
192 >This chapter explains how to perform the following tasks by using the indicated commands:</P
194 CLASS="informaltable"
207 >Install new binaries</TD
219 >Examine binary check-and-restart time</TD
231 >Set binary check-and-restart time</TD
243 >Examine compilation dates on binary files</TD
255 >Restart a process to use new binaries</TD
267 >Revert to old version of binaries</TD
279 >Remove obsolete <SPAN
303 >List partitions on a file server machine</TD
315 >Shutdown AFS server processes</TD
327 >List volumes on a partition</TD
339 >Move read/write volumes</TD
351 >List a cell's database server machines</TD
363 >Add a database server machine to server <SPAN
381 >Remove a database server machine from server <SPAN
399 >Set authorization checking requirements</TD
411 >Prevent authentication for <SPAN
442 >Prevent authentication for kas commands</TD
450 > flag on some commands or issue <SPAN
456 > while in interactive mode</TD
460 >Display all VLDB server entries</TD
472 >Remove a VLDB server entry</TD
484 >Reboot a server machine remotely</TD
510 >Local Disk Files on a Server Machine</A
513 >Several types of files must reside in the subdirectories of the <SPAN
520 AFS server machine's local disk. They include binaries, configuration files, the administrative database files (on database
521 server machines), log files, and volume header files.</P
527 >Note for Windows users:</B
529 > Some files described in this document possibly do not exist on
530 machines that run a Windows operating system. Also, Windows uses a backslash (<SPAN
543 >) to separate the elements in a pathname.</P
550 >Binaries in the /usr/afs/bin Directory</A
559 > directory stores the AFS server process and command suite binaries
560 appropriate for the machine's system (CPU and operating system) type. If a process has both a server portion and a client
561 portion (as with the Update Server) or if it has separate components (as with the <SPAN
568 process), each component resides in a separate file.</P
570 >To ensure predictable system performance, all file server machines must run the same AFS build version of a given
571 process. To maintain consistency easily, use the Update Server process to distribute binaries from a binary distribution
572 machine of each system type, as described further in <A
573 HREF="c3025.html#HDRWQ93"
574 >Binary Distribution Machines</A
577 >It is best to keep the binaries for all processes in the <SPAN
584 if you do not run the process actively on the machine. It simplifies the process of reconfiguring machines (for example,
585 adding database server functionality to an existing file server machine). Similarly, it is best to keep the command suite
586 binaries in the directory, even if you do not often issue commands while working on the server machine. It enables you to
587 issue commands during recovery from server and machine outages.</P
589 >The following lists the binary files in the <SPAN
595 > directory that are directly
596 related to the AFS server processes or command suites. Other binaries (for example, for the <SPAN
602 > command) sometimes appear in this directory on a particular file server machine's disk or in an
603 AFS distribution. <DIV
616 >The command suite for the AFS Backup System (the binary for the Backup Server is <SPAN
634 >The command suite for communicating with the Basic OverSeer (BOS) Server (the binary for the BOS Server is
653 >The binary for the Basic OverSeer (BOS) Server process.</P
665 >The binary for the Backup Server process.</P
677 >The binary for the File Server component of the <SPAN
695 >The command suite for communicating with the Authentication Server (the binary for the Authentication Server is
714 >The binary for the Authentication Server process.</P
726 >The binary for the Network Time Protocol Daemon (NTPD). AFS redistributes this binary and uses the <SPAN
732 > program to configure and initialize the NTPD process.</P
744 >A debugging utility furnished with the <SPAN
762 >The command suite for communicating with the Protection Server process (the binary for the Protection Server is
781 >The binary for the Protection Server process.</P
793 >The binary for the program used to configure NTPD most appropriately for use with AFS.</P
805 >The binary for the Salvager component of the <SPAN
823 >The binary for a program that reports the status of AFS's distributed database technology, Ubik.</P
835 >The binary for the client portion of the Update Server process.</P
847 >The binary for the server portion of the Update Server process.</P
859 >The binary for the Volume Location (VL) Server process.</P
871 >The binary for the Volume Server component of the <SPAN
889 >The command suite for communicating with the Volume and VL Server processes (the binaries for the servers are
914 >Common Configuration Files in the /usr/afs/etc Directory</A
923 > on every file server machine's local disk contains
924 configuration files in ASCII and machine-independent binary format. For predictable AFS performance throughout a cell, all
925 server machines must have the same version of each configuration file: <UL
928 >Cells that run the United States edition of AFS conventionally use the Update Server to distribute a common
929 version of each file from the cell's system control machine to other server machines (for more on the system control
931 HREF="c3025.html#HDRWQ94"
932 >The System Control Machine</A
933 >). Run the Update Server's server portion on the
934 system control machine, and the client portion on all other server machines. Update the files on the system control
935 machine only, except as directed by instructions for dealing with emergencies.</P
939 >Cells that run the international edition of AFS must not use the Update Server to distribute the contents of the
946 > directory. Due to United States government regulations, the data
947 encryption routines that AFS uses to protect the files in this directory as they cross the network are not available to
948 the Update Server in the international edition of AFS. You must instead update the files on each server machine
949 individually, taking extra care to issue exactly the same <SPAN
955 > command for each machine.
956 The necessary data encryption routines are available to the <SPAN
963 information is safe as it crosses the network from the machine where the <SPAN
970 issued to the server machines.</P
975 >Never directly edit any of the files in the <SPAN
981 > directory, except as directed
982 by instructions for dealing with emergencies. In normal circumstances, use the appropriate <SPAN
988 > commands to change the files. The following list includes pointers to instructions.</P
990 >The files in this directory include: <DIV
1003 >An ASCII file that names the cell's database server machines, which run the Authentication, Backup, Protection,
1004 and VL Server processes. You create the initial version of this file by issuing the <SPAN
1011 > command while installing your cell's first server machine. It is very important to update this
1012 file when you change the identity of your cell's database server machines.</P
1020 > file is not the same as the <SPAN
1026 > file stored in the <SPAN
1033 client machines. The client version lists the database server machines for every AFS cell that you choose to make
1034 accessible from the client machine. The server <SPAN
1040 > file lists only the local
1041 cell's database server machines, because server processes never contact processes in other cells.</P
1043 >For instructions on maintaining this file, see <A
1044 HREF="c3025.html#HDRWQ118"
1045 >Maintaining the Server CellServDB
1059 >A machine-independent, binary-format file that lists the server encryption keys the AFS server processes use to
1060 encrypt and decrypt tickets. The information in this file is the basis for secure communication in the cell, and so is
1061 extremely sensitive. The file is specially protected so that only privileged users can read or change it.</P
1063 >For instructions on maintaining this file, see <A
1065 >Managing Server Encryption
1079 >An ASCII file that consists of a single line defining the complete Internet domain-style name of the cell (such
1081 CLASS="computeroutput"
1083 >). You create this file with the <SPAN
1090 > command during the installation of your cell's first file server machine, as instructed in the
1095 >IBM AFS Quick Beginnings</I
1099 >Note that changing this file is only one step in changing your cell's name. For discussion, see <A
1100 HREF="c667.html#HDRWQ34"
1101 >Choosing a Cell Name</A
1114 >An ASCII file that lists the usernames of the system administrators authorized to issue privileged <SPAN
1133 commands. For instructions on maintaining the file, see <A
1134 HREF="c32432.html#HDRWQ592"
1135 >Administering the UserList
1149 >Local Configuration Files in the /usr/afs/local Directory</A
1152 >The directory <SPAN
1158 > contains configuration files that are different for each
1159 file server machine in a cell. Thus, they are not updated automatically from a central source like the files in <SPAN
1171 > directories. The most important file is
1178 > file; it defines which server processes are to run on that machine.</P
1180 >As with the common configuration files in <SPAN
1186 >, you must not edit these files
1187 directly. Use commands from the <SPAN
1193 > command suite where appropriate; some files never need to
1196 >The files in this directory include the following: <DIV
1197 CLASS="variablelist"
1209 >This file lists the server processes to run on the server machine, by defining which processes the BOS Server
1210 monitors and what it does if the process fails. It also defines the times at which the BOS Server automatically
1211 restarts processes for maintenance purposes.</P
1213 >As you create server processes during a file server machine's installation, their entries are defined in this
1214 file automatically. The <SPAN
1218 >IBM AFS Quick Beginnings</I
1220 > outlines the <SPAN
1226 > commands to use. For a more complete description of the file, and instructions for
1227 controlling process status by editing the file with commands from the <SPAN
1236 >Monitoring and Controlling Server Processes</A
1249 >This optional ASCII file lists one or more of the network interface addresses on the server machine. If it
1250 exists when the File Server initializes, the File Server uses it as the basis for the list of interfaces that it
1251 registers in its Volume Location Database (VLDB) server entry. See <A
1252 HREF="c3025.html#HDRWQ138"
1254 Addresses and VLDB Server Entries</A
1267 >This optional ASCII file lists one or more network interface addresses. If it exists when the File Server
1268 initializes, the File Server removes the specified addresses from the list of interfaces that it registers in its VLDB
1269 server entry. See <A
1270 HREF="c3025.html#HDRWQ138"
1271 >Managing Server IP Addresses and VLDB Server Entries</A
1284 >This zero-length file instructs all AFS server processes running on the machine not to perform authorization
1285 checking. Thus, they perform any action for any user, even <SPAN
1292 insecure state is useful only in rare instances, mainly during the installation of the machine.</P
1294 >The file is created automatically when you start the initial <SPAN
1307 > flag, or issue the <SPAN
1314 command to turn off authentication requirements. When you use the <SPAN
1321 to turn on authentication, the BOS Server removes this file. For more information, see <A
1322 HREF="c3025.html#HDRWQ123"
1323 >Managing Authentication and Authorization Requirements</A
1336 >This zero-length file controls how the BOS Server handles a crash of the File Server component of the <SPAN
1342 > process. The BOS Server creates this file each time it starts or restarts the <SPAN
1348 > process. If the file is present when the File Server crashes, then the BOS Server runs the
1349 Salvager before restarting the File Server and Volume Server again. When the File Server exits normally, the BOS
1350 Server removes the file so that the Salvager does not run.</P
1352 >Do not create or remove this file yourself; the BOS Server does so automatically. If necessary, you can salvage
1353 a volume or partition by using the <SPAN
1360 HREF="c8420.html#HDRWQ232"
1361 >Salvaging Volumes</A
1374 >This file guarantees that only one Salvager process runs on a file server machine at a time (the single process
1375 can fork multiple subprocesses to salvage multiple partitions in parallel). As the Salvager initiates (when invoked by
1376 the BOS Server or by issue of the <SPAN
1382 > command), it creates this zero-length
1383 file and issues the <SPAN
1389 > system call on it. It removes the file when it completes
1390 the salvage operation. Because the Salvager must lock the file in order to run, only one Salvager can run at a
1403 >This file records the network interface addresses that the File Server (<SPAN
1409 > process) registers in its VLDB server entry. When the Cache Manager requests volume
1410 location information, the Volume Location (VL) Server provides all of the interfaces registered for each server
1411 machine that houses the volume. This enables the Cache Manager to make use of multiple addresses when accessing AFS
1412 data stored on a multihomed file server machine. For further information, see <A
1413 HREF="c3025.html#HDRWQ138"
1415 IP Addresses and VLDB Server Entries</A
1428 >Replicated Database Files in the /usr/afs/db Directory</A
1431 >The directory <SPAN
1437 > contains two types of files pertaining to the four replicated
1438 databases in the cell--the Authentication Database, Backup Database, Protection Database, and Volume Location Database (VLDB):
1442 >A file that contains each database, with a <SPAN
1452 >A log file for each database, with a <SPAN
1458 > extension. The database server
1459 process logs each database operation in this file before performing it. If the operation is interrupted, the process
1460 consults this file to learn how to finish it.</P
1465 >Each database server process (Authentication, Backup, Protection, or VL Server) maintains its own database and log
1466 files. The database files are in binary format, so you must always access or alter them using commands from the <SPAN
1472 > suite (for the Authentication Database), <SPAN
1479 Backup Database), <SPAN
1485 > suite (for the Protection Database), or <SPAN
1491 > suite (for the VLDB).</P
1493 >If a cell runs more than one database server machine, each database server process keeps its own copy of its database on
1494 its machine's hard disk. However, it is important that all the copies of a given database are the same. To synchronize them,
1495 the database server processes call on AFS's distributed database technology, Ubik, as described in <A
1496 HREF="c3025.html#HDRWQ102"
1497 >Replicating the AFS Administrative Databases</A
1500 >The files listed here appear in this directory only on database server machines. On non-database server machines, this
1501 directory is empty. <DIV
1502 CLASS="variablelist"
1514 >The Backup Database file.</P
1526 >The Backup Database log file.</P
1538 >The Authentication Database file.</P
1550 >The Authentication Database log file.</P
1562 >The Protection Database file.</P
1574 >The Protection Database log file.</P
1586 >The Volume Location Database file.</P
1598 >The Volume Location Database log file.</P
1610 >Log Files in the /usr/afs/logs Directory</A
1619 > directory contains log files from various server processes. The files
1620 detail interesting events that occur during normal operations. For instance, the Volume Server can record volume moves in the
1627 > file. Events are recorded at completion, so the server processes do not use these
1628 files to reconstruct failed operations unlike the ones in the <SPAN
1636 >The information in log files can be very useful as you evaluate process failures and other problems. For instance, if
1637 you receive a timeout message when you try to access a volume, checking the <SPAN
1644 possibly provides an explanation, showing that the File Server was unable to attach the volume. To examine a log file
1645 remotely, use the <SPAN
1651 > command as described in <A
1652 HREF="c6449.html#HDRWQ173"
1654 Server Process Log Files</A
1657 >This directory also contains the core image files generated if a process being monitored by the BOS Server crashes. The
1658 BOS Server attempts to add an extension to the standard <SPAN
1664 > name to indicate which process
1665 generated the core file (for example, naming a core file generated by the Protection Server <SPAN
1671 >). The BOS Server cannot always assign the correct extension if two processes fail at
1672 about the same time, so it is not guaranteed to be correct.</P
1674 >The directory contains the following files: <DIV
1675 CLASS="variablelist"
1687 >The Authentication Server's log file.</P
1699 >The Backup Server's log file.</P
1711 >The BOS Server's log file.</P
1723 >The File Server's log file.</P
1735 >The Salvager's log file.</P
1747 >The Volume Location (VL) Server's log file.</P
1759 >The Volume Server's log file.</P
1771 >If present, a core image file produced as an AFS server process on the machine crashed (probably the process
1772 named by process).</P
1784 >To prevent log files from growing unmanageably large, restart the server processes periodically, particularly the
1785 database server processes. To avoid restarting the processes, use the UNIX <SPAN
1792 remove the file as the process runs; it re-creates it automatically.</P
1802 >Volume Headers on Server Partitions</A
1805 >A partition that houses AFS volumes must be mounted at a subdirectory of the machine's root ( / ) directory (not, for
1806 instance under the <SPAN
1812 > directory). The file server machine's file system registry file
1819 > or equivalent) must correctly map the directory name and the partition's device
1820 name. The directory name is of the form <SPAN
1826 >index, where each index is one or two lowercase
1827 letters. By convention, the first AFS partition on a machine is mounted at <SPAN
1840 >, and so on. If there are more than 26 partitions, continue with <SPAN
1852 > and so on. The <SPAN
1859 > specifies the number of supported partitions per server machine.</P
1861 >Do not store non-AFS files on AFS partitions. The File Server and Volume Server expect to have available all of the
1862 space on the partition.</P
1870 > directories contain two types of files: <DIV
1871 CLASS="variablelist"
1883 >Each such file is a volume header. The vol_ID corresponds to the volume ID number displayed in the output from
1914 >This zero-length file triggers the Salvager to salvage the entire partition. The AFS-modified version of the
1921 > program creates this file if it discovers corruption.</P
1933 >For most system types, it is important never to run the standard <SPAN
1940 provided with the operating system on an AFS file server machine. It removes all AFS volume data from server partitions
1941 because it does not recognize their format.</P
1952 >The Four Roles for File Server Machines</A
1955 >In cells that have more than one server machine, not all server machines have to perform exactly the same functions. The
1956 are four possible <SPAN
1962 > a machine can assume, determined by which server processes it is running. A machine
1963 can assume more than one role by running all of the relevant processes. The following list summarizes the four roles, which are
1964 described more completely in subsequent sections. <UL
1971 >simple file server</I
1973 > machine runs only the processes that store and deliver AFS files to client
1974 machines. You can run as many simple file server machines as you need to satisfy your cell's performance and disk space
1983 >database server machine</I
1985 > runs the four database server processes that maintain AFS's
1986 replicated administrative databases: the Authentication, Backup, Protection, and Volume Location (VL) Server
1995 >binary distribution machine</I
1997 > distributes the AFS server binaries for its system type to all
1998 other server machines of that system type.</P
2006 >system control machine</I
2008 > distributes common server configuration files to all other
2009 server machines in the cell, in a cell that runs the United States edition of AFS (cells that use the international
2010 edition of AFS must not use the system control machine for this purpose). The machine conventionally also serves as the
2011 time synchronization source for the cell, adjusting its clock according to a time source outside the cell.</P
2016 >If a cell has a single server machine, it assumes the simple file server and database server roles. The instructions in
2021 >IBM AFS Quick Beginnings</I
2023 > also have you configure it as the system control machine and binary
2024 distribution machine for its system type, but it does not actually perform those functions until you install another server
2027 >It is best to keep the binaries for all of the AFS server processes in the <SPAN
2034 directory, even if not all processes are running. You can then change which roles a machine assumes simply by starting or
2035 stopping the processes that define the role.</P
2042 >Simple File Server Machines</A
2049 >simple file server machine</I
2051 > runs only the server processes that store and deliver AFS files to
2052 client machines, monitor process status, and pick up binaries and configuration files from the cell's binary distribution and
2053 system control machines.</P
2055 >In general, only cells with more than three server machines need to run simple file server machines. In cells with three
2056 or fewer machines, all of them are usually database server machines (to benefit from replicating the administrative
2058 HREF="c3025.html#HDRWQ92"
2059 >Database Server Machines</A
2062 >The following processes run on a simple file server machine: <UL
2065 >The BOS Server (<SPAN
2081 > process, which combines the File Server, Volume Server, and Salvager
2082 processes so that they can coordinate their operations on the data in volumes and avoid the inconsistencies that can
2083 result from multiple simultaneous operations on the same data</P
2087 >The NTP coordinator (<SPAN
2093 > process), which helps keep the machine's clock
2094 synchronized with the clocks on the other server machines in the cell</P
2098 >A client portion of the Update Server that picks up binary files from the binary distribution machine of its AFS
2099 system type (the <SPAN
2109 >A client portion of the Update Server that picks up common configuration files from the system control machine, in
2110 cells running the United States edition of AFS (the <SPAN
2127 >Database Server Machines</A
2134 >database server machine</I
2136 > runs the four processes that maintain the AFS replicated administrative
2137 databases: the Authentication Server, Backup Server, Protection Server, and Volume Location (VL) Server, which maintain the
2138 Authentication Database, Backup Database, Protection Database, and Volume Location Database (VLDB), respectively. To review
2139 the functions of these server processes and their databases, see <A
2140 HREF="c130.html#HDRWQ17"
2141 >AFS Server Processes and the Cache
2145 >If a cell has more than one server machine, it is best to run more than one database server machine, but more than three
2146 are rarely necessary. Replicating the databases in this way yields the same benefits as replicating volumes: increased
2147 availability and reliability of information. If one database server machine or process goes down, the information in the
2148 database is still available from others. The load of requests for database information is spread across multiple machines,
2149 preventing any one from becoming overloaded.</P
2151 >Unlike replicated volumes, however, replicated databases do change frequently. Consistent system performance demands
2152 that all copies of the database always be identical, so it is not possible to record changes in only some of them. To
2153 synchronize the copies of a database, the database server processes use AFS's distributed database technology, Ubik. See <A
2154 HREF="c3025.html#HDRWQ102"
2155 >Replicating the AFS Administrative Databases</A
2158 >It is critical that the AFS server processes on every server machine in a cell know which machines are the database
2159 server machines. The database server processes in particular must maintain constant contact with their peers in order to
2160 coordinate the copies of the database. The other server processes often need information from the databases. Every file server
2161 machine keeps a list of its cell's database server machines in its local <SPAN
2165 >/usr/afs/etc/CellServDB</B
2167 > file. Cells that use the States edition of AFS can use the system control
2168 machine to distribute this file (see <A
2169 HREF="c3025.html#HDRWQ94"
2170 >The System Control Machine</A
2173 >The following processes define a database server machine: <UL
2176 >The Authentication Server (<SPAN
2186 >The Backup Server (<SPAN
2196 >The Protection Server (<SPAN
2206 >The VL Server (<SPAN
2217 >Database server machines can also run the processes that define a simple file server machine, as listed in <A
2218 HREF="c3025.html#HDRWQ91"
2219 >Simple File Server Machines</A
2220 >. One database server machine can act as the cell's system control
2221 machine, and any database server machine can serve as the binary distribution machine for its system type; see <A
2222 HREF="c3025.html#HDRWQ94"
2223 >The System Control Machine</A
2225 HREF="c3025.html#HDRWQ93"
2226 >Binary Distribution Machines</A
2235 >Binary Distribution Machines</A
2242 >binary distribution machine</I
2244 > stores and distributes the binary files for the AFS processes and
2245 command suites to all other server machines of its system type. Each file server machine keeps its own copy of AFS server
2246 process binaries on its local disk, by convention in the <SPAN
2253 consistent system performance, however, all server machines must run the same version (build level) of a process. For
2254 instructions for checking a binary's build level, see <A
2255 HREF="c3025.html#HDRWQ117"
2256 >Displaying A Binary File's Build Level</A
2258 The easiest way to keep the binaries consistent is to have a binary distribution machine of each system type distribute them
2259 to its system-type peers.</P
2261 >The process that defines a binary distribution machine is the server portion of the Update Server (<SPAN
2267 > process). The client portion of the Update Server (<SPAN
2273 > process) runs on the other server machines of that system type and references the binary
2274 distribution machine.</P
2276 >Binary distribution machines usually also run the processes that define a simple file server machine, as listed in <A
2277 HREF="c3025.html#HDRWQ91"
2278 >Simple File Server Machines</A
2279 >. One binary distribution machine can act as the cell's system control
2280 machine, and any binary distribution machine can serve as a database server machine; see <A
2281 HREF="c3025.html#HDRWQ94"
2285 HREF="c3025.html#HDRWQ92"
2286 >Database Server Machines</A
2295 >The System Control Machine</A
2298 >In cells that run the United States edition of AFS, the <SPAN
2302 >system control machine</I
2305 distributes system configuration files shared by all of the server machines in the cell. Each file server machine keeps its
2306 own copy of the configuration files on its local disk, by convention in the <SPAN
2313 directory. For consistent system performance, however, all server machines must use the same files. The easiest way to keep
2314 the files consistent is to have the system control machine distribute them. You make changes only to the copy stored on the
2315 system control machine, as directed by the instructions in this document. The United States edition of AFS is available to
2316 cells in the United States and Canada and to selected institutions in other countries, as determined by United States
2317 government regulations.</P
2319 >Cells that run the international version of AFS do not use the system control machine to distribute system configuration
2320 files. Some of the files contain information that is too sensitive to cross the network unencrypted, and United States
2321 government regulations forbid the export of the necessary encryption routines in the form that the Update Server uses. You
2322 must instead update the configuration files on each file server machine individually. The <SPAN
2329 commands that you use to update the files encrypt the information using an exportable form of the encryption routines.</P
2331 >For a list of the configuration files stored in the <SPAN
2338 HREF="c3025.html#HDRWQ85"
2339 >Common Configuration Files in the /usr/afs/etc Directory</A
2346 >IBM AFS Quick Beginnings</I
2348 > configures a cell's first server machine as the system control
2349 machine. If you wish, you can reassign the role to a different machine that you install later, but you must then change the
2350 client portion of the Update Server (<SPAN
2356 >) process running on all other server
2357 machines to refer to the new system control machine.</P
2359 >The following processes define the system control machine: <UL
2362 >The server portion of the Update Server (<SPAN
2368 >) process, in cells using the
2369 United States edition of AFS. The client portion of the Update Server (<SPAN
2376 process) runs on the other server machines and references the system control machine.</P
2380 >The NTP coordinator (<SPAN
2386 > process) which points to a time source outside the
2387 cell, if the cell uses NTPD to synchronize its clocks. The <SPAN
2394 machines reference the system control machine as their main time source.</P
2399 >The system control machine can also run the processes that define a simple file server machine, as listed in <A
2400 HREF="c3025.html#HDRWQ91"
2401 >Simple File Server Machines</A
2402 >. It can also server as a database server machine, and by convention acts
2403 as the binary distribution machine for its system type. A single <SPAN
2410 distribute both configuration files and binaries. See <A
2411 HREF="c3025.html#HDRWQ92"
2412 >Database Server Machines</A
2414 HREF="c3025.html#HDRWQ93"
2415 >Binary Distribution Machines</A
2424 >To locate database server machines</A
2437 CLASS="programlisting"
2451 >The machines listed in the output are the cell's database server machines. For complete instructions and example
2453 HREF="c3025.html#HDRWQ120"
2454 >To display a cell's database server machines</A
2472 that a machine listed in the output of the <SPAN
2478 > command is actually running the
2479 processes that define it as a database server machine. For complete instructions, see <A
2480 HREF="c6449.html#HDRWQ158"
2482 Process Status and Information from the BosConfig File</A
2484 CLASS="programlisting"
2498 >buserver kaserver ptserver vlserver</B
2504 >If the specified machine is a database server machine, the output from the <SPAN
2511 > command includes the following lines:</P
2513 CLASS="programlisting"
2514 > Instance buserver, currently running normally.
2515 Instance kaserver, currently running normally.
2516 Instance ptserver, currently running normally.
2517 Instance vlserver, currently running normally.
2528 >To locate the system control machine</A
2540 > command for any server machine. Complete instructions appear
2542 HREF="c6449.html#HDRWQ158"
2543 >Displaying Process Status and Information from the BosConfig File</A
2545 CLASS="programlisting"
2559 >upserver upclientbin upclientetc</B
2571 >The output you see depends on the machine you have contacted: a simple file server machine, the system control
2572 machine, or a binary distribution machine. See <A
2573 HREF="c3025.html#HDRWQ98"
2574 >Interpreting the Output from the bos status
2586 >To locate the binary distribution machine for a system type</A
2598 > command for a file server machine of the system type you are
2599 checking (to determine a machine's system type, issue the <SPAN
2611 > command as described in <A
2612 HREF="c21473.html#HDRWQ417"
2613 >Displaying and Setting the System Type
2615 >. Complete instructions for the <SPAN
2621 > command appear in <A
2622 HREF="c6449.html#HDRWQ158"
2623 >Displaying Process Status and Information from the BosConfig File</A
2625 CLASS="programlisting"
2639 >upserver upclientbin upclientetc -long</B
2645 >The output you see depends on the machine you have contacted: a simple file server machine, the system control
2646 machine, or a binary distribution machine. See <A
2647 HREF="c3025.html#HDRWQ98"
2648 >Interpreting the Output from the bos status
2660 >Interpreting the Output from the bos status Command</A
2663 >Interpreting the output of the <SPAN
2669 > command is most straightforward for a simple
2670 file server machine. There is no <SPAN
2676 > process, so the output includes the following
2679 CLASS="programlisting"
2680 > bos: failed to get instance info for 'upserver' (no such entity)
2683 >A simple file server machine runs the <SPAN
2689 > process, so the output includes a
2690 message like the following. It indicates that <SPAN
2696 > is the binary distribution machine
2697 for this system type.</P
2699 CLASS="programlisting"
2700 > Instance upclientbin, (type is simple) currently running normally.
2701 Process last started at Wed Mar 10 23:37:09 1999 (1 proc start)
2702 Command 1 is '/usr/afs/bin/upclient fs7.abc.com -t 60 /usr/afs/bin'
2705 >If you run the United States edition of AFS, a simple file server machine also runs the <SPAN
2711 > process, so the output includes a message like the following. It indicates that <SPAN
2717 > is the system control machine.</P
2719 CLASS="programlisting"
2720 > Instance upclientetc, (type is simple) currently running normally.
2721 Process last started at Mon Mar 22 05:23:49 1999 (1 proc start)
2722 Command 1 is '/usr/afs/bin/upclient fs1.abc.com -t 60 /usr/afs/etc'
2730 >The Output on the System Control Machine</A
2733 >If you run the United States edition of AFS and have issued the <SPAN
2740 for the system control machine, the output includes an entry for the <SPAN
2747 similar to the following:</P
2749 CLASS="programlisting"
2750 > Instance upserver, (type is simple) currently running normally.
2751 Process last started at Mon Mar 22 05:23:54 1999 (1 proc start)
2752 Command 1 is '/usr/afs/bin/upserver'
2755 >If you are using the default configuration recommended in the <SPAN
2759 >IBM AFS Quick Beginnings</I
2762 system control machine is also the binary distribution machine for its system type, and a single <SPAN
2768 > process distributes both kinds of updates. In that case, the output includes the following
2771 CLASS="programlisting"
2772 > bos: failed to get instance info for 'upclientbin' (no such entity)
2773 bos: failed to get instance info for 'upclientetc' (no such entity)
2776 >If the system control machine is not a binary distribution machine, the output includes an error message for the
2783 > process, but a complete a listing for the <SPAN
2789 > process (in this case it refers to the machine <SPAN
2795 > as the binary distribution machine):</P
2797 CLASS="programlisting"
2798 > Instance upclientbin, (type is simple) currently running normally.
2799 Process last started at Mon Mar 22 05:23:49 1999 (1 proc start)
2800 Command 1 is '/usr/afs/bin/upclient fs5.abc.com -t 60 /usr/afs/bin'
2801 bos: failed to get instance info for 'upclientetc' (no such entity)
2810 >The Output on a Binary Distribution Machine</A
2813 >If you have issued the <SPAN
2819 > command for a binary distribution machine, the
2820 output includes an entry for the <SPAN
2826 > process similar to the following and error
2827 message for the <SPAN
2835 CLASS="programlisting"
2836 > Instance upserver, (type is simple) currently running normally.
2837 Process last started at Mon Apr 5 05:23:54 1999 (1 proc start)
2838 Command 1 is '/usr/afs/bin/upserver'
2839 bos: failed to get instance info for 'upclientbin' (no such entity)
2842 >Unless this machine also happens to be the system control machine, a message like the following references the system
2843 control machine (in this case, <SPAN
2851 CLASS="programlisting"
2852 > Instance upclientetc, (type is simple) currently running normally.
2853 Process last started at Mon Apr 5 05:23:49 1999 (1 proc start)
2854 Command 1 is '/usr/afs/bin/upclient fs3.abc.com -t 60 /usr/afs/etc'
2865 >Administering Database Server Machines</A
2868 >This section explains how to administer database server machines. For installation instructions, see the <SPAN
2882 >Replicating the AFS Administrative Databases</A
2885 >There are several benefits to replicating the AFS administrative databases (the Authentication, Backup, Protection, and
2886 Volume Location Databases), as discussed in <A
2887 HREF="c667.html#HDRWQ52"
2888 >Replicating the AFS Administrative Databases</A
2890 correct cell functioning, the copies of each database must be identical at all times. To keep the databases synchronized, AFS
2891 uses library of utilities called <SPAN
2897 >. Each database server process runs an associated lightweight Ubik
2898 process, and client-side programs call Ubik's client-side subroutines when they submit requests to read and change the
2901 >Ubik is designed to work with minimal administrator intervention, but there are several configuration requirements, as
2903 HREF="c3025.html#HDRWQ103"
2904 >Configuring the Cell for Proper Ubik Operation</A
2905 >. The following brief overview of
2906 Ubik's operation is helpful for understanding the requirements. For more details, see <A
2907 HREF="c3025.html#HDRWQ104"
2909 Operates Automatically</A
2912 >Ubik is designed to distribute changes made in an AFS administrative database to all copies as quickly as possible. Only
2913 one copy of the database, the <SPAN
2917 >synchronization site</I
2919 >, accepts change requests from clients; the lightweight
2920 Ubik process running there is the <SPAN
2924 >Ubik coordinator</I
2926 >. To maintain maximum availability, there is a separate
2927 Ubik coordinator for each database, and the synchronization site for each of the four databases can be on a different machine.
2928 The synchronization site for a database can also move from machine to machine in response to process, machine, or network
2931 >The other copies of a database, and the Ubik processes that maintain them, are termed <SPAN
2938 The secondary sites do not accept database changes directly from client-side programs, but only from the synchronization
2941 >After the Ubik coordinator records a change in its copy of a database, it immediately sends the change to the secondary
2942 sites. During the brief distribution period, clients cannot access any of the copies of the database, even for reading. If the
2943 coordinator cannot reach a majority of the secondary sites, it halts the distribution and informs the client that the
2944 attempted change failed.</P
2946 >To avoid distribution failures, the Ubik processes maintain constant contact by exchanging time-stamped messages. As
2947 long as a majority of the secondary sites respond to the coordinator's messages, there is a <SPAN
2954 sites that are synchronized with the coordinator. If a process, machine, or network outage breaks the quorum, the Ubik
2955 processes attempt to elect a new coordinator in order to establish a new quorum among the highest possible number of sites.
2957 HREF="c3025.html#HDRWQ106"
2958 >A Flexible Coordinator Boosts Availability</A
2966 >Configuring the Cell for Proper Ubik Operation</A
2969 >This section describes how to configure your cell to maintain proper Ubik operation. <UL
2972 >Run all four database server processes--Authentication Server, Backup Server, Protection Server, and VL
2973 Server--on all database server machines.</P
2975 >Both the client and server portions of Ubik expect that all the database server machines listed in the <SPAN
2981 > file are running all of the database server processes. There is no mechanism for
2982 indicating that only some database server processes are running on a machine.</P
2986 >Maintain correct information in the <SPAN
2990 >/usr/afs/etc/CellServDB</B
2995 >Ubik consults the <SPAN
2999 >/usr/afs/etc/CellServDB</B
3001 > file to determine the sites with
3002 which to establish and maintain a quorum. Incorrect information can result in unsynchronized databases or election of
3003 a coordinator in each of several subgroups of machines, because the Ubik processes on various machines do not agree on
3004 which machines need to participate in the quorum.</P
3006 >If you run the United States version of AFS and use the Update Server, it is simplest to maintain the <SPAN
3010 >/usr/afs/etc/CellServDB</B
3012 > file on the system control machine, which distributes its copy to all
3013 other server machines. The <SPAN
3017 >IBM AFS Quick Beginnings</I
3019 > explains how to configure the Update Server.
3020 If you run the international version of AFS, you must update the file on each machine individually.</P
3022 >The only reason to alter the file is when configuring or decommissioning a database server machine. Use the
3029 > commands rather than editing the file by hand. For instructions, see
3031 HREF="c3025.html#HDRWQ118"
3032 >Maintaining the Server CellServDB File</A
3033 >. The instructions in <A
3035 >Monitoring and Controlling Server Processes</A
3036 > for stopping and starting processes remind you
3043 > file when appropriate, as do the instructions in the
3048 >IBM AFS Quick Beginnings</I
3050 > for installing or decommissioning a database server machine.</P
3052 >(Client processes and the server processes that do not maintain databases also rely on correct information in
3059 > file for proper operation, but their use of the information does not
3060 affect Ubik's operation. See <A
3061 HREF="c3025.html#HDRWQ118"
3062 >Maintaining the Server CellServDB File</A
3064 HREF="c21473.html#HDRWQ406"
3065 >Maintaining Knowledge of Database Server Machines</A
3070 >Keep the clocks synchronized on all machines in the cell, especially the database server machines.</P
3072 >In the conventional configuration specified in the <SPAN
3076 >IBM AFS Quick Beginnings</I
3085 > process to supervise the local Network Time Protocol Daemon (NTPD) on every
3086 AFS server machine. The NTPD on the system control machine synchronizes its clock with a reliable source outside the
3087 cell and broadcasts the time to the NTPDs on the other server machines. You can choose to run a different time
3088 synchronization protocol if you wish.</P
3090 >Keeping clocks synchronized is important because the Ubik processes at a database's sites timestamp the messages
3091 which they exchange to maintain constant contact. Timestamping the messages is necessary because in a networked
3092 environment it is not safe to assume that a message reaches its destination instantly. Ubik compares the timestamp on
3093 an incoming message with the current time. If the difference is too great, it is possible that an outage is preventing
3094 reliable communication between the Ubik sites, which can possibly result in unsynchronized databases. Ubik considers
3095 the message invalid, which can prompt it to attempt election of a different coordinator.</P
3097 >Electing a new coordinator is appropriate if a timestamped message is expired due to actual interruption of
3098 communication, but not if a message appears expired only because the sender and recipient do not share the same time.
3099 For detailed examples of how unsynchronized clocks can destabilize Ubik operation, see <A
3100 HREF="c3025.html#HDRWQ105"
3102 Ubik Uses Timestamped Messages</A
3114 >How Ubik Operates Automatically</A
3117 >The following Ubik features help keep its maintenance requirements to a minimum: <UL
3120 >Ubik's server and client portions operate automatically.</P
3122 >Each database server process runs a lightweight process to call on the server portion of the Ubik library. It is
3123 common to refer to this lightweight process itself as Ubik. Because it is lightweight, the Ubik process does not
3124 appear in process listings such as those generated by the UNIX <SPAN
3131 Client-side programs that need to read and change the databases directly call the subroutines in the Ubik library's
3132 client portion, rather than running a separate lightweight process. Examples of such programs are the <SPAN
3138 > command and the commands in the <SPAN
3148 >Ubik tracks database version numbers.</P
3150 >As the coordinator records a change to a database, it increments the database's version number. The version
3151 number makes it easy for the coordinator to determine if a site has the most recent version or not. The version number
3152 speeds the return to normal functioning after election of a new coordinator or when communication is restored after an
3153 outage, because it makes it easy to determine which site has the most current database and which need to be
3158 >Ubik's use of timestamped messages guarantees that database copies are always synchronized during normal
3161 >Replicating a database to increase data availability is pointless if all copies of the database are not the
3162 same. Inconsistent performance can result if clients receive different information depending on which copy of the
3163 database they access. As previously noted, Ubik sites constantly track the status of their peers by exchanging
3164 timestamped messages. For a detailed description, see <A
3165 HREF="c3025.html#HDRWQ105"
3166 >How Ubik Uses Timestamped
3172 >The ability to move the coordinator maximizes database availability.</P
3174 >Suppose, for example, that in a cell with three database server machines a network partition separates the two
3175 secondary sites from the coordinator. The coordinator retires because it is no longer in contact with a majority of
3176 the sites listed in the <SPAN
3182 > file. The two sites on the other side of the
3183 partition can elect a new coordinator among themselves, and it can then accept database changes from clients. If the
3184 coordinator cannot move in this way, the database has to be read-only until the network partition is repaired. For a
3185 detailed description of Ubik's election procedure, see <A
3186 HREF="c3025.html#HDRWQ106"
3187 >A Flexible Coordinator Boosts
3199 >How Ubik Uses Timestamped Messages</A
3202 >Ubik synchronizes the copies of a database by maintaining constant contact between the synchronization site and the
3203 secondary sites. The Ubik coordinator frequently sends a time-stamped <SPAN
3209 > message to each of
3210 the secondary sites. When the secondary site receives the message, it concludes that it is in contact with the
3211 coordinator. It considers its copy of the database to be valid until time <SPAN
3217 >, which is usually 60
3218 seconds from the time the coordinator sent the message. In response, the secondary site returns a
3225 > message that acknowledges the coordinator as valid until a certain time X, which is usually 120
3226 seconds in the future.</P
3228 >The coordinator sends guarantee messages more frequently than every <SPAN
3234 > seconds, so that the
3235 expiration periods overlap. There is no danger of expiration unless a network partition or other outage actually
3236 interrupts communication. If the guarantee expires, the secondary site's copy of the database it not necessarily current.
3237 Nonetheless, the database server continues to service client requests. It is considered better for overall cell
3238 functioning that a secondary site remains accessible even if the information it is distributing is possibly out of date.
3239 Most of the AFS administrative databases do not change that frequently, in any case, and making a database inaccessible
3240 causes a timeout for clients that happen to access that copy.</P
3242 >As previously mentioned, Ubik's use of timestamped messages makes it vital to synchronize the clocks on database
3243 server machines. There are two ways that skewed clocks can interrupt normal Ubik functioning, depending on which clock is
3244 ahead of the others.</P
3246 >Suppose, for example, that the Ubik coordinator's clock is ahead of the secondary sites: the coordinator's clock
3247 says 9:35:30, but the secondary clocks say 9:31:30. The secondary sites send votes messages that acknowledge the
3248 coordinator as valid until 9:33:30. This is two minutes in the future according to the secondary clocks, but is already in
3249 the past from the coordinator's perspective. The coordinator concludes that it no longer has enough support to remain
3250 coordinator and forces election of a new coordinator. Election takes about three minutes, during which time no copy of the
3251 database accepts changes.</P
3253 >The opposite possibility is that a secondary site's clock (14:50:00) is ahead of the coordinator's (14:46:30). When
3254 the coordinator sends a guarantee message good until 14:47:30), it has already expired according to the secondary clock.
3255 Believing that it is out of contact with the coordinator, the secondary site stops sending votes for the coordinator and
3256 tries get itself elected as coordinator. This is appropriate if the coordinator has actually failed, but is inappropriate
3257 when there is no actual outage.</P
3259 >The attempt of a single secondary site to get elected as the new coordinator usually does not affect the performance
3260 of the other sites. As long as their clocks agree with the coordinator's, they ignore the other secondary site's request
3261 for votes and continue voting for the current coordinator. However, if enough of the secondary sites's clocks get ahead of
3262 the coordinator's, they can force election of a new coordinator even though the current one is actually working
3271 >A Flexible Coordinator Boosts Availability</A
3274 >Ubik uses timestamped messages to determine when coordinator election is necessary, just as it does to keep the
3275 database copies synchronized. As long as the coordinator receives vote messages from a majority of the sites (it
3276 implicitly votes for itself), it is appropriate for it to continue as coordinator because it is successfully distributing
3277 database changes. A majority is defined as more than 50% of all database sites when there are an odd number of sites; with
3278 an even number of sites, the site with the lowest Internet address has an extra vote for breaking ties as necessary.If the
3279 coordinator is not receiving sufficient votes, it retires and the Ubik sites elect a new coordinator. This does not happen
3280 spontaneously, but only when the coordinator really fails or stops receiving a majority of the votes. The secondary sites
3281 have a built-in bias to continue voting for an existing coordinator, which prevents undue elections.</P
3283 >The election of the new coordinator is by majority vote. The Ubik subprocesses have a bias to vote for the site with
3284 the lowest Internet address, which helps it gather the necessary majority quicker than if all the sites were competing to
3285 receive votes themselves. During the election (which normally lasts less than three minutes), clients can read information
3286 from the database, but cannot make any changes.</P
3288 >Ubik's election procedure makes it possible for each database server process's coordinator to be on a different
3289 machine. For example, if the Ubik coordinators for all four processes start out on machine A and the Protection Server on
3290 machine A fails for some reason, then a different site (say machine B) must be elected as the new Protection Database Ubik
3291 coordinator. Machine B remains the coordinator for the Protection Database even after the Protection Server on machine A
3292 is working again. The failure of the Protection Server has no effect on the Authentication, Backup, or VL Servers, so
3293 their coordinators remain on machine A.</P
3303 >Backing Up and Restoring the Administrative Databases</A
3306 >The AFS administrative databases store information that is critical for AFS operation in your cell. If a database
3307 becomes corrupted due to a hardware failure or other problem on a database server machine, it likely to be difficult and
3308 time-consuming to recreate all of the information from scratch. To protect yourself against loss of data, back up the
3309 administrative databases to a permanent media, such as tape, on a regular basis. The recommended method is to use a standard
3310 local disk backup utility such as the UNIX <SPAN
3318 >When deciding how often to back up a database, consider the amount of data that you are willing to recreate by hand if
3319 it becomes necessary to restore the database from a backup copy. In most cells, the databases differ quite a bit in how often
3320 and how much they change. Changes to the Authentication Database are probably the least frequent, and consist mostly of
3321 changed user passwords. Protection Database and VLDB changes are probably more frequent, as users add or delete groups and
3322 change group memberships, and as you and other administrators create or move volumes. The number and frequency of changes is
3323 probably greatest in the Backup Database, particularly if you perform backups every day.</P
3325 >The ease with which you can recapture lost changes also differs for the different databases: <UL
3328 >If regular users make a large proportion of the changes to the Authentication Database and Protection Database in
3329 your cell, then recovering them possibly requires a large amount of detective work and interviewing of users, assuming
3330 that they can even remember what changes they made at what time.</P
3334 >Recovering lost changes to the VLDB is more straightforward, because you can use the <SPAN
3347 > commands to correct any discrepancies between the
3348 VLDB and the actual state of volumes on server machines. Running these commands can be time-consuming, however.</P
3352 >The configuration information in the Backup Database (Tape Coordinator port offsets, volume sets and entries, the
3353 dump hierarchy, and so on) probably does not change that often, in which case it is not that hard to recover a few
3354 recent changes. In contrast, there are likely to be a large number of new dump records resulting from dump operations.
3355 You can recover these records by using the <SPAN
3361 > argument to the <SPAN
3367 > command, reading in information from the backup tapes themselves. This can take a
3368 long time and require numerous tape changes, however, depending on how much data you back up in your cell and how you
3369 append dumps. Furthermore, the <SPAN
3375 > command is subject to several
3376 restrictions. The most basic is that it halts if it finds that an existing dump record in the database has the same dump
3377 ID number as a dump on the tape it is scanning. If you want to continue with the scanning operation, you must locate and
3378 remove the existing record from the database. For further discussion, see the <SPAN
3385 > command's reference page in the <SPAN
3389 >IBM AFS Administration Reference</I
3396 >These differences between the databases possibly suggest backing up the database at different frequencies, ranging from
3397 every few days or weekly for the Backup Database to every few weeks for the Authentication Database. On the other hand, it is
3398 probably simpler from a logistical standpoint to back them all up at the same time (and frequently), particularly if tape
3399 consumption is not a major concern. Also, it is not generally necessary to keep backup copies of the databases for a long
3400 time, so you can recycle the tapes fairly frequently.</P
3408 >To back up the administrative databases</A
3414 >Log in as the local superuser <SPAN
3420 > on a database server machine that is not the
3421 synchronization site. The machine with the highest IP address is normally the best choice, since it is least likely to
3422 become the synchronization site in an election.</P
3427 NAME="LIDBBK_SHUTDOWN"
3435 > command to shut down the
3436 relevant server process on the local machine. For a complete description of the command, see <A
3437 HREF="c6449.html#HDRWQ168"
3439 stop processes temporarily</A
3448 > argument, specify one or more database server process names
3455 > for the Backup Server, <SPAN
3462 Authentication Server, <SPAN
3468 > for the Protection Server, or <SPAN
3474 > for the Volume Location Server. Include the <SPAN
3481 flag because you are logged in as the local superuser <SPAN
3487 > but do not necessarily have
3488 administrative tokens.</P
3490 CLASS="programlisting"
3526 >Use a local disk backup utility, such as the UNIX <SPAN
3532 > command, to transfer one or
3533 more database files to tape. If the local database server machine does not have a tape device attached, use a remote copy
3534 command to transfer the file to a machine with a tape device, then use the <SPAN
3543 >The following command sequence backs up the complete contents of the <SPAN
3552 CLASS="programlisting"
3575 >To back up individual database files, substitute their names for the period in the preceding <SPAN
3590 > for the Backup Database</P
3600 > for the Authentication Database</P
3610 > for the Protection Database</P
3633 > command to restart the server processes on the local machine.
3634 For a complete description of the command, see <A
3635 HREF="c6449.html#HDRWQ166"
3636 >To start processes by changing their status flags
3638 >. Provide the same values for the <SPAN
3644 > argument as in Step <A
3645 HREF="c3025.html#LIDBBK_SHUTDOWN"
3653 > flag for the same reason.
3655 CLASS="programlisting"
3673 >server process name</VAR
3692 >To restore an administrative database</A
3698 >Log in as the local superuser <SPAN
3704 > on each database server machine in the
3710 NAME="LIDBREST_SHUTDOWN"
3712 >Working on one of the machines, issue the <SPAN
3719 > command once for each database server machine, to shut down the relevant server process on all of
3720 them. For a complete description of the command, see <A
3721 HREF="c6449.html#HDRWQ168"
3722 >To stop processes temporarily</A
3731 > argument, specify one or more database server process names
3738 > for the Backup Server, <SPAN
3745 Authentication Server, <SPAN
3751 > for the Protection Server, or <SPAN
3757 > for the Volume Location Server. Include the <SPAN
3764 flag because you are logged in as the local superuser <SPAN
3770 > but do not necessarily have
3771 administrative tokens.</P
3773 CLASS="programlisting"
3809 >Remove the database from each database server machine, by issuing the following commands on each one.
3811 CLASS="programlisting"
3822 >For the Backup Database:</P
3824 CLASS="programlisting"
3841 >For the Authentication Database:</P
3843 CLASS="programlisting"
3855 >rm kaserver.DBSYS1</B
3860 >For the Protection Database:</P
3862 CLASS="programlisting"
3881 CLASS="programlisting"
3900 >Using the local disk backup utility that you used to back up the database, copy the most recently backed-up version
3901 of it to the appropriate file on the database server machine with the lowest IP address. The following is an appropriate
3908 > command if the synchronization site has a tape device attached: <PRE
3909 CLASS="programlisting"
3923 > tape_device database_file
3933 > is one of the following: <UL
3942 > for the Backup Database</P
3952 > for the Authentication Database</P
3962 > for the Protection Database</P
3979 >Working on one of the machines, issue the <SPAN
3985 > command to restart the server
3986 process on each of the database server machines in turn. Start with the machine with the lowest IP address, which becomes
3987 the synchronization site for the Backup Database. Wait for it to establish itself as the synchronization site before
3988 repeating the command to restart the process on the other database server machines. For a complete description of the
3990 HREF="c6449.html#HDRWQ166"
3991 >To start processes by changing their status flags to Run</A
3993 values for the <SPAN
3999 > argument as in Step <A
4000 HREF="c3025.html#LIDBREST_SHUTDOWN"
4009 > flag for the same reason. <PRE
4010 CLASS="programlisting"
4028 >server process name</VAR
4041 >If the database has changed since you last backed it up, issue the appropriate commands from the instructions in the
4042 indicated sections to recreate the information in the restored database. If issuing <SPAN
4049 commands, you must first obtain administrative tokens. The <SPAN
4061 > commands accept the <SPAN
4067 > flag if you are logged in as
4068 the local superuser <SPAN
4074 >, so you do not need administrative tokens. The Authentication
4075 Server always performs a separate authentication anyway, so you only need to include the <SPAN
4081 > argument if issuing <SPAN
4090 >To define or remove volume sets and volume entries in the Backup Database, see <A
4091 HREF="c12776.html#HDRWQ265"
4092 >Defining and Displaying Volume Sets and Volume Entries</A
4097 >To edit the dump hierarchy in the Backup Database, see <A
4098 HREF="c12776.html#HDRWQ267"
4099 >Defining and Displaying the
4105 >To define or remove Tape Coordinator port offset entries in the Backup Database, see <A
4106 HREF="c12776.html#HDRWQ261"
4107 >Configuring Tape Coordinator Machines and Tape Devices</A
4112 >To restore dump records in the Backup Database, see <A
4113 HREF="c15383.html#HDRWQ305"
4114 >To scan the contents of a
4120 >To recreate Authentication Database entries or password changes for users, see the appropriate section of
4123 >Administering User Accounts</A
4128 >To recreate Protection Database entries or group membership information, see the appropriate section of <A
4130 >Administering the Protection Database</A
4135 >To synchronize the VLDB with volume headers, see <A
4136 HREF="c8420.html#HDRWQ227"
4137 >Synchronizing the VLDB and Volume
4153 >Installing Server Process Software</A
4156 >This section explains how to install new server process binaries on file server machines, how to revert to a previous
4157 version if the current version is not working properly, and how to install new disks to house AFS volumes on a file server
4160 >The most frequent reason to replace a server process's binaries is to upgrade AFS to a new version. In general,
4161 installation instructions accompany the updated software, but this chapter provides an additional reference.</P
4163 >Each AFS server machine must store the server process binaries in a local disk directory, called <SPAN
4169 > by convention. For predictable system performance, it is best that all server machines run
4170 the same build level, or at least the same version, of the server software. For instructions on checking AFS build level, see
4172 HREF="c3025.html#HDRWQ117"
4173 >Displaying A Binary File's Build Level</A
4176 >The Update Server makes it easy to distribute a consistent version of software to all server machines. You designate one
4177 server machine of each system type as the <SPAN
4181 >binary distribution machine</I
4183 > by running the server portion of the
4184 Update Server (<SPAN
4190 > process) on it. All other server machines of that system type run the
4191 client portion of the Update Server (<SPAN
4197 > process) to retrieve updated software from the
4198 binary distribution machine. The <SPAN
4202 >IBM AFS Quick Beginnings</I
4204 > explains how to install the appropriate
4205 processes. For more on binary distribution machines, see <A
4206 HREF="c3025.html#HDRWQ93"
4207 >Binary Distribution Machines</A
4210 >When you use the Update Server, you install new binaries on binary distribution machines only. If you install binaries
4211 directly on a machine that is running the <SPAN
4217 > process, they are overwritten the next
4218 time the process compares the contents of the local <SPAN
4224 > directory to the contents on
4225 the system control machine, by default within five minutes.</P
4227 >The following instructions explain how to use the appropriate commands from the <SPAN
4234 to install and uninstall server binaries.</P
4241 >Installing New Binaries</A
4244 >An AFS server process does not automatically switch to a new process binary file as soon as it is installed in the
4251 > directory. The process continues to use the previous version of the binary file
4252 until it (the process) next restarts. By default, the BOS Server restarts processes for which there are new binary files every
4253 day at 5:00 a.m., as specified in the <SPAN
4257 >/usr/afs/local/BosConfig</B
4259 > file. To display or change
4264 >binary restart time</I
4278 > commands, as described in <A
4279 HREF="c6449.html#HDRWQ171"
4280 >Setting the BOS Server's Restart
4284 >You can force the server machine to start using new server process binaries immediately by issuing the <SPAN
4290 > command as described in the following instructions.</P
4292 >You do not need to restart processes when you install new command suite binaries. The new binary is invoked
4293 automatically the next time a command from the suite is issued.</P
4295 >When you use the <SPAN
4301 > command, the BOS Server automatically saves the current
4302 version of a binary file by adding a <SPAN
4308 > extension to its name. It renames the current
4315 > version, if any, to the <SPAN
4321 > version, if there is no
4328 > version already. If there is a current <SPAN
4341 > version must be at least seven days old to replace it.</P
4343 >It is best to store AFS binaries in the <SPAN
4349 > directory, because that is the
4350 only directory the BOS Server automatically checks for new binaries. You can, however, use the <SPAN
4363 > argument to install non-AFS binaries into other directories
4364 on a server machine's local disk. See the command's reference page in the <SPAN
4368 >IBM AFS Administration
4371 > for further information.</P
4379 >To install new server binaries</A
4385 >Verify that you are listed in the <SPAN
4389 >/usr/afs/etc/UserList</B
4391 > file. If necessary, issue
4398 > command, which is fully described in <A
4399 HREF="c32432.html#HDRWQ593"
4401 display the users in the UserList file</A
4403 CLASS="programlisting"
4419 >Verify that the binaries are available in the source directory from which you are installing them. If the machine is
4420 also an AFS client, you can retrieve the binaries from a central directory in AFS. Otherwise, you can obtain them directly
4421 from the AFS distribution media, from a local disk directory where you previously installed them, or from a remote machine
4422 using a transfer utility such as the <SPAN
4441 > command for the binary distribution
4442 machine. (If you have forgotten which machine is performing that role, see <A
4443 HREF="c3025.html#HDRWQ97"
4444 >To locate the binary
4445 distribution machine for a system type</A
4447 CLASS="programlisting"
4459 >files to install</VAR
4465 CLASS="variablelist"
4477 >Is the shortest acceptable abbreviation of <SPAN
4495 >Names the binary distribution machine.</P
4502 >files to install</B
4507 >Names each binary file to install into the local <SPAN
4514 Partial pathnames are interpreted relative to the current working directory. The last element in each pathname
4515 (the filename itself) matches the name of the file it is replacing, such as <SPAN
4527 > for server processes, <SPAN
4541 >Each AFS server process other than the <SPAN
4547 > process uses a single binary
4554 > process uses three binary files: <SPAN
4572 >. Installing a new version of one component does not necessarily mean that you need
4573 to replace all three.</P
4582 HREF="c3025.html#LIWQ112"
4584 > for each binary distribution machine.</P
4594 > If you want to restart processes to use the new binaries immediately,
4595 wait until the <SPAN
4601 > process retrieves them from the binary distribution machine.
4602 You can verify the timestamps on binary files by using the <SPAN
4610 HREF="c3025.html#HDRWQ115"
4611 >Displaying Binary Version Dates</A
4612 >. When the binary files are available on each
4613 server machine, issue the <SPAN
4619 > command, for which complete instructions appear in
4621 HREF="c6449.html#HDRWQ170"
4622 >Stopping and Immediately Restarting Processes</A
4625 >If you are working on an AFS client machine, it is a wise precaution to have a copy of the <SPAN
4631 > command suite binaries on the local disk before restarting server processes. In the
4632 conventional configuration, the <SPAN
4638 > directory that houses the <SPAN
4644 > command binary on client machines is a symbolic link into AFS, which conserves local disk
4645 space. However, restarting certain processes (particularly the database server processes) can make the AFS filespace
4646 inaccessible, particularly if a problem arises during the restart. Having a local copy of the <SPAN
4652 > binary enables you to uninstall or reinstall process binaries or restart processes even in this
4659 > command to copy the <SPAN
4672 > directory to a local directory such as <SPAN
4680 >Restarting a process causes a service outage. It is best to perform the restart at times of low system usage if
4683 CLASS="programlisting"
4707 >Reverting to the Previous Version of Binaries</A
4710 >In rare cases, installing a new binary can cause problems serious enough to require reverting to the previous version.
4711 Just as with installing binaries, consistent system performance requires reverting every server machine back to the same
4712 version. Issue the <SPAN
4718 > command described here on each binary distribution
4721 >When you use the <SPAN
4727 > command, the BOS Server discards the current version of
4728 a binary file and promotes the <SPAN
4734 > version of the file by removing the extension. It renames
4741 > version, if any, to <SPAN
4749 >If there is no current <SPAN
4755 > version, the <SPAN
4762 command operation fails and generates an error message. If a <SPAN
4768 > version still exists, issue
4775 > command to rename it to <SPAN
4781 > before reissuing the
4790 >Just as when you install new binaries, the server processes do not start using a reverted version immediately.
4791 Presumably you are reverting because the current binaries do not work, so the following instructions have you restart the
4792 relevant processes.</P
4800 >To revert to the previous version of binaries</A
4806 >Verify that you are listed in the <SPAN
4810 >/usr/afs/etc/UserList</B
4812 > file. If necessary, issue
4819 > command, which is fully described in <A
4820 HREF="c32432.html#HDRWQ593"
4822 display the users in the UserList file</A
4824 CLASS="programlisting"
4840 >Verify that the <SPAN
4846 > version of each relevant binary is available in the <SPAN
4852 > directory on each binary distribution machine. If necessary, you can use the <SPAN
4858 > command as described in <A
4859 HREF="c3025.html#HDRWQ115"
4860 >Displaying Binary Version
4862 >. If necessary, rename the <SPAN
4887 > command for a binary distribution
4888 machine. (If you have forgotten which machine is performing that role, see <A
4889 HREF="c3025.html#HDRWQ97"
4890 >To locate the binary
4891 distribution machine for a system type</A
4893 CLASS="programlisting"
4905 >files to uninstall</VAR
4911 CLASS="variablelist"
4923 >Is the shortest acceptable abbreviation of <SPAN
4941 >Names the binary distribution machine.</P
4948 >files to uninstall</B
4953 >Names each binary file in the <SPAN
4959 > directory to replace with its
4966 > version. The file name alone is sufficient, because the <SPAN
4972 > directory is assumed.</P
4981 HREF="c3025.html#LIWQ114"
4983 > for each binary distribution machine.</P
4987 >Wait until the <SPAN
4993 > process on each server machine retrieves the reverted
4994 from the binary distribution machine. You can verify the timestamps on binary files by using the <SPAN
5001 > command as described in <A
5002 HREF="c3025.html#HDRWQ115"
5003 >Displaying Binary Version Dates</A
5005 binary files are available on each server machine, issue the <SPAN
5012 which complete instructions appear in <A
5013 HREF="c6449.html#HDRWQ170"
5014 >Stopping and Immediately Restarting
5018 >If you are working on an AFS client machine, it is a wise precaution to have a copy of the <SPAN
5024 > command suite binaries on the local disk before restarting server processes. In the
5025 conventional configuration, the <SPAN
5031 > directory that houses the <SPAN
5037 > command binary on client machines is a symbolic link into AFS, which conserves local disk
5038 space. However, restarting certain processes (particularly the database server processes) can make the AFS filespace
5039 inaccessible, particularly if a problem arises during the restart. Having a local copy of the <SPAN
5045 > binary enables you to uninstall or reinstall process binaries or restart processes even in this
5052 > command to copy the <SPAN
5065 > directory to a local directory such as <SPAN
5073 CLASS="programlisting"
5097 >Displaying Binary Version Dates</A
5100 >You can check the compilation dates for all three versions of a binary file in the <SPAN
5106 > directory--the current, <SPAN
5118 > versions. This is useful for verifying that new binaries have been copied to a file server machine
5119 from its binary distribution machine before restarting a server process to use the new binaries.</P
5121 >To check dates on binaries in a directory other than <SPAN
5133 > argument. See the <SPAN
5137 >IBM AFS Administration Reference</I
5147 >To display binary version dates</A
5160 CLASS="programlisting"
5172 >files to check</VAR
5178 CLASS="variablelist"
5190 >Is the shortest acceptable abbreviation of <SPAN
5208 >Name the file server machine for which to display binary dates.</P
5220 >Names each binary file to display.</P
5234 >Removing Obsolete Binary Files</A
5237 >When processes with new binaries have been running without problems for a number of days, it is generally safe to remove
5250 > versions from the <SPAN
5256 > directory, both to reduce clutter and to free space on the file server machine's local
5259 >You can use the <SPAN
5265 > command's flags to remove the following types of files:
5269 >To remove files in the <SPAN
5275 > directory with a <SPAN
5281 > extension, use the <SPAN
5291 >To remove files in the <SPAN
5297 > directory with a <SPAN
5303 > extension, use the <SPAN
5313 >To remove files in the <SPAN
5319 > directory called <SPAN
5325 >, with any extension, use the <SPAN
5335 >To remove all three types of files, use the <SPAN
5352 >To remove obsolete binaries</A
5358 >Verify that you are listed in the <SPAN
5362 >/usr/afs/etc/UserList</B
5364 > file. If necessary, issue
5371 > command, which is fully described in <A
5372 HREF="c32432.html#HDRWQ593"
5374 display the users in the UserList file</A
5376 CLASS="programlisting"
5398 > command with one or more of its flags. <PRE
5399 CLASS="programlisting"
5438 CLASS="variablelist"
5450 >Is the shortest acceptable abbreviation of <SPAN
5468 >Names the file server machine on which to remove obsolete files.</P
5480 >Removes all the files with a <SPAN
5486 > extension from the <SPAN
5492 > directory. Do not combine this flag with the <SPAN
5510 >Removes all the files a .<SPAN
5516 > extension from the <SPAN
5522 > directory. Do not combine this flag with the <SPAN
5540 >Removes all core files from the <SPAN
5546 > directory. Do not combine
5547 this flag with the <SPAN
5565 >Combines the effect of the other three flags. Do not combine it with the other three flags.</P
5579 >Displaying A Binary File's Build Level</A
5582 >For the most consistent performance on a server machine, and cell-wide, it is best for all server processes to come from
5583 the same AFS distribution. Every AFS binary includes an ASCII string that specifies its version, or <SPAN
5590 >. To display it, use the <SPAN
5603 commands, which are included in most UNIX distributions.</P
5611 >To display an AFS binary's build level</A
5617 >Change to the directory that houses the binary file . If you are not sure where the binary resides, issue the
5625 CLASS="programlisting"
5633 /bin_dir_path/binary_file
5652 > command to extract all ASCII strings from the binary file. Pipe
5653 the output to the <SPAN
5659 > command to locate the relevant line. <PRE
5660 CLASS="programlisting"
5677 >The output reports the AFS build level in a format like the following:</P
5679 CLASS="programlisting"
5680 > @(#)Base configuration afsversion build_level
5683 >For example, the following string indicates the binary is from AFS 3.6 build 3.0:</P
5685 CLASS="programlisting"
5686 > @(#)Base configuration afs3.6 3.0
5698 >Maintaining the Server CellServDB File</A
5701 >Every file server machine maintains a list of its home cell's database server machines in the local disk file <SPAN
5705 >/usr/afs/etc/CellServDB</B
5707 > on its local disk. Both database server processes and non-database server
5708 processes consult the file: <UL
5711 >The database server processes (the Authentication, Backup, Protection, and Volume Location Servers) maintain
5712 constant contact with their peers in order to keep their copies of the replicated administrative databases
5716 HREF="c3025.html#HDRWQ102"
5717 >Replicating the AFS Administrative Databases</A
5718 >, the database server
5719 processes use the Ubik utility to synchronize the information in the databases they maintain. The Ubik coordinator at the
5720 synchronization site for each database maintains the single read/write copy of the database and distributes changes to the
5721 secondary sites as necessary. It must maintain contact with a majority of the secondary sites to remain the coordinator,
5722 and consults the <SPAN
5728 > file to learn how many peers it has and on which machines
5729 they are running.</P
5731 >If the coordinator loses contact with the majority of its peers, they all cooperate to elect a new coordinator by
5732 majority vote. During the election, all of the Ubik processes consult the <SPAN
5739 to learn where to send their votes, and what number constitutes a majority.</P
5743 >The non-database server processes must know which machines are running the database server processes in order to
5744 retrieve information from the databases. For example, the first time that a user accesses an AFS file, the File Server
5745 that houses it contacts the Protection Server to obtain a list of the user's group memberships (the list is called a
5746 current protection subgroup, or CPS). The File Server uses the CPS as it determines if the access control list (ACL)
5747 protecting the file grants the required permissions to the user (for more details, see <A
5748 HREF="c29323.html#HDRWQ534"
5750 Protection Database</A
5756 >The consequences of missing or incorrect information in the <SPAN
5766 >If the file does not list a machine, then it is effectively not a database server machine even if the database
5767 server processes are running. The Ubik coordinator does not send it database updates or include it in the count that
5768 establishes a majority. It does not participate in Ubik elections, and so refuses to distribute database information to
5769 any client machines that happen to contact it (which they can do if their <SPAN
5773 >/usr/vice/etc/CellServDB</B
5775 > file lists it). Users of the client machine must wait for a timeout before
5776 they can contact a correctly functioning database server machine.</P
5780 >If the file lists a machine that is not running the database server processes, the consequences can be serious. The
5781 Ubik coordinator cannot send it database updates, but includes it in the count that establishes a majority. If valid
5782 secondary sites go down and stop sending their votes to the coordinator, it can wrongly appear that the coordinator no
5783 longer has the majority it needs. The resulting election of a new coordinator causes a service outage during which
5784 information from the database becomes unavailable. Furthermore, the lack of a vote from the incorrectly listed site can
5785 disturb the election, if it makes the other sites believe that a majority of sites are not voting for the new
5788 >A more minor consequence is that non-database server processes attempt to contact the database server processes on
5789 the machine. They experience a timeout delay because the processes are not running.</P
5794 >Note that the <SPAN
5798 >/usr/afs/etc/CellServDB</B
5800 > file on a server machine is not the same as the
5805 >/usr/vice/etc/CellServDB</B
5807 > file on client machine. The client version includes entries for
5808 foreign cells as well as the local cell. However, it is important to update both versions of the file whenever you change your
5809 cell's database server machines. A server machine that is also a client needs to have both files, and you need to update them
5810 both. For more information on maintaining the client version of the <SPAN
5817 HREF="c21473.html#HDRWQ406"
5818 >Maintaining Knowledge of Database Server Machines</A
5826 >Distributing the Server CellServDB File</A
5829 >To avoid the negative consequences of incorrect information in the <SPAN
5833 >/usr/afs/etc/CellServDB</B
5835 > file, you must update it on all of your cell's server machines every time you
5836 add or remove a database server machine. The <SPAN
5840 >IBM AFS Quick Beginnings</I
5842 > provides complete instructions for
5843 installing or removing a database server machine and for updating the <SPAN
5850 context. This section explains how to distribute the file to your server machines and how to make other cells aware of the
5851 changes if you participate in the AFS global name space.</P
5853 >If you use the United States edition of AFS, use the Update Server to distribute the central copy of the server
5860 > file stored on the cell's system control machine. If you use the international
5861 edition of AFS, instead change the file on each server machine individually. For further discussion of the system control
5862 machine and why international cells must not use it for files in the <SPAN
5870 HREF="c3025.html#HDRWQ94"
5871 >The System Control Machine</A
5872 >. For instructions on configuring the Update Server when using
5873 the United States version of AFS, see the <SPAN
5877 >IBM AFS Quick Beginnings</I
5881 >To avoid formatting errors that can cause errors, always use the <SPAN
5894 > commands, rather than editing the file directly. You must also restart the
5895 database server processes running on the machine, to initiate a coordinator election among the new set of database server
5896 machines. This step is included in the instructions that appear in <A
5897 HREF="c3025.html#HDRWQ121"
5898 >To add a database server machine
5899 to the CellServDB file</A
5901 HREF="c3025.html#HDRWQ122"
5902 >To remove a database server machine from the CellServDB
5904 >. For instructions on displaying the contents of the file, see <A
5905 HREF="c3025.html#HDRWQ120"
5906 >To display a cell's
5907 database server machines</A
5910 >If you make your cell accessible to foreign users as part of the AFS global name space, you also need to inform other
5911 cells when you change your cell's database server machines. The AFS Support group maintains a <SPAN
5917 > file that lists all cells that participate in the AFS global name space, and can change your
5918 cell's entry at your request. For further details, see <A
5919 HREF="c667.html#HDRWQ38"
5920 >Making Your Cell Visible to
5924 >Another way to advertise your cell's database server machines is to maintain a copy of the file at the conventional
5925 location in your AFS filespace, <SPAN
5941 >/service/etc/CellServDB.local</B
5943 >. For further discussion, see <A
5944 HREF="c667.html#HDRWQ43"
5955 >To display a cell's database server machines</A
5967 > command. If you have maintained the file properly, the
5968 output is the same on every server machine, but the <SPAN
5974 > argument enables you to check
5975 various machines if you wish. <PRE
5976 CLASS="programlisting"
5994 CLASS="variablelist"
6006 >Is the shortest acceptable abbreviation of <SPAN
6024 >Specifies the server machine from which to display the <SPAN
6028 >/usr/afs/etc/CellServDB</B
6042 >Specifies the complete Internet domain name of a foreign cell. You must already know the name of at least
6043 one server machine in the cell, to provide as the <SPAN
6057 >The output lists the machines in the order they appear in the <SPAN
6064 specified server machine. It assigns each one a <SAMP
6065 CLASS="computeroutput"
6067 > index number, as in the following
6068 example. There is no implied relationship between the index and a machine's IP address, name, or role as Ubik coordinator or
6071 CLASS="programlisting"
6076 >bos listhosts fs1.abc.com</B
6079 Cell name is abc.com
6080 Host 1 is fs1.abc.com
6081 Host 2 is fs7.abc.com
6082 Host 3 is fs4.abc.com
6085 >The output lists machines by name rather than IP address as long as the naming service (such as the Domain Name Service
6086 or local host table) is functioning properly. To display IP addresses, login to a server machine as the local superuser
6093 > and use a text editor or display command, such as the <SPAN
6099 > command, to view the <SPAN
6103 >/usr/afs/etc/CellServDB</B
6113 >To add a database server machine to the CellServDB file</A
6119 >Verify that you are listed in the <SPAN
6123 >/usr/afs/etc/UserList</B
6125 > file. If necessary, issue
6132 > command, which is fully described in <A
6133 HREF="c32432.html#HDRWQ593"
6135 display the users in the UserList file</A
6137 CLASS="programlisting"
6159 > command to add each new database server machine to the
6166 > file. If you use the United States edition of AFS, specify the system control
6173 >. (If you have forgotten which machine is the system control machine, see
6175 HREF="c3025.html#HDRWQ99"
6176 >The Output on the System Control Machine</A
6177 >.) If you use the international edition of AFS,
6178 repeat the command on each or your cell's server machines in turn by substituting its name for <SPAN
6186 CLASS="programlisting"
6204 CLASS="variablelist"
6216 >Is the shortest acceptable abbreviation of <SPAN
6234 >Names the system control machine, if you are using the United States edition of AFS. If you are using the
6235 international edition of AFS, it names each of your server machines in turn.</P
6247 >Specifies the fully qualified hostname of each database server machine to add to the <SPAN
6253 > file (for example: <SPAN
6266 > routine to obtain each machine's IP address and records
6267 both the name and address automatically.</P
6275 >Restart the Authentication Server, Backup Server, Protection Server, and VL Server on every database server machine,
6276 so that the new set of machines participate in the election of a new Ubik coordinator. The instruction uses the
6277 conventional names for the processes; make the appropriate substitution if you use different process names. For complete
6279 HREF="c6449.html#HDRWQ170"
6280 >Stopping and Immediately Restarting Processes</A
6289 > Repeat the following command in quick succession on all of the database
6292 CLASS="programlisting"
6306 >buserver kaserver ptserver vlserver</B
6317 >/usr/vice/etc/CellServDB</B
6319 > file on each of your cell's client machines. For
6320 instructions, see <A
6321 HREF="c21473.html#HDRWQ406"
6322 >Maintaining Knowledge of Database Server Machines</A
6327 >If you participate in the AFS global name space, please have one of your cell's designated site contacts register
6328 the changes you have made with the AFS Product Support group.</P
6330 >If you maintain a central copy of your cell's server <SPAN
6337 conventional location (<SPAN
6353 >/service/etc/CellServDB.local</B
6355 >), edit the file to reflect the change.</P
6365 >To remove a database server machine from the CellServDB file</A
6371 >Verify that you are listed in the <SPAN
6375 >/usr/afs/etc/UserList</B
6377 > file. If necessary, issue
6384 > command, which is fully described in <A
6385 HREF="c32432.html#HDRWQ593"
6387 display the users in the UserList file</A
6389 CLASS="programlisting"
6411 > command to remove each database server machine from the
6418 > file. If you use the United States edition of AFS, specify the system control
6425 >. (If you have forgotten which machine is the system control machine, see
6427 HREF="c3025.html#HDRWQ99"
6428 >The Output on the System Control Machine</A
6429 >.) If you use the international edition of AFS,
6430 repeat the command on each or your cell's server machines in turn by substituting its name for <SPAN
6438 CLASS="programlisting"
6456 CLASS="variablelist"
6468 >Is the shortest acceptable abbreviation of <SPAN
6486 >Names the system control machine, if you are using the United States edition of AFS. If you are using the
6487 international edition of AFS, it names each of your server machines in turn.</P
6499 >Specifies the fully qualified hostname of each database server machine to remove from the <SPAN
6505 > file (for example: <SPAN
6519 >Restart the Authentication Server, Backup Server, Protection Server, and VL Server on every database server machine,
6520 so that the new set of machines participate in the election of a new Ubik coordinator. The instruction uses the
6521 conventional names for the processes; make the appropriate substitution if you use different process names. For complete
6523 HREF="c6449.html#HDRWQ170"
6524 >Stopping and Immediately Restarting Processes</A
6533 > Repeat the following command in quick succession on all of the database
6536 CLASS="programlisting"
6550 >buserver kaserver ptserver vlserver</B
6561 >/usr/vice/etc/CellServDB</B
6563 > file on each of your cell's client machines. For
6564 instructions, see <A
6565 HREF="c21473.html#HDRWQ406"
6566 >Maintaining Knowledge of Database Server Machines</A
6571 >If you participate in the AFS global name space, please have one of your cell's designated site contacts register
6572 the changes you have made with the AFS Product Support group.</P
6574 >If you maintain a central copy of your cell's server <SPAN
6581 conventional location (<SPAN
6597 >/service/etc/CellServDB.local</B
6599 >), edit the file to reflect the change.</P
6610 >Managing Authentication and Authorization Requirements</A
6613 >This section describes how the AFS server processes guarantee that only properly authorized users perform privileged
6614 commands, by checking authorization checking and mutually authenticating with their clients. It explains how you can control
6615 authorization checking requirements on a per-machine or per-cell basis, and how to bypass mutual authentication when issuing
6623 >Authentication versus Authorization</A
6626 >Many AFS commands are <SPAN
6632 > in that the AFS server process invoked by the command performs it
6633 only for a properly authorized user. The server process performs the following two tests to determine if someone is properly
6643 > test, the server process mutually authenticates with the command
6644 interpreter, Cache Manager, or other client process that is acting on behalf of a user or application. The goal of this
6645 test is to determine who is issuing the command. The server process verifies that the issuer really is who he or she
6646 claims to be, by examining the server ticket and other components of the issuer's token. (Secondarily, it allows the
6647 client process to verify that the server process is genuine.) If the issuer has no token or otherwise fails the test,
6648 the server process assigns him or her the identity <SPAN
6654 >, a completely unprivileged
6655 user. For a more complete description of mutual authentication, see <A
6656 HREF="c667.html#HDRWQ75"
6657 >A More Detailed Look at
6658 Mutual Authentication</A
6661 >Many individual commands enable you to bypass the authentication test by assuming the <SPAN
6667 > identity without even attempting to mutually authenticate. Note, however, that this is
6668 futile if the command is privileged and the server process is still performing the <SPAN
6675 test, because in that case the process refuses to execute privileged commands for the <SPAN
6685 >In the authorization test, the server process determines if the issuer is authorized to use the command by
6686 consulting a list of privileged users. The goal of this test is to determine what the issuer is allowed to do. Different
6687 server processes consult different lists of users, as described in <A
6689 >Managing Administrative
6691 >. The server process refuses to execute any privileged command for an unauthorized issuer. If a command
6692 has no privilege requirements, the server process skips this step and executes and immediately.</P
6700 >Never place the <SPAN
6712 > group on a privilege list; it makes authorization checking meaningless.</P
6714 >You can use the <SPAN
6720 > command to control whether the server processes on
6721 a server machine check for authorization. Other server machines are not affected. Keep in mind that turning off
6722 authorization checking is a grave security risk, because the server processes on that machine perform any action for
6736 >Controlling Authorization Checking on a Server Machine</A
6739 >Disabling authorization checking is a serious breach of security because it means that the AFS server processes on a
6740 file server machine performs any action for any user, even the <SPAN
6748 >The only time it is common to disable authorization checking is when installing a new file server machine (see the IBM
6749 AFS Quick Beginnings). It is necessary then because it is not possible to configure all of the necessary security mechanisms
6750 before performing other actions that normally make use of them. For greatest security, work at the console of the machine you
6751 are installing and enable authorization checking as soon as possible.</P
6753 >During normal operation, the only reason to disable authorization checking is if an error occurs with the server
6754 encryption keys, leaving the servers unable to authenticate users properly. For instructions on handling key-related
6756 HREF="c20494.html#HDRWQ370"
6757 >Handling Server Encryption Key Emergencies</A
6760 >You control authorization checking on each file server machine separately; turning it on or off on one machine does not
6761 affect the others. Because client machines generally choose a server process at random, it is hard to predict what
6762 authorization checking conditions prevail for a given command unless you make the requirement the same on all machines. To
6763 turn authorization checking on or off for the entire cell, you must repeat the appropriate command on every file server
6766 >The server processes constantly monitor the directory <SPAN
6773 disks to determine if they need to check for authorization. If the file called <SPAN
6780 in that directory, then the servers do not check for authorization. When it is not present (the usual case), they perform
6781 authorization checking.</P
6783 >Control the presence of the <SPAN
6789 > file through the BOS Server. When you disable
6790 authorization checking with the <SPAN
6796 > command (or, during installation, by putting the
6803 > flag on the command that starts up the BOS Server), the BOS Server creates the
6810 > file. When you reenable authorization checking, the BOS Server removes the
6819 >To disable authorization checking on a server machine</A
6825 >Verify that you are listed in the <SPAN
6829 >/usr/afs/etc/UserList</B
6831 > file. If necessary, issue
6838 > command, which is fully described in <A
6839 HREF="c32432.html#HDRWQ593"
6841 display the users in the UserList file</A
6843 CLASS="programlisting"
6865 > command to disable authorization checking. <PRE
6866 CLASS="programlisting"
6887 CLASS="variablelist"
6899 >Is the shortest acceptable abbreviation of <SPAN
6917 >Specifies the file server machine on which server processes do not check for authorization.</P
6931 >To enable authorization checking on a server machine</A
6937 >Reenable authorization checking. (No privilege is required because the machine is not currently checking for
6938 authorization.) For detailed syntax information, see the preceding section. <PRE
6939 CLASS="programlisting"
6967 >Bypassing Mutual Authentication for an Individual Command</A
6970 >Several of the server processes allow any user (not just system administrators) to disable mutual authentication when
6971 issuing a command. The server process treats the issuer as the unauthenticated user <SPAN
6979 >The facilities for preventing mutual authentication are provided for use in emergencies (such as the key emergency
6981 HREF="c20494.html#HDRWQ370"
6982 >Handling Server Encryption Key Emergencies</A
6983 >). During normal circumstances,
6984 authorization checking is turned on, making it useless to prevent authentication: the server processes refuse to perform
6985 privileged commands for the user <SPAN
6993 >It can be useful to prevent authentication when authorization checking is turned off. The very act of trying to
6994 authenticate can cause problems if the server cannot understand a particular encryption key, as is likely to happen in a key
7003 >To bypass mutual authentication for bos, kas, pts, and vos commands</A
7012 > flag which is available on many of the commands in the suites. To
7013 verify that a command accepts the flag, issue the <SPAN
7019 > command in its suite, or consult the
7020 command's reference page in the <SPAN
7024 >IBM AFS Administration Reference</I
7026 > (the reference page also specifies the
7027 shortest acceptable abbreviation for the flag on each command). The suites' <SPAN
7040 > commands do not themselves accept the flag.</P
7042 >You can bypass mutual authentication for all <SPAN
7048 > commands issued during an interactive
7049 session by including the <SPAN
7062 command. If you have already entered interactive mode with an authenticated identity, issue the <SPAN
7069 > command to assume the <SPAN
7083 >To bypass mutual authentication for fs commands</A
7086 >This is not possible, except by issuing the <SPAN
7092 > command to discard your tokens before
7108 >Adding or Removing Disks and Partitions</A
7111 >AFS makes it very easy to add storage space to your cell, just by adding disks to existing file server machines. This
7112 section explains how to install or remove a disk used to store AFS volumes. (Another way to add storage space is to install
7113 additional server machines, as instructed in the <SPAN
7117 >IBM AFS Quick Beginnings</I
7121 >Both adding and removing a disk cause at least a brief file system outage, because you must restart the <SPAN
7127 > process to have it recognize the new set of server partitions. Some operating systems require that you
7128 shut the machine off before adding or removing a disk, in which case you must shut down all of the AFS server processes first.
7129 Otherwise, the AFS-related aspects of adding or removing a disk are not complicated, so the duration of the outage depends
7130 mostly on how long it takes to install or remove the disk itself.</P
7132 >The following instructions for installing a new disk completely prepare it to house AFS volumes. You can then use the
7139 > command to create new volumes, or the <SPAN
7146 command to move existing ones from other partitions. For instructions, see <A
7147 HREF="c8420.html#HDRWQ185"
7148 >Creating Read/write
7151 HREF="c8420.html#HDRWQ226"
7153 >. The instructions for removing a disk are basically the
7154 reverse of the installation instructions, but include extra steps that protect against data loss.</P
7156 >A server machines can house 256 AFS server partitions, each one mounted at a directory with a name of the form <SPAN
7174 > is one or two lowercase letters. By
7175 convention, the first partition on a machine is mounted at <SPAN
7181 >, the second at <SPAN
7187 >, and so on to the twenty-sixth at <SPAN
7193 >. Additional partitions
7194 are mounted at <SPAN
7213 >. Using the letters consecutively is not required, but is simpler.</P
7221 > directory directly under the local file system's root directory (
7228 > ), not as a subdirectory of any other directory; for example, <SPAN
7234 > is not an acceptable location. You must also map the directory to the partition's device name
7235 in the file server machine's file systems registry file (<SPAN
7241 > or equivalent).</P
7243 >These instructions assume that the machine's AFS initialization file includes the following command to restart the BOS
7244 Server after each reboot. The BOS Server starts the other AFS server processes listed in the local <SPAN
7248 >/usr/afs/local/BosConfig</B
7250 > file. For information on the <SPAN
7257 command's optional arguments, see its reference page in the <SPAN
7261 >IBM AFS Administration Reference</I
7265 CLASS="programlisting"
7266 > /usr/afs/bin/bosserver &
7274 >To add and mount a new disk to house AFS volumes</A
7280 >Become the local superuser <SPAN
7286 > on the machine, if you are not already, by issuing
7294 CLASS="programlisting"
7311 >Decide how many AFS partitions to divide the new disk into and the names of the directories at which to mount them
7312 (the introduction to this section describes the naming conventions). To display the names of the existing server
7313 partitions on the machine, issue the <SPAN
7319 > command. Include the <SPAN
7325 > flag because you are logged in as the local superuser <SPAN
7331 > but do not necessarily have administrative tokens. <PRE
7332 CLASS="programlisting"
7353 CLASS="variablelist"
7365 >Is the shortest acceptable abbreviation of <SPAN
7383 >Names the local file server machine.</P
7395 >Constructs a server ticket using a key from the local <SPAN
7399 >/usr/afs/etc/KeyFile</B
7408 > command interpreter presents it to the BOS Server during mutual
7417 >Create each directory at which to mount a partition. <PRE
7418 CLASS="programlisting"
7440 >Using a text editor, create an entry in the machine's file systems registry file (<SPAN
7446 > or equivalent) for each new disk partition, mapping its device name to the directory you
7447 created in the previous step. Refer to existing entries in the file to learn the proper format, which varies for different
7448 operating systems.</P
7455 >If the operating system requires that you shut off the machine to install a new disk, issue
7462 > command to shut down all AFS server processes other than the BOS Server
7463 (it terminates safely when you shut off the machine). Include the <SPAN
7470 you are logged in as the local superuser <SPAN
7476 > but do not necessarily have administrative
7477 tokens. For a complete description of the command, see <A
7478 HREF="c6449.html#HDRWQ168"
7479 >To stop processes temporarily</A
7482 CLASS="programlisting"
7513 >If necessary, shut off the machine. Install and format the new disk according to the
7514 instructions provided by the disk and operating system vendors. If necessary, edit the disk's partition table to reflect
7515 the changes you made to the files system registry file in step <A
7516 HREF="c3025.html#LIWQ132"
7518 >; consult the operating
7519 system documentation for instructions.</P
7523 >If you shut off the machine down in step <A
7524 HREF="c3025.html#LIWQ134"
7526 >, turn it on. Otherwise, issue the
7533 > command to restart the <SPAN
7540 it to recognize the new set of server partitions. Include the <SPAN
7547 are logged in as the local superuser <SPAN
7553 > but do not necessarily have administrative
7554 tokens. For complete instructions for the <SPAN
7561 HREF="c6449.html#HDRWQ170"
7562 >Stopping and Immediately Restarting Processes</A
7564 CLASS="programlisting"
7592 > command to verify that all server processes are running
7593 correctly. For more detailed instructions, see <A
7594 HREF="c6449.html#HDRWQ158"
7595 >Displaying Process Status and Information from the
7598 CLASS="programlisting"
7620 >To unmount and remove a disk housing AFS volumes</A
7626 >Verify that you are listed in the <SPAN
7630 >/usr/afs/etc/UserList</B
7632 > file. If necessary, issue
7639 > command, which is fully described in <A
7640 HREF="c32432.html#HDRWQ593"
7642 display the users in the UserList file</A
7644 CLASS="programlisting"
7666 > command to list the volumes housed on each partition of each
7667 disk you are about to remove, in preparation for removing them or moving them to other partitions. For detailed
7668 instructions, see <A
7669 HREF="c8420.html#HDRWQ219"
7670 >Displaying Volume Headers</A
7672 CLASS="programlisting"
7684 >partition name</VAR
7691 >Move any volume you wish to retain in the file system to another partition. You can move only read/write volumes.
7692 For more detailed instructions, and for instructions on moving read-only and backup volumes, see <A
7693 HREF="c8420.html#HDRWQ226"
7696 CLASS="programlisting"
7705 >volume name or ID</VAR
7709 >machine name on source</VAR
7712 >partition name on source</VAR
7716 >machine name on destination</VAR
7719 >partition name on destination</VAR
7732 > If there are any volumes you do not wish to retain, back them up using
7739 > command or the AFS Backup System. See <A
7740 HREF="c8420.html#HDRWQ240"
7742 Restoring Volumes</A
7744 HREF="c15383.html#HDRWQ296"
7750 >Become the local superuser <SPAN
7756 > on the machine, if you are not already, by issuing
7764 CLASS="programlisting"
7787 > command, repeating it for each partition on the disk to be
7789 CLASS="programlisting"
7805 >partition_block_device_name</VAR
7815 >Using a text editor, remove or comment out each partition's entry from the machine's file
7816 systems registry file (<SPAN
7822 > or equivalent).</P
7832 > directory associated with each partition. <PRE
7833 CLASS="programlisting"
7846 >If the operating system requires that you shut off the machine to remove a disk, issue the <SPAN
7853 > command to shut down all AFS server processes other than the BOS Server (it terminates safely when you
7854 shut off the machine). Include the <SPAN
7860 > flag because you are logged in as the local
7867 > but do not necessarily have administrative tokens. For a complete
7868 description of the command, see <A
7869 HREF="c6449.html#HDRWQ168"
7870 >To stop processes temporarily</A
7872 CLASS="programlisting"
7903 >If necessary, shut off the machine. Remove the disk according to the instructions provided by
7904 the disk and operating system vendors. If necessary, edit the disk's partition table to reflect the changes you made to
7905 the files system registry file in step <A
7906 HREF="c3025.html#LIWQ136"
7908 >; consult the operating system documentation for
7913 >If you shut off the machine down in step <A
7914 HREF="c3025.html#LIWQ137"
7916 >, turn it on. Otherwise, issue the
7923 > command to restart the <SPAN
7930 it to recognize the new set of server partitions. Include the <SPAN
7937 are logged in as the local superuser <SPAN
7943 > but do not necessarily have administrative
7944 tokens. For complete instructions for the <SPAN
7951 HREF="c6449.html#HDRWQ170"
7952 >Stopping and Immediately Restarting Processes</A
7954 CLASS="programlisting"
7982 > command to verify that all server processes are running
7983 correctly. For more detailed instructions, see <A
7984 HREF="c6449.html#HDRWQ158"
7985 >Displaying Process Status and Information from the
7988 CLASS="programlisting"
8011 >Managing Server IP Addresses and VLDB Server Entries</A
8014 >The AFS support for multihomed file server machines is largely automatic. The File Server process records the IP addresses
8015 of its file server machine's network interfaces in the local <SPAN
8019 >/usr/afs/local/sysid</B
8022 registers them in a <SPAN
8028 > in the Volume Location Database (VLDB). The <SPAN
8034 > file and server entry are identified by the same unique number, which creates an association
8037 >When the Cache Manager requests volume location information, the Volume Location (VL) Server provides all of the
8038 interfaces registered for each server machine that houses the volume. This enables the Cache Manager to make use of multiple
8039 addresses when accessing AFS data stored on a multihomed file server machine.</P
8041 >If you wish, you can control which interfaces the File Server registers in its VLDB server entry by creating two files in
8060 >. Each time the File Server restarts, it builds a list of the local machine's interfaces by
8067 > file, if it exists. If you do not create the file, the File Server uses the
8068 list of network interfaces configured with the operating system. It then removes from the list any addresses that appear in the
8075 > file, if it exists. The File Server records the resulting list in the <SPAN
8081 > file and registers the interfaces in the VLDB server entry that has the same unique
8084 >On database server machines, the <SPAN
8097 files also determine which interfaces the Ubik database synchronization library uses when communicating with the database server
8098 processes running on other database server machines.</P
8100 >There is a maximum number of IP addresses in each server entry, as documented in the <SPAN
8107 >. If a multihomed file server machine has more interfaces than the maximum, AFS simply ignores the excess ones.
8108 It is probably appropriate for such machines to use the <SPAN
8120 > files to control which interfaces are registered.</P
8122 >If for some reason the <SPAN
8128 > file no longer exists, the File Server creates a new one
8129 with a new unique identifier. When the File Server registers the contents of the new file, the Volume Location (VL) Server
8130 normally recognizes automatically that the new file corresponds to an existing server entry, and overwrites the existing server
8131 entry with the new file contents and identifier. However, it is best not to remove the <SPAN
8138 file if that can be avoided.</P
8140 >Similarly, it is important not to copy the <SPAN
8146 > file from one file server machine to
8147 another. If you commonly copy the contents of the <SPAN
8153 > directory from an existing machine
8154 as part of installing a new file server machine, be sure to remove the <SPAN
8167 > directory on the new machine before starting the File Server.</P
8169 >There are certain cases where the VL Server cannot determine whether it is appropriate to overwrite an existing server
8170 entry with a new <SPAN
8176 > file's contents and identifier. It then refuses to allow the File Server
8177 to register the interfaces, which prevents the File Server from starting. This can happen if, for example, a new <SPAN
8183 > file includes two interfaces that currently are registered by themselves in separate server
8184 entries. In such cases, error messages in the <SPAN
8188 >/usr/afs/log/VLLog</B
8190 > file on the VL Server machine
8195 >/usr/afs/log/FileLog</B
8197 > file on the file server machine indicate that you need to use
8204 > command to resolve the problem. Contact the AFS Product Support group for
8205 instructions and assistance.</P
8207 >Except in this type of rare error case, the only appropriate use of the <SPAN
8214 command is to remove a VLDB server entry completely when you remove a file server machine from service. The VLDB can accommodate
8215 a maximum number of server entries, as specified in the <SPAN
8219 >IBM AFS Release Notes</I
8221 >. Removing obsolete entries
8222 makes it possible to allocate server entries for new file server machines as required. See the instructions that follow.</P
8224 >Do not use the <SPAN
8230 > command to change the list of interfaces registered in a
8231 VLDB server entry. To change a file server machine's IP addresses and server entry, see the instructions that follow.</P
8238 >To create or edit the server NetInfo file</A
8244 >Become the local superuser <SPAN
8250 > on the machine, if you are not already, by issuing
8258 CLASS="programlisting"
8275 >Using a text editor, open the <SPAN
8279 >/usr/afs/local/NetInfo</B
8281 > file. Place one IP address in
8282 dotted decimal format (for example, <SAMP
8283 CLASS="computeroutput"
8284 >192.12.107.33</SAMP
8285 >) on each line. The order of entries is
8290 >If you want the File Server to start using the revised list immediately, use the <SPAN
8297 > command to restart the <SPAN
8303 > process. For instructions, see <A
8304 HREF="c6449.html#HDRWQ170"
8305 >Stopping and Immediately Restarting Processes</A
8316 >To create or edit the server NetRestrict file</A
8322 >Become the local superuser <SPAN
8328 > on the machine, if you are not already, by issuing
8336 CLASS="programlisting"
8353 >Using a text editor, open the <SPAN
8357 >/usr/afs/local/NetRestrict</B
8359 > file. Place one IP address
8360 in dotted decimal format on each line. The order of the addresses is not significant. Use the value <SPAN
8366 > as a wildcard that represents all possible addresses in that field. For example, the entry
8368 CLASS="computeroutput"
8369 >192.12.105.255</SAMP
8370 > indicates that the Cache Manager does not register any of the addresses in
8371 the 192.12.105 subnet.</P
8375 >If you want the File Server to start using the revised list immediately, use the <SPAN
8382 > command to restart the <SPAN
8388 > process. For instructions, see <A
8389 HREF="c6449.html#HDRWQ170"
8390 >Stopping and Immediately Restarting Processes</A
8401 >To display all server entries from the VLDB</A
8413 > command to display all server entries from the VLDB.
8415 CLASS="programlisting"
8432 > is the shortest acceptable abbreviation of <SPAN
8440 >The output displays all server entries from the VLDB, each on its own line. If a file server machine is multihomed,
8441 all of its registered addresses appear on the line. The first one is the one reported as a volume's site in the output
8456 >VLDB server entries record IP addresses, and the command interpreter has the local name service (either a process
8457 like the Domain Name Service or a local host table) translate them to hostnames before displaying them. If an IP address
8458 appears in the output, it is not possible to translate it.</P
8460 >The existence of an entry does not necessarily indicate that the machine that is still an active file server
8461 machine. To remove obsolete server entries, see the following instructions.</P
8471 >To remove obsolete server entries from the VLDB</A
8477 >Verify that you are listed in the <SPAN
8481 >/usr/afs/etc/UserList</B
8483 > file. If necessary, issue
8490 > command, which is fully described in <A
8491 HREF="c32432.html#HDRWQ593"
8493 display the users in the UserList file</A
8495 CLASS="programlisting"
8517 > command to remove a server entry from the VLDB.
8519 CLASS="programlisting"
8528 >original IP address</VAR
8540 CLASS="variablelist"
8552 >Is the shortest acceptable abbreviation of <SPAN
8565 >original IP address</B
8570 >Specifies one of the IP addresses currently registered for the file server machine in the VLDB. Any of a
8571 multihomed file server machine's addresses are acceptable to identify it.</P
8583 >Removes the server entry.</P
8597 >To change a server machine's IP addresses</A
8603 >Verify that you are listed in the <SPAN
8607 >/usr/afs/etc/UserList</B
8609 > file. If necessary, issue
8616 > command, which is fully described in <A
8617 HREF="c32432.html#HDRWQ593"
8619 display the users in the UserList file</A
8621 CLASS="programlisting"
8637 >If the machine is the system control machine or a binary distribution machine, and you are also changing its
8638 hostname, redefine all relevant <SPAN
8644 > processes on other server machines to refer to
8645 the new hostname. Use the <SPAN
8658 commands as instructed in <A
8659 HREF="c6449.html#HDRWQ161"
8660 >Creating and Removing Processes</A
8665 >If the machine is a database server machine, edit its entry in the <SPAN
8669 >/usr/afs/etc/CellServDB</B
8671 > file on every server machine in the cell to list one of the new IP
8672 addresses. If you use the United States edition of AFS, you can edit the file on the system control machine and wait the
8673 required time (by default, five minutes) for the Update Server to distribute the changed file to all server
8678 >If the machine is a database server machine, issue the <SPAN
8685 all server processes. If the machine is also a file server, the volumes on it are inaccessible during this time. For a
8686 complete description of the command, see <A
8687 HREF="c6449.html#HDRWQ168"
8688 >To stop processes temporarily</A
8690 CLASS="programlisting"
8706 >Use the utilities provided with the operating system to change one or more of the machine's IP addresses.</P
8710 >If appropriate, edit the <SPAN
8714 >/usr/afs/local/NetInfo</B
8720 >/usr/afs/local/NetRestrict</B
8722 > file, or both, to reflect the changed addresses. Instructions appear
8723 earlier in this section.</P
8727 >If the machine is a database server machine, issue the <SPAN
8734 restart all server processes on the machine. For complete instructions for the <SPAN
8742 HREF="c6449.html#HDRWQ170"
8743 >Stopping and Immediately Restarting Processes</A
8746 CLASS="programlisting"
8766 >At the same time, issue the <SPAN
8772 > command on all other database server
8773 machines in the cell to restart the database server processes only (the Authentication, Backup, Protection, and Volume
8774 Location Servers). Issue the commands in quick succession so that all of the database server processes vote in the quorum
8777 CLASS="programlisting"
8791 >kaserver buserver ptserver vlserver</B
8796 >If you are changing IP addresses on every database server machine in the cell, you must also issue the <SPAN
8802 > command on every file server machine in the cell to restart the <SPAN
8812 >If the machine is not a database server machine, issue the <SPAN
8825 > process (if the machine is a database server, you already restarted the
8826 process in the previous step). The File Server automatically compiles a new list of interfaces, records them in the
8831 >/usr/afs/local/sysid</B
8833 > file, and registers them in its VLDB server entry. <PRE
8834 CLASS="programlisting"
8856 >If the machine is a database server machine, edit its entry in the <SPAN
8860 >/usr/vice/etc/CellServDB</B
8862 > file on every client machine in the cell to list one of the new IP
8863 addresses. Instructions appear in <A
8864 HREF="c21473.html#HDRWQ406"
8865 >Maintaining Knowledge of Database Server
8871 >If there are machine entries in the Protection Database for the machine's previous IP addresses, use the <SPAN
8877 > command to change them to the new addresses. For instructions, see <A
8878 HREF="c29323.html#HDRWQ556"
8879 >Changing a Protection Database Entry's Name</A
8891 >Rebooting a Server Machine</A
8894 >You can reboot a server machine either by typing the appropriate commands at its console or by issuing the <SPAN
8900 > command on a remote machine. Remote rebooting can be more convenient, because you do not need to
8901 leave your present location, but you cannot track the progress of the reboot as you can at the console. Remote rebooting is
8902 possible because the server machine's operating system recognizes the BOS Server, which executes the <SPAN
8909 > command, as the local superuser <SPAN
8917 >Rebooting server machines is part of routine maintenance in some cells, and some instructions in the AFS documentation
8918 include it as a step. It is certainly not intended to be the standard method for recovering from AFS-related problems, however,
8919 but only a last resort when the machine is unresponsive and you have tried all other reasonable options.</P
8921 >Rebooting causes a service outage. If the machine stores volumes, they are all inaccessible until the reboot completes and
8922 the File Server reattaches them. If the machine is a database server machine, information from the databases can become
8923 unavailable during the reelection of the synchronization site for each database server process; the VL Server outage generally
8924 has the greatest impact, because the Cache Manager must be able to access the VLDB to fetch AFS data.</P
8926 >By convention, a server machine's AFS initialization file includes the following command to restart the BOS Server after
8927 each reboot. It starts the other AFS server processes listed in the local <SPAN
8931 >/usr/afs/local/BosConfig</B
8933 > file. These instructions assume that the initialization file includes the
8936 CLASS="programlisting"
8937 > /usr/afs/bin/bosserver &
8945 >To reboot a file server machine from its console</A
8951 >Become the local superuser <SPAN
8957 > on the machine, if you are not already, by issuing
8965 CLASS="programlisting"
8988 > command to shut down all AFS server processes other than the
8989 BOS Server, which terminates safely when you reboot the machine. Include the <SPAN
8996 flag because you are logged in as the local superuser <SPAN
9002 > but do not necessarily have
9003 administrative tokens. For a complete description of the command, see <A
9004 HREF="c6449.html#HDRWQ168"
9008 CLASS="programlisting"
9036 >Reboot the machine. On many system types, the appropriate command is <SPAN
9043 the appropriate options vary; consult your UNIX administrator's guide. <PRE
9044 CLASS="programlisting"
9063 >To reboot a file server machine remotely</A
9069 >Verify that you are listed in the <SPAN
9073 >/usr/afs/etc/UserList</B
9075 > file on the machine you are
9076 rebooting. If necessary, issue the <SPAN
9082 > command, which is fully described in
9084 HREF="c32432.html#HDRWQ593"
9085 >To display the users in the UserList file</A
9087 CLASS="programlisting"
9109 > to halt AFS server processes other than the BOS Server,
9110 which terminates safely when you turn off the machine. For a complete description of the command, see <A
9111 HREF="c6449.html#HDRWQ168"
9112 >To stop processes temporarily</A
9114 CLASS="programlisting"
9142 > command to reboot the machine remotely. <PRE
9143 CLASS="programlisting"
9153 >> reboot_command
9158 CLASS="variablelist"
9170 >Names the file server machine to reboot.</P
9182 >Is the rebooting command for the machine's operating system. The <SPAN
9189 command is appropriate on many system types, but consult your operating system documentation.</P
9204 SUMMARY="Footer navigation table"
9243 >Managing File Server Machines</TD
9257 >Monitoring and Controlling Server Processes</TD