X-Git-Url: http://git.openafs.org/?p=openafs-wiki.git;a=blobdiff_plain;f=DemandAttach.mdwn;h=4215521dff5a26c7e18736aba35c3386e3f21206;hp=ac9c7e4cb8a6cbf2c4955835a6eb778a0da10144;hb=f8ba0a6e1f0f7219831960a0746aaa3b6906783d;hpb=bc411d59e4c4872b9d4a97b3cfe7ccccbfd0c7c6 diff --git a/DemandAttach.mdwn b/DemandAttach.mdwn index ac9c7e4..4215521 100644 --- a/DemandAttach.mdwn +++ b/DemandAttach.mdwn @@ -1,37 +1,10 @@ -## Demand-Attach File-Server (DAFS) +[[!toc levels=3]] + +# Demand-Attach File-Server (DAFS) + OpenAFS 1.5 contains Demand-Attach File-Server (DAFS). DAFS is a significant departure from the more _traditional_ AFS file-server and this document details those changes. -
- -
- -## Why Demand-Attach File-Server (DAFS) ? +## Why Demand-Attach File-Server (DAFS) ? On a traditional file-server, volumes are attached at start-up and detached only at shutdown. Any attached volume can be modified and changes are periodically flushed to disk or on shutdown. When a file-server isn't shutdown cleanly, the integrity of every attached volume has to be verified by the salvager, whether the volume had been modified or not. As file-servers grow larger (and the number of volumes increase), the length of time required to salvage and attach volumes increases, e.g. it takes around two hours for a file-server housing 512GB data to salvage and attach volumes ! @@ -41,7 +14,7 @@ The primary objective of the demand-attach file-server was to dramatically reduc Large portions of this document were taken / influenced by the presentation entitled [Demand Attach / Fast-Restart Fileserver](http://workshop.openafs.org/afsbpw06/talks/tkeiser-dafs.pdf) given by Tom Keiser at the [AFS and Kerberos Best Practices Workshop](http://workshop.openafs.org/) in [2006](http://workshop.openafs.org/afsbpw06/). -## An Overview of Demand-Attach File-Server +## An Overview of Demand-Attach File-Server Demand-attach necessitated a significant re-design of certain aspects of the AFS code, including: @@ -55,40 +28,42 @@ Demand-attach necessitated a significant re-design of certain aspects of the AFS The changes implemented for demand-attach include: -- [[volume finite-state automata|AFSLore/DemandAttach#Volume_Finite_State_Automata]] +- [[volume finite-state automata|DemandAttach#Volume_Finite_State_Automata]] - volumes are attached on demand - volume _garbage collector_ to detach unused volumes - notion of volume state means read-only volumes aren't salvaged -- [[vnode finite-state automata|AFSLore/DemandAttach#Vnode_Finite_State_Automata]] +- [[vnode finite-state automata|DemandAttach#Vnode_Finite_State_Automata]] - global lock is only held when required and never held across high-latency operations - automatic salvaging of volumes - shutdown is done in parallel (maximum number of threads utilized) - callbacks are no longer broken on shutdown - instead, host / callback state is preserved across restarts -## The Gory Details of the Demand-Attach File-Server +## The Gory Details of the Demand-Attach File-Server -### Bos Configuration +### Bos Configuration A traditional file-server uses the `bnode` type `fs` and has a definition similar to bnode fs fs 1 - parm /usr/afs/bin/fileserver -p 123 -pctspare 20 -L -busyat 200 -rxpck 2000 -rxbind - parm /usr/afs/bin/volserver -p 127 -log -rxbind + parm /usr/afs/bin/fileserver -p 123 -L -busyat 200 -rxpck 2000 -cb 4000000 + parm /usr/afs/bin/volserver -p 127 -log parm /usr/afs/bin/salvager -parallel all32 end Since an additional component was required for the demand-attach file-server, a new `bnode` type ( `dafs`) is required. The definition should be similar to bnode dafs dafs 1 - parm /usr/afs/bin/fileserver -p 123 -pctspare 20 -L -busyat 50 -rxpck 2000 -rxbind -cb 4000000 -vattachpar 128 -vlruthresh 1440 -vlrumax 8 -vhashsize 11 - parm /usr/afs/bin/volserver -p 64 -log -rxbind + parm /usr/afs/bin/dafileserver -p 123 -L -busyat 200 -rxpck 2000 -cb 4000000 -vattachpar 128 -vlruthresh 1440 -vlrumax 8 -vhashsize 11 + parm /usr/afs/bin/davolserver -p 64 -log parm /usr/afs/bin/salvageserver - parm /usr/afs/bin/salvager -parallel all32 + parm /usr/afs/bin/dasalvager -parallel all32 end -The instance for a demand-attach file-server is therefore `dafs` instead of `fs`. +The instance for a demand-attach file-server is therefore `dafs` +instead of `fs`. For a complete list of configuration options see the +[dafileserver man page](http://docs.openafs.org/Reference/8/dafileserver.html). -### File-server Start-up / Shutdown Sequence +### File-server Start-up / Shutdown Sequence The table below compares the start-up sequence for a traditional file-server and a demand-attach file-server. @@ -99,27 +74,27 @@ The table below compares the start-up sequence for a traditional file-server and   - %BULLET% host / callback state restored + host / callback state restored   - %BULLET% host / callback state consistency verified + host / callback state consistency verified - %BULLET% build vice partition list - %BULLET% build vice partition list + build vice partition list + build vice partition list - %BULLET% volumes are attached - %BULLET% volume headers read + volumes are attached + volume headers read   - %BULLET% volumes placed into pre-attached state + volumes placed into pre-attached state -The [[host / callback state|AFSLore/DemandAttach#FSStateDat]] is covered later. The _pre-attached_ state indicates that the file-server has read the volume headers and is aware that the volume exists, but that it has not been attached (and hence is not on-line). +The [[host / callback state|DemandAttach#FSStateDat]] is covered later. The _pre-attached_ state indicates that the file-server has read the volume headers and is aware that the volume exists, but that it has not been attached (and hence is not on-line). The shutdown sequence for both file-server types is: @@ -129,32 +104,32 @@ The shutdown sequence for both file-server types is: Demand-Attach - %BULLET% break callbacks - %BULLET% quiesce host / callback state + break callbacks + quiesce host / callback state - %BULLET% shutdown volumes - %BULLET% shutdown on-line volumes + shutdown volumes + shutdown on-line volumes   - %BULLET% verify host / callback state consistency + verify host / callback state consistency   - %BULLET% save host / callback state + save host / callback state On a traditional file-server, volumes are off-lined (detached) serially. In demand-attach, as many threads as possible are used to detach volumes, which is possible due to the notion of a volume has an associated state. -### Volume Finite-State Automata +### Volume Finite-State Automata + +The volume finite-state automata is available in the source tree under `doc/arch/dafs-fsa.dot`. See [[=fssync-debug=|DemandAttach#fssync_debug]] for information on debugging the volume package. -The volume finite-state automata is available in the source tree under `doc/arch/dafs-fsa.dot`. See [[=fssync-debug=|AFSLore/DemandAttach#fssync_debug]] for information on debugging the volume package. - -### Volume Least Recently Used (VLRU) Queues +### Volume Least Recently Used (VLRU) Queues The Volume Least Recently Used (VLRU) is a garbage collection facility which automatically off-lines volumes in the background. The purpose of this facility is to pro-actively off-line infrequently used volumes to improve shutdown and salvage times. The process of off-lining a volume from the "attached" state to the "pre-attached" state is called soft detachment. @@ -175,21 +150,21 @@ VLRU works in a manner similar to a generational garbage collector. There are fi intermediate (mid) - Volumes transitioning from new -> old (see [[AFSLore.DemandAttach#VLRUStateTransitions][state transitions] for details). + Volumes transitioning from new -> old (see [[DemandAttach#VLRUStateTransitions][state transitions] for details). new - Volumes which have been accessed. See [[AFSLore.DemandAttach#VLRUStateTransitions][state transitions] for details. + Volumes which have been accessed. See [[DemandAttach#VLRUStateTransitions][state transitions] for details. old - Volumes which are continually accessed. See [[AFSLore.DemandAttach.#VLRUStateTransitions][state transitions] for details. + Volumes which are continually accessed. See [[DemandAttach.#VLRUStateTransitions][state transitions] for details. The state of the various VLRU queues is dumped with the file-server state and at shutdown. - The VLRU queues new, mid (intermediate) and old are generational queues for active volumes. State transitions are controlled by inactivity timers and are + The VLRU queues new, mid (intermediate) and old are generational queues for active volumes. State transitions are controlled by inactivity timers and are @@ -238,38 +213,38 @@ The state of the various VLRU queues is dumped with the file-server state and at `vlruthresh` has been optimized for RO file-servers, where volumes are frequently accessed once a day and soft-detaching has little effect (RO volumes are not salvaged; one of the main reasons for soft detaching). -### Vnode Finite-State Automata +### Vnode Finite-State Automata The vnode finite-state automata is available in the source tree under `doc/arch/dafs-vnode-fsa.dot` -`/usr/afs/bin/fssync-debug` provides low-level inspection and control of the file-server volume package. \*Indiscriminate use of **fsync-debug** can lead to extremely bad things occurring. Use with care. %ENDCOLOR% +`/usr/afs/bin/fssync-debug` provides low-level inspection and control of the file-server volume package. **Indiscriminate use of fsync-debug** can lead to extremely bad things occurring. Use with care. - -### Demand Salvaging + +### Demand Salvaging Demand salvaging is implemented by the `salvageserver`. The actual code for salvaging a volume remains largely unchanged. However, the method for invoking salvaging with demand-attach has changed: - file-server automatically requests volumes be salvaged as required, i.e. they are marked as requiring salvaging when attached. - manual initiation of salvaging may be required when access is through the `volserver` (may be addressed at some later date). -- `bos salvage` requires the `-forceDAFS` flag to initiate salvaging wit DAFS. However, %RED% **salvaging should not be initiated using this method**.%ENDCOLOR% +- `bos salvage` requires the `-forceDAFS` flag to initiate salvaging with DAFS. However, **salvaging should not be initiated using this method**. - infinite salvage, attach, salvage, ... loops are possible. There is therefore a hard-limit on the number of times a volume will be salvaged which is reset when the volume is removed or the file-server is restarted. - volumes are salvaged in parallel and is controlled by the `-Parallel` argument to the `salvageserver`. Defaults to 4. - the `salvageserver` and the `inode` file-server are incompatible: - because volumes are inter-mingled on a partition (rather than being separated), a lock for the entire partition on which the volume is located is held throughout. Both the `fileserver` and `volserver` will block if they require this lock, e.g. to restore / dump a volume located on the partition. - inodes for a particular volume can be located anywhere on a partition. Salvaging therefore results in **every** inode on a partition having to be read to determine whether it belongs to the volume. This is extremely I/O intensive and leads to horrendous salvaging performance. -- `/usr/afs/bin/salvsync-debug` provides low-level inspection and control over the `salvageserver`. %RED% **Indiscriminate use of `salvsync-debug` can lead to extremely bad things occurring. Use with care.** %ENDCOLOR% -- See [[=salvsync-debug=|AFSLore/DemandAttach#salvsync_debug]] for information on debugging problems with the salvageserver. +- `/usr/afs/bin/salvsync-debug` provides low-level inspection and control over the `salvageserver`. **Indiscriminate use of `salvsync-debug` can lead to extremely bad things occurring. Use with care.** +- See [[=salvsync-debug=|DemandAttach#salvsync_debug]] for information on debugging problems with the salvageserver. + - -### File-Server Host / Callback State +### File-Server Host / Callback State Host / callback information is persistent across restarts with demand-attach. On shutdown, the file-server writes the data to `/usr/afs/local/fsstate.dat`. The contents of this file are read and verified at start-up and hence it is unnecessary to break callbacks on shutdown with demand-attach. The contents of `fsstate.dat` can be inspected using `/usr/afs/bin/state_analyzer`. -## File-Server Arguments (relating to Demand-Attach) +## File-Server Arguments (relating to Demand-Attach) These are available in the man-pages (section 8) for the fileserver; some details are provided here for convenience: @@ -299,14 +274,14 @@ Arguments controlling the host / callback state: - +
fs-state-verify n/a both - Controls the behavior of the state verification mechanism. Before saving or restoring the fileserver state information, the internal host and callback data structures are verified. A value of 'none' turns off all verification. A value of 'save' only performs the verification steps prior to saving state to disk. A value of 'restore' only performs the verification steps after restoring state from disk. A value of 'both' performs all verification steps both prior to saving and after restoring state.
-Arguments controlling the [[VLRU:|Main/WebHome#VolumeLeastRecentlyUsed]] +Arguments controlling the VLRU @@ -360,13 +335,13 @@ Arguments controlling the [[VLRU:|Main/WebHome#VolumeLeastRecentlyUsed]]
-## Tools for Debugging Demand-Attach File-Server +## Tools for Debugging Demand-Attach File-Server Several tools aid debugging problems with demand-attach file-servers. They operate at an extremely low-level and hence require a detailed knowledge of the architecture / code. -### **fssync-debug** +### **fssync-debug** -%RED% **Indiscriminate use of `fssync-debug` can have extremely dire consequences. Use with care** %ENDCOLOR% +**Indiscriminate use of `fssync-debug` can have extremely dire consequences. Use with care.** `fssync-debug` provides low-level inspection and control over the volume package of the file-server. It can be used to display the file-server information associated with a volume, e.g. @@ -477,15 +452,15 @@ Note that the `volumeid` argument must be the numeric ID and the `partition` arg - `VOL_IN_HASH` indicates that the volume has been added to the volume linked-list - `VOL_ON_VBYP_LIST` indicates that the volume is linked off the partition list - `VOL_ON_VLRU` means the volume is on a VLRU queue -- the `salvage` structure (detailed [[here|AFSLore/DemandAttach#salvsync_debug]]) +- the `salvage` structure (detailed [[here|DemandAttach#salvsync_debug]]) - the `stats` structure, particularly the volume operation times ( `last_*`). - the `vlru` structure, particularly the VLRU queue An understanding of the [volume finite-state machine](http://www.dementia.org/twiki//view/dafs-fsa.png) is required before the state of a volume should be manipulated. -### **salvsync-debug** +### **salvsync-debug** -%RED% **Indiscriminate use of `salvsync-debug` can have extremely dire consequences. Use with care** %ENDCOLOR% +**Indiscriminate use of `salvsync-debug` can have extremely dire consequences. Use with care** `salvsync-debug` provides low-level inspection and control of the salvageserver process, including the scheduling order of volumes. @@ -529,17 +504,17 @@ To initiate the salvaging of a volume This is the method that should be used on demand-attach file-servers to initiate the manual salvage of volumes. It should be used with care. -Under normal circumstances, the priority ( `prio`) of a salvage request is the number of times the volume has been requested by clients. %RED% Modifying the priority (and hence the order volumes are salvaged) under heavy demand-salvaging usually leads to extremely bad things happening. %ENDCOLOR% To modify the priority of a request, use +Under normal circumstances, the priority ( `prio`) of a salvage request is the number of times the volume has been requested by clients. Modifying the priority (and hence the order volumes are salvaged) under heavy demand-salvaging usually leads to extremely bad things happening. To modify the priority of a request, use salvsync-debug priority -vol 537119916 -part /vicepb -priority 999999 (where `priority` is a 32-bit integer). -### **state\_analyzer** +### **state\_analyzer** `state_analyzer` allows the contents of the host / callback state file ( `/usr/afs/local/fsstate.dat`) to be inspected. -#### Header Information +#### Header Information Header information is gleaned through the `hdr` command @@ -568,7 +543,7 @@ Header information is gleaned through the `hdr` command } fs state analyzer> -#### Host Information +#### Host Information Host information can be gleaned through the `h` command, e.g. @@ -653,7 +628,7 @@ Host information can be gleaned through the `h` command, e.g. } fs state analyzer: h(1)> -#### Callback Information +#### Callback Information Callback information is available through the `cb` command, e.g.