+++ /dev/null
-# CERN backup setup ca. 2011
-
-## Base numbers
-
- * ~55 file servers
- * 850 million files (50% yearly growth)
- * total backup size on tapes: ~ 650TB
- * total number of objects on tapes: ~5 million
- * daily transfer to tape: 1-2.5 TB
-
-## Requirements
- * backup dumps: daily
- * backup retention policy: 6 months
- * restore directly done by the users (that is w/o admin intervention)
- * same backup policy for all volumes: all volumes are equal and no volumes are more equal than others
-
-## Current organization
-
- * components:
- * frontend: AFS "native" backup system (butc, backup commands)
- * tape backend: TSM backup service and tape manager
- * granularity:
- * full backup cycle to tape ~ every 30-50 days
- * 1 or 2 differential backups to tape per full backup cycle (differential = "diff to the last full")
- * daily incremental backups to tape ("diff to the last incremental")
- * "yesterday" also available as “BK” clone on disk
- * for home directories these clones are already mounted and directly accessible to users
- * servers are backed-up in parallel by a cronjob every evening (i.e. every 24 hours)
- * partitions on a server are backup-up in sequence
- * load-balancing the data flow to the backup service:
- * avoid all servers doing full backups at the same time
-
-## Problems/Issues
- * scalability: need 24 hour window to complete a dump of all partitions on a server (may be a problem for increasingly larger servers)
- * more parallelism in dumping the volumes?
- * using more features of the TSM (e.g. more accounts) may be required in the future
- * maintenance:
- * current administration layer for backup dump/restore is overly complex and drags a lot of historical/obsolete code
-
-## Detailed details
-
- * AFS backup system frontend
- * backup volsets: one volset per partition
- * database model: a volset dump contains a list of volume dumps
- * dump hierarchy:
- * simply defined as a chain: full-i1-...-i52 (no differentials)
- * differential backups are determined based on additional state information stored locally via perl Catalog module
- * once a differential is created, the backup database keeps track of the parentship of subsequent incrementals
- * several layers of wrappers perl/ksh/arc/... to manage dumps and restores
- * butc wrapped by expect script and fired on demand (no permanent butc processes), requires locking
- * end-user backup restore integrated into the afs_admin interface
- * restore done in a “synchronous” mode: an authenticated RPC to a restore server (one of afs file server nodes) is made using "arc"
- * the RPC connection is kept alive until user leaves a temporary shell which is positioned where recovered volumes are temporarily mounted
- * TSM backend
-
- * TSM server (one TSM account at the moment, using "archive" mode)
- * butc linked against XBSA/TSM library
- * each volume dump is stored as a separate object in TSM with the volset dump id prefix ("/{volsetdumpid}/{volname}.backup")