B<-offline-timeout>. Otherwise, the default value is C<-1>.
For the LWP fileserver, the only valid value for this option is C<-1>.
+
+=item B<-sync> <always | onclose | none>
+
+This option changes how hard the fileserver tries to ensure that data written
+to volumes actually hits the physical disk.
+
+Normally, when the fileserver writes to disk, the underlying filesystem or
+Operating System may delay writes from actually going to disk, and reorder
+which writes hit the disk first. So, during an unclean shutdown of the machine
+(if the power goes out, or the machine crashes, etc), file data may become lost
+that the server previously told clients was already successfully written.
+
+To try to mitigate this, the fileserver will try to "sync" file data to the
+physical disk at numerous points during various I/O. However, this can result
+in significantly reduced performance. Depending on the usage patterns, this may
+or may not be acceptable. This option dictates specifically what the fileserver
+does when it wants to perform a "sync".
+
+There are several options; pass one of these as the argument to -sync. The
+default is C<onclose>.
+
+=over 4
+
+=item always
+
+This causes a sync operation to always sync immediately and synchronously.
+This is the slowest option that provides the greatest protection against data
+loss in the event of a crash.
+
+Note that this is still not a 100% guarantee that data will not be lost or
+corrupted during a crash. The underlying filesystem itself may cause data to
+be lost or corrupt in such a situation. And OpenAFS itself does not (yet) even
+guarantee that all data is consistent at any point in time; so even if the
+filesystem and OS do not buffer or reorder any writes, you are not guaranteed
+that all data will be okay after a crash.
+
+This was the only behavior allowed in OpenAFS releases prior to 1.4.5.
+
+=item onclose
+
+This causes a sync to do nothing immediately, but causes the relevant file to
+be flagged as potentially needing a sync. When a volume is detached, volume
+metadata files flaged for synced are synced, as well as data files that have
+been accessed recently. Events that cause a volume to detach include:
+performing volume operations (dump, restore, clone, etc), a clean shutdown
+of the fileserver, or during DAFS "soft detachment".
+
+Effectively this option is the same as C<never> while a volume is attached and
+actively being used, but if a volume is detached, there is an additional
+guarantee for the data's consistency.
+
+After the removal of the C<delayed> option after the OpenAFS 1.6 series, this
+option became the default.
+
+=item never
+
+This causes all syncs to never do anything. This is the fastest option, with
+the weakest guarantees for data consistency.
+
+Depending on the underlying filesystem and Operating System, there may be
+guarantees that any data written to disk will hit the physical media after a
+certain amount of time. For example, Linux's pdflush process usually makes this
+guarantee, and ext3 can make certain various consistency guarantees according
+to the options given. ZFS on Solaris can also provide similar guarantees, as
+can various other platforms and filesystems. Consult the documentation for
+your platform if you are unsure.
+
+=item delayed
+
+This option used to exist in OpenAFS 1.6, but was later removed due to issues
+encountered with data corruption during normal operation. Outside of the
+OpenAFS 1.6 series, it is not a valid option, and the fileserver will fail to
+start if you specify this (or any other unknown option). It caused syncs to
+occur in a background thread, executing every 10 seconds.
+
+This was the only behavior allowed in OpenAFS releases starting from 1.4.5 up
+to and including 1.6.2. It was also the default for the 1.6 series starting in
+OpenAFS 1.6.3.
+
+=back
+
+Which option you choose is not an easy decision to make. Various developers
+and experts sometimes disagree on which option is the most reasonable, and it
+may depend on the specific scenario and workload involved. Some argue that
+the C<always> option does not provide significantly greater guarantees over
+any other option, whereas others argue that choosing anything besides the
+C<always> option allows for an unacceptable risk of data loss. This may
+depend on your usage patterns, your platform and filesystem, and who you talk
+to about this topic.
S<<< [B<-lock>] >>>
S<<< [B<-offline-timeout> <I<timeout in seconds>>] >>>
S<<< [B<-offline-shutdown-timeout> <I<timeout in seconds>>] >>>
+ S<<< [B<-sync> <I<sync behavior>>] >>>
By default, volume access statistics are reset during volume restore and reclone
operations.
+=item B<-sync> <I<sync behavior>>
+
+This is the same as the B<-sync> option in L<fileserver(8)>. See
+L<fileserver(8)>.
+
=item B<-help>
Prints the online help for this command. All other valid options are
[B<-nojumbo>] [B<-jumbo>]
[B<-enable_peer_stats>] [B<-enable_process_stats>]
[B<-allow-dotted-principals>] [B<-preserve-vol-stats>] [B<-help>]
+ [B<-sync> <I<sync behavior>>]
OPT_rxmaxmtu,
OPT_udpsize,
OPT_dotted,
- OPT_realm
+ OPT_realm,
+ OPT_sync
};
static int
int lwps_max;
char *auditFileName = NULL;
+ char *sync_behavior = NULL;
#if defined(AFS_AIX32_ENV)
extern int aixlow_water;
"permit Kerberos 5 principals with dots");
cmd_AddParmAtOffset(opts, OPT_realm, "-realm",
CMD_LIST, CMD_OPTIONAL, "local realm");
+ cmd_AddParmAtOffset(opts, OPT_sync, "-sync",
+ CMD_SINGLE, CMD_OPTIONAL, "always | onclose | never");
code = cmd_Parse(argc, argv, &opts);
if (code)
&vol_io_params.fd_max_cachesize);
cmd_OptionAsUint(opts, OPT_vhandle_initial_cachesize,
&vol_io_params.fd_initial_cachesize);
+ if (cmd_OptionAsString(opts, OPT_sync, &sync_behavior) == 0) {
+ if (ih_SetSyncBehavior(sync_behavior)) {
+ printf("Invalid -sync value %s\n", sync_behavior);
+ return -1;
+ }
+ }
#ifdef AFS_DEMAND_ATTACH_FS
if (cmd_OptionPresent(opts, OPT_fs_state_dont_save))
/* fd cache size that will be used if/when ih_UseLargeCache()
* is called */
vol_io_params.fd_max_cachesize = FD_MAX_CACHESIZE;
+
+ vol_io_params.sync_behavior = IH_SYNC_ONCLOSE;
+}
+
+int
+ih_SetSyncBehavior(const char *behavior)
+{
+ int val;
+
+ if (strcmp(behavior, "always") == 0) {
+ val = IH_SYNC_ALWAYS;
+
+ } else if (strcmp(behavior, "onclose") == 0) {
+ val = IH_SYNC_ONCLOSE;
+
+ } else if (strcmp(behavior, "never") == 0) {
+ val = IH_SYNC_NEVER;
+
+ } else {
+ /* invalid behavior name */
+ return -1;
+ }
+
+ vol_io_params.sync_behavior = val;
+ return 0;
}
#ifdef AFS_PTHREAD_ENV
ihP->ih_refcnt++; /* must not disappear over unlock */
if (ihP->ih_synced) {
FdHandle_t *fdP;
+ opr_Assert(vol_io_params.sync_behavior != IH_SYNC_ALWAYS);
+ opr_Assert(vol_io_params.sync_behavior != IH_SYNC_NEVER);
ihP->ih_synced = 0;
IH_UNLOCK;
return 0;
}
#endif /* !AFS_NT40_ENV */
+
+int
+ih_fdsync(FdHandle_t *fdP)
+{
+ switch (vol_io_params.sync_behavior) {
+ case IH_SYNC_ALWAYS:
+ return OS_SYNC(fdP->fd_fd);
+ case IH_SYNC_ONCLOSE:
+ if (fdP->fd_ih) {
+ fdP->fd_ih->ih_synced = 1;
+ return 0;
+ }
+ return 1;
+ case IH_SYNC_NEVER:
+ return 0;
+ default:
+ opr_Assert(0);
+ }
+}
#define FD_HANDLE_MALLOCSIZE ((size_t)((4096/sizeof(FdHandle_t))))
#define STREAM_HANDLE_MALLOCSIZE 1
+/* Possible values for the vol_io_params.sync_behavior option.
+ * These dictate what actually happens when you call FDH_SYNC or IH_CONDSYNC. */
+#define IH_SYNC_ALWAYS (1) /* This makes FDH_SYNCs do what you'd probably
+ * expect: a synchronous fsync() */
+#define IH_SYNC_ONCLOSE (2) /* This makes FDH_SYNCs just flag the ih as "I
+ * need to sync", and does not perform the actual
+ * fsync() until we IH_REALLYCLOSE. This provides a
+ * little assurance over IH_SYNC_NEVER when a volume
+ * has gone offline, and a few other situations. */
+#define IH_SYNC_NEVER (3) /* This makes FDH_SYNCs do nothing. Faster, but
+ * obviously less durable. The OS may ensure that
+ * our data hits the disk eventually, depending on
+ * the platform and various OS-specific tuning
+ * parameters. */
+
/* READ THIS.
*
afs_uint32 fd_handle_setaside; /* for non-cached i/o, trad. was 128 */
afs_uint32 fd_initial_cachesize; /* what was 'default' */
afs_uint32 fd_max_cachesize; /* max open files if large-cache activated */
-} ih_init_params;
+ int sync_behavior; /* one of the IH_SYNC_* constants */
+} ih_init_params;
/* Number of file descriptors needed for non-cached I/O */
#define FD_HANDLE_SETASIDE 128 /* Match to MAX_FILESERVER_THREAD */
extern void ih_PkgDefaults(void);
extern void ih_Initialize(void);
extern void ih_UseLargeCache(void);
+extern int ih_SetSyncBehavior(const char *behavior);
extern IHandle_t *ih_init(int /*@alt Device@ */ dev, int /*@alt VolId@ */ vid,
Inode ino);
extern IHandle_t *ih_copy(IHandle_t * ihP);
#define FDH_WRITE(H, B, S) OS_WRITE((H)->fd_fd, B, S)
#define FDH_SEEK(H, O, F) OS_SEEK((H)->fd_fd, O, F)
-#define FDH_SYNC(H) ((H->fd_ih!=NULL) ? ( H->fd_ih->ih_synced = 1) - 1 : 1)
+#define FDH_SYNC(H) ih_fdsync(H)
#define FDH_TRUNC(H, L) OS_TRUNC((H)->fd_fd, L)
#define FDH_SIZE(H) OS_SIZE((H)->fd_fd)
#define FDH_LOCKFILE(H, O) OS_LOCKFILE((H)->fd_fd, O)
#define FDH_UNLOCKFILE(H, O) OS_UNLOCKFILE((H)->fd_fd, O)
#define FDH_ISUNLINKED(H) OS_ISUNLINKED((H)->fd_fd)
+extern int ih_fdsync(FdHandle_t *fdP);
+
#ifdef AFS_NT40_ENV
# define afs_stat_st __stat64
# define afs_stat _stat64
vnodeIndexOffset(vcp, dir->vnodeNumber), (char *)&vnode,
sizeof(vnode));
opr_Assert(lcode == sizeof(vnode));
-#if 0
-#ifdef AFS_NT40_ENV
- nt_sync(salvinfo->fileSysDevice);
-#else
- sync(); /* this is slow, but hopefully rarely called. We don't have
- * an open FD on the file itself to fsync.
- */
-#endif
-#else
- salvinfo->vnodeInfo[vLarge].handle->ih_synced = 1;
-#endif
+ IH_CONDSYNC(salvinfo->vnodeInfo[vLarge].handle);
+
/* make sure old directory file is really closed */
fdP = IH_OPEN(dir->dirHandle.dirh_handle);
FDH_REALLYCLOSE(fdP);
rx_enableProcessRPCStats();
} else if (strcmp(argv[code], "-preserve-vol-stats") == 0) {
DoPreserveVolumeStats = 1;
+ } else if (strcmp(argv[code], "-sync") == 0) {
+ if ((code + 1) >= argc) {
+ printf("You have to specify -sync <sync_behavior>\n");
+ exit(1);
+ }
+ ih_PkgDefaults();
+ if (ih_SetSyncBehavior(argv[++code])) {
+ printf("Invalid -sync value %s\n", argv[code]);
+ exit(1);
+ }
}
#ifndef AFS_NT40_ENV
else if (strcmp(argv[code], "-syslog") == 0) {
"[-udpsize <size of socket buffer in bytes>] "
"[-syslog[=FACILITY]] "
"[-enable_peer_stats] [-enable_process_stats] "
+ "[-sync <always | onclose | never>] "
"[-help]\n");
#else
printf("Usage: volserver [-log] [-p <number of processes>] "
"[-nojumbo] [-jumbo] [-rxmaxmtu <bytes>] [-rxbind] [-allow-dotted-principals] "
"[-udpsize <size of socket buffer in bytes>] "
"[-enable_peer_stats] [-enable_process_stats] "
+ "[-sync <always | onclose | never>] "
"[-help]\n");
#endif
VS_EXIT(1);