From: Andrew Deason Date: Fri, 1 May 2020 20:02:08 +0000 (-0500) Subject: vlserver: Return error when growing beyond 2 GiB X-Git-Tag: openafs-devel-1_9_0~78 X-Git-Url: https://git.openafs.org/?p=openafs.git;a=commitdiff_plain;h=d01398731550b8a93b293800642c3e1592099114 vlserver: Return error when growing beyond 2 GiB In the vlserver, when we add a new vlentry or extent block, we grow the VLDB by doing something like this: vital_header.eofPtr += sizeof(item); Since we don't check for overflow, and all of our offset-related variables are signed 32-bit integers, this can cause some odd behavior if we try to grow the database to be over 2 GiB in size. To avoid this, change the two places in vlserver code that grow the database to use a new function, grow_eofPtr(), which checks for 31-bit overflow. If we are about to overflow, log a message and return an error. See the following for a specific example of our "odd behavior" when we overflow the 2 GiB limit in the VLDB: With 1 extent block, we can create 14509076 vlentries successfully. On the 14509077th vlentry, we'll attempt to write the entry to offset 2147483560 (0x7FFFFFA8). Since a vlentry is 148 bytes long, we'll write all the way through offset 2147483707 (0x8000003B), which is over the 31-bit limit. In the udisk subsystem, this results in writing to page numbers 2097151, and -2097152 (since our ubik pages are 1k, and going over the 31-bit limit causes us to treat offsets as negative). These pages start at physical offsets 2147482688 (0x7FFFFC40) and -2147483584 (-0x7FFFFFC0) in our vldb.DB0 (where offset is page*1024+64). Modifying each of these pages involves reading in the existing page first, modifying the parts we are changing, and writing it back. This works just fine for 2097151, but of course fails for -2097152. The latter fails in DReadBuffer when eventually our pread() fails with EINVAL, and causes ubik to log the message: Ubik: Error reading database file: errno=22 But when DReadBuffer fails, DReadBufferForWrite assumes this is due to EOF, and just creates a new buffer for the given page (DNewBuffer). So, the udisk_write() call ultimately succeeds. When we go to flush the dirty data to disk when committing the transaction, after we have successfully written the transaction log, DFlush() fails for the -2097152 page when the pwrite() call eventually fails with EINVAL, causing ubik to panic, logging the messages: Ubik PANIC: Writing Ubik DB modifications When the vlserver gets restarted by bosserver, we then process the transaction log, and perform the operations in the log before starting up (ReplayLog). The log records the actual data we wrote, not split into pages, and the log-replaying code writes directly to the db usying uphys_write instead of udisk_write. So, because of this, the write actually succeeds when replaying the log, since we just write 148 bytes to offset 2147483624 (0x7FFFFFE8), and no negative offsets are used. The vlserver will then be able to run, but will be unable to read that newly-created vlentry, since it involves reading a ubik page beyond the 31-bit boundary. That means trying to lookup that entry will fail with i/o errors, and as well as any entry on the same hash chains as the new entry (since the new entry will be added to the head of the hash chain). Listing all entries in the database will also just show an empty database, since our vital_header.eofPtr will be negative, and we determine EOF by comparing our current blockindex to the value in eofPtr. Change-Id: Ie0b7ac61f9121fa265686449efbae8e18edb1896 Reviewed-on: https://gerrit.openafs.org/14180 Tested-by: BuildBot Reviewed-by: Benjamin Kaduk Reviewed-by: Cheyenne Wills --- diff --git a/src/vlserver/vlutils.c b/src/vlserver/vlutils.c index 9067f08..d11917f 100644 --- a/src/vlserver/vlutils.c +++ b/src/vlserver/vlutils.c @@ -397,6 +397,31 @@ CheckInit(struct ubik_trans *trans, int builddb) return 0; } +/** + * Grow the eofPtr in the header by 'bump' bytes. + * + * @param[inout] cheader VL header + * @param[in] bump How many bytes to add to eofPtr + * @param[out] a_blockindex On success, set to the original eofPtr before we + * bumped it + * @return VL error codes + */ +static afs_int32 +grow_eofPtr(struct vlheader *cheader, afs_int32 bump, afs_int32 *a_blockindex) +{ + afs_int32 blockindex = ntohl(cheader->vital_header.eofPtr); + + if (blockindex < 0 || blockindex >= MAX_AFS_INT32 - bump) { + VLog(0, ("Error: Tried to grow the VLDB beyond the 2GiB limit. Either " + "find a way to trim down your VLDB, or upgrade to a release " + "and database format that supports a larger VLDB.\n")); + return VL_IO; + } + + *a_blockindex = blockindex; + cheader->vital_header.eofPtr = htonl(blockindex + bump); + return 0; +} afs_int32 GetExtentBlock(struct vl_ctx *ctx, afs_int32 base) @@ -418,16 +443,17 @@ GetExtentBlock(struct vl_ctx *ctx, afs_int32 base) /* Write the full extension block at end of vldb */ ctx->ex_addr[base]->ex_hdrflags = htonl(VLCONTBLOCK); - blockindex = ntohl(ctx->cheader->vital_header.eofPtr); + code = grow_eofPtr(ctx->cheader, VL_ADDREXTBLK_SIZE, &blockindex); + if (code) + ERROR_EXIT(VL_IO); + code = vlwrite(ctx->trans, blockindex, (char *)ctx->ex_addr[base], VL_ADDREXTBLK_SIZE); if (code) ERROR_EXIT(VL_IO); - /* Update the cheader.vitalheader structure on disk */ - ctx->cheader->vital_header.eofPtr = blockindex + VL_ADDREXTBLK_SIZE; - ctx->cheader->vital_header.eofPtr = htonl(ctx->cheader->vital_header.eofPtr); + code = write_vital_vlheader(ctx); if (code) ERROR_EXIT(VL_IO); @@ -572,9 +598,11 @@ AllocBlock(struct vl_ctx *ctx, struct nvlentry *tentry) return 0; ctx->cheader->vital_header.freePtr = htonl(tentry->nextIdHash[0]); } else { + afs_int32 code; /* hosed, nothing on free list, grow file */ - blockindex = ntohl(ctx->cheader->vital_header.eofPtr); /* remember this guy */ - ctx->cheader->vital_header.eofPtr = htonl(blockindex + sizeof(vlentry)); + code = grow_eofPtr(ctx->cheader, sizeof(vlentry), &blockindex); + if (code) + return 0; } ctx->cheader->vital_header.allocs++; if (write_vital_vlheader(ctx))