* [PATCH -v2] mke2fs: fix hugefile creation so the hugefile(s) are contiguous
@ 2017-06-23 0:00 Theodore Ts'o
2017-06-23 17:56 ` Tahsin Erdogan
2017-06-23 19:19 ` Eric Biggers
0 siblings, 2 replies; 4+ messages in thread
From: Theodore Ts'o @ 2017-06-23 0:00 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: Theodore Ts'o
Commit 4f868703f6f2: "libext2fs: use fallocate for creating journals
and hugefiles" introduced a regression for the mke2fs hugefile
feature. The problem is that the fallocate library function
intersperses the extent tree metadata blocks with the data blocks, and
this violates the hugefile guarantee that the created files should be
physically contiguous on disk.
Unfortuantely the m_hugefile regression test was flawed, and didn't
pick up the regression.
This commit fixes the regression test so that it detects the problem
before fixing mke2fs, and also fixes the mke2fs hugefile by reverting
the mke2fs hugefile portion of commit 4f868703f6f2.
Google-Bug-Id: 62791459
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
misc/mk_hugefiles.c | 107 ++++++++++++++++++++++++++++++++++++++++++------
misc/mke2fs.conf.5.in | 6 ++-
tests/m_hugefile/expect | 23 ++---------
tests/m_hugefile/script | 52 ++++++++++++++++++++++-
4 files changed, 152 insertions(+), 36 deletions(-)
diff --git a/misc/mk_hugefiles.c b/misc/mk_hugefiles.c
index 5882394d..f34fa411 100644
--- a/misc/mk_hugefiles.c
+++ b/misc/mk_hugefiles.c
@@ -262,8 +262,12 @@ static errcode_t mk_hugefile(ext2_filsys fs, blk64_t num,
{
errcode_t retval;
+ blk64_t lblk, bend = 0;
+ __u64 size;
+ blk64_t left;
+ blk64_t count = 0;
struct ext2_inode inode;
- int falloc_flags;
+ ext2_extent_handle_t handle;
retval = ext2fs_new_inode(fs, 0, LINUX_S_IFREG, NULL, ino);
if (retval)
@@ -283,20 +287,93 @@ static errcode_t mk_hugefile(ext2_filsys fs, blk64_t num,
ext2fs_inode_alloc_stats2(fs, *ino, +1, 0);
- if (ext2fs_has_feature_extents(fs->super))
- inode.i_flags |= EXT4_EXTENTS_FL;
-
- falloc_flags = EXT2_FALLOCATE_FORCE_INIT;
- if (zero_hugefile)
- falloc_flags |= EXT2_FALLOCATE_ZERO_BLOCKS;
- retval = ext2fs_fallocate(fs, falloc_flags, *ino, &inode, goal, 0, num);
+ retval = ext2fs_extent_open2(fs, *ino, &inode, &handle);
if (retval)
return retval;
- retval = ext2fs_inode_size_set(fs, &inode, num * fs->blocksize);
+
+ /*
+ * We don't use ext2fs_fallocate() here because hugefiles are
+ * designed to be physically contiguous (if the block group
+ * descriptors are configured to be in a single block at the
+ * beginning of the file system, by using the
+ * packed_meta_blocks layout), with the extent tree blocks
+ * allocated near the beginning of the file system.
+ */
+ lblk = 0;
+ left = num ? num : 1;
+ while (left) {
+ blk64_t pblk, end;
+ blk64_t n = left;
+
+ retval = ext2fs_find_first_zero_block_bitmap2(fs->block_map,
+ goal, ext2fs_blocks_count(fs->super) - 1, &end);
+ if (retval)
+ goto errout;
+ goal = end;
+
+ retval = ext2fs_find_first_set_block_bitmap2(fs->block_map, goal,
+ ext2fs_blocks_count(fs->super) - 1, &bend);
+ if (retval == ENOENT) {
+ bend = ext2fs_blocks_count(fs->super);
+ if (num == 0)
+ left = 0;
+ }
+ if (!num || bend - goal < left)
+ n = bend - goal;
+ pblk = goal;
+ if (num)
+ left -= n;
+ goal += n;
+ count += n;
+ ext2fs_block_alloc_stats_range(fs, pblk, n, +1);
+
+ if (zero_hugefile) {
+ blk64_t ret_blk;
+ retval = ext2fs_zero_blocks2(fs, pblk, n,
+ &ret_blk, NULL);
+
+ if (retval)
+ com_err(program_name, retval,
+ _("while zeroing block %llu "
+ "for hugefile"), ret_blk);
+ }
+
+ while (n) {
+ blk64_t l = n;
+ struct ext2fs_extent newextent;
+
+ if (l > EXT_INIT_MAX_LEN)
+ l = EXT_INIT_MAX_LEN;
+
+ newextent.e_len = l;
+ newextent.e_pblk = pblk;
+ newextent.e_lblk = lblk;
+ newextent.e_flags = 0;
+
+ retval = ext2fs_extent_insert(handle,
+ EXT2_EXTENT_INSERT_AFTER, &newextent);
+ if (retval)
+ return retval;
+ pblk += l;
+ lblk += l;
+ n -= l;
+ }
+ }
+
+ retval = ext2fs_read_inode(fs, *ino, &inode);
if (retval)
- return retval;
+ goto errout;
- retval = ext2fs_write_inode(fs, *ino, &inode);
+ retval = ext2fs_iblk_add_blocks(fs, &inode,
+ count / EXT2FS_CLUSTER_RATIO(fs));
+ if (retval)
+ goto errout;
+ size = (__u64) count * fs->blocksize;
+ retval = ext2fs_inode_size_set(fs, &inode, size);
+ if (retval)
+ goto errout;
+
+ retval = ext2fs_write_new_inode(fs, *ino, &inode);
if (retval)
goto errout;
@@ -314,7 +391,13 @@ retry:
goto retry;
}
+ if (retval)
+ goto errout;
+
errout:
+ if (handle)
+ ext2fs_extent_free(handle);
+
return retval;
}
@@ -499,8 +582,6 @@ errcode_t mk_hugefiles(ext2_filsys fs, const char *device_name)
printf(_("with %llu blocks each"), num_blocks);
fputs(": ", stdout);
}
- if (num_blocks == 0)
- num_blocks = ext2fs_blocks_count(fs->super) - goal;
for (i=0; i < num_files; i++) {
ext2_ino_t ino;
diff --git a/misc/mke2fs.conf.5.in b/misc/mke2fs.conf.5.in
index 1ce0f5eb..da105d6b 100644
--- a/misc/mke2fs.conf.5.in
+++ b/misc/mke2fs.conf.5.in
@@ -441,7 +441,11 @@ command line option to
.TP
.I make_hugefiles
This boolean relation enables the creation of pre-allocated files as
-part of formatting the file system.
+part of formatting the file system. If the file system is configured so
+that the block group descriptors are located at beginning file system
+space (by using the packed_meta_blocks option), the data blocks of the
+huge files will be contiguous, with the extent tree blocks allocated
+near the beginning of the file system space.
.TP
.I hugefiles_uid
This relation controls the user ownership for all of the files and
diff --git a/tests/m_hugefile/expect b/tests/m_hugefile/expect
index 82a60319..831d31ad 100644
--- a/tests/m_hugefile/expect
+++ b/tests/m_hugefile/expect
@@ -14,23 +14,6 @@ Pass 4: Checking reference counts
Pass 5: Checking group summary information
Exit status is 0
-debugfs -R "extents /store/big-data" test.img | head
-Level Entries Logical Physical Length Flags
- 0/ 2 1/ 1 0 - 1073610751 131070 1073610752
- 1/ 2 1/ 97 0 - 11108351 131071 11108352
- 2/ 2 1/339 0 - 32767 131072 - 163839 32768
- 2/ 2 2/339 32768 - 65535 163840 - 196607 32768
- 2/ 2 3/339 65536 - 98303 196608 - 229375 32768
- 2/ 2 4/339 98304 - 131071 229376 - 262143 32768
- 2/ 2 5/339 131072 - 163839 262144 - 294911 32768
- 2/ 2 6/339 163840 - 196607 294912 - 327679 32768
- 2/ 2 7/339 196608 - 229375 327680 - 360447 32768
- 2/ 2 8/339 229376 - 262143 360448 - 393215 32768
- 2/ 2 9/339 262144 - 294911 393216 - 425983 32768
- 2/ 2 10/339 294912 - 327679 425984 - 458751 32768
- 2/ 2 11/339 327680 - 360447 458752 - 491519 32768
- 2/ 2 12/339 360448 - 393215 491520 - 524287 32768
- 2/ 2 13/339 393216 - 425983 524288 - 557055 32768
- 2/ 2 14/339 425984 - 458751 557056 - 589823 32768
- 2/ 2 15/339 458752 - 491519 589824 - 622591 32768
- 2/ 2 16/339 491520 - 524287 622592 - 655359 32768
+debugfs -R "extents /store/big-data" test.img
+Last logical block: 1073610751
+Last physical block: 1073741823
diff --git a/tests/m_hugefile/script b/tests/m_hugefile/script
index 2750d538..d2b92e2f 100644
--- a/tests/m_hugefile/script
+++ b/tests/m_hugefile/script
@@ -44,9 +44,57 @@ $FSCK $FSCK_OPT -N test_filesys $TMPFILE >> $OUT 2>&1
status=$?
echo Exit status is $status >> $OUT
-echo 'debugfs -R "extents /store/big-data" test.img | head' >> $OUT
+echo 'debugfs -R "extents /store/big-data" test.img' >> $OUT
-$DEBUGFS -R "extents /store/big-data" $TMPFILE 2>&1 | head -n 20 >> $OUT 2>&1
+$DEBUGFS -R "extents /store/big-data" $TMPFILE 2>&1 | tr / " " | tr -d - | awk '
+BEGIN {
+ expected_logical_start = 0;
+ expected_physical_start = 0;
+}
+{
+ if (NR != 1) {
+ level = $1;
+ total_levels = $2;
+
+ if (level == total_levels) {
+ logical_start=$5;
+ logical_end=$6;
+ physical_start=$7;
+ physical_end=$8;
+ len = $9;
+
+ if (logical_end + 1 - logical_start != len) {
+ print logical_end + 1 - logical_start, len;
+ print "UNEXPECTED LENGTH for extent", $0;
+ }
+ if (physical_end + 1 - physical_start != len) {
+ print physical_end + 1 - physical_start, len;
+ print "UNEXPECTED LENGTH for extent", $0;
+ }
+
+ if (logical_start != expected_logical_start) {
+ print "UNEXPECTED LOGICAL DISCONTINUITY between extents:";
+ print "\t", prev;
+ print "\t", $0;
+ }
+ if (physical_start != expected_physical_start &&
+ expected_logical_start != 0) {
+ print "PHYSICAL DISCONTINUITY between extents:";
+ print "\t", prev;
+ print "\t", $0;
+ }
+
+ expected_logical_start = logical_end + 1;
+ expected_physical_start = physical_end + 1;
+ }
+ }
+ prev=$0;
+}
+END {
+ print "Last logical block:", expected_logical_start-1;
+ print "Last physical block:", expected_physical_start-1;
+}
+' >> $OUT 2>&1
rm $TMPFILE
--
2.11.0.rc0.7.gbe5a750
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH -v2] mke2fs: fix hugefile creation so the hugefile(s) are contiguous
2017-06-23 0:00 [PATCH -v2] mke2fs: fix hugefile creation so the hugefile(s) are contiguous Theodore Ts'o
@ 2017-06-23 17:56 ` Tahsin Erdogan
2017-06-23 19:19 ` Eric Biggers
1 sibling, 0 replies; 4+ messages in thread
From: Tahsin Erdogan @ 2017-06-23 17:56 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Ext4 Developers List
> + if (logical_end + 1 - logical_start != len) {
> + print logical_end + 1 - logical_start, len;
> + print "UNEXPECTED LENGTH for extent", $0;
Can you fix indentation of first print above?
> + }
> + if (physical_end + 1 - physical_start != len) {
> + print physical_end + 1 - physical_start, len;
> + print "UNEXPECTED LENGTH for extent", $0;
Same here.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH -v2] mke2fs: fix hugefile creation so the hugefile(s) are contiguous
2017-06-23 0:00 [PATCH -v2] mke2fs: fix hugefile creation so the hugefile(s) are contiguous Theodore Ts'o
2017-06-23 17:56 ` Tahsin Erdogan
@ 2017-06-23 19:19 ` Eric Biggers
2017-06-23 21:19 ` Theodore Ts'o
1 sibling, 1 reply; 4+ messages in thread
From: Eric Biggers @ 2017-06-23 19:19 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Ext4 Developers List
Hi Ted,
On Thu, Jun 22, 2017 at 08:00:09PM -0400, Theodore Ts'o wrote:
> .I make_hugefiles
> This boolean relation enables the creation of pre-allocated files as
> -part of formatting the file system.
> +part of formatting the file system. If the file system is configured so
> +that the block group descriptors are located at beginning file system
> +space (by using the packed_meta_blocks option), the data blocks of the
> +huge files will be contiguous, with the extent tree blocks allocated
> +near the beginning of the file system space.
It's not quite that simple. The presence of backup superblocks can also cause a
discontinuity. If I remove 'num_backup_sb = 0' from mke2fs.conf, I see:
PHYSICAL DISCONTINUITY between extents:
2 2 219 220 1073545216 1073577983 1073676288 1073709055 32768
2 2 220 220 1073577984 1073610398 1073709409 1073741823 32415
Also, for packed_meta_blocks to take effect, flex_bg must be enabled. This is
documented in mke2fs(8) but not in mke2fs.conf(5).
There could be other issues as well; those are just the ones I found.
Perhaps there should be an option hugefiles_contiguous which makes the mke2fs
command fail if the hugefiles can't be allocated contiguously?
Eric
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH -v2] mke2fs: fix hugefile creation so the hugefile(s) are contiguous
2017-06-23 19:19 ` Eric Biggers
@ 2017-06-23 21:19 ` Theodore Ts'o
0 siblings, 0 replies; 4+ messages in thread
From: Theodore Ts'o @ 2017-06-23 21:19 UTC (permalink / raw)
To: Eric Biggers; +Cc: Ext4 Developers List
On Fri, Jun 23, 2017 at 12:19:25PM -0700, Eric Biggers wrote:
> It's not quite that simple. The presence of backup superblocks can also cause a
> discontinuity. If I remove 'num_backup_sb = 0' from mke2fs.conf, I see:
>
> PHYSICAL DISCONTINUITY between extents:
> 2 2 219 220 1073545216 1073577983 1073676288 1073709055 32768
> 2 2 220 220 1073577984 1073610398 1073709409 1073741823 32415
>
> Also, for packed_meta_blocks to take effect, flex_bg must be enabled. This is
> documented in mke2fs(8) but not in mke2fs.conf(5).
Good point. I'll phrase it this way instead in the manpage.
This boolean relation enables the creation of pre-allocated files as
part of formatting the file system. The extent tree blocks for these
pre-allocated files will be placed near the beginning of the file
system, so that if all of the other metadata blocks are also configured
to be placed near the beginning of the file system (by disabling the
backup superblocks, using the packed_meta_blocks option, etc.), the data
blocks of the pre-allocated files will be contiguous.
This technically accurate, but admittedly doesn't go into a huge
amount of details about how to actually make it work if you really
want it.
My original thinking was that this was something fairly specialized
that very few people would want, and I didn't want to make the
documentation too verbose and ultimately more confusing.
> Perhaps there should be an option hugefiles_contiguous which makes the mke2fs
> command fail if the hugefiles can't be allocated contiguously?
I suspect the better thing to do is to somehow make it easy for users
to make the hugefiles be allocated contiguously, as opposed to giving
an option which tells the user that they didn't quite get it right.
That seems a bit too much like the old joke:
Ken Thompson has an automobile which he helped design. Unlike most
automobiles, it has neither speedometer, nor gas gauge, nor any of
the numerous idiot lights which plague the modern driver. Rather,
if the driver makes any mistake, a giant "?" lights up in the
center of the dashboard. "The experienced driver", he says, "will
usually know what's wrong.
So perhaps just simply adding an example usage to the mke2fs.conf man
page, e.g.
[fs_types]
hugefile = {
features = extent,huge_file,bigalloc,flex_bg,uninit_bg,dir_nlink,extra_isize,^resize_inode,sparse_super2
cluster_size = 32768
reserved_ratio = 0.0
num_backup_sb = 0
packed_meta_blocks = 1
make_hugefiles = 1
inode_ratio = 4194304
hugefiles_dir = /huge-dir
hugefiles_name = huge-data
hugefiles_digits = 0
hugefiles_size = 0
hugefiles_align = 256M
num_hugefiles = 1
zero_hugefiles = false
flex_bg_size = 262144
}
I'm going to save making this change as a separate commit, though,
since I want to get the bug fix commit out sooner rather than later.
- Ted
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-06-23 21:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-23 0:00 [PATCH -v2] mke2fs: fix hugefile creation so the hugefile(s) are contiguous Theodore Ts'o
2017-06-23 17:56 ` Tahsin Erdogan
2017-06-23 19:19 ` Eric Biggers
2017-06-23 21:19 ` Theodore Ts'o
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.