All of lore.kernel.org
 help / color / mirror / Atom feed
* mkfs'ing a 48-bit fs... or not.
@ 2011-10-03 21:55 Eric Sandeen
  2011-10-04  4:00 ` Ted Ts'o
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Eric Sandeen @ 2011-10-03 21:55 UTC (permalink / raw)
  To: ext4 development

Has anyone tried mke2fs at its limits?  The latest git tree seems to fail in several ways.
(Richard Jones reported the initial failure)

# truncate --size 1152921504606846976 reallybigfile 
# mke2fs -t ext4 reallybigfile

first,

Warning: the fs_type huge is not defined in mke2fs.conf

(when types "big" and "huge" got added, they never got a mke2fs.conf update?)

Then, I got:

reallybigfile: Not enough space to build proposed filesystem while setting up superblock


because:

        fs->group_desc_count = (blk_t) ext2fs_div64_ceil(
                ext2fs_blocks_count(super) - super->s_first_data_block,
                EXT2_BLOCKS_PER_GROUP(super));
        if (fs->group_desc_count == 0) {
                retval = EXT2_ET_TOOSMALL;

The div64_ceil returns > 2^32 (2^33, actually), and the cast to blk_t
(which should be dgrp_t?) turns that into a 0.

Trying it with "-O bigalloc" (which should be automatic at this size,
I think?) just goes away for a very long time, I'm not sure what it's
thinking about, or if it's in a loop somewhere (looking now).

I also came across this in ext2fs_initialize() in the bigalloc case:

                if (super->s_clusters_per_group > EXT2_MAX_CLUSTERS_PER_GROUP(super))
                        super->s_blocks_per_group = EXT2_MAX_CLUSTERS_PER_GROUP(super);
                super->s_blocks_per_group = EXT2FS_C2B(fs,
                                       super->s_clusters_per_group);

which seems to be incorrect; I doubt that you meant to set s_blocks_per_group under
a conditional, and then unconditionally set it immediately after.  I assume
that should be super->s_clusters_per_group in the first case?  I'll send a patch,
assuming so.

TBH I've kind of lost the thread on bigalloc, so just putting this out there for
now while I look into things a bit more.

-Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mkfs'ing a 48-bit fs... or not.
  2011-10-03 21:55 mkfs'ing a 48-bit fs... or not Eric Sandeen
@ 2011-10-04  4:00 ` Ted Ts'o
  2011-10-04  4:26   ` [PATCH 1/2] Add "big" and "huge" types to mke2fs.conf Theodore Ts'o
  2011-10-04  5:31   ` mkfs'ing a 48-bit fs... or not Andreas Dilger
  2011-10-04  4:03 ` Eric Sandeen
  2011-10-04  7:06 ` Richard W.M. Jones
  2 siblings, 2 replies; 11+ messages in thread
From: Ted Ts'o @ 2011-10-04  4:00 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: ext4 development

On Mon, Oct 03, 2011 at 04:55:11PM -0500, Eric Sandeen wrote:
> Has anyone tried mke2fs at its limits?  The latest git tree seems to fail in several ways.
> (Richard Jones reported the initial failure)
> 
> # truncate --size 1152921504606846976 reallybigfile 
> # mke2fs -t ext4 reallybigfile
> 
> first,
> 
> Warning: the fs_type huge is not defined in mke2fs.conf
> 
> (when types "big" and "huge" got added, they never got a mke2fs.conf update?)

It used to be that an undefined file system type didn't flag an error.
It now does, so we should have definitions for them in mke2fs.conf.

> reallybigfile: Not enough space to build proposed filesystem while setting up superblock
> 
> because:
> 
>         fs->group_desc_count = (blk_t) ext2fs_div64_ceil(
>                 ext2fs_blocks_count(super) - super->s_first_data_block,
>                 EXT2_BLOCKS_PER_GROUP(super));
>         if (fs->group_desc_count == 0) {
>                 retval = EXT2_ET_TOOSMALL;
> 
> The div64_ceil returns > 2^32 (2^33, actually), and the cast to blk_t
> (which should be dgrp_t?) turns that into a 0.

Yep, that should be dgrp_t.  Oops.

> Trying it with "-O bigalloc" (which should be automatic at this size,
> I think?) just goes away for a very long time, I'm not sure what it's
> thinking about, or if it's in a loop somewhere (looking now).

Well, we probably do want to engage bigalloc automatically, at some
point (I want to wait until bigalloc is in commonly used kernels, at
least for community distro's).  I'm not sure what the best cluster
size to pick by default should be, though.  16k?  64k?

     	     		       	   	    	  - Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mkfs'ing a 48-bit fs... or not.
  2011-10-03 21:55 mkfs'ing a 48-bit fs... or not Eric Sandeen
  2011-10-04  4:00 ` Ted Ts'o
@ 2011-10-04  4:03 ` Eric Sandeen
  2011-10-04  4:28   ` Ted Ts'o
  2011-10-04  7:06 ` Richard W.M. Jones
  2 siblings, 1 reply; 11+ messages in thread
From: Eric Sandeen @ 2011-10-04  4:03 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: ext4 development

On 10/3/11 4:55 PM, Eric Sandeen wrote:
> Has anyone tried mke2fs at its limits?  The latest git tree seems to fail in several ways.
> (Richard Jones reported the initial failure)
> 
> # truncate --size 1152921504606846976 reallybigfile 
> # mke2fs -t ext4 reallybigfile

...

> Trying it with "-O bigalloc" (which should be automatic at this size,
> I think?) just goes away for a very long time, I'm not sure what it's
> thinking about, or if it's in a loop somewhere (looking now).

It comes up with too many inodes, then tries to reduce the count,
but the "waste not want not" logic bumps it back up... ipg eventually
goes "below" 0 but it's unsigned so it goes on in this loop forever.

Some of this is my fault... I put that retry logic in years ago.  :(

I'll see what I can do to fix it up.

-Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] Add "big" and "huge" types to mke2fs.conf
  2011-10-04  4:00 ` Ted Ts'o
@ 2011-10-04  4:26   ` Theodore Ts'o
  2011-10-04  4:27     ` [PATCH 2/2] libext2fs: fix bad cast which causes problems for file systems > 512EB Theodore Ts'o
  2011-10-04  5:31   ` mkfs'ing a 48-bit fs... or not Andreas Dilger
  1 sibling, 1 reply; 11+ messages in thread
From: Theodore Ts'o @ 2011-10-04  4:26 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: sandeen, Theodore Ts'o

mke2fs attempts to use the "big" and "huge" types, and now that mke2fs
will complain if there are file system types which are undefined,
let's add definitions for them.

Thanks to Richard Jones for reporting this problem.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 misc/mke2fs-hurd.conf |    6 ++++++
 misc/mke2fs.conf      |    6 ++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/misc/mke2fs-hurd.conf b/misc/mke2fs-hurd.conf
index 52ed7e5..4f0527d 100644
--- a/misc/mke2fs-hurd.conf
+++ b/misc/mke2fs-hurd.conf
@@ -21,6 +21,12 @@
 	floppy = {
 		inode_ratio = 8192
 	}
+	big = {
+		inode_ratio = 32768
+	}
+	huge = {
+		inode_ratio = 65536
+	}
 	news = {
 		inode_ratio = 4096
 	}
diff --git a/misc/mke2fs.conf b/misc/mke2fs.conf
index 775e046..0871f77 100644
--- a/misc/mke2fs.conf
+++ b/misc/mke2fs.conf
@@ -30,6 +30,12 @@
 		inode_size = 128
 		inode_ratio = 8192
 	}
+	big = {
+		inode_ratio = 32768
+	}
+	huge = {
+		inode_ratio = 65536
+	}
 	news = {
 		inode_ratio = 4096
 	}
-- 
1.7.4.1.22.gec8e1.dirty


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] libext2fs: fix bad cast which causes problems for file systems > 512EB
  2011-10-04  4:26   ` [PATCH 1/2] Add "big" and "huge" types to mke2fs.conf Theodore Ts'o
@ 2011-10-04  4:27     ` Theodore Ts'o
  2011-10-04 11:47       ` Eric Sandeen
  0 siblings, 1 reply; 11+ messages in thread
From: Theodore Ts'o @ 2011-10-04  4:27 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: sandeen, Theodore Ts'o

If the number of block groups exceeds 2**32, a bad cast would lead to
a bogus "Not enough space to build proposed filesystem while setting
up superblock" failure.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 lib/ext2fs/initialize.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
index 2875f97..b050a0a 100644
--- a/lib/ext2fs/initialize.c
+++ b/lib/ext2fs/initialize.c
@@ -248,7 +248,7 @@ errcode_t ext2fs_initialize(const char *name, int flags,
 	}
 
 retry:
-	fs->group_desc_count = (blk_t) ext2fs_div64_ceil(
+	fs->group_desc_count = (dgrp_t) ext2fs_div64_ceil(
 		ext2fs_blocks_count(super) - super->s_first_data_block,
 		EXT2_BLOCKS_PER_GROUP(super));
 	if (fs->group_desc_count == 0) {
-- 
1.7.4.1.22.gec8e1.dirty


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: mkfs'ing a 48-bit fs... or not.
  2011-10-04  4:03 ` Eric Sandeen
@ 2011-10-04  4:28   ` Ted Ts'o
  0 siblings, 0 replies; 11+ messages in thread
From: Ted Ts'o @ 2011-10-04  4:28 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Eric Sandeen, ext4 development

On Mon, Oct 03, 2011 at 11:03:40PM -0500, Eric Sandeen wrote:
> It comes up with too many inodes, then tries to reduce the count,
> but the "waste not want not" logic bumps it back up... ipg eventually
> goes "below" 0 but it's unsigned so it goes on in this loop forever.

Oh, this is because of the fact that we can't have more than 2**32
inodes, right?  Doh!

> Some of this is my fault... I put that retry logic in years ago.  :(
> 
> I'll see what I can do to fix it up.

Many thanks.  I've fixed the other issues you've pointed out.  Check
out the next branch on github...

					- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mkfs'ing a 48-bit fs... or not.
  2011-10-04  4:00 ` Ted Ts'o
  2011-10-04  4:26   ` [PATCH 1/2] Add "big" and "huge" types to mke2fs.conf Theodore Ts'o
@ 2011-10-04  5:31   ` Andreas Dilger
  1 sibling, 0 replies; 11+ messages in thread
From: Andreas Dilger @ 2011-10-04  5:31 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Eric Sandeen, ext4 development

On 2011-10-03, at 10:00 PM, Ted Ts'o <tytso@mit.edu> wrote:
> On Mon, Oct 03, 2011 at 04:55:11PM -0500, Eric Sandeen wrote:
>> Has anyone tried mke2fs at its limits?  The latest git tree seems to fail in several ways.
>> (Richard Jones reported the initial failure)
>> 
>> # truncate --size 1152921504606846976 reallybigfile 
>> # mke2fs -t ext4 reallybigfile
>> 
>> first,
>> 
>> Warning: the fs_type huge is not defined in mke2fs.conf
>> 
>> (when types "big" and "huge" got added, they never got a mke2fs.conf update?)
> 
> It used to be that an undefined file system type didn't flag an error.
> It now does, so we should have definitions for them in mke2fs.conf.
> 
>> reallybigfile: Not enough space to build proposed filesystem while setting up superblock

Isn't there also a problem with the number of block group descriptor blocks in the first group, if meta_bg is not used?  With 64-byte group descriptors per 128MB group this is 1024 bytes of descriptors for 2GB of blocks, or 128MB of descriptors for 256TB of blocks.  At this point group 0 is full of primary block descriptors and group 1 is full of backup descriptors, and we are out of luck to make a larger filesystem. 

That is only 2^48 bytes, not 2^48 blocks (2^60 bytes), so it means meta_bg needs to get into more testing, and online resize with flex_bg needs to move forward. 

>> because:
>> 
>>        fs->group_desc_count = (blk_t) ext2fs_div64_ceil(
>>                ext2fs_blocks_count(super) - super->s_first_data_block,
>>                EXT2_BLOCKS_PER_GROUP(super));
>>        if (fs->group_desc_count == 0) {
>>                retval = EXT2_ET_TOOSMALL;
>> 
>> The div64_ceil returns > 2^32 (2^33, actually), and the cast to blk_t
>> (which should be dgrp_t?) turns that into a 0.
> 
> Yep, that should be dgrp_t.  Oops.
> 
>> Trying it with "-O bigalloc" (which should be automatic at this size,
>> I think?) just goes away for a very long time, I'm not sure what it's
>> thinking about, or if it's in a loop somewhere (looking now).
> 
> Well, we probably do want to engage bigalloc automatically, at some
> point (I want to wait until bigalloc is in commonly used kernels, at
> least for community distro's).  I'm not sure what the best cluster
> size to pick by default should be, though.  16k?  64k?
> 
>                                                  - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mkfs'ing a 48-bit fs... or not.
  2011-10-03 21:55 mkfs'ing a 48-bit fs... or not Eric Sandeen
  2011-10-04  4:00 ` Ted Ts'o
  2011-10-04  4:03 ` Eric Sandeen
@ 2011-10-04  7:06 ` Richard W.M. Jones
  2 siblings, 0 replies; 11+ messages in thread
From: Richard W.M. Jones @ 2011-10-04  7:06 UTC (permalink / raw)
  Cc: ext4 development


Thanks Eric.  Here is the original thread (see also the replies).

https://lists.fedoraproject.org/pipermail/devel/2011-October/157618.html

In theory I could test this up to ~ 2**63, but it requires a number of
bugfixes and changes in qemu.  Obviously that size is ridiculous :-)
but it may reveal bugs that wouldn't be found by ordinary testing.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] libext2fs: fix bad cast which causes problems for file systems > 512EB
  2011-10-04  4:27     ` [PATCH 2/2] libext2fs: fix bad cast which causes problems for file systems > 512EB Theodore Ts'o
@ 2011-10-04 11:47       ` Eric Sandeen
  2011-10-04 18:05         ` Ted Ts'o
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Sandeen @ 2011-10-04 11:47 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List

On 10/3/11 11:27 PM, Theodore Ts'o wrote:
> If the number of block groups exceeds 2**32, a bad cast would lead to
> a bogus "Not enough space to build proposed filesystem while setting
> up superblock" failure.

It's the proper cast now, but I don't think it fixes the problem, since they
are both __u32...

But in any case, for the actual change at least:

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
>  lib/ext2fs/initialize.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/lib/ext2fs/initialize.c b/lib/ext2fs/initialize.c
> index 2875f97..b050a0a 100644
> --- a/lib/ext2fs/initialize.c
> +++ b/lib/ext2fs/initialize.c
> @@ -248,7 +248,7 @@ errcode_t ext2fs_initialize(const char *name, int flags,
>  	}
>  
>  retry:
> -	fs->group_desc_count = (blk_t) ext2fs_div64_ceil(
> +	fs->group_desc_count = (dgrp_t) ext2fs_div64_ceil(
>  		ext2fs_blocks_count(super) - super->s_first_data_block,
>  		EXT2_BLOCKS_PER_GROUP(super));
>  	if (fs->group_desc_count == 0) {


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] libext2fs: fix bad cast which causes problems for file systems > 512EB
  2011-10-04 11:47       ` Eric Sandeen
@ 2011-10-04 18:05         ` Ted Ts'o
  2011-10-04 18:15           ` Eric Sandeen
  0 siblings, 1 reply; 11+ messages in thread
From: Ted Ts'o @ 2011-10-04 18:05 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Richard W.M. Jones, Ext4 Developers List

On Tue, Oct 04, 2011 at 06:47:12AM -0500, Eric Sandeen wrote:
> On 10/3/11 11:27 PM, Theodore Ts'o wrote:
> > If the number of block groups exceeds 2**32, a bad cast would lead to
> > a bogus "Not enough space to build proposed filesystem while setting
> > up superblock" failure.
> 
> It's the proper cast now, but I don't think it fixes the problem, since they
> are both __u32...

Hmm, yes.

And to be quite honest I'm not sure it's worth fixing.  2**32 block
groups gets us up to 2**59 bytes assuming 4k blocks.  The theoretical
maximum given the current extent tree format is 2**60 assuming 4k
blocks.  So changing dgrp_t to be 64-bits just to get that last power
of two (i.e., from 512EB to a full PB) doesn't seem worth it.  Simply
using a bigalloc cluster size of 8k would make the problem go away
(and arguably we'd probably want a large cluster size if someone
wanted to create a file system that big anyway).

So maybe we should just check to see if the required number of block
groups is greater than 2**32, and if so, give an error.

       	  	       	      	  - Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] libext2fs: fix bad cast which causes problems for file systems > 512EB
  2011-10-04 18:05         ` Ted Ts'o
@ 2011-10-04 18:15           ` Eric Sandeen
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Sandeen @ 2011-10-04 18:15 UTC (permalink / raw)
  To: Ted Ts'o; +Cc: Richard W.M. Jones, Ext4 Developers List

On 10/4/11 1:05 PM, Ted Ts'o wrote:
> On Tue, Oct 04, 2011 at 06:47:12AM -0500, Eric Sandeen wrote:
>> On 10/3/11 11:27 PM, Theodore Ts'o wrote:
>>> If the number of block groups exceeds 2**32, a bad cast would lead to
>>> a bogus "Not enough space to build proposed filesystem while setting
>>> up superblock" failure.
>>
>> It's the proper cast now, but I don't think it fixes the problem, since they
>> are both __u32...
> 
> Hmm, yes.
> 
> And to be quite honest I'm not sure it's worth fixing.  2**32 block
> groups gets us up to 2**59 bytes assuming 4k blocks.  The theoretical
> maximum given the current extent tree format is 2**60 assuming 4k
> blocks.  So changing dgrp_t to be 64-bits just to get that last power
> of two (i.e., from 512EB to a full PB) doesn't seem worth it.  Simply
> using a bigalloc cluster size of 8k would make the problem go away
> (and arguably we'd probably want a large cluster size if someone
> wanted to create a file system that big anyway).
> 
> So maybe we should just check to see if the required number of block
> groups is greater than 2**32, and if so, give an error.
> 
>        	  	       	      	  - Ted
> 

As long as we have a consistent, predictable, well-designed and well-understood
maximum (theoretical) size for the fs, I'm all for documenting & enforcing it.

TBH I'm still trying to get all the moving parts together in my head, between
meta_bg & bigalloc & whatnot, at these sizes.

The initialization functions are looking pretty ad-hoc to me right now. :)

-Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-10-04 18:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-03 21:55 mkfs'ing a 48-bit fs... or not Eric Sandeen
2011-10-04  4:00 ` Ted Ts'o
2011-10-04  4:26   ` [PATCH 1/2] Add "big" and "huge" types to mke2fs.conf Theodore Ts'o
2011-10-04  4:27     ` [PATCH 2/2] libext2fs: fix bad cast which causes problems for file systems > 512EB Theodore Ts'o
2011-10-04 11:47       ` Eric Sandeen
2011-10-04 18:05         ` Ted Ts'o
2011-10-04 18:15           ` Eric Sandeen
2011-10-04  5:31   ` mkfs'ing a 48-bit fs... or not Andreas Dilger
2011-10-04  4:03 ` Eric Sandeen
2011-10-04  4:28   ` Ted Ts'o
2011-10-04  7:06 ` Richard W.M. Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.