All of lore.kernel.org
 help / color / mirror / Atom feed
* ext4 compat flag assignments
@ 2006-09-22  9:15 Andreas Dilger
  2006-09-28  8:55 ` Alexandre Ratchov
  0 siblings, 1 reply; 11+ messages in thread
From: Andreas Dilger @ 2006-09-22  9:15 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-ext4

Ted,
there are several COMPAT flag assignments that have been proposed in
the past:

- EXT4_FEATURE_INCOMPAT_64BIT (0x0080!) - support for 64-bit block count
  fields in the superblock (s_blocks_count_hi, s_free_blocks_count_hi),
  large group descriptors (s_desc_size), extents with high 16 bits
  (ee_start_hi, ei_leaf_hi), inode ACL (i_file_acl_hi).  May also grow
  to encompass the previously proposed BIG_BG.

- EXT4_FEATURE_RO_COMPAT_HUGE_FILE (0x0008) - change i_blocks to be
  in units of s_blocksize units instead of 512-byte sectors, use
  l_i_frag and l_i_fsize as i_blocks_hi (could also be part of 64BIT).
  Also uses EXT4_HUGE_FILE_FL 0x40000 for i_flags.

- EXT4_FEATURE_RO_COMPAT_GDT_CSUM (0x0010?) - store a crc16 checksum in
  the group descriptor (s_uuid[16] | __u32 group | ext3_group_desc
  (excluding gd_checksum itself)).  This allows the kernel to more safely
  manage UNINIT groups.  Incomplete patch, e2fsck support mostly done.

- EXT4_FEATURE_RO_COMPAT_DIR_NLINK (0x0020?) - allow directories to have
  > 65000 subdirectories (i_nlinks) by setting i_nlinks = 1 for such
  directories.  RO_COMPAT protects old filesystems from unlinking such
  directories incorrectly and losing all files therein.  Needs RO_COMPAT
  flag handling, needs e2fsck support, but very heavily tested.

- EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE (0x0040?) - add s_min_extra_isize and
  s_want_extra_isize fields to superblock, which allow specifying
  the minimum and desired i_extra_isize fields in large inodes
  (for nsec+epoch timestamps, potential other uses).  Needs RO_COMPAT
  flag handling, needs e2fsck support, patch complete, little testing.


I'm not sure about the state of HUGE_FILE (it might be useful for ext[23]
to allow larger sparse files, but also fits quite well with INCOMPAT_64BIT),
but the others are definitely useful independent from INCOMPAT_64BIT.
There are patches in various states of completion for all of the features.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext4 compat flag assignments
  2006-09-22  9:15 ext4 compat flag assignments Andreas Dilger
@ 2006-09-28  8:55 ` Alexandre Ratchov
  2006-09-28 20:29   ` Andi Kleen
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Alexandre Ratchov @ 2006-09-28  8:55 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Theodore Ts'o, linux-ext4

On Fri, Sep 22, 2006 at 02:15:20AM -0700, Andreas Dilger wrote:
> Ted,
> there are several COMPAT flag assignments that have been proposed in
> the past:

as Ted suggested, i post here a list of all fields we plan to use, perhaps
others could complete it?

> 
> - EXT4_FEATURE_INCOMPAT_64BIT (0x0080!) - support for 64-bit block count
>   fields in the superblock (s_blocks_count_hi, s_free_blocks_count_hi),
>   large group descriptors (s_desc_size), extents with high 16 bits
>   (ee_start_hi, ei_leaf_hi), inode ACL (i_file_acl_hi).  May also grow
>   to encompass the previously proposed BIG_BG.
>


here is a list of fields we plan to use for the 64bit support, they must be
zero on file systems without the EXT4_FEATURE_INCOMPAT_64BIT.

struct ext4_super_block
{
	/* at offset 0xfe */
	__le32	s_desc_size;		/* Group descriptor size */
	/* at offset 0x150 */
	__le32	s_blocks_count_hi;	/* Blocks count */
	__le32	s_r_blocks_count_hi;	/* Reserved blocks count */
	__le32	s_free_blocks_count_hi;	/* Free blocks count */
	__le32	s_jnl_blocks_hi[17];	/* Backup of the journal inode */
};

struct ext4_group_desc
{
	/* at offset 0x20 */
	__le32	bg_block_bitmap;	/* Blocks bitmap block hi bits */
	__le32	bg_inode_bitmap;	/* Inodes bitmap block hi bits */
	__le32	bg_inode_table;		/* Inodes table block hi bits */
	__le16	bg_free_blocks_count;	/* Free blocks count hi bits */
	__le16	bg_free_inodes_count;	/* Free inodes count hi bits */
	__le16	bg_used_dirs_count;	/* Directories count hi bits */
};

basically, we make 64bit all block numbers and we double the size of all
xxx_count in the block group descriptor.

> - EXT4_FEATURE_RO_COMPAT_HUGE_FILE (0x0008) - change i_blocks to be
>   in units of s_blocksize units instead of 512-byte sectors, use
>   l_i_frag and l_i_fsize as i_blocks_hi (could also be part of 64BIT).
>   Also uses EXT4_HUGE_FILE_FL 0x40000 for i_flags.
> 
> - EXT4_FEATURE_RO_COMPAT_GDT_CSUM (0x0010?) - store a crc16 checksum in
>   the group descriptor (s_uuid[16] | __u32 group | ext3_group_desc
>   (excluding gd_checksum itself)).  This allows the kernel to more safely
>   manage UNINIT groups.  Incomplete patch, e2fsck support mostly done.
> 
> - EXT4_FEATURE_RO_COMPAT_DIR_NLINK (0x0020?) - allow directories to have >
>   65000 subdirectories (i_nlinks) by setting i_nlinks = 1 for such
>   directories.  RO_COMPAT protects old filesystems from unlinking such
>   directories incorrectly and losing all files therein.  Needs RO_COMPAT
>   flag handling, needs e2fsck support, but very heavily tested.
> 
> - EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE (0x0040?) - add s_min_extra_isize and
>   s_want_extra_isize fields to superblock, which allow specifying
>   the minimum and desired i_extra_isize fields in large inodes
>   (for nsec+epoch timestamps, potential other uses).  Needs RO_COMPAT
>   flag handling, needs e2fsck support, patch complete, little testing.
> 
> 
> I'm not sure about the state of HUGE_FILE (it might be useful for ext[23]
> to allow larger sparse files, but also fits quite well with INCOMPAT_64BIT),
> but the others are definitely useful independent from INCOMPAT_64BIT.
> There are patches in various states of completion for all of the features.
> 

i agree, this should go with INCOMPAT_64BIT.

There's also the change attribute patch; it currently uses the l_i_reserved2
field of the inode:

-                       __u32   l_i_reserved2;
+                       __le32  l_i_change_attribute;

-#define i_reserved2    osd2.linux2.l_i_reserved2
+#define i_chattr       osd2.linux2.l_i_change_attribute

It doesn't need RO_COMPAT/INCOMPAT flag because there are no incompatibility
issues with kernels that do not support the change attribute but that mount
file systems that have used it. Also it doesn't really need changes in fsck.

cheers,

-- Alexandre

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext4 compat flag assignments
  2006-09-28  8:55 ` Alexandre Ratchov
@ 2006-09-28 20:29   ` Andi Kleen
  2006-09-28 22:41     ` Andreas Dilger
  2006-09-28 23:06   ` Andreas Dilger
  2006-10-04 20:04   ` Theodore Tso
  2 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2006-09-28 20:29 UTC (permalink / raw)
  To: Alexandre Ratchov; +Cc: Theodore Ts'o, linux-ext4

Alexandre Ratchov <alexandre.ratchov@bull.net> writes:

> here is a list of fields we plan to use for the 64bit support, they must be
> zero on file systems without the EXT4_FEATURE_INCOMPAT_64BIT.
> 
> struct ext4_super_block
> {
> 	/* at offset 0xfe */
> 	__le32	s_desc_size;		/* Group descriptor size */
> 	/* at offset 0x150 */
> 	__le32	s_blocks_count_hi;	/* Blocks count */
> 	__le32	s_r_blocks_count_hi;	/* Reserved blocks count */
> 	__le32	s_free_blocks_count_hi;	/* Free blocks count */
> 	__le32	s_jnl_blocks_hi[17];	/* Backup of the journal inode */
> };
> 
> struct ext4_group_desc
> {
> 	/* at offset 0x20 */
> 	__le32	bg_block_bitmap;	/* Blocks bitmap block hi bits */
> 	__le32	bg_inode_bitmap;	/* Inodes bitmap block hi bits */
> 	__le32	bg_inode_table;		/* Inodes table block hi bits */
> 	__le16	bg_free_blocks_count;	/* Free blocks count hi bits */
> 	__le16	bg_free_inodes_count;	/* Free inodes count hi bits */
> 	__le16	bg_used_dirs_count;	/* Directories count hi bits */
> };
> 
> basically, we make 64bit all block numbers and we double the size of all
> xxx_count in the block group descriptor.

When you do this have you considered at least reserving fields in the 
new 64bit indirect blocks for checksums for each block? 

IMHO it would be a great advantage to checksum all metadata 
(as demonstrated by ZFS) and CPU cycles are cheap enough now that it is 
basically free.

The checksums could be different feature flags, but it would be useful
to reserve space in any new format. 16 byte free on each block should be enough.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext4 compat flag assignments
  2006-09-28 20:29   ` Andi Kleen
@ 2006-09-28 22:41     ` Andreas Dilger
  2006-09-28 23:06       ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Andreas Dilger @ 2006-09-28 22:41 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Alexandre Ratchov, Theodore Ts'o, linux-ext4

On Sep 28, 2006  22:29 +0200, Andi Kleen wrote:
> Alexandre Ratchov <alexandre.ratchov@bull.net> writes:
> > struct ext4_group_desc
> > {
> > 	/* at offset 0x20 */
> > 	__le32	bg_block_bitmap;	/* Blocks bitmap block hi bits */
> > 	__le32	bg_inode_bitmap;	/* Inodes bitmap block hi bits */
> > 	__le32	bg_inode_table;		/* Inodes table block hi bits */
> > 	__le16	bg_free_blocks_count;	/* Free blocks count hi bits */
> > 	__le16	bg_free_inodes_count;	/* Free inodes count hi bits */
> > 	__le16	bg_used_dirs_count;	/* Directories count hi bits */
> > };
> > 
> > basically, we make 64bit all block numbers and we double the size of all
> > xxx_count in the block group descriptor.
> 
> When you do this have you considered at least reserving fields in the 
> new 64bit indirect blocks for checksums for each block? 
>
> IMHO it would be a great advantage to checksum all metadata 
> (as demonstrated by ZFS) and CPU cycles are cheap enough now that it is 
> basically free.

Actually, there are several plans afoot in that direction already.
Some of them need at least some help in the "finish up and get it
into the kernel" department, some of them are just ideas previously
discussed..

One of the reason for Alexandre pushing the 64-bit inode/block counters
into the "large" descriptor is because the 64-bit filesystem is already
incompatible with a 32-bit filesystem so there is no extra harm, and this
leaves space in the "original" group descriptor for checksums of the block
and inode bitmaps.  The bitmap checksums are a critical single-point-of-
failure, and having checksums allows the kernel to avoid cascading
filesystem corruption even if it can't (yet) do anything about it.
Having the checksums in the "original" group descriptor allows this
feature to be used on both 32-bit and 64-bit filesystems.

No work has been done on this yet.  Getting checksums to be efficient
depends on having a generic callback mechanism from the journal code
to avoid repeated checksums on a block while it is being modified.
The journal callback would do the checksum exactly once for each block
(or sub-structure therein) at checkpoint time.

A second change is to add checksums to the ext3 journal commit blocks
(per U. Wisconsin) to avoid need for 2-phase commit for transactions,
and to provide redundancy.  Patches for the kernel and e2fsck are
available for that already (not 100% sure if I posted them here).

Checksums for the group descriptors themselves, to allow mke2fs
and the kernel to handle "uninitialized groups".  This means that mke2fs
doesn't need to zero the block/inode bitmaps and inode table, and the
kernel can selectively initialize the inode tables to avoid the need to
read all of them during e2fsck time.  The checksum is a safety check on
the group descriptor flags, as well as providing normal corruption detection.
Patches for the kernel and e2fsck are in early prototype and were posted
about a week ago.

Finally, the extents format has the capability (though no code is implemented
for this yet) to store a checksum in each index and extent block.  This
would be done by reducing the count of allowed entries in the block and
storing an ext3_extent_tail (checksum, inode+generation backpointer) as
the last entry in the block.  No work has been done on this, but I've
described the ext3_extent_tail a few times previously on this list.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext4 compat flag assignments
  2006-09-28 22:41     ` Andreas Dilger
@ 2006-09-28 23:06       ` Andi Kleen
  2006-10-02  4:34         ` Andreas Dilger
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2006-09-28 23:06 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Alexandre Ratchov, Theodore Ts'o, linux-ext4


> Actually, there are several plans afoot in that direction already.
> Some of them need at least some help in the "finish up and get it
> into the kernel" department, some of them are just ideas previously
> discussed..

The important part right now is to just keep enough space in all 
structures that are being changed anyways.

> 
> One of the reason for Alexandre pushing the 64-bit inode/block counters
> into the "large" descriptor is because the 64-bit filesystem is already
> incompatible with a 32-bit filesystem so there is no extra harm, and this
> leaves space in the "original" group descriptor for checksums of the block
> and inode bitmaps.  The bitmap checksums are a critical single-point-of-
> failure, and having checksums allows the kernel to avoid cascading
> filesystem corruption even if it can't (yet) do anything about it.
> Having the checksums in the "original" group descriptor allows this
> feature to be used on both 32-bit and 64-bit filesystems.

Ok.

> No work has been done on this yet.  Getting checksums to be efficient
> depends on having a generic callback mechanism from the journal code
> to avoid repeated checksums on a block while it is being modified.

You can just do incremental checksumming which is very cheap. 

Or did you mean the flushing to disk of the checksum? If it's always in the same
object that would be free, but that is not possible for bitmaps at least.
But I guess the checksum write in the block descriptor 
could be done very lazily at least, perhaps keeping track on disk if invalid
checksums are expected or not.

> Finally, the extents format has the capability (though no code is implemented
> for this yet) to store a checksum in each index and extent block.  This
> would be done by reducing the count of allowed entries in the block and
> storing an ext3_extent_tail (checksum, inode+generation backpointer) as
> the last entry in the block.  No work has been done on this, but I've
> described the ext3_extent_tail a few times previously on this list.

Old style indirect blocks will need them too. My thinking was
to use another block for those (so a indirect block would be two nearby
blocks) 

Inodes need them, but with the inode extension that will be hopefully
not a problem to keep a few bytes for this.

And directories, which should be relatively easy to extend with
the current format.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext4 compat flag assignments
  2006-09-28  8:55 ` Alexandre Ratchov
  2006-09-28 20:29   ` Andi Kleen
@ 2006-09-28 23:06   ` Andreas Dilger
  2006-10-04 20:04   ` Theodore Tso
  2 siblings, 0 replies; 11+ messages in thread
From: Andreas Dilger @ 2006-09-28 23:06 UTC (permalink / raw)
  To: Alexandre Ratchov; +Cc: Theodore Ts'o, linux-ext4

On Sep 28, 2006  10:55 +0200, Alexandre Ratchov wrote:
> here is a list of fields we plan to use for the 64bit support, they must be
> zero on file systems without the EXT4_FEATURE_INCOMPAT_64BIT.
> 
> struct ext4_super_block
> {
> 	/* at offset 0xfe */
> 	__le32	s_desc_size;		/* Group descriptor size */

I believe this is actually a __u16 and not __u32.  The group descriptor
can't be larger than a filesystem block anyways.  Formerly called
s_reserved_word_pad.

> > - EXT4_FEATURE_RO_COMPAT_GDT_CSUM (0x0010?) - store a crc16 checksum in
> >   the group descriptor (s_uuid[16] | __u32 group | ext3_group_desc
> >   (excluding gd_checksum itself)).  This allows the kernel to more safely
> >   manage UNINIT groups.  Incomplete patch, e2fsck support mostly done.

 struct ext3_group_desc
 {
        __le32  bg_block_bitmap;                /* Blocks bitmap block */
        __le32  bg_inode_bitmap;                /* Inodes bitmap block */
        __le32  bg_inode_table;         /* Inodes table block */
        __le16  bg_free_blocks_count;   /* Free blocks count */
        __le16  bg_free_inodes_count;   /* Free inodes count */
        __le16  bg_used_dirs_count;     /* Directories count */
-       __u16   bg_pad;
-       __le32  bg_reserved[3];
+       __le16  bg_flags;
+       __le32  bg_reserved[2];
+       __le16  bg_itable_unused;       /*Unused inodes count*/
+       __le16  bg_checksum;		/*crc16(s_uuid+group_num+group_desc)*/
 };

> > - EXT4_FEATURE_RO_COMPAT_DIR_NLINK (0x0020?) - allow directories to have >
> >   65000 subdirectories (i_nlinks) by setting i_nlinks = 1 for such
> >   directories.  RO_COMPAT protects old filesystems from unlinking such
> >   directories incorrectly and losing all files therein.  Needs RO_COMPAT
> >   flag handling, needs e2fsck support, but very heavily tested.

No extra fields needed, just compat.  Bumps EXT3_LINK_MAX to 65000.

> > - EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE (0x0040?) - add s_min_extra_isize and
> >   s_want_extra_isize fields to superblock, which allow specifying
> >   the minimum and desired i_extra_isize fields in large inodes
> >   (for nsec+epoch timestamps, potential other uses).  Needs RO_COMPAT
> >   flag handling, needs e2fsck support, patch complete, little testing.

No patch yet which uses s_*_extra_isize, they can go in next available slots.
struct ext3_inode {
        } osd2;                         /* OS dependent 2 */
        __le16  i_extra_isize;
        __le16  i_pad1;
        __le32  i_ctime_extra;  /* extra Change time      (nsec << 2 | epoch) */
        __le32  i_mtime_extra;  /* extra Modification time(nsec << 2 | epoch) */
        __le32  i_atime_extra;  /* extra Access time      (nsec << 2 | epoch) */
        __le32  i_extra_reserved1;
}

> There's also the change attribute patch; it currently uses the l_i_reserved2
> field of the inode:
> 
> -                       __u32   l_i_reserved2;
> +                       __le32  l_i_change_attribute;
> 
> -#define i_reserved2    osd2.linux2.l_i_reserved2
> +#define i_chattr       osd2.linux2.l_i_change_attribute
> 
> It doesn't need RO_COMPAT/INCOMPAT flag because there are no incompatibility
> issues with kernels that do not support the change attribute but that mount
> file systems that have used it. Also it doesn't really need changes in fsck.

Did we decide if l_i_change_attribute would also be the ctime nsec value?
That would affect the RO_COMPAT_EXTRA_ISIZE implementation above, putting
the i_ctime_extra in place of l_i_reserved2.  That doesn't change the
patch significantly, though it does need the "always increment" change.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext4 compat flag assignments
  2006-09-28 23:06       ` Andi Kleen
@ 2006-10-02  4:34         ` Andreas Dilger
  0 siblings, 0 replies; 11+ messages in thread
From: Andreas Dilger @ 2006-10-02  4:34 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Alexandre Ratchov, Theodore Ts'o, linux-ext4

On Sep 29, 2006  01:06 +0200, Andi Kleen wrote:
> Andreas Dilger wrote:
> > No work has been done on this yet.  Getting checksums to be efficient
> > depends on having a generic callback mechanism from the journal code
> > to avoid repeated checksums on a block while it is being modified.
> 
> You can just do incremental checksumming which is very cheap. 
> 
> Or did you mean the flushing to disk of the checksum?  If it's always in
> the same object that would be free, but that is not possible for bitmaps
> at least.  But I guess the checksum write in the block descriptor 
> could be done very lazily at least, perhaps keeping track on disk if invalid
> checksums are expected or not.

I'm not sure I understand what you mean.  My goal is that the ext4 code
modifies the block as many times as it wants during a transaction (this
may happen from multiple threads for a single block), then just before
the transaction is committed to disk the journal calls a callback for that
block (inode, group descriptor, bitmap, superblock, extent, index, etc) and 
computes the checksum only once for that block.  Then the block is flushed
to filesystem.

I'm not sure I like the idea of writing "this block doesn't have a valid
checksum" to disk, since there is some risk of that block being corrupted
during a crash and then we don't know if the block is valid or not.

> > Finally, the extents format has the capability (though no code is
> > implemented for this yet) to store a checksum in each index and extent
> > block... storing an ext3_extent_tail (checksum, inode+generation
> > backpointer) as the last entry in the block.
> 
> Old style indirect blocks will need them too. My thinking was
> to use another block for those (so a indirect block would be two nearby
> blocks) 

We couldn't do this for the existing indirect blocks easily, but what I'd
thought is that it is possible to either have e2fsck convert block-mapped
files to extent mapped (with extent tail of checksum + inode backpointer)
or have a new block-mapped extent (for fragmented files), which would also
have a header with magic (so that random garbage in a large filesystem
doesn't look like a valid [dt]indirect block) and also have the extent
tail to contain the checksum + inode backpointer.

> Inodes need them, but with the inode extension that will be hopefully
> not a problem to keep a few bytes for this.

Yes, it might even be valuable to put this into the "small" inode so
that it can be used for existing ext3 filesystems.

> And directories, which should be relatively easy to extend with
> the current format.

Haven't thought about that specifically for directories, but I do have
some ideas about enhancing the directory format to allow storing more
data into the dir_entries (e.g. 64-bit inode) and possibly using the
same code to store a tree of EAs in the same format as directories, so
the htree code can be used to do lookups if there are lots of EAs.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext4 compat flag assignments
  2006-09-28  8:55 ` Alexandre Ratchov
  2006-09-28 20:29   ` Andi Kleen
  2006-09-28 23:06   ` Andreas Dilger
@ 2006-10-04 20:04   ` Theodore Tso
  2006-10-05  0:19     ` Andreas Dilger
  2006-10-06 13:33     ` Valerie Clement
  2 siblings, 2 replies; 11+ messages in thread
From: Theodore Tso @ 2006-10-04 20:04 UTC (permalink / raw)
  To: Alexandre Ratchov; +Cc: Andreas Dilger, linux-ext4

On Thu, Sep 28, 2006 at 10:55:15AM +0200, Alexandre Ratchov wrote:
> struct ext4_super_block
> {
> 	/* at offset 0xfe */
> 	__le32	s_desc_size;		/* Group descriptor size */
> 	/* at offset 0x150 */
> 	__le32	s_blocks_count_hi;	/* Blocks count */
> 	__le32	s_r_blocks_count_hi;	/* Reserved blocks count */
> 	__le32	s_free_blocks_count_hi;	/* Free blocks count */
> 	__le32	s_jnl_blocks_hi[17];	/* Backup of the journal inode */
> };

Why do we need to have the high blocks # of the journal inode.
s_jnl_blocks was just a backup of the i_blocks[] array.  But if we are
assuming that we will only support 64-bits using extents, we shouldn't
need s_jnl_blocks_hi[].  How specifically is this array being used in
the patches?


						- Ted


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext4 compat flag assignments
  2006-10-04 20:04   ` Theodore Tso
@ 2006-10-05  0:19     ` Andreas Dilger
  2006-10-05  2:02       ` Theodore Tso
  2006-10-06 13:33     ` Valerie Clement
  1 sibling, 1 reply; 11+ messages in thread
From: Andreas Dilger @ 2006-10-05  0:19 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Alexandre Ratchov, linux-ext4

On Oct 04, 2006  16:04 -0400, Theodore Tso wrote:
> On Thu, Sep 28, 2006 at 10:55:15AM +0200, Alexandre Ratchov wrote:
> > struct ext4_super_block
> > {
> > 	/* at offset 0xfe */
> > 	__le32	s_desc_size;		/* Group descriptor size */
> > 	/* at offset 0x150 */
> > 	__le32	s_blocks_count_hi;	/* Blocks count */
> > 	__le32	s_r_blocks_count_hi;	/* Reserved blocks count */
> > 	__le32	s_free_blocks_count_hi;	/* Free blocks count */
> > 	__le32	s_jnl_blocks_hi[17];	/* Backup of the journal inode */
> > };
> 
> Why do we need to have the high blocks # of the journal inode.
> s_jnl_blocks was just a backup of the i_blocks[] array.  But if we are
> assuming that we will only support 64-bits using extents, we shouldn't
> need s_jnl_blocks_hi[].  How specifically is this array being used in
> the patches?

Good question, I don't know that it is.  Even if the journal was extent
mapped (possible, but would need support in e2fsprogs for this) the
data would be stored in the same sized i_blocks array.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext4 compat flag assignments
  2006-10-05  0:19     ` Andreas Dilger
@ 2006-10-05  2:02       ` Theodore Tso
  0 siblings, 0 replies; 11+ messages in thread
From: Theodore Tso @ 2006-10-05  2:02 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Alexandre Ratchov, linux-ext4

On Wed, Oct 04, 2006 at 06:19:06PM -0600, Andreas Dilger wrote:
> Good question, I don't know that it is.  Even if the journal was extent
> mapped (possible, but would need support in e2fsprogs for this) the
> data would be stored in the same sized i_blocks array.

That won't be too hard.  The e2fsprogs code is designed to be
identical to the kernel code, so we just drop in a new version of
fs/ext3/recovery.c that understands extents into e2fsck/recovery.c,
and e2fsprogs will have support.  :-)

						- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: ext4 compat flag assignments
  2006-10-04 20:04   ` Theodore Tso
  2006-10-05  0:19     ` Andreas Dilger
@ 2006-10-06 13:33     ` Valerie Clement
  1 sibling, 0 replies; 11+ messages in thread
From: Valerie Clement @ 2006-10-06 13:33 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Andreas Dilger, linux-ext4

Theodore Tso wrote:
> On Thu, Sep 28, 2006 at 10:55:15AM +0200, Alexandre Ratchov wrote:
>> struct ext4_super_block
>> {
>> 	/* at offset 0xfe */
>> 	__le32	s_desc_size;		/* Group descriptor size */
>> 	/* at offset 0x150 */
>> 	__le32	s_blocks_count_hi;	/* Blocks count */
>> 	__le32	s_r_blocks_count_hi;	/* Reserved blocks count */
>> 	__le32	s_free_blocks_count_hi;	/* Free blocks count */
>> 	__le32	s_jnl_blocks_hi[17];	/* Backup of the journal inode */
>> };
> 
> Why do we need to have the high blocks # of the journal inode.
> s_jnl_blocks was just a backup of the i_blocks[] array.  But if we are
> assuming that we will only support 64-bits using extents, we shouldn't
> need s_jnl_blocks_hi[].  How specifically is this array being used in
> the patches?

The s_jnl_blocks_hi[] array is not used in the current patchset.
Alexandre wanted to reserve these fields for a future use, for instance 
to support larger inode sizes.
As we'll not use them in the short term and we'll still need to think 
about that, you can remove this array.

Regards,
   Valérie

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-10-06 13:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-22  9:15 ext4 compat flag assignments Andreas Dilger
2006-09-28  8:55 ` Alexandre Ratchov
2006-09-28 20:29   ` Andi Kleen
2006-09-28 22:41     ` Andreas Dilger
2006-09-28 23:06       ` Andi Kleen
2006-10-02  4:34         ` Andreas Dilger
2006-09-28 23:06   ` Andreas Dilger
2006-10-04 20:04   ` Theodore Tso
2006-10-05  0:19     ` Andreas Dilger
2006-10-05  2:02       ` Theodore Tso
2006-10-06 13:33     ` Valerie Clement

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.