linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Inline data with 128-byte inodes?
@ 2020-04-14  7:02 Josh Triplett
  2020-04-22 16:00 ` Jan Kara
  0 siblings, 1 reply; 6+ messages in thread
From: Josh Triplett @ 2020-04-14  7:02 UTC (permalink / raw)
  To: linux-ext4

Is there a fundamental reason that ext4 *can't* or *shouldn't* support
inline data with 128-byte inodes?

As far as I can tell, the kernel ext4 implementation only allows inline
data with 256-byte or larger inodes, because it requires the system.data
xattr to exist, even if the actual data requires 60 bytes or less. (The
implementation in debugfs, on the other hand, handles inline data in
128-byte inodes just fine. And it seems like it'd be fairly
straightforward to change the kernel implementation to support it as
well.)

For filesystems that don't need to store xattrs in general, and can live
with the other limitations of 128-byte inodes, using a 128-byte inode
can save substantial space compared to a 256-byte inode (many megabytes
worth of inode tables, versus 4k for each file between 61-160 bytes),
and many small files or small directories would still fit in 60 bytes.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inline data with 128-byte inodes?
  2020-04-14  7:02 Inline data with 128-byte inodes? Josh Triplett
@ 2020-04-22 16:00 ` Jan Kara
  2020-04-22 20:15   ` Andreas Dilger
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2020-04-22 16:00 UTC (permalink / raw)
  To: Josh Triplett; +Cc: linux-ext4

On Tue 14-04-20 00:02:07, Josh Triplett wrote:
> Is there a fundamental reason that ext4 *can't* or *shouldn't* support
> inline data with 128-byte inodes?

Well, where would we put it on disk? ext4 on-disk inode fills 128-bytes
with 'osd2' union...

Or do you mean we should put inline data in an external xattr block?

								Honza

> As far as I can tell, the kernel ext4 implementation only allows inline
> data with 256-byte or larger inodes, because it requires the system.data
> xattr to exist, even if the actual data requires 60 bytes or less. (The
> implementation in debugfs, on the other hand, handles inline data in
> 128-byte inodes just fine. And it seems like it'd be fairly
> straightforward to change the kernel implementation to support it as
> well.)
> 
> For filesystems that don't need to store xattrs in general, and can live
> with the other limitations of 128-byte inodes, using a 128-byte inode
> can save substantial space compared to a 256-byte inode (many megabytes
> worth of inode tables, versus 4k for each file between 61-160 bytes),
> and many small files or small directories would still fit in 60 bytes.
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inline data with 128-byte inodes?
  2020-04-22 16:00 ` Jan Kara
@ 2020-04-22 20:15   ` Andreas Dilger
  2020-04-23  0:40     ` Josh Triplett
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Dilger @ 2020-04-22 20:15 UTC (permalink / raw)
  To: Jan Kara, Josh Triplett; +Cc: Ext4 Developers List

[-- Attachment #1: Type: text/plain, Size: 2463 bytes --]

On Apr 22, 2020, at 10:00 AM, Jan Kara <jack@suse.cz> wrote:
> 
> On Tue 14-04-20 00:02:07, Josh Triplett wrote:
>> Is there a fundamental reason that ext4 *can't* or *shouldn't* support
>> inline data with 128-byte inodes?
> 
> Well, where would we put it on disk? ext4 on-disk inode fills 128-bytes
> with 'osd2' union...

There are 60 bytes in the "i_block" field that can be used by inline_data.

> Or do you mean we should put inline data in an external xattr block?

Using an 4KB xattr block would IMHO be worse than just using a regular
file block for such files, if they don't fit into the 60 i_block bytes.
That makes the data handling more complex (data copies each time in/out
of the xattr) and has performance impact (all writes essentially data
journal because they go via the setxattr code path.

The only time it _might_ be useful is if there are other xattrs that are
shared in the external block with inline_data.  However, at that point
I think you are better off to just create larger inodes to hold the
xattrs to avoid the seeking needed to load the external block...

Given the prevalence of xattrs today (SELinux springs to mind), I'd be
surprised whether this combination shows any improvement in real life,
but I don't have an _objection_ to allowing this combination (e.g. for
ultra-compact /etc or boot filesystem images.

Maybe there is a bigger win for small directories avoiding 4KB leaf blocks?

That said, I'd be happy to see some numbers to show this is a win, and
I'm definitely not _against_ allowing this to work if there is a use for it.

Cheers, Andreas

>> As far as I can tell, the kernel ext4 implementation only allows inline
>> data with 256-byte or larger inodes, because it requires the system.data
>> xattr to exist, even if the actual data requires 60 bytes or less. (The
>> implementation in debugfs, on the other hand, handles inline data in
>> 128-byte inodes just fine. And it seems like it'd be fairly
>> straightforward to change the kernel implementation to support it as
>> well.)
>> 
>> For filesystems that don't need to store xattrs in general, and can live
>> with the other limitations of 128-byte inodes, using a 128-byte inode
>> can save substantial space compared to a 256-byte inode (many megabytes
>> worth of inode tables, versus 4k for each file between 61-160 bytes),
>> and many small files or small directories would still fit in 60 bytes.
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inline data with 128-byte inodes?
  2020-04-22 20:15   ` Andreas Dilger
@ 2020-04-23  0:40     ` Josh Triplett
  2020-04-23 11:56       ` Jan Kara
  2020-05-04 22:52       ` Andreas Dilger
  0 siblings, 2 replies; 6+ messages in thread
From: Josh Triplett @ 2020-04-23  0:40 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Jan Kara, Ext4 Developers List

On Wed, Apr 22, 2020 at 02:15:28PM -0600, Andreas Dilger wrote:
> On Apr 22, 2020, at 10:00 AM, Jan Kara <jack@suse.cz> wrote:
> > On Tue 14-04-20 00:02:07, Josh Triplett wrote:
> >> Is there a fundamental reason that ext4 *can't* or *shouldn't* support
> >> inline data with 128-byte inodes?
> > 
> > Well, where would we put it on disk? ext4 on-disk inode fills 128-bytes
> > with 'osd2' union...
> 
> There are 60 bytes in the "i_block" field that can be used by inline_data.

Exactly. But the Linux ext4 implementation doesn't accept inline data
unless the system.data xattr exists, even if the file's data fits in 60
bytes (in which case system.data must exist and have 0 length).

> > Or do you mean we should put inline data in an external xattr block?

Definitely not, no. That seems much more complex to deal with.

I'm only talking about the case of files or directories <= 60 bytes
fitting in the inode with 128-byte inodes. Effectively, this would mean
accepting inodes with the inline data flag set, whether they have the
system.data xattr or not.

> Maybe there is a bigger win for small directories avoiding 4KB leaf blocks?
>
> That said, I'd be happy to see some numbers to show this is a win, and
> I'm definitely not _against_ allowing this to work if there is a use for it.

Some statistics, for ext4 with 4k blocks and 128-byte inodes, if 60-byte
inline data worked with 128-byte inodes:

A filesystem containing the source code of the Linux kernel would
save about 1508 disk blocks, or around 6032k.

A filesystem containing only my /etc directory would save about 650
blocks, or 2600k, a substantial fraction of the entire directory (which
takes up 9004k total without inline data).

- Josh Triplett

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inline data with 128-byte inodes?
  2020-04-23  0:40     ` Josh Triplett
@ 2020-04-23 11:56       ` Jan Kara
  2020-05-04 22:52       ` Andreas Dilger
  1 sibling, 0 replies; 6+ messages in thread
From: Jan Kara @ 2020-04-23 11:56 UTC (permalink / raw)
  To: Josh Triplett; +Cc: Andreas Dilger, Jan Kara, Ext4 Developers List

On Wed 22-04-20 17:40:33, Josh Triplett wrote:
> On Wed, Apr 22, 2020 at 02:15:28PM -0600, Andreas Dilger wrote:
> > On Apr 22, 2020, at 10:00 AM, Jan Kara <jack@suse.cz> wrote:
> > > On Tue 14-04-20 00:02:07, Josh Triplett wrote:
> > >> Is there a fundamental reason that ext4 *can't* or *shouldn't* support
> > >> inline data with 128-byte inodes?
> > > 
> > > Well, where would we put it on disk? ext4 on-disk inode fills 128-bytes
> > > with 'osd2' union...
> > 
> > There are 60 bytes in the "i_block" field that can be used by inline_data.
> 
> Exactly. But the Linux ext4 implementation doesn't accept inline data
> unless the system.data xattr exists, even if the file's data fits in 60
> bytes (in which case system.data must exist and have 0 length).

I see now I understand what you meant. Thanks for explanation.

> > Maybe there is a bigger win for small directories avoiding 4KB leaf blocks?
> >
> > That said, I'd be happy to see some numbers to show this is a win, and
> > I'm definitely not _against_ allowing this to work if there is a use for it.
> 
> Some statistics, for ext4 with 4k blocks and 128-byte inodes, if 60-byte
> inline data worked with 128-byte inodes:
> 
> A filesystem containing the source code of the Linux kernel would
> save about 1508 disk blocks, or around 6032k.
> 
> A filesystem containing only my /etc directory would save about 650
> blocks, or 2600k, a substantial fraction of the entire directory (which
> takes up 9004k total without inline data).

I guess few people care about a few megabytes these days... For really
space sensitive applications, people don't pick ext4 as a base filesystem
I'd guess (I'd expect squashfs, erofs, or ubifs if you need write access).
So the benefit is relatively small, the question about the cost is - how
complicated it gets to support inline data without xattrs?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Inline data with 128-byte inodes?
  2020-04-23  0:40     ` Josh Triplett
  2020-04-23 11:56       ` Jan Kara
@ 2020-05-04 22:52       ` Andreas Dilger
  1 sibling, 0 replies; 6+ messages in thread
From: Andreas Dilger @ 2020-05-04 22:52 UTC (permalink / raw)
  To: Josh Triplett; +Cc: Jan Kara, Ext4 Developers List

[-- Attachment #1: Type: text/plain, Size: 8992 bytes --]

On Apr 22, 2020, at 6:40 PM, Josh Triplett <josh@joshtriplett.org> wrote:
> 
> On Wed, Apr 22, 2020 at 02:15:28PM -0600, Andreas Dilger wrote:
>> On Apr 22, 2020, at 10:00 AM, Jan Kara <jack@suse.cz> wrote:
>>> On Tue 14-04-20 00:02:07, Josh Triplett wrote:
>>>> Is there a fundamental reason that ext4 *can't* or *shouldn't* support
>>>> inline data with 128-byte inodes?
>>> 
>>> Well, where would we put it on disk? ext4 on-disk inode fills 128-bytes
>>> with 'osd2' union...
>> 
>> There are 60 bytes in the "i_block" field that can be used by inline_data.
> 
> Exactly. But the Linux ext4 implementation doesn't accept inline data
> unless the system.data xattr exists, even if the file's data fits in 60
> bytes (in which case system.data must exist and have 0 length).
> 
>>> Or do you mean we should put inline data in an external xattr block?
> 
> Definitely not, no. That seems much more complex to deal with.
> 
> I'm only talking about the case of files or directories <= 60 bytes
> fitting in the inode with 128-byte inodes. Effectively, this would mean
> accepting inodes with the inline data flag set, whether they have the
> system.data xattr or not.
> 
>> Maybe there is a bigger win for small directories avoiding 4KB leaf blocks?
>> 
>> That said, I'd be happy to see some numbers to show this is a win, and
>> I'm definitely not _against_ allowing this to work if there is a use for it.
> 
> Some statistics, for ext4 with 4k blocks and 128-byte inodes, if 60-byte
> inline data worked with 128-byte inodes:
> 
> A filesystem containing the source code of the Linux kernel would
> save about 1508 disk blocks, or around 6032k.
> 
> A filesystem containing only my /etc directory would save about 650
> blocks, or 2600k, a substantial fraction of the entire directory (which
> takes up 9004k total without inline data).

Hi Josh,
I started looking into this, and got half-way through a patch before
getting pulled into something else.  If you are interested to finish
it off, I've included it here, but it definitely isn't ready yet.

Looking at the details, it isn't just an issue of changing a single
threshold to decide whether 128-byte inodes can handle inline data
or not.  The code is intermingled between "have system.data xattr"
as the defining factor and using "inline_off" (which points to the
byte offset of the system.data contents) to determine whether there
is inline data or not.

It _should_ be possible to change to using EXT4_STATE_MAY_INLINE_DATA
and EXT4_INLINE_DATA_FL to determine if there is any inline data in
the inode, and use "inline_off" and i_extra_isize only to detect if
any data is in the xattr or not.  It doesn't need to check the xattrs
if the inode size is EXT4_GOOD_OLD_INODE_SIZE, since we *don't* want
"inline" data in an external xattr block for sanity/efficiency reasons.

Hopefully this patch is useful for you as a starting point - I suspect
it is about 80% finished, but I wouldn't have time to work on it for a
few weeks at least, since 128-byte inodes is not a useful case for me (we
are using 1024-byte inodes on our metadata filesystems these days).  On
the flip side, I think inline_data is very interesting, and am happy for
it to gain more widespread usage.  I don't think the 128-byte inode support
for inline_data increases complexity, so I wouldn't be against landing it.

Cheers, Andreas
--

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 61b37a0..ec6f51a 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3260,8 +3260,7 @@ extern int ext4_inline_data_fiemap(struct inode *inode,

 static inline int ext4_has_inline_data(struct inode *inode)
 {
-	return ext4_test_inode_flag(inode, EXT4_INODE_INLINE_DATA) &&
-	       EXT4_I(inode)->i_inline_off;
+	return ext4_test_inode_flag(inode, EXT4_INODE_INLINE_DATA);
 }

 /* namei.c */
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index fad82d0..ea9596a 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -20,7 +20,7 @@

 static int ext4_get_inline_size(struct inode *inode)
 {
-	if (EXT4_I(inode)->i_inline_off)
+	if (ext4_get_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA))
 		return EXT4_I(inode)->i_inline_size;

 	return 0;
@@ -94,7 +94,7 @@ int ext4_get_max_inline_size(struct inode *inode)
 	struct ext4_iloc iloc;

 	if (EXT4_I(inode)->i_extra_isize == 0)
-		return 0;
+		return EXT4_MIN_INLINE_DATA_SIZE;

 	error = ext4_get_inode_loc(inode, &iloc);
 	if (error) {
@@ -133,6 +133,11 @@ int ext4_find_inline_data_nolock(struct inode *inode)
 	};
 	int error;

+	if (!ext4_has_feature_inline_data(inode))
+		return 0;
+
+	ext4_set_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
+	EXT4_I(inode)->i_inline_size = EXT4_MIN_INLINE_DATA_SIZE;
 	if (EXT4_I(inode)->i_extra_isize == 0)
 		return 0;

@@ -153,9 +158,8 @@ int ext4_find_inline_data_nolock(struct inode *inode)
 		}
 		EXT4_I(inode)->i_inline_off = (u16)((void *)is.s.here -
 					(void *)ext4_raw_inode(&is.iloc));
-		EXT4_I(inode)->i_inline_size = EXT4_MIN_INLINE_DATA_SIZE +
+		EXT4_I(inode)->i_inline_size +=
 				le32_to_cpu(is.s.here->e_value_size);
-		ext4_set_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
 	}
 out:
 	brelse(is.iloc.bh);
@@ -188,6 +192,7 @@ static int ext4_read_inline_data(struct inode *inode, void *buffer,
 	if (!len)
 		goto out;

+	BUG_ON(EXT4_I(inode)->i_extra_isize == 0);
 	header = IHDR(inode, raw_inode);
 	entry = (struct ext4_xattr_entry *)((void *)raw_inode +
 					    EXT4_I(inode)->i_inline_off);
@@ -219,7 +224,7 @@ static void ext4_write_inline_data(struct inode *inode, struct ext4_iloc *iloc,
 	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
 		return;

-	BUG_ON(!EXT4_I(inode)->i_inline_off);
+	BUG_ON(!ext4_get_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA));
 	BUG_ON(pos + len > EXT4_I(inode)->i_inline_size);

 	raw_inode = ext4_raw_inode(iloc);
@@ -238,6 +243,7 @@ static void ext4_write_inline_data(struct inode *inode, struct ext4_iloc *iloc,
 	if (!len)
 		return;

+	BUG_ON(EXT4_I(inode)->i_extra_isize == 0);
 	pos -= EXT4_MIN_INLINE_DATA_SIZE;
 	header = IHDR(inode, raw_inode);
 	entry = (struct ext4_xattr_entry *)((void *)raw_inode +
@@ -269,6 +275,17 @@ static int ext4_create_inline_data(handle_t *handle,
 	if (error)
 		goto out;

+	EXT4_I(inode)->i_inline_size = EXT4_MIN_INLINE_DATA_SIZE;
+	memset((void *)ext4_raw_inode(&is.iloc)->i_block,
+		0, EXT4_MIN_INLINE_DATA_SIZE);
+	ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
+	ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA);
+
+	if (EXT4_I(inode)->i_extra_isize == 0) {
+		BUG_ON(len > EXT4_MIN_INLINE_DATA_SIZE);
+		goto out_dirty;
+	}
+
 	if (len > EXT4_MIN_INLINE_DATA_SIZE) {
 		value = EXT4_ZERO_XATTR_VALUE;
 		len -= EXT4_MIN_INLINE_DATA_SIZE;
@@ -295,14 +312,11 @@ static int ext4_create_inline_data(handle_t *handle,
 		goto out;
 	}

-	memset((void *)ext4_raw_inode(&is.iloc)->i_block,
-		0, EXT4_MIN_INLINE_DATA_SIZE);
-
 	EXT4_I(inode)->i_inline_off = (u16)((void *)is.s.here -
 				      (void *)ext4_raw_inode(&is.iloc));
-	EXT4_I(inode)->i_inline_size = len + EXT4_MIN_INLINE_DATA_SIZE;
-	ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
-	ext4_set_inode_flag(inode, EXT4_INODE_INLINE_DATA);
+	EXT4_I(inode)->i_inline_size += len;
+
+out_dirty:
 	get_bh(is.iloc.bh);
 	error = ext4_mark_iloc_dirty(handle, inode, &is.iloc);

@@ -804,6 +818,7 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
 						 void **fsdata)
 {
 	int ret = 0, inline_size;
+	bool truncate = false;
 	struct page *page;

 	page = grab_cache_page_write_begin(mapping, 0, flags);
@@ -811,10 +826,8 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
 		return -ENOMEM;

 	down_read(&EXT4_I(inode)->xattr_sem);
-	if (!ext4_has_inline_data(inode)) {
-		ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
-		goto out;
-	}
+	if (!ext4_has_inline_data(inode))
+		goto out_clear;

 	inline_size = ext4_get_inline_size(inode);

@@ -827,23 +840,25 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
 	ret = __block_write_begin(page, 0, inline_size,
 				  ext4_da_get_block_prep);
 	if (ret) {
-		up_read(&EXT4_I(inode)->xattr_sem);
-		unlock_page(page);
-		put_page(page);
-		ext4_truncate_failed_write(inode);
-		return ret;
+		truncate = true;
+		goto out;
 	}

 	SetPageDirty(page);
 	SetPageUptodate(page);
-	ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
 	*fsdata = (void *)CONVERT_INLINE_DATA;

+out_clear:
+	ext4_clear_inode_flag(inode, EXT4_INODE_INLINE_DATA);
+	ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
 out:
 	up_read(&EXT4_I(inode)->xattr_sem);
 	if (page) {
+		if (read)
 		unlock_page(page);
 		put_page(page);
+		if (truncate)
+			ext4_truncate_failed_write(inode);
 	}
 	return ret;
 }


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-05-04 22:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-14  7:02 Inline data with 128-byte inodes? Josh Triplett
2020-04-22 16:00 ` Jan Kara
2020-04-22 20:15   ` Andreas Dilger
2020-04-23  0:40     ` Josh Triplett
2020-04-23 11:56       ` Jan Kara
2020-05-04 22:52       ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).