All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
@ 2019-07-03  7:55 ` Chao Yu
  0 siblings, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-03  7:55 UTC (permalink / raw)
  To: hch, darrick.wong
  Cc: linux-xfs, linux-fsdevel, andreas.gruenbacher, gaoxiang25, chao, Chao Yu

Some filesystems like erofs/reiserfs have the ability to pack tail
data into metadata, e.g.:
IOMAP_MAPPED [0, 8192]
IOMAP_INLINE [8192, 8200]

However current IOMAP_INLINE type has assumption that:
- inline data should be locating at page #0.
- inline size should equal to .i_size
Those restriction fail to convert to use iomap IOMAP_INLINE in erofs,
so this patch tries to relieve above limits to make IOMAP_INLINE more
generic to cover tail-packing case.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
---
 fs/iomap.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 12654c2e78f8..d1c16b692d31 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -264,13 +264,12 @@ static void
 iomap_read_inline_data(struct inode *inode, struct page *page,
 		struct iomap *iomap)
 {
-	size_t size = i_size_read(inode);
+	size_t size = iomap->length;
 	void *addr;
 
 	if (PageUptodate(page))
 		return;
 
-	BUG_ON(page->index);
 	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
 
 	addr = kmap_atomic(page);
@@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	sector_t sector;
 
 	if (iomap->type == IOMAP_INLINE) {
-		WARN_ON_ONCE(pos);
 		iomap_read_inline_data(inode, page, iomap);
 		return PAGE_SIZE;
 	}
-- 
2.18.0.rc1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
@ 2019-07-03  7:55 ` Chao Yu
  0 siblings, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-03  7:55 UTC (permalink / raw)
  To: hch, darrick.wong
  Cc: linux-xfs, linux-fsdevel, andreas.gruenbacher, gaoxiang25, chao, Chao Yu

Some filesystems like erofs/reiserfs have the ability to pack tail
data into metadata, e.g.:
IOMAP_MAPPED [0, 8192]
IOMAP_INLINE [8192, 8200]

However current IOMAP_INLINE type has assumption that:
- inline data should be locating at page #0.
- inline size should equal to .i_size
Those restriction fail to convert to use iomap IOMAP_INLINE in erofs,
so this patch tries to relieve above limits to make IOMAP_INLINE more
generic to cover tail-packing case.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
---
 fs/iomap.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 12654c2e78f8..d1c16b692d31 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -264,13 +264,12 @@ static void
 iomap_read_inline_data(struct inode *inode, struct page *page,
 		struct iomap *iomap)
 {
-	size_t size = i_size_read(inode);
+	size_t size = iomap->length;
 	void *addr;
 
 	if (PageUptodate(page))
 		return;
 
-	BUG_ON(page->index);
 	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
 
 	addr = kmap_atomic(page);
@@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	sector_t sector;
 
 	if (iomap->type == IOMAP_INLINE) {
-		WARN_ON_ONCE(pos);
 		iomap_read_inline_data(inode, page, iomap);
 		return PAGE_SIZE;
 	}
-- 
2.18.0.rc1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-03  7:55 ` Chao Yu
@ 2019-07-08  2:12   ` Chao Yu
  -1 siblings, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-08  2:12 UTC (permalink / raw)
  To: hch, darrick.wong
  Cc: linux-xfs, linux-fsdevel, andreas.gruenbacher, gaoxiang25, chao

Ping, any comments on this patch?

On 2019/7/3 15:55, Chao Yu wrote:
> Some filesystems like erofs/reiserfs have the ability to pack tail
> data into metadata, e.g.:
> IOMAP_MAPPED [0, 8192]
> IOMAP_INLINE [8192, 8200]
> 
> However current IOMAP_INLINE type has assumption that:
> - inline data should be locating at page #0.
> - inline size should equal to .i_size
> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs,
> so this patch tries to relieve above limits to make IOMAP_INLINE more
> generic to cover tail-packing case.
> 
> Signed-off-by: Chao Yu <yuchao0@huawei.com>
> ---
>  fs/iomap.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 12654c2e78f8..d1c16b692d31 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -264,13 +264,12 @@ static void
>  iomap_read_inline_data(struct inode *inode, struct page *page,
>  		struct iomap *iomap)
>  {
> -	size_t size = i_size_read(inode);
> +	size_t size = iomap->length;
>  	void *addr;
>  
>  	if (PageUptodate(page))
>  		return;
>  
> -	BUG_ON(page->index);
>  	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
>  
>  	addr = kmap_atomic(page);
> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	sector_t sector;
>  
>  	if (iomap->type == IOMAP_INLINE) {
> -		WARN_ON_ONCE(pos);
>  		iomap_read_inline_data(inode, page, iomap);
>  		return PAGE_SIZE;
>  	}
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
@ 2019-07-08  2:12   ` Chao Yu
  0 siblings, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-08  2:12 UTC (permalink / raw)
  To: hch, darrick.wong
  Cc: linux-xfs, linux-fsdevel, andreas.gruenbacher, gaoxiang25, chao

Ping, any comments on this patch?

On 2019/7/3 15:55, Chao Yu wrote:
> Some filesystems like erofs/reiserfs have the ability to pack tail
> data into metadata, e.g.:
> IOMAP_MAPPED [0, 8192]
> IOMAP_INLINE [8192, 8200]
> 
> However current IOMAP_INLINE type has assumption that:
> - inline data should be locating at page #0.
> - inline size should equal to .i_size
> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs,
> so this patch tries to relieve above limits to make IOMAP_INLINE more
> generic to cover tail-packing case.
> 
> Signed-off-by: Chao Yu <yuchao0@huawei.com>
> ---
>  fs/iomap.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 12654c2e78f8..d1c16b692d31 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -264,13 +264,12 @@ static void
>  iomap_read_inline_data(struct inode *inode, struct page *page,
>  		struct iomap *iomap)
>  {
> -	size_t size = i_size_read(inode);
> +	size_t size = iomap->length;
>  	void *addr;
>  
>  	if (PageUptodate(page))
>  		return;
>  
> -	BUG_ON(page->index);
>  	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
>  
>  	addr = kmap_atomic(page);
> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	sector_t sector;
>  
>  	if (iomap->type == IOMAP_INLINE) {
> -		WARN_ON_ONCE(pos);
>  		iomap_read_inline_data(inode, page, iomap);
>  		return PAGE_SIZE;
>  	}
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-03  7:55 ` Chao Yu
  (?)
  (?)
@ 2019-07-08 16:03 ` Christoph Hellwig
  2019-07-09 13:52   ` Chao Yu
  -1 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2019-07-08 16:03 UTC (permalink / raw)
  To: Chao Yu
  Cc: hch, darrick.wong, linux-xfs, linux-fsdevel, andreas.gruenbacher,
	gaoxiang25, chao

On Wed, Jul 03, 2019 at 03:55:02PM +0800, Chao Yu wrote:
> Some filesystems like erofs/reiserfs have the ability to pack tail
> data into metadata, e.g.:
> IOMAP_MAPPED [0, 8192]
> IOMAP_INLINE [8192, 8200]
> 
> However current IOMAP_INLINE type has assumption that:
> - inline data should be locating at page #0.
> - inline size should equal to .i_size
> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs,
> so this patch tries to relieve above limits to make IOMAP_INLINE more
> generic to cover tail-packing case.
> 
> Signed-off-by: Chao Yu <yuchao0@huawei.com>

This looks good to me, but I'd also like to see a review and gfs2
testing from Andreas.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-08 16:03 ` Christoph Hellwig
@ 2019-07-09 13:52   ` Chao Yu
  0 siblings, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-09 13:52 UTC (permalink / raw)
  To: andreas.gruenbacher
  Cc: Christoph Hellwig, Chao Yu, darrick.wong, linux-xfs,
	linux-fsdevel, gaoxiang25

On 2019-7-9 0:03, Christoph Hellwig wrote:
> On Wed, Jul 03, 2019 at 03:55:02PM +0800, Chao Yu wrote:
>> Some filesystems like erofs/reiserfs have the ability to pack tail
>> data into metadata, e.g.:
>> IOMAP_MAPPED [0, 8192]
>> IOMAP_INLINE [8192, 8200]
>>
>> However current IOMAP_INLINE type has assumption that:
>> - inline data should be locating at page #0.
>> - inline size should equal to .i_size
>> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs,
>> so this patch tries to relieve above limits to make IOMAP_INLINE more
>> generic to cover tail-packing case.
>>
>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
> 
> This looks good to me, but I'd also like to see a review and gfs2
> testing from Andreas.

Thanks for your reply. :)

Well, so, Andreas, could you please take a look at this patch and do related
test on gfs2 if you have time?

Thanks,

> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-03  7:55 ` Chao Yu
                   ` (2 preceding siblings ...)
  (?)
@ 2019-07-09 23:32 ` Andreas Grünbacher
  2019-07-10 10:30     ` Chao Yu
  -1 siblings, 1 reply; 26+ messages in thread
From: Andreas Grünbacher @ 2019-07-09 23:32 UTC (permalink / raw)
  To: Chao Yu
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs,
	Linux FS-devel Mailing List, Gao Xiang, chao

Hi Chao,

Am Mi., 3. Juli 2019 um 09:55 Uhr schrieb Chao Yu <yuchao0@huawei.com>:
> Some filesystems like erofs/reiserfs have the ability to pack tail
> data into metadata, e.g.:
> IOMAP_MAPPED [0, 8192]
> IOMAP_INLINE [8192, 8200]
>
> However current IOMAP_INLINE type has assumption that:
> - inline data should be locating at page #0.
> - inline size should equal to .i_size
> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs,
> so this patch tries to relieve above limits to make IOMAP_INLINE more
> generic to cover tail-packing case.

this patch suffers from the following problems:

* Fiemap distinguishes between FIEMAP_EXTENT_DATA_INLINE and
FIEMAP_EXTENT_DATA_TAIL. Someone will sooner or later inevitably use
iomap_fiemap on a filesystem with tail packing, so if we don't make
the same distinction in iomap, iomap_fiemap will silently produce the
wrong result. This means that IOMAP_TAIL should become a separate
mapping type.

* IOMAP_INLINE mappings always start at offset 0 and span an entire
page, so they are always page aligned. IOMAP_TAIL mappings only need
to be block aligned. If the block size is smaller than the page size,
a tail page can consist of more than one mapping. So
iomap_read_inline_data needs to be changed to only copy a single block
out of iomap->inline_data, and we need to adjust iomap_write_begin and
iomap_readpage_actor accordingly.

* Functions iomap_read_inline_data, iomap_write_end_inline, and
iomap_dio_inline_actor currently all assume that we are operating on
page 0, and that iomap->inline_data points at the data at offset 0.
That's no longer the case for IOMAP_TAIL mappings. We need to look
only at the offset within the page / block there.

* There are some asserts like the following int he code:

  BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));

Those should probably be tightened to check for block boundaries
instead of page boundaries, i.e. something like:

  BUG_ON(size > i_blocksize(inode) -
offset_in_block(iomap->inline_data, inode));

> Signed-off-by: Chao Yu <yuchao0@huawei.com>
> ---
>  fs/iomap.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 12654c2e78f8..d1c16b692d31 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -264,13 +264,12 @@ static void
>  iomap_read_inline_data(struct inode *inode, struct page *page,
>                 struct iomap *iomap)
>  {
> -       size_t size = i_size_read(inode);
> +       size_t size = iomap->length;

Function iomap_read_inline_data fills the entire page here, not only
the iomap->length part, so this is wrong. But see my comment above
about changing iomap_read_inline_data to read a single block above as
well.

>         void *addr;
>
>         if (PageUptodate(page))
>                 return;
>
> -       BUG_ON(page->index);
>         BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
>
>         addr = kmap_atomic(page);
> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>         sector_t sector;
>
>         if (iomap->type == IOMAP_INLINE) {
> -               WARN_ON_ONCE(pos);
>                 iomap_read_inline_data(inode, page, iomap);
>                 return PAGE_SIZE;
>         }

Those last two changes look right to me.

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-09 23:32 ` Andreas Grünbacher
@ 2019-07-10 10:30     ` Chao Yu
  0 siblings, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-10 10:30 UTC (permalink / raw)
  To: Andreas Grünbacher
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs,
	Linux FS-devel Mailing List, Gao Xiang, chao

Hi Andreas,

Thanks for your review in advance. :)

On 2019/7/10 7:32, Andreas Grünbacher wrote:
> Hi Chao,
> 
> Am Mi., 3. Juli 2019 um 09:55 Uhr schrieb Chao Yu <yuchao0@huawei.com>:
>> Some filesystems like erofs/reiserfs have the ability to pack tail
>> data into metadata, e.g.:
>> IOMAP_MAPPED [0, 8192]
>> IOMAP_INLINE [8192, 8200]
>>
>> However current IOMAP_INLINE type has assumption that:
>> - inline data should be locating at page #0.
>> - inline size should equal to .i_size
>> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs,
>> so this patch tries to relieve above limits to make IOMAP_INLINE more
>> generic to cover tail-packing case.
> 
> this patch suffers from the following problems:
> 
> * Fiemap distinguishes between FIEMAP_EXTENT_DATA_INLINE and
> FIEMAP_EXTENT_DATA_TAIL. Someone will sooner or later inevitably use
> iomap_fiemap on a filesystem with tail packing, so if we don't make
> the same distinction in iomap, iomap_fiemap will silently produce the
> wrong result. This means that IOMAP_TAIL should become a separate
> mapping type.

I'm a little confused, IIUC, IOMAP_TAIL and FIEMAP_EXTENT_DATA_TAIL are with
different semantics:

- IOMAP_TAIl:
  we may introduce this flag to cover tail-packing case, in where we merge
small-sized data in the tail of file into free space of its metadata.
- FIEMAP_EXTENT_DATA_TAIL:
quoted from Documentation/filesystems/fiemap.txt
"  This will also set FIEMAP_EXTENT_NOT_ALIGNED
Data is packed into a block with data from other files."
It looks like this flag indicates that blocks from different files stores into
one common block.

I'm not familiar with fiemap flags' history, but maybe FIEMAP_EXTENT_DATA_TAIL
should be better to rename to FIEMAP_EXTENT_DATA_PACKING to avoid ambiguity. And
then FIEMAP_EXTENT_DATA_INLINE will be enough to cover normal inline case and
tail-packing case?

> 
> * IOMAP_INLINE mappings always start at offset 0 and span an entire
> page, so they are always page aligned. IOMAP_TAIL mappings only need

Why can't #0 page consist of more than one block/mapping? I didn't get what's
difference here.

> to be block aligned. If the block size is smaller than the page size,

- reiserfs tries to find last page's content, if the size of that page's valid
content is smaller than threshold (at least less than one block), reiserfs can
do the packing.

- erofs' block size is 4kb which is the same as page size.

Actually, for IOMAP_TAIL, there is no users who need to map more than one block
into metadata, so I'm not sure that we should support that, or we just need to
let code preparing for that to those potential users?

Thanks,

> a tail page can consist of more than one mapping. So
> iomap_read_inline_data needs to be changed to only copy a single block
> out of iomap->inline_data, and we need to adjust iomap_write_begin and
> iomap_readpage_actor accordingly.
> 
> * Functions iomap_read_inline_data, iomap_write_end_inline, and
> iomap_dio_inline_actor currently all assume that we are operating on
> page 0, and that iomap->inline_data points at the data at offset 0.
> That's no longer the case for IOMAP_TAIL mappings. We need to look
> only at the offset within the page / block there.
> 
> * There are some asserts like the following int he code:
> 
>   BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> 
> Those should probably be tightened to check for block boundaries
> instead of page boundaries, i.e. something like:
> 
>   BUG_ON(size > i_blocksize(inode) -
> offset_in_block(iomap->inline_data, inode));
> 
>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>> ---
>>  fs/iomap.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/fs/iomap.c b/fs/iomap.c
>> index 12654c2e78f8..d1c16b692d31 100644
>> --- a/fs/iomap.c
>> +++ b/fs/iomap.c
>> @@ -264,13 +264,12 @@ static void
>>  iomap_read_inline_data(struct inode *inode, struct page *page,
>>                 struct iomap *iomap)
>>  {
>> -       size_t size = i_size_read(inode);
>> +       size_t size = iomap->length;
> 
> Function iomap_read_inline_data fills the entire page here, not only
> the iomap->length part, so this is wrong. But see my comment above
> about changing iomap_read_inline_data to read a single block above as
> well.
> 
>>         void *addr;
>>
>>         if (PageUptodate(page))
>>                 return;
>>
>> -       BUG_ON(page->index);
>>         BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>
>>         addr = kmap_atomic(page);
>> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>>         sector_t sector;
>>
>>         if (iomap->type == IOMAP_INLINE) {
>> -               WARN_ON_ONCE(pos);
>>                 iomap_read_inline_data(inode, page, iomap);
>>                 return PAGE_SIZE;
>>         }
> 
> Those last two changes look right to me.
> 
> Thanks,
> Andreas
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
@ 2019-07-10 10:30     ` Chao Yu
  0 siblings, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-10 10:30 UTC (permalink / raw)
  To: Andreas Grünbacher
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs,
	Linux FS-devel Mailing List, Gao Xiang, chao

Hi Andreas,

Thanks for your review in advance. :)

On 2019/7/10 7:32, Andreas Grünbacher wrote:
> Hi Chao,
> 
> Am Mi., 3. Juli 2019 um 09:55 Uhr schrieb Chao Yu <yuchao0@huawei.com>:
>> Some filesystems like erofs/reiserfs have the ability to pack tail
>> data into metadata, e.g.:
>> IOMAP_MAPPED [0, 8192]
>> IOMAP_INLINE [8192, 8200]
>>
>> However current IOMAP_INLINE type has assumption that:
>> - inline data should be locating at page #0.
>> - inline size should equal to .i_size
>> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs,
>> so this patch tries to relieve above limits to make IOMAP_INLINE more
>> generic to cover tail-packing case.
> 
> this patch suffers from the following problems:
> 
> * Fiemap distinguishes between FIEMAP_EXTENT_DATA_INLINE and
> FIEMAP_EXTENT_DATA_TAIL. Someone will sooner or later inevitably use
> iomap_fiemap on a filesystem with tail packing, so if we don't make
> the same distinction in iomap, iomap_fiemap will silently produce the
> wrong result. This means that IOMAP_TAIL should become a separate
> mapping type.

I'm a little confused, IIUC, IOMAP_TAIL and FIEMAP_EXTENT_DATA_TAIL are with
different semantics:

- IOMAP_TAIl:
  we may introduce this flag to cover tail-packing case, in where we merge
small-sized data in the tail of file into free space of its metadata.
- FIEMAP_EXTENT_DATA_TAIL:
quoted from Documentation/filesystems/fiemap.txt
"  This will also set FIEMAP_EXTENT_NOT_ALIGNED
Data is packed into a block with data from other files."
It looks like this flag indicates that blocks from different files stores into
one common block.

I'm not familiar with fiemap flags' history, but maybe FIEMAP_EXTENT_DATA_TAIL
should be better to rename to FIEMAP_EXTENT_DATA_PACKING to avoid ambiguity. And
then FIEMAP_EXTENT_DATA_INLINE will be enough to cover normal inline case and
tail-packing case?

> 
> * IOMAP_INLINE mappings always start at offset 0 and span an entire
> page, so they are always page aligned. IOMAP_TAIL mappings only need

Why can't #0 page consist of more than one block/mapping? I didn't get what's
difference here.

> to be block aligned. If the block size is smaller than the page size,

- reiserfs tries to find last page's content, if the size of that page's valid
content is smaller than threshold (at least less than one block), reiserfs can
do the packing.

- erofs' block size is 4kb which is the same as page size.

Actually, for IOMAP_TAIL, there is no users who need to map more than one block
into metadata, so I'm not sure that we should support that, or we just need to
let code preparing for that to those potential users?

Thanks,

> a tail page can consist of more than one mapping. So
> iomap_read_inline_data needs to be changed to only copy a single block
> out of iomap->inline_data, and we need to adjust iomap_write_begin and
> iomap_readpage_actor accordingly.
> 
> * Functions iomap_read_inline_data, iomap_write_end_inline, and
> iomap_dio_inline_actor currently all assume that we are operating on
> page 0, and that iomap->inline_data points at the data at offset 0.
> That's no longer the case for IOMAP_TAIL mappings. We need to look
> only at the offset within the page / block there.
> 
> * There are some asserts like the following int he code:
> 
>   BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> 
> Those should probably be tightened to check for block boundaries
> instead of page boundaries, i.e. something like:
> 
>   BUG_ON(size > i_blocksize(inode) -
> offset_in_block(iomap->inline_data, inode));
> 
>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>> ---
>>  fs/iomap.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/fs/iomap.c b/fs/iomap.c
>> index 12654c2e78f8..d1c16b692d31 100644
>> --- a/fs/iomap.c
>> +++ b/fs/iomap.c
>> @@ -264,13 +264,12 @@ static void
>>  iomap_read_inline_data(struct inode *inode, struct page *page,
>>                 struct iomap *iomap)
>>  {
>> -       size_t size = i_size_read(inode);
>> +       size_t size = iomap->length;
> 
> Function iomap_read_inline_data fills the entire page here, not only
> the iomap->length part, so this is wrong. But see my comment above
> about changing iomap_read_inline_data to read a single block above as
> well.
> 
>>         void *addr;
>>
>>         if (PageUptodate(page))
>>                 return;
>>
>> -       BUG_ON(page->index);
>>         BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>
>>         addr = kmap_atomic(page);
>> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>>         sector_t sector;
>>
>>         if (iomap->type == IOMAP_INLINE) {
>> -               WARN_ON_ONCE(pos);
>>                 iomap_read_inline_data(inode, page, iomap);
>>                 return PAGE_SIZE;
>>         }
> 
> Those last two changes look right to me.
> 
> Thanks,
> Andreas
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-10 10:30     ` Chao Yu
  (?)
@ 2019-07-10 21:50     ` Andreas Grünbacher
  2019-07-10 23:42       ` Gao Xiang
  2019-07-11 14:15       ` Chao Yu
  -1 siblings, 2 replies; 26+ messages in thread
From: Andreas Grünbacher @ 2019-07-10 21:50 UTC (permalink / raw)
  To: Chao Yu
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs,
	Linux FS-devel Mailing List, Gao Xiang, chao

Am Mi., 10. Juli 2019 um 12:31 Uhr schrieb Chao Yu <yuchao0@huawei.com>:
> Hi Andreas,
>
> Thanks for your review in advance. :)
>
> On 2019/7/10 7:32, Andreas Grünbacher wrote:
> > Hi Chao,
> >
> > Am Mi., 3. Juli 2019 um 09:55 Uhr schrieb Chao Yu <yuchao0@huawei.com>:
> >> Some filesystems like erofs/reiserfs have the ability to pack tail
> >> data into metadata, e.g.:
> >> IOMAP_MAPPED [0, 8192]
> >> IOMAP_INLINE [8192, 8200]
> >>
> >> However current IOMAP_INLINE type has assumption that:
> >> - inline data should be locating at page #0.
> >> - inline size should equal to .i_size
> >> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs,
> >> so this patch tries to relieve above limits to make IOMAP_INLINE more
> >> generic to cover tail-packing case.
> >
> > this patch suffers from the following problems:
> >
> > * Fiemap distinguishes between FIEMAP_EXTENT_DATA_INLINE and
> > FIEMAP_EXTENT_DATA_TAIL. Someone will sooner or later inevitably use
> > iomap_fiemap on a filesystem with tail packing, so if we don't make
> > the same distinction in iomap, iomap_fiemap will silently produce the
> > wrong result. This means that IOMAP_TAIL should become a separate
> > mapping type.
>
> I'm a little confused, IIUC, IOMAP_TAIL and FIEMAP_EXTENT_DATA_TAIL are with
> different semantics:
>
> - IOMAP_TAIL:
>   we may introduce this flag to cover tail-packing case, in where we merge
> small-sized data in the tail of file into free space of its metadata.
> - FIEMAP_EXTENT_DATA_TAIL:
> quoted from Documentation/filesystems/fiemap.txt
> "  This will also set FIEMAP_EXTENT_NOT_ALIGNED
> Data is packed into a block with data from other files."
> It looks like this flag indicates that blocks from different files stores into
> one common block.

Well, maybe FIEMAP_EXTENT_DATA_INLINE is indeed the more appropriate
flag in your scenario. In that case, we should be fine on the fiemap
front.

> I'm not familiar with fiemap flags' history, but maybe FIEMAP_EXTENT_DATA_TAIL
> should be better to rename to FIEMAP_EXTENT_DATA_PACKING to avoid ambiguity. And
> then FIEMAP_EXTENT_DATA_INLINE will be enough to cover normal inline case and
> tail-packing case?

Those definitions are user-visible.

> > * IOMAP_INLINE mappings always start at offset 0 and span an entire
> > page, so they are always page aligned. IOMAP_TAIL mappings only need
>
> Why can't #0 page consist of more than one block/mapping? I didn't get what's
> difference here.

Currently, iomap_write_begin will read the inline data into page #0
and mark that page up-to-date. There's a check for that in
iomap_write_end_inline. If a page contains more than one mapping, we
won't be able to mark the entire page up to date anymore; we'd need to
track which parts of the page are up to date and which are not. Iomap
supports two tracking mechanisms, buffer heads and iomap_page, and
we'd need to implement that tracking code for each of those cases.

> > to be block aligned. If the block size is smaller than the page size,
>
> - reiserfs tries to find last page's content, if the size of that page's valid
> content is smaller than threshold (at least less than one block), reiserfs can
> do the packing.
>
> - erofs' block size is 4kb which is the same as page size.
>
>
> Actually, for IOMAP_TAIL, there is no users who need to map more than one block
> into metadata, so I'm not sure that we should support that, or we just need to
> let code preparing for that to those potential users?

How about architectures with PAGE_SIZE > 4096? I'm assuming that erofs
doesn't require block size == PAGE_SIZE?

> Thanks,
>
> > a tail page can consist of more than one mapping. So
> > iomap_read_inline_data needs to be changed to only copy a single block
> > out of iomap->inline_data, and we need to adjust iomap_write_begin and
> > iomap_readpage_actor accordingly.
> >
> > * Functions iomap_read_inline_data, iomap_write_end_inline, and
> > iomap_dio_inline_actor currently all assume that we are operating on
> > page 0, and that iomap->inline_data points at the data at offset 0.
> > That's no longer the case for IOMAP_TAIL mappings. We need to look
> > only at the offset within the page / block there.
> >
> > * There are some asserts like the following int he code:
> >
> >   BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> >
> > Those should probably be tightened to check for block boundaries
> > instead of page boundaries, i.e. something like:
> >
> >   BUG_ON(size > i_blocksize(inode) -
> > offset_in_block(iomap->inline_data, inode));
> >
> >> Signed-off-by: Chao Yu <yuchao0@huawei.com>
> >> ---
> >>  fs/iomap.c | 4 +---
> >>  1 file changed, 1 insertion(+), 3 deletions(-)
> >>
> >> diff --git a/fs/iomap.c b/fs/iomap.c
> >> index 12654c2e78f8..d1c16b692d31 100644
> >> --- a/fs/iomap.c
> >> +++ b/fs/iomap.c
> >> @@ -264,13 +264,12 @@ static void
> >>  iomap_read_inline_data(struct inode *inode, struct page *page,
> >>                 struct iomap *iomap)
> >>  {
> >> -       size_t size = i_size_read(inode);
> >> +       size_t size = iomap->length;
> >
> > Function iomap_read_inline_data fills the entire page here, not only
> > the iomap->length part, so this is wrong. But see my comment above
> > about changing iomap_read_inline_data to read a single block above as
> > well.
> >
> >>         void *addr;
> >>
> >>         if (PageUptodate(page))
> >>                 return;
> >>
> >> -       BUG_ON(page->index);
> >>         BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> >>
> >>         addr = kmap_atomic(page);
> >> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >>         sector_t sector;
> >>
> >>         if (iomap->type == IOMAP_INLINE) {
> >> -               WARN_ON_ONCE(pos);
> >>                 iomap_read_inline_data(inode, page, iomap);
> >>                 return PAGE_SIZE;
> >>         }
> >
> > Those last two changes look right to me.
> >
> > Thanks,
> > Andreas
> > .
> >

At this point, can I ask how important this packing mechanism is to
you? I can see a point in implementing inline files, which help
because there tends to be a large number of very small files. But for
not-so-small files, is saving an extra block really worth the trouble,
especially given how cheap storage has become?

Thanks,
Andreas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-10 21:50     ` Andreas Grünbacher
@ 2019-07-10 23:42       ` Gao Xiang
  2019-07-11 13:06         ` Matthew Wilcox
  2019-07-11 14:15       ` Chao Yu
  1 sibling, 1 reply; 26+ messages in thread
From: Gao Xiang @ 2019-07-10 23:42 UTC (permalink / raw)
  To: Andreas Grünbacher
  Cc: Chao Yu, Christoph Hellwig, Darrick J. Wong, linux-xfs,
	Linux FS-devel Mailing List, Gao Xiang, chao


At 2019/7/11 ??????5:50, Andreas Gr??nbacher Wrote:
> At this point, can I ask how important this packing mechanism is to
> you? I can see a point in implementing inline files, which help
> because there tends to be a large number of very small files. But for
> not-so-small files, is saving an extra block really worth the trouble,
> especially given how cheap storage has become?

I would try to answer the above. I think there are several advantages by
using tail-end packing inline:
1) It is more cache-friendly. Considering a file "A" accessed by user
now or recently, we
?????? tend to (1) get more data about "A" (2) leave more data about "A"
according to LRU-like assumption
?????? because it is more likely to be used than the metadata of some other
files "X", especially for files whose
?????? tail-end block is relatively small enough (less than a threshold,
e.g. < 100B just for example);

2) for directories files, tail-end packing will boost up those traversal
performance;

3) I think tail-end packing is a more generic inline, it saves I/Os for
generic cases not just to
?????? save the storage space;

"is saving an extra block really worth the trouble" I dont understand
what exact the trouble is...


Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-10 10:30     ` Chao Yu
  (?)
  (?)
@ 2019-07-11 12:28     ` Andreas Gruenbacher
  2019-07-12  9:31         ` Chao Yu
  2019-07-18 12:31       ` Christoph Hellwig
  -1 siblings, 2 replies; 26+ messages in thread
From: Andreas Gruenbacher @ 2019-07-11 12:28 UTC (permalink / raw)
  To: Chao Yu
  Cc: Andreas Gruenbacher, Christoph Hellwig, Darrick J. Wong,
	linux-xfs, linux-fsdevel, Gao Xiang, chao

Something along the lines of the attached, broken patch might work in
the end.

Andreas

---
 fs/buffer.c           | 10 ++++--
 fs/iomap.c            | 74 +++++++++++++++++++++++++++++--------------
 include/linux/iomap.h |  3 ++
 3 files changed, 61 insertions(+), 26 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index e450c55f6434..8d8668e377ab 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1873,8 +1873,8 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to)
 EXPORT_SYMBOL(page_zero_new_buffers);
 
 static void
-iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
-		struct iomap *iomap)
+iomap_to_bh(struct inode *inode, struct page *page, sector_t block,
+		struct buffer_head *bh, struct iomap *iomap)
 {
 	loff_t offset = block << inode->i_blkbits;
 
@@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
 				inode->i_blkbits;
 		set_buffer_mapped(bh);
 		break;
+	case IOMAP_INLINE:
+		__iomap_read_inline_data(inode, page, iomap);
+		set_buffer_uptodate(bh);
+		break;
 	}
 }
 
@@ -1969,7 +1973,7 @@ int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
 				if (err)
 					break;
 			} else {
-				iomap_to_bh(inode, block, bh, iomap);
+				iomap_to_bh(inode, page, block, bh, iomap);
 			}
 
 			if (buffer_new(bh)) {
diff --git a/fs/iomap.c b/fs/iomap.c
index 45aa58e837b5..61188e95def2 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -260,24 +260,47 @@ struct iomap_readpage_ctx {
 	struct list_head	*pages;
 };
 
-static void
-iomap_read_inline_data(struct inode *inode, struct page *page,
+#define offset_in_block(offset, inode) \
+	((unsigned long)(offset) & (i_blocksize(inode) - 1))
+
+static bool
+inline_data_within_block(struct inode *inode, struct iomap *iomap,
+		unsigned int size)
+{
+	unsigned int off = offset_in_block(iomap->inline_data, inode);
+
+	return size <= i_blocksize(inode) - off;
+}
+
+void
+__iomap_read_inline_data(struct inode *inode, struct page *page,
 		struct iomap *iomap)
 {
-	size_t size = i_size_read(inode);
+	size_t size = offset_in_block(i_size_read(inode), inode);
+	unsigned int poff = offset_in_page(iomap->offset);
+	unsigned int bsize = i_blocksize(inode);
 	void *addr;
 
 	if (PageUptodate(page))
 		return;
 
-	BUG_ON(page->index);
-	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
+	BUG_ON(!inline_data_within_block(inode, iomap, size));
 
 	addr = kmap_atomic(page);
-	memcpy(addr, iomap->inline_data, size);
-	memset(addr + size, 0, PAGE_SIZE - size);
+	memcpy(addr + poff, iomap->inline_data, size);
+	memset(addr + poff + size, 0, bsize - size);
 	kunmap_atomic(addr);
-	SetPageUptodate(page);
+}
+
+static void
+iomap_read_inline_data(struct inode *inode, struct page *page,
+		struct iomap *iomap)
+{
+	unsigned int poff = offset_in_page(iomap->offset);
+	unsigned int bsize = i_blocksize(inode);
+
+	__iomap_read_inline_data(inode, page, iomap);
+	iomap_set_range_uptodate(page, poff, bsize);
 }
 
 static loff_t
@@ -292,11 +315,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	unsigned poff, plen;
 	sector_t sector;
 
-	if (iomap->type == IOMAP_INLINE) {
-		WARN_ON_ONCE(pos);
+	if (iomap->type == IOMAP_INLINE)
 		iomap_read_inline_data(inode, page, iomap);
-		return PAGE_SIZE;
-	}
 
 	/* zero post-eof blocks as the page may be mapped */
 	iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
@@ -637,6 +657,11 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
 	if (PageUptodate(page))
 		return 0;
 
+	if (iomap->type == IOMAP_INLINE) {
+		iomap_read_inline_data(inode, page, iomap);
+		return 0;
+	}
+
 	do {
 		iomap_adjust_read_range(inode, iop, &block_start,
 				block_end - block_start, &poff, &plen);
@@ -682,9 +707,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
 		goto out_no_page;
 	}
 
-	if (iomap->type == IOMAP_INLINE)
-		iomap_read_inline_data(inode, page, iomap);
-	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
+	if (iomap->flags & IOMAP_F_BUFFER_HEAD)
 		status = __block_write_begin_int(page, pos, len, NULL, iomap);
 	else
 		status = __iomap_write_begin(inode, pos, len, page, iomap);
@@ -761,11 +784,11 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
 {
 	void *addr;
 
-	WARN_ON_ONCE(!PageUptodate(page));
-	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
+	BUG_ON(!inline_data_within_block(inode, iomap, pos + copied));
 
 	addr = kmap_atomic(page);
-	memcpy(iomap->inline_data + pos, addr + pos, copied);
+	memcpy(iomap->inline_data + offset_in_block(pos, inode),
+	       addr + offset_in_page(pos), copied);
 	kunmap_atomic(addr);
 
 	mark_inode_dirty(inode);
@@ -1064,7 +1087,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
 		const struct iomap_ops *ops)
 {
 	unsigned int blocksize = i_blocksize(inode);
-	unsigned int off = pos & (blocksize - 1);
+	unsigned int off = offset_in_block(pos, inode);
 
 	/* Block boundary? Nothing to do */
 	if (!off)
@@ -1772,21 +1795,26 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
 	struct iov_iter *iter = dio->submit.iter;
 	size_t copied;
 
-	BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
+	BUG_ON(!inline_data_within_block(inode, iomap, pos + length));
 
 	if (dio->flags & IOMAP_DIO_WRITE) {
 		loff_t size = inode->i_size;
 
 		if (pos > size)
-			memset(iomap->inline_data + size, 0, pos - size);
-		copied = copy_from_iter(iomap->inline_data + pos, length, iter);
+			memset(iomap->inline_data +
+			       offset_in_block(size, inode), 0, pos - size);
+		copied = copy_from_iter(iomap->inline_data +
+					offset_in_block(pos, inode),
+					length, iter);
 		if (copied) {
 			if (pos + copied > size)
 				i_size_write(inode, pos + copied);
 			mark_inode_dirty(inode);
 		}
 	} else {
-		copied = copy_to_iter(iomap->inline_data + pos, length, iter);
+		copied = copy_to_iter(iomap->inline_data +
+				      offset_in_block(pos, inode),
+				      length, iter);
 	}
 	dio->size += copied;
 	return copied;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 2103b94cb1bf..a8a60dd2fdc0 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -131,6 +131,9 @@ static inline struct iomap_page *to_iomap_page(struct page *page)
 	return NULL;
 }
 
+void __iomap_read_inline_data(struct inode *inode, struct page *page,
+		struct iomap *iomap);
+
 ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
 		const struct iomap_ops *ops);
 int iomap_readpage(struct page *page, const struct iomap_ops *ops);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-10 23:42       ` Gao Xiang
@ 2019-07-11 13:06         ` Matthew Wilcox
  2019-07-11 13:54           ` Gao Xiang
  0 siblings, 1 reply; 26+ messages in thread
From: Matthew Wilcox @ 2019-07-11 13:06 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Andreas Grünbacher, Chao Yu, Christoph Hellwig,
	Darrick J. Wong, linux-xfs, Linux FS-devel Mailing List,
	Gao Xiang, chao

On Thu, Jul 11, 2019 at 07:42:20AM +0800, Gao Xiang wrote:
> 
> At 2019/7/11 ??????5:50, Andreas Gr??nbacher Wrote:
> > At this point, can I ask how important this packing mechanism is to
> > you? I can see a point in implementing inline files, which help
> > because there tends to be a large number of very small files. But for
> > not-so-small files, is saving an extra block really worth the trouble,
> > especially given how cheap storage has become?
> 
> I would try to answer the above. I think there are several advantages by
> using tail-end packing inline:
> 1) It is more cache-friendly. Considering a file "A" accessed by user
> now or recently, we
> ?????? tend to (1) get more data about "A" (2) leave more data about "A"
> according to LRU-like assumption
> ?????? because it is more likely to be used than the metadata of some other
> files "X", especially for files whose
> ?????? tail-end block is relatively small enough (less than a threshold,
> e.g. < 100B just for example);
> 
> 2) for directories files, tail-end packing will boost up those traversal
> performance;
> 
> 3) I think tail-end packing is a more generic inline, it saves I/Os for
> generic cases not just to
> ?????? save the storage space;
> 
> "is saving an extra block really worth the trouble" I dont understand
> what exact the trouble is...

"the trouble" is adding code complexity and additional things to test.

I'm not sure you really understood Andreas' question.  He's saying that he
understands the performance and space gain from packing short files
(eg files less than 100 bytes).  But how many files are there between
4096 and 4196 bytes in size, let alone between 8192 and 8292, 12384 and
12484 ...

Is optimising for _those_ files worth it?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-11 13:06         ` Matthew Wilcox
@ 2019-07-11 13:54           ` Gao Xiang
  0 siblings, 0 replies; 26+ messages in thread
From: Gao Xiang @ 2019-07-11 13:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Gao Xiang, Andreas Grünbacher, Chao Yu, Christoph Hellwig,
	Darrick J. Wong, linux-xfs, Linux FS-devel Mailing List, chao



On 2019/7/11 21:06, Matthew Wilcox wrote:
> On Thu, Jul 11, 2019 at 07:42:20AM +0800, Gao Xiang wrote:
>>
>> At 2019/7/11 ??????5:50, Andreas Gr??nbacher Wrote:
>>> At this point, can I ask how important this packing mechanism is to
>>> you? I can see a point in implementing inline files, which help
>>> because there tends to be a large number of very small files. But for
>>> not-so-small files, is saving an extra block really worth the trouble,
>>> especially given how cheap storage has become?
>>
>> I would try to answer the above. I think there are several advantages by
>> using tail-end packing inline:
>> 1) It is more cache-friendly. Considering a file "A" accessed by user
>> now or recently, we
>> ?????? tend to (1) get more data about "A" (2) leave more data about "A"
>> according to LRU-like assumption
>> ?????? because it is more likely to be used than the metadata of some other
>> files "X", especially for files whose
>> ?????? tail-end block is relatively small enough (less than a threshold,
>> e.g. < 100B just for example);
>>
>> 2) for directories files, tail-end packing will boost up those traversal
>> performance;
>>
>> 3) I think tail-end packing is a more generic inline, it saves I/Os for
>> generic cases not just to
>> ?????? save the storage space;
>>
>> "is saving an extra block really worth the trouble" I dont understand
>> what exact the trouble is...
> 
> "the trouble" is adding code complexity and additional things to test.
> 
> I'm not sure you really understood Andreas' question.  He's saying that he
> understands the performance and space gain from packing short files
> (eg files less than 100 bytes).  But how many files are there between
> 4096 and 4196 bytes in size, let alone between 8192 and 8292, 12384 and
> 12484 ...
> 
> Is optimising for _those_ files worth it?

Hi Willy,

Thanks for your kindly explanation.. I get it :) I try to express my thoughts in
the following aspects...

1) In my thought, I personally think Chao's first patch which adds an additional
   type could be better for now, maybe we can reduce duplicate code based on that
   patch even further. What EROFS needs is only a read-only tail-end packing,
   I think for write we actually need to rethink more carefully (but it doesn't
   mean complex I think, but I don't do research on this.. I have to be silent...)
   and maybe we could leave it until a really fs user switching to iomap and mix
   INLINE and TAIL at that time...

2) EROFS actually has an unfinished feature which supports tail-end packing
   compresssed data, which means decompressed data could not be so small...
   and I know that is another matter... So to direct answer the question is
   that it depends on the userdata and user. For EROFS, tail-end packing
   inline is easy to implement, and it's a per-file optional feature (not
   mandatory) and the threshold (< 100B) is not a hardcoded limit as well,
   which is configured by mkfs users and only help mkfs decide whether the
   file should enable it or not. it should be useful for all directories
   at least, and I think it is more cache-friendly for regular files as well
   so a large range of files configured by users (not just < 100B) can be
   benefited from this...

Sorry about my English...

Thanks,
Gao Xiang

> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-10 21:50     ` Andreas Grünbacher
  2019-07-10 23:42       ` Gao Xiang
@ 2019-07-11 14:15       ` Chao Yu
  1 sibling, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-11 14:15 UTC (permalink / raw)
  To: Andreas Grünbacher, Chao Yu
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs,
	Linux FS-devel Mailing List, Gao Xiang

On 2019-7-11 5:50, Andreas Grünbacher wrote:
> Am Mi., 10. Juli 2019 um 12:31 Uhr schrieb Chao Yu <yuchao0@huawei.com>:
>> Hi Andreas,
>>
>> Thanks for your review in advance. :)
>>
>> On 2019/7/10 7:32, Andreas Grünbacher wrote:
>>> Hi Chao,
>>>
>>> Am Mi., 3. Juli 2019 um 09:55 Uhr schrieb Chao Yu <yuchao0@huawei.com>:
>>>> Some filesystems like erofs/reiserfs have the ability to pack tail
>>>> data into metadata, e.g.:
>>>> IOMAP_MAPPED [0, 8192]
>>>> IOMAP_INLINE [8192, 8200]
>>>>
>>>> However current IOMAP_INLINE type has assumption that:
>>>> - inline data should be locating at page #0.
>>>> - inline size should equal to .i_size
>>>> Those restriction fail to convert to use iomap IOMAP_INLINE in erofs,
>>>> so this patch tries to relieve above limits to make IOMAP_INLINE more
>>>> generic to cover tail-packing case.
>>>
>>> this patch suffers from the following problems:
>>>
>>> * Fiemap distinguishes between FIEMAP_EXTENT_DATA_INLINE and
>>> FIEMAP_EXTENT_DATA_TAIL. Someone will sooner or later inevitably use
>>> iomap_fiemap on a filesystem with tail packing, so if we don't make
>>> the same distinction in iomap, iomap_fiemap will silently produce the
>>> wrong result. This means that IOMAP_TAIL should become a separate
>>> mapping type.
>>
>> I'm a little confused, IIUC, IOMAP_TAIL and FIEMAP_EXTENT_DATA_TAIL are with
>> different semantics:
>>
>> - IOMAP_TAIL:
>>   we may introduce this flag to cover tail-packing case, in where we merge
>> small-sized data in the tail of file into free space of its metadata.
>> - FIEMAP_EXTENT_DATA_TAIL:
>> quoted from Documentation/filesystems/fiemap.txt
>> "  This will also set FIEMAP_EXTENT_NOT_ALIGNED
>> Data is packed into a block with data from other files."
>> It looks like this flag indicates that blocks from different files stores into
>> one common block.
> 
> Well, maybe FIEMAP_EXTENT_DATA_INLINE is indeed the more appropriate
> flag in your scenario. In that case, we should be fine on the fiemap
> front.

Yup.

> 
>> I'm not familiar with fiemap flags' history, but maybe FIEMAP_EXTENT_DATA_TAIL
>> should be better to rename to FIEMAP_EXTENT_DATA_PACKING to avoid ambiguity. And
>> then FIEMAP_EXTENT_DATA_INLINE will be enough to cover normal inline case and
>> tail-packing case?
> 
> Those definitions are user-visible.
> 
>>> * IOMAP_INLINE mappings always start at offset 0 and span an entire
>>> page, so they are always page aligned. IOMAP_TAIL mappings only need
>>
>> Why can't #0 page consist of more than one block/mapping? I didn't get what's
>> difference here.
> 
> Currently, iomap_write_begin will read the inline data into page #0
> and mark that page up-to-date. There's a check for that in
> iomap_write_end_inline. If a page contains more than one mapping, we
> won't be able to mark the entire page up to date anymore; we'd need to
> track which parts of the page are up to date and which are not. Iomap
> supports two tracking mechanisms, buffer heads and iomap_page, and
> we'd need to implement that tracking code for each of those cases.

Okay, I can understand now, so the reason here is the limitation (BUG_ON,
WARM_ON in iomap_inline_xxx functions) makes inline page should only contain one
mapping, then to generalize it, we should consider the unaligned case in between
page size and block size.

> 
>>> to be block aligned. If the block size is smaller than the page size,
>>
>> - reiserfs tries to find last page's content, if the size of that page's valid
>> content is smaller than threshold (at least less than one block), reiserfs can
>> do the packing.
>>
>> - erofs' block size is 4kb which is the same as page size.
>>
>>
>> Actually, for IOMAP_TAIL, there is no users who need to map more than one block
>> into metadata, so I'm not sure that we should support that, or we just need to
>> let code preparing for that to those potential users?
> 
> How about architectures with PAGE_SIZE > 4096? I'm assuming that erofs

We haven't considered the PAGE_SIZE > 4096 case yet.

> doesn't require block size == PAGE_SIZE?

At least now erofs block size is 4KB by default.

> 
>> Thanks,
>>
>>> a tail page can consist of more than one mapping. So
>>> iomap_read_inline_data needs to be changed to only copy a single block
>>> out of iomap->inline_data, and we need to adjust iomap_write_begin and
>>> iomap_readpage_actor accordingly.
>>>
>>> * Functions iomap_read_inline_data, iomap_write_end_inline, and
>>> iomap_dio_inline_actor currently all assume that we are operating on
>>> page 0, and that iomap->inline_data points at the data at offset 0.
>>> That's no longer the case for IOMAP_TAIL mappings. We need to look
>>> only at the offset within the page / block there.
>>>
>>> * There are some asserts like the following int he code:
>>>
>>>   BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>>
>>> Those should probably be tightened to check for block boundaries
>>> instead of page boundaries, i.e. something like:
>>>
>>>   BUG_ON(size > i_blocksize(inode) -
>>> offset_in_block(iomap->inline_data, inode));
>>>
>>>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>>>> ---
>>>>  fs/iomap.c | 4 +---
>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/iomap.c b/fs/iomap.c
>>>> index 12654c2e78f8..d1c16b692d31 100644
>>>> --- a/fs/iomap.c
>>>> +++ b/fs/iomap.c
>>>> @@ -264,13 +264,12 @@ static void
>>>>  iomap_read_inline_data(struct inode *inode, struct page *page,
>>>>                 struct iomap *iomap)
>>>>  {
>>>> -       size_t size = i_size_read(inode);
>>>> +       size_t size = iomap->length;
>>>
>>> Function iomap_read_inline_data fills the entire page here, not only
>>> the iomap->length part, so this is wrong. But see my comment above
>>> about changing iomap_read_inline_data to read a single block above as
>>> well.
>>>
>>>>         void *addr;
>>>>
>>>>         if (PageUptodate(page))
>>>>                 return;
>>>>
>>>> -       BUG_ON(page->index);
>>>>         BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>>>
>>>>         addr = kmap_atomic(page);
>>>> @@ -293,7 +292,6 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>>>>         sector_t sector;
>>>>
>>>>         if (iomap->type == IOMAP_INLINE) {
>>>> -               WARN_ON_ONCE(pos);
>>>>                 iomap_read_inline_data(inode, page, iomap);
>>>>                 return PAGE_SIZE;
>>>>         }
>>>
>>> Those last two changes look right to me.
>>>
>>> Thanks,
>>> Andreas
>>> .
>>>
> 
> At this point, can I ask how important this packing mechanism is to
> you? I can see a point in implementing inline files, which help
> because there tends to be a large number of very small files. But for
> not-so-small files, is saving an extra block really worth the trouble,
> especially given how cheap storage has become?

- Erofs is a readonly filesystem, we don't need to consider/design write path
for tail-pack feature, including writeback and inline conversion cases. So code
complex didn't trouble us.

- The files from real user's scenario are always unaligned to page size or block
size, so it can expect that part of files can be packed its tail data; and erofs
is used in consumer-electronics now (cell phone), there is no such large size
storage, we think that tail-packing can be one of compression/compaction
solutions in erofs to save storage space as much as possible.

Thanks,

> 
> Thanks,
> Andreas
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-11 12:28     ` Andreas Gruenbacher
@ 2019-07-12  9:31         ` Chao Yu
  2019-07-18 12:31       ` Christoph Hellwig
  1 sibling, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-12  9:31 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs, linux-fsdevel,
	Gao Xiang, chao

On 2019/7/11 20:28, Andreas Gruenbacher wrote:
> Something along the lines of the attached, broken patch might work in
> the end.
> 
> Andreas
> 
> ---
>  fs/buffer.c           | 10 ++++--
>  fs/iomap.c            | 74 +++++++++++++++++++++++++++++--------------
>  include/linux/iomap.h |  3 ++
>  3 files changed, 61 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index e450c55f6434..8d8668e377ab 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1873,8 +1873,8 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to)
>  EXPORT_SYMBOL(page_zero_new_buffers);
>  
>  static void
> -iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
> -		struct iomap *iomap)
> +iomap_to_bh(struct inode *inode, struct page *page, sector_t block,
> +		struct buffer_head *bh, struct iomap *iomap)
>  {
>  	loff_t offset = block << inode->i_blkbits;
>  
> @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>  				inode->i_blkbits;
>  		set_buffer_mapped(bh);
>  		break;
> +	case IOMAP_INLINE:
> +		__iomap_read_inline_data(inode, page, iomap);
> +		set_buffer_uptodate(bh);
> +		break;
>  	}
>  }
>  
> @@ -1969,7 +1973,7 @@ int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
>  				if (err)
>  					break;
>  			} else {
> -				iomap_to_bh(inode, block, bh, iomap);
> +				iomap_to_bh(inode, page, block, bh, iomap);
>  			}
>  
>  			if (buffer_new(bh)) {
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 45aa58e837b5..61188e95def2 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -260,24 +260,47 @@ struct iomap_readpage_ctx {
>  	struct list_head	*pages;
>  };
>  
> -static void
> -iomap_read_inline_data(struct inode *inode, struct page *page,
> +#define offset_in_block(offset, inode) \
> +	((unsigned long)(offset) & (i_blocksize(inode) - 1))
> +
> +static bool
> +inline_data_within_block(struct inode *inode, struct iomap *iomap,
> +		unsigned int size)
> +{
> +	unsigned int off = offset_in_block(iomap->inline_data, inode);
> +
> +	return size <= i_blocksize(inode) - off;
> +}
> +
> +void
> +__iomap_read_inline_data(struct inode *inode, struct page *page,
>  		struct iomap *iomap)
>  {
> -	size_t size = i_size_read(inode);
> +	size_t size = offset_in_block(i_size_read(inode), inode);
> +	unsigned int poff = offset_in_page(iomap->offset);
> +	unsigned int bsize = i_blocksize(inode);
>  	void *addr;
>  
>  	if (PageUptodate(page))
>  		return;
>  
> -	BUG_ON(page->index);
> -	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +	BUG_ON(!inline_data_within_block(inode, iomap, size));
>  
>  	addr = kmap_atomic(page);
> -	memcpy(addr, iomap->inline_data, size);
> -	memset(addr + size, 0, PAGE_SIZE - size);
> +	memcpy(addr + poff, iomap->inline_data, size);
> +	memset(addr + poff + size, 0, bsize - size);
>  	kunmap_atomic(addr);
> -	SetPageUptodate(page);
> +}
> +
> +static void
> +iomap_read_inline_data(struct inode *inode, struct page *page,
> +		struct iomap *iomap)
> +{
> +	unsigned int poff = offset_in_page(iomap->offset);
> +	unsigned int bsize = i_blocksize(inode);
> +
> +	__iomap_read_inline_data(inode, page, iomap);
> +	iomap_set_range_uptodate(page, poff, bsize);
>  }
>  
>  static loff_t
> @@ -292,11 +315,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	unsigned poff, plen;
>  	sector_t sector;
>  
> -	if (iomap->type == IOMAP_INLINE) {
> -		WARN_ON_ONCE(pos);
> +	if (iomap->type == IOMAP_INLINE)
>  		iomap_read_inline_data(inode, page, iomap);
> -		return PAGE_SIZE;

Hi Andreas,

Thanks for your patch.

In my erofs test case, filled inline data will be zeroed out due to we fallback
to following flow:

	if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
		zero_user(page, poff, plen);

Should we return before this condition check?

Thanks,

> -	}
>  
>  	/* zero post-eof blocks as the page may be mapped */
>  	iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
> @@ -637,6 +657,11 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
>  	if (PageUptodate(page))
>  		return 0;
>  
> +	if (iomap->type == IOMAP_INLINE) {
> +		iomap_read_inline_data(inode, page, iomap);
> +		return 0;
> +	}
> +
>  	do {
>  		iomap_adjust_read_range(inode, iop, &block_start,
>  				block_end - block_start, &poff, &plen);
> @@ -682,9 +707,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		goto out_no_page;
>  	}
>  
> -	if (iomap->type == IOMAP_INLINE)
> -		iomap_read_inline_data(inode, page, iomap);
> -	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> +	if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>  		status = __block_write_begin_int(page, pos, len, NULL, iomap);
>  	else
>  		status = __iomap_write_begin(inode, pos, len, page, iomap);
> @@ -761,11 +784,11 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>  {
>  	void *addr;
>  
> -	WARN_ON_ONCE(!PageUptodate(page));
> -	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +	BUG_ON(!inline_data_within_block(inode, iomap, pos + copied));
>  
>  	addr = kmap_atomic(page);
> -	memcpy(iomap->inline_data + pos, addr + pos, copied);
> +	memcpy(iomap->inline_data + offset_in_block(pos, inode),
> +	       addr + offset_in_page(pos), copied);
>  	kunmap_atomic(addr);
>  
>  	mark_inode_dirty(inode);
> @@ -1064,7 +1087,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
>  		const struct iomap_ops *ops)
>  {
>  	unsigned int blocksize = i_blocksize(inode);
> -	unsigned int off = pos & (blocksize - 1);
> +	unsigned int off = offset_in_block(pos, inode);
>  
>  	/* Block boundary? Nothing to do */
>  	if (!off)
> @@ -1772,21 +1795,26 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>  	struct iov_iter *iter = dio->submit.iter;
>  	size_t copied;
>  
> -	BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +	BUG_ON(!inline_data_within_block(inode, iomap, pos + length));
>  
>  	if (dio->flags & IOMAP_DIO_WRITE) {
>  		loff_t size = inode->i_size;
>  
>  		if (pos > size)
> -			memset(iomap->inline_data + size, 0, pos - size);
> -		copied = copy_from_iter(iomap->inline_data + pos, length, iter);
> +			memset(iomap->inline_data +
> +			       offset_in_block(size, inode), 0, pos - size);
> +		copied = copy_from_iter(iomap->inline_data +
> +					offset_in_block(pos, inode),
> +					length, iter);
>  		if (copied) {
>  			if (pos + copied > size)
>  				i_size_write(inode, pos + copied);
>  			mark_inode_dirty(inode);
>  		}
>  	} else {
> -		copied = copy_to_iter(iomap->inline_data + pos, length, iter);
> +		copied = copy_to_iter(iomap->inline_data +
> +				      offset_in_block(pos, inode),
> +				      length, iter);
>  	}
>  	dio->size += copied;
>  	return copied;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 2103b94cb1bf..a8a60dd2fdc0 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -131,6 +131,9 @@ static inline struct iomap_page *to_iomap_page(struct page *page)
>  	return NULL;
>  }
>  
> +void __iomap_read_inline_data(struct inode *inode, struct page *page,
> +		struct iomap *iomap);
> +
>  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
>  		const struct iomap_ops *ops);
>  int iomap_readpage(struct page *page, const struct iomap_ops *ops);
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
@ 2019-07-12  9:31         ` Chao Yu
  0 siblings, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-12  9:31 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs, linux-fsdevel,
	Gao Xiang, chao

On 2019/7/11 20:28, Andreas Gruenbacher wrote:
> Something along the lines of the attached, broken patch might work in
> the end.
> 
> Andreas
> 
> ---
>  fs/buffer.c           | 10 ++++--
>  fs/iomap.c            | 74 +++++++++++++++++++++++++++++--------------
>  include/linux/iomap.h |  3 ++
>  3 files changed, 61 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index e450c55f6434..8d8668e377ab 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1873,8 +1873,8 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to)
>  EXPORT_SYMBOL(page_zero_new_buffers);
>  
>  static void
> -iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
> -		struct iomap *iomap)
> +iomap_to_bh(struct inode *inode, struct page *page, sector_t block,
> +		struct buffer_head *bh, struct iomap *iomap)
>  {
>  	loff_t offset = block << inode->i_blkbits;
>  
> @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>  				inode->i_blkbits;
>  		set_buffer_mapped(bh);
>  		break;
> +	case IOMAP_INLINE:
> +		__iomap_read_inline_data(inode, page, iomap);
> +		set_buffer_uptodate(bh);
> +		break;
>  	}
>  }
>  
> @@ -1969,7 +1973,7 @@ int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
>  				if (err)
>  					break;
>  			} else {
> -				iomap_to_bh(inode, block, bh, iomap);
> +				iomap_to_bh(inode, page, block, bh, iomap);
>  			}
>  
>  			if (buffer_new(bh)) {
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 45aa58e837b5..61188e95def2 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -260,24 +260,47 @@ struct iomap_readpage_ctx {
>  	struct list_head	*pages;
>  };
>  
> -static void
> -iomap_read_inline_data(struct inode *inode, struct page *page,
> +#define offset_in_block(offset, inode) \
> +	((unsigned long)(offset) & (i_blocksize(inode) - 1))
> +
> +static bool
> +inline_data_within_block(struct inode *inode, struct iomap *iomap,
> +		unsigned int size)
> +{
> +	unsigned int off = offset_in_block(iomap->inline_data, inode);
> +
> +	return size <= i_blocksize(inode) - off;
> +}
> +
> +void
> +__iomap_read_inline_data(struct inode *inode, struct page *page,
>  		struct iomap *iomap)
>  {
> -	size_t size = i_size_read(inode);
> +	size_t size = offset_in_block(i_size_read(inode), inode);
> +	unsigned int poff = offset_in_page(iomap->offset);
> +	unsigned int bsize = i_blocksize(inode);
>  	void *addr;
>  
>  	if (PageUptodate(page))
>  		return;
>  
> -	BUG_ON(page->index);
> -	BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +	BUG_ON(!inline_data_within_block(inode, iomap, size));
>  
>  	addr = kmap_atomic(page);
> -	memcpy(addr, iomap->inline_data, size);
> -	memset(addr + size, 0, PAGE_SIZE - size);
> +	memcpy(addr + poff, iomap->inline_data, size);
> +	memset(addr + poff + size, 0, bsize - size);
>  	kunmap_atomic(addr);
> -	SetPageUptodate(page);
> +}
> +
> +static void
> +iomap_read_inline_data(struct inode *inode, struct page *page,
> +		struct iomap *iomap)
> +{
> +	unsigned int poff = offset_in_page(iomap->offset);
> +	unsigned int bsize = i_blocksize(inode);
> +
> +	__iomap_read_inline_data(inode, page, iomap);
> +	iomap_set_range_uptodate(page, poff, bsize);
>  }
>  
>  static loff_t
> @@ -292,11 +315,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	unsigned poff, plen;
>  	sector_t sector;
>  
> -	if (iomap->type == IOMAP_INLINE) {
> -		WARN_ON_ONCE(pos);
> +	if (iomap->type == IOMAP_INLINE)
>  		iomap_read_inline_data(inode, page, iomap);
> -		return PAGE_SIZE;

Hi Andreas,

Thanks for your patch.

In my erofs test case, filled inline data will be zeroed out due to we fallback
to following flow:

	if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
		zero_user(page, poff, plen);

Should we return before this condition check?

Thanks,

> -	}
>  
>  	/* zero post-eof blocks as the page may be mapped */
>  	iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
> @@ -637,6 +657,11 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
>  	if (PageUptodate(page))
>  		return 0;
>  
> +	if (iomap->type == IOMAP_INLINE) {
> +		iomap_read_inline_data(inode, page, iomap);
> +		return 0;
> +	}
> +
>  	do {
>  		iomap_adjust_read_range(inode, iop, &block_start,
>  				block_end - block_start, &poff, &plen);
> @@ -682,9 +707,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>  		goto out_no_page;
>  	}
>  
> -	if (iomap->type == IOMAP_INLINE)
> -		iomap_read_inline_data(inode, page, iomap);
> -	else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> +	if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>  		status = __block_write_begin_int(page, pos, len, NULL, iomap);
>  	else
>  		status = __iomap_write_begin(inode, pos, len, page, iomap);
> @@ -761,11 +784,11 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>  {
>  	void *addr;
>  
> -	WARN_ON_ONCE(!PageUptodate(page));
> -	BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +	BUG_ON(!inline_data_within_block(inode, iomap, pos + copied));
>  
>  	addr = kmap_atomic(page);
> -	memcpy(iomap->inline_data + pos, addr + pos, copied);
> +	memcpy(iomap->inline_data + offset_in_block(pos, inode),
> +	       addr + offset_in_page(pos), copied);
>  	kunmap_atomic(addr);
>  
>  	mark_inode_dirty(inode);
> @@ -1064,7 +1087,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
>  		const struct iomap_ops *ops)
>  {
>  	unsigned int blocksize = i_blocksize(inode);
> -	unsigned int off = pos & (blocksize - 1);
> +	unsigned int off = offset_in_block(pos, inode);
>  
>  	/* Block boundary? Nothing to do */
>  	if (!off)
> @@ -1772,21 +1795,26 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>  	struct iov_iter *iter = dio->submit.iter;
>  	size_t copied;
>  
> -	BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
> +	BUG_ON(!inline_data_within_block(inode, iomap, pos + length));
>  
>  	if (dio->flags & IOMAP_DIO_WRITE) {
>  		loff_t size = inode->i_size;
>  
>  		if (pos > size)
> -			memset(iomap->inline_data + size, 0, pos - size);
> -		copied = copy_from_iter(iomap->inline_data + pos, length, iter);
> +			memset(iomap->inline_data +
> +			       offset_in_block(size, inode), 0, pos - size);
> +		copied = copy_from_iter(iomap->inline_data +
> +					offset_in_block(pos, inode),
> +					length, iter);
>  		if (copied) {
>  			if (pos + copied > size)
>  				i_size_write(inode, pos + copied);
>  			mark_inode_dirty(inode);
>  		}
>  	} else {
> -		copied = copy_to_iter(iomap->inline_data + pos, length, iter);
> +		copied = copy_to_iter(iomap->inline_data +
> +				      offset_in_block(pos, inode),
> +				      length, iter);
>  	}
>  	dio->size += copied;
>  	return copied;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 2103b94cb1bf..a8a60dd2fdc0 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -131,6 +131,9 @@ static inline struct iomap_page *to_iomap_page(struct page *page)
>  	return NULL;
>  }
>  
> +void __iomap_read_inline_data(struct inode *inode, struct page *page,
> +		struct iomap *iomap);
> +
>  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
>  		const struct iomap_ops *ops);
>  int iomap_readpage(struct page *page, const struct iomap_ops *ops);
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-12  9:31         ` Chao Yu
  (?)
@ 2019-07-12 11:54         ` Andreas Gruenbacher
  2019-07-15  9:26             ` Chao Yu
  2019-07-18 12:33           ` Christoph Hellwig
  -1 siblings, 2 replies; 26+ messages in thread
From: Andreas Gruenbacher @ 2019-07-12 11:54 UTC (permalink / raw)
  To: Chao Yu
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs, linux-fsdevel,
	Gao Xiang, chao

On Fri, 12 Jul 2019 at 11:31, Chao Yu <yuchao0@huawei.com> wrote:
> On 2019/7/11 20:28, Andreas Gruenbacher wrote:
> > Something along the lines of the attached, broken patch might work in
> > the end.
> >
> > Andreas
> >
> > ---
> >  fs/buffer.c           | 10 ++++--
> >  fs/iomap.c            | 74 +++++++++++++++++++++++++++++--------------
> >  include/linux/iomap.h |  3 ++
> >  3 files changed, 61 insertions(+), 26 deletions(-)
> >
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index e450c55f6434..8d8668e377ab 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -1873,8 +1873,8 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to)
> >  EXPORT_SYMBOL(page_zero_new_buffers);
> >
> >  static void
> > -iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
> > -             struct iomap *iomap)
> > +iomap_to_bh(struct inode *inode, struct page *page, sector_t block,
> > +             struct buffer_head *bh, struct iomap *iomap)
> >  {
> >       loff_t offset = block << inode->i_blkbits;
> >
> > @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
> >                               inode->i_blkbits;
> >               set_buffer_mapped(bh);
> >               break;
> > +     case IOMAP_INLINE:
> > +             __iomap_read_inline_data(inode, page, iomap);
> > +             set_buffer_uptodate(bh);
> > +             break;
> >       }
> >  }
> >
> > @@ -1969,7 +1973,7 @@ int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
> >                               if (err)
> >                                       break;
> >                       } else {
> > -                             iomap_to_bh(inode, block, bh, iomap);
> > +                             iomap_to_bh(inode, page, block, bh, iomap);
> >                       }
> >
> >                       if (buffer_new(bh)) {
> > diff --git a/fs/iomap.c b/fs/iomap.c
> > index 45aa58e837b5..61188e95def2 100644
> > --- a/fs/iomap.c
> > +++ b/fs/iomap.c
> > @@ -260,24 +260,47 @@ struct iomap_readpage_ctx {
> >       struct list_head        *pages;
> >  };
> >
> > -static void
> > -iomap_read_inline_data(struct inode *inode, struct page *page,
> > +#define offset_in_block(offset, inode) \
> > +     ((unsigned long)(offset) & (i_blocksize(inode) - 1))
> > +
> > +static bool
> > +inline_data_within_block(struct inode *inode, struct iomap *iomap,
> > +             unsigned int size)
> > +{
> > +     unsigned int off = offset_in_block(iomap->inline_data, inode);
> > +
> > +     return size <= i_blocksize(inode) - off;
> > +}
> > +
> > +void
> > +__iomap_read_inline_data(struct inode *inode, struct page *page,
> >               struct iomap *iomap)
> >  {
> > -     size_t size = i_size_read(inode);
> > +     size_t size = offset_in_block(i_size_read(inode), inode);
> > +     unsigned int poff = offset_in_page(iomap->offset);
> > +     unsigned int bsize = i_blocksize(inode);
> >       void *addr;
> >
> >       if (PageUptodate(page))
> >               return;
> >
> > -     BUG_ON(page->index);
> > -     BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
> > +     BUG_ON(!inline_data_within_block(inode, iomap, size));
> >
> >       addr = kmap_atomic(page);
> > -     memcpy(addr, iomap->inline_data, size);
> > -     memset(addr + size, 0, PAGE_SIZE - size);
> > +     memcpy(addr + poff, iomap->inline_data, size);
> > +     memset(addr + poff + size, 0, bsize - size);
> >       kunmap_atomic(addr);
> > -     SetPageUptodate(page);
> > +}
> > +
> > +static void
> > +iomap_read_inline_data(struct inode *inode, struct page *page,
> > +             struct iomap *iomap)
> > +{
> > +     unsigned int poff = offset_in_page(iomap->offset);
> > +     unsigned int bsize = i_blocksize(inode);
> > +
> > +     __iomap_read_inline_data(inode, page, iomap);
> > +     iomap_set_range_uptodate(page, poff, bsize);
> >  }
> >
> >  static loff_t
> > @@ -292,11 +315,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
> >       unsigned poff, plen;
> >       sector_t sector;
> >
> > -     if (iomap->type == IOMAP_INLINE) {
> > -             WARN_ON_ONCE(pos);
> > +     if (iomap->type == IOMAP_INLINE)
> >               iomap_read_inline_data(inode, page, iomap);
> > -             return PAGE_SIZE;
>
> Hi Andreas,
>
> Thanks for your patch.
>
> In my erofs test case, filled inline data will be zeroed out due to we fallback
> to following flow:
>
>         if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
>                 zero_user(page, poff, plen);
>
> Should we return before this condition check?

Yes, probably by returning i_blocksize(inode) after
iomap_read_inline_data, but that alone isn't enough to make the patch
work completely. This really needs a review from Christoph and careful
testing of all the code paths.

Andreas

> Thanks,
>
> > -     }
> >
> >       /* zero post-eof blocks as the page may be mapped */
> >       iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
> > @@ -637,6 +657,11 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
> >       if (PageUptodate(page))
> >               return 0;
> >
> > +     if (iomap->type == IOMAP_INLINE) {
> > +             iomap_read_inline_data(inode, page, iomap);
> > +             return 0;
> > +     }
> > +
> >       do {
> >               iomap_adjust_read_range(inode, iop, &block_start,
> >                               block_end - block_start, &poff, &plen);
> > @@ -682,9 +707,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> >               goto out_no_page;
> >       }
> >
> > -     if (iomap->type == IOMAP_INLINE)
> > -             iomap_read_inline_data(inode, page, iomap);
> > -     else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> > +     if (iomap->flags & IOMAP_F_BUFFER_HEAD)
> >               status = __block_write_begin_int(page, pos, len, NULL, iomap);
> >       else
> >               status = __iomap_write_begin(inode, pos, len, page, iomap);
> > @@ -761,11 +784,11 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
> >  {
> >       void *addr;
> >
> > -     WARN_ON_ONCE(!PageUptodate(page));
> > -     BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
> > +     BUG_ON(!inline_data_within_block(inode, iomap, pos + copied));
> >
> >       addr = kmap_atomic(page);
> > -     memcpy(iomap->inline_data + pos, addr + pos, copied);
> > +     memcpy(iomap->inline_data + offset_in_block(pos, inode),
> > +            addr + offset_in_page(pos), copied);
> >       kunmap_atomic(addr);
> >
> >       mark_inode_dirty(inode);
> > @@ -1064,7 +1087,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
> >               const struct iomap_ops *ops)
> >  {
> >       unsigned int blocksize = i_blocksize(inode);
> > -     unsigned int off = pos & (blocksize - 1);
> > +     unsigned int off = offset_in_block(pos, inode);
> >
> >       /* Block boundary? Nothing to do */
> >       if (!off)
> > @@ -1772,21 +1795,26 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
> >       struct iov_iter *iter = dio->submit.iter;
> >       size_t copied;
> >
> > -     BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
> > +     BUG_ON(!inline_data_within_block(inode, iomap, pos + length));
> >
> >       if (dio->flags & IOMAP_DIO_WRITE) {
> >               loff_t size = inode->i_size;
> >
> >               if (pos > size)
> > -                     memset(iomap->inline_data + size, 0, pos - size);
> > -             copied = copy_from_iter(iomap->inline_data + pos, length, iter);
> > +                     memset(iomap->inline_data +
> > +                            offset_in_block(size, inode), 0, pos - size);
> > +             copied = copy_from_iter(iomap->inline_data +
> > +                                     offset_in_block(pos, inode),
> > +                                     length, iter);
> >               if (copied) {
> >                       if (pos + copied > size)
> >                               i_size_write(inode, pos + copied);
> >                       mark_inode_dirty(inode);
> >               }
> >       } else {
> > -             copied = copy_to_iter(iomap->inline_data + pos, length, iter);
> > +             copied = copy_to_iter(iomap->inline_data +
> > +                                   offset_in_block(pos, inode),
> > +                                   length, iter);
> >       }
> >       dio->size += copied;
> >       return copied;
> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > index 2103b94cb1bf..a8a60dd2fdc0 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -131,6 +131,9 @@ static inline struct iomap_page *to_iomap_page(struct page *page)
> >       return NULL;
> >  }
> >
> > +void __iomap_read_inline_data(struct inode *inode, struct page *page,
> > +             struct iomap *iomap);
> > +
> >  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
> >               const struct iomap_ops *ops);
> >  int iomap_readpage(struct page *page, const struct iomap_ops *ops);
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-12 11:54         ` Andreas Gruenbacher
@ 2019-07-15  9:26             ` Chao Yu
  2019-07-18 12:33           ` Christoph Hellwig
  1 sibling, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-15  9:26 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs, linux-fsdevel,
	Gao Xiang, chao

On 2019/7/12 19:54, Andreas Gruenbacher wrote:
> On Fri, 12 Jul 2019 at 11:31, Chao Yu <yuchao0@huawei.com> wrote:
>> On 2019/7/11 20:28, Andreas Gruenbacher wrote:
>>> Something along the lines of the attached, broken patch might work in
>>> the end.
>>>
>>> Andreas
>>>
>>> ---
>>>  fs/buffer.c           | 10 ++++--
>>>  fs/iomap.c            | 74 +++++++++++++++++++++++++++++--------------
>>>  include/linux/iomap.h |  3 ++
>>>  3 files changed, 61 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/fs/buffer.c b/fs/buffer.c
>>> index e450c55f6434..8d8668e377ab 100644
>>> --- a/fs/buffer.c
>>> +++ b/fs/buffer.c
>>> @@ -1873,8 +1873,8 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to)
>>>  EXPORT_SYMBOL(page_zero_new_buffers);
>>>
>>>  static void
>>> -iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>>> -             struct iomap *iomap)
>>> +iomap_to_bh(struct inode *inode, struct page *page, sector_t block,
>>> +             struct buffer_head *bh, struct iomap *iomap)
>>>  {
>>>       loff_t offset = block << inode->i_blkbits;
>>>
>>> @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>>>                               inode->i_blkbits;
>>>               set_buffer_mapped(bh);
>>>               break;
>>> +     case IOMAP_INLINE:
>>> +             __iomap_read_inline_data(inode, page, iomap);
>>> +             set_buffer_uptodate(bh);
>>> +             break;
>>>       }
>>>  }
>>>
>>> @@ -1969,7 +1973,7 @@ int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
>>>                               if (err)
>>>                                       break;
>>>                       } else {
>>> -                             iomap_to_bh(inode, block, bh, iomap);
>>> +                             iomap_to_bh(inode, page, block, bh, iomap);
>>>                       }
>>>
>>>                       if (buffer_new(bh)) {
>>> diff --git a/fs/iomap.c b/fs/iomap.c
>>> index 45aa58e837b5..61188e95def2 100644
>>> --- a/fs/iomap.c
>>> +++ b/fs/iomap.c
>>> @@ -260,24 +260,47 @@ struct iomap_readpage_ctx {
>>>       struct list_head        *pages;
>>>  };
>>>
>>> -static void
>>> -iomap_read_inline_data(struct inode *inode, struct page *page,
>>> +#define offset_in_block(offset, inode) \
>>> +     ((unsigned long)(offset) & (i_blocksize(inode) - 1))
>>> +
>>> +static bool
>>> +inline_data_within_block(struct inode *inode, struct iomap *iomap,
>>> +             unsigned int size)
>>> +{
>>> +     unsigned int off = offset_in_block(iomap->inline_data, inode);
>>> +
>>> +     return size <= i_blocksize(inode) - off;
>>> +}
>>> +
>>> +void
>>> +__iomap_read_inline_data(struct inode *inode, struct page *page,
>>>               struct iomap *iomap)
>>>  {
>>> -     size_t size = i_size_read(inode);
>>> +     size_t size = offset_in_block(i_size_read(inode), inode);
>>> +     unsigned int poff = offset_in_page(iomap->offset);
>>> +     unsigned int bsize = i_blocksize(inode);
>>>       void *addr;
>>>
>>>       if (PageUptodate(page))
>>>               return;
>>>
>>> -     BUG_ON(page->index);
>>> -     BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>> +     BUG_ON(!inline_data_within_block(inode, iomap, size));
>>>
>>>       addr = kmap_atomic(page);
>>> -     memcpy(addr, iomap->inline_data, size);
>>> -     memset(addr + size, 0, PAGE_SIZE - size);
>>> +     memcpy(addr + poff, iomap->inline_data, size);
>>> +     memset(addr + poff + size, 0, bsize - size);
>>>       kunmap_atomic(addr);
>>> -     SetPageUptodate(page);
>>> +}
>>> +
>>> +static void
>>> +iomap_read_inline_data(struct inode *inode, struct page *page,
>>> +             struct iomap *iomap)
>>> +{
>>> +     unsigned int poff = offset_in_page(iomap->offset);
>>> +     unsigned int bsize = i_blocksize(inode);
>>> +
>>> +     __iomap_read_inline_data(inode, page, iomap);
>>> +     iomap_set_range_uptodate(page, poff, bsize);
>>>  }
>>>
>>>  static loff_t
>>> @@ -292,11 +315,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>>>       unsigned poff, plen;
>>>       sector_t sector;
>>>
>>> -     if (iomap->type == IOMAP_INLINE) {
>>> -             WARN_ON_ONCE(pos);
>>> +     if (iomap->type == IOMAP_INLINE)
>>>               iomap_read_inline_data(inode, page, iomap);
>>> -             return PAGE_SIZE;
>>
>> Hi Andreas,
>>
>> Thanks for your patch.
>>
>> In my erofs test case, filled inline data will be zeroed out due to we fallback
>> to following flow:
>>
>>         if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
>>                 zero_user(page, poff, plen);
>>
>> Should we return before this condition check?
> 
> Yes, probably by returning i_blocksize(inode) after
> iomap_read_inline_data, but that alone isn't enough to make the patch

Could you please resend this diff as a new patch with above fix?

> work completely. This really needs a review from Christoph and careful
> testing of all the code paths.

Later, I can do the test on read path.

Thanks,

> 
> Andreas
> 
>> Thanks,
>>
>>> -     }
>>>
>>>       /* zero post-eof blocks as the page may be mapped */
>>>       iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
>>> @@ -637,6 +657,11 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
>>>       if (PageUptodate(page))
>>>               return 0;
>>>
>>> +     if (iomap->type == IOMAP_INLINE) {
>>> +             iomap_read_inline_data(inode, page, iomap);
>>> +             return 0;
>>> +     }
>>> +
>>>       do {
>>>               iomap_adjust_read_range(inode, iop, &block_start,
>>>                               block_end - block_start, &poff, &plen);
>>> @@ -682,9 +707,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>>>               goto out_no_page;
>>>       }
>>>
>>> -     if (iomap->type == IOMAP_INLINE)
>>> -             iomap_read_inline_data(inode, page, iomap);
>>> -     else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>>> +     if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>>>               status = __block_write_begin_int(page, pos, len, NULL, iomap);
>>>       else
>>>               status = __iomap_write_begin(inode, pos, len, page, iomap);
>>> @@ -761,11 +784,11 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>>>  {
>>>       void *addr;
>>>
>>> -     WARN_ON_ONCE(!PageUptodate(page));
>>> -     BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>> +     BUG_ON(!inline_data_within_block(inode, iomap, pos + copied));
>>>
>>>       addr = kmap_atomic(page);
>>> -     memcpy(iomap->inline_data + pos, addr + pos, copied);
>>> +     memcpy(iomap->inline_data + offset_in_block(pos, inode),
>>> +            addr + offset_in_page(pos), copied);
>>>       kunmap_atomic(addr);
>>>
>>>       mark_inode_dirty(inode);
>>> @@ -1064,7 +1087,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
>>>               const struct iomap_ops *ops)
>>>  {
>>>       unsigned int blocksize = i_blocksize(inode);
>>> -     unsigned int off = pos & (blocksize - 1);
>>> +     unsigned int off = offset_in_block(pos, inode);
>>>
>>>       /* Block boundary? Nothing to do */
>>>       if (!off)
>>> @@ -1772,21 +1795,26 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>>>       struct iov_iter *iter = dio->submit.iter;
>>>       size_t copied;
>>>
>>> -     BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>> +     BUG_ON(!inline_data_within_block(inode, iomap, pos + length));
>>>
>>>       if (dio->flags & IOMAP_DIO_WRITE) {
>>>               loff_t size = inode->i_size;
>>>
>>>               if (pos > size)
>>> -                     memset(iomap->inline_data + size, 0, pos - size);
>>> -             copied = copy_from_iter(iomap->inline_data + pos, length, iter);
>>> +                     memset(iomap->inline_data +
>>> +                            offset_in_block(size, inode), 0, pos - size);
>>> +             copied = copy_from_iter(iomap->inline_data +
>>> +                                     offset_in_block(pos, inode),
>>> +                                     length, iter);
>>>               if (copied) {
>>>                       if (pos + copied > size)
>>>                               i_size_write(inode, pos + copied);
>>>                       mark_inode_dirty(inode);
>>>               }
>>>       } else {
>>> -             copied = copy_to_iter(iomap->inline_data + pos, length, iter);
>>> +             copied = copy_to_iter(iomap->inline_data +
>>> +                                   offset_in_block(pos, inode),
>>> +                                   length, iter);
>>>       }
>>>       dio->size += copied;
>>>       return copied;
>>> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
>>> index 2103b94cb1bf..a8a60dd2fdc0 100644
>>> --- a/include/linux/iomap.h
>>> +++ b/include/linux/iomap.h
>>> @@ -131,6 +131,9 @@ static inline struct iomap_page *to_iomap_page(struct page *page)
>>>       return NULL;
>>>  }
>>>
>>> +void __iomap_read_inline_data(struct inode *inode, struct page *page,
>>> +             struct iomap *iomap);
>>> +
>>>  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
>>>               const struct iomap_ops *ops);
>>>  int iomap_readpage(struct page *page, const struct iomap_ops *ops);
>>>
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
@ 2019-07-15  9:26             ` Chao Yu
  0 siblings, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-15  9:26 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs, linux-fsdevel,
	Gao Xiang, chao

On 2019/7/12 19:54, Andreas Gruenbacher wrote:
> On Fri, 12 Jul 2019 at 11:31, Chao Yu <yuchao0@huawei.com> wrote:
>> On 2019/7/11 20:28, Andreas Gruenbacher wrote:
>>> Something along the lines of the attached, broken patch might work in
>>> the end.
>>>
>>> Andreas
>>>
>>> ---
>>>  fs/buffer.c           | 10 ++++--
>>>  fs/iomap.c            | 74 +++++++++++++++++++++++++++++--------------
>>>  include/linux/iomap.h |  3 ++
>>>  3 files changed, 61 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/fs/buffer.c b/fs/buffer.c
>>> index e450c55f6434..8d8668e377ab 100644
>>> --- a/fs/buffer.c
>>> +++ b/fs/buffer.c
>>> @@ -1873,8 +1873,8 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to)
>>>  EXPORT_SYMBOL(page_zero_new_buffers);
>>>
>>>  static void
>>> -iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>>> -             struct iomap *iomap)
>>> +iomap_to_bh(struct inode *inode, struct page *page, sector_t block,
>>> +             struct buffer_head *bh, struct iomap *iomap)
>>>  {
>>>       loff_t offset = block << inode->i_blkbits;
>>>
>>> @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>>>                               inode->i_blkbits;
>>>               set_buffer_mapped(bh);
>>>               break;
>>> +     case IOMAP_INLINE:
>>> +             __iomap_read_inline_data(inode, page, iomap);
>>> +             set_buffer_uptodate(bh);
>>> +             break;
>>>       }
>>>  }
>>>
>>> @@ -1969,7 +1973,7 @@ int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
>>>                               if (err)
>>>                                       break;
>>>                       } else {
>>> -                             iomap_to_bh(inode, block, bh, iomap);
>>> +                             iomap_to_bh(inode, page, block, bh, iomap);
>>>                       }
>>>
>>>                       if (buffer_new(bh)) {
>>> diff --git a/fs/iomap.c b/fs/iomap.c
>>> index 45aa58e837b5..61188e95def2 100644
>>> --- a/fs/iomap.c
>>> +++ b/fs/iomap.c
>>> @@ -260,24 +260,47 @@ struct iomap_readpage_ctx {
>>>       struct list_head        *pages;
>>>  };
>>>
>>> -static void
>>> -iomap_read_inline_data(struct inode *inode, struct page *page,
>>> +#define offset_in_block(offset, inode) \
>>> +     ((unsigned long)(offset) & (i_blocksize(inode) - 1))
>>> +
>>> +static bool
>>> +inline_data_within_block(struct inode *inode, struct iomap *iomap,
>>> +             unsigned int size)
>>> +{
>>> +     unsigned int off = offset_in_block(iomap->inline_data, inode);
>>> +
>>> +     return size <= i_blocksize(inode) - off;
>>> +}
>>> +
>>> +void
>>> +__iomap_read_inline_data(struct inode *inode, struct page *page,
>>>               struct iomap *iomap)
>>>  {
>>> -     size_t size = i_size_read(inode);
>>> +     size_t size = offset_in_block(i_size_read(inode), inode);
>>> +     unsigned int poff = offset_in_page(iomap->offset);
>>> +     unsigned int bsize = i_blocksize(inode);
>>>       void *addr;
>>>
>>>       if (PageUptodate(page))
>>>               return;
>>>
>>> -     BUG_ON(page->index);
>>> -     BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>> +     BUG_ON(!inline_data_within_block(inode, iomap, size));
>>>
>>>       addr = kmap_atomic(page);
>>> -     memcpy(addr, iomap->inline_data, size);
>>> -     memset(addr + size, 0, PAGE_SIZE - size);
>>> +     memcpy(addr + poff, iomap->inline_data, size);
>>> +     memset(addr + poff + size, 0, bsize - size);
>>>       kunmap_atomic(addr);
>>> -     SetPageUptodate(page);
>>> +}
>>> +
>>> +static void
>>> +iomap_read_inline_data(struct inode *inode, struct page *page,
>>> +             struct iomap *iomap)
>>> +{
>>> +     unsigned int poff = offset_in_page(iomap->offset);
>>> +     unsigned int bsize = i_blocksize(inode);
>>> +
>>> +     __iomap_read_inline_data(inode, page, iomap);
>>> +     iomap_set_range_uptodate(page, poff, bsize);
>>>  }
>>>
>>>  static loff_t
>>> @@ -292,11 +315,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>>>       unsigned poff, plen;
>>>       sector_t sector;
>>>
>>> -     if (iomap->type == IOMAP_INLINE) {
>>> -             WARN_ON_ONCE(pos);
>>> +     if (iomap->type == IOMAP_INLINE)
>>>               iomap_read_inline_data(inode, page, iomap);
>>> -             return PAGE_SIZE;
>>
>> Hi Andreas,
>>
>> Thanks for your patch.
>>
>> In my erofs test case, filled inline data will be zeroed out due to we fallback
>> to following flow:
>>
>>         if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
>>                 zero_user(page, poff, plen);
>>
>> Should we return before this condition check?
> 
> Yes, probably by returning i_blocksize(inode) after
> iomap_read_inline_data, but that alone isn't enough to make the patch

Could you please resend this diff as a new patch with above fix?

> work completely. This really needs a review from Christoph and careful
> testing of all the code paths.

Later, I can do the test on read path.

Thanks,

> 
> Andreas
> 
>> Thanks,
>>
>>> -     }
>>>
>>>       /* zero post-eof blocks as the page may be mapped */
>>>       iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
>>> @@ -637,6 +657,11 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
>>>       if (PageUptodate(page))
>>>               return 0;
>>>
>>> +     if (iomap->type == IOMAP_INLINE) {
>>> +             iomap_read_inline_data(inode, page, iomap);
>>> +             return 0;
>>> +     }
>>> +
>>>       do {
>>>               iomap_adjust_read_range(inode, iop, &block_start,
>>>                               block_end - block_start, &poff, &plen);
>>> @@ -682,9 +707,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>>>               goto out_no_page;
>>>       }
>>>
>>> -     if (iomap->type == IOMAP_INLINE)
>>> -             iomap_read_inline_data(inode, page, iomap);
>>> -     else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>>> +     if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>>>               status = __block_write_begin_int(page, pos, len, NULL, iomap);
>>>       else
>>>               status = __iomap_write_begin(inode, pos, len, page, iomap);
>>> @@ -761,11 +784,11 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>>>  {
>>>       void *addr;
>>>
>>> -     WARN_ON_ONCE(!PageUptodate(page));
>>> -     BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>> +     BUG_ON(!inline_data_within_block(inode, iomap, pos + copied));
>>>
>>>       addr = kmap_atomic(page);
>>> -     memcpy(iomap->inline_data + pos, addr + pos, copied);
>>> +     memcpy(iomap->inline_data + offset_in_block(pos, inode),
>>> +            addr + offset_in_page(pos), copied);
>>>       kunmap_atomic(addr);
>>>
>>>       mark_inode_dirty(inode);
>>> @@ -1064,7 +1087,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
>>>               const struct iomap_ops *ops)
>>>  {
>>>       unsigned int blocksize = i_blocksize(inode);
>>> -     unsigned int off = pos & (blocksize - 1);
>>> +     unsigned int off = offset_in_block(pos, inode);
>>>
>>>       /* Block boundary? Nothing to do */
>>>       if (!off)
>>> @@ -1772,21 +1795,26 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>>>       struct iov_iter *iter = dio->submit.iter;
>>>       size_t copied;
>>>
>>> -     BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>> +     BUG_ON(!inline_data_within_block(inode, iomap, pos + length));
>>>
>>>       if (dio->flags & IOMAP_DIO_WRITE) {
>>>               loff_t size = inode->i_size;
>>>
>>>               if (pos > size)
>>> -                     memset(iomap->inline_data + size, 0, pos - size);
>>> -             copied = copy_from_iter(iomap->inline_data + pos, length, iter);
>>> +                     memset(iomap->inline_data +
>>> +                            offset_in_block(size, inode), 0, pos - size);
>>> +             copied = copy_from_iter(iomap->inline_data +
>>> +                                     offset_in_block(pos, inode),
>>> +                                     length, iter);
>>>               if (copied) {
>>>                       if (pos + copied > size)
>>>                               i_size_write(inode, pos + copied);
>>>                       mark_inode_dirty(inode);
>>>               }
>>>       } else {
>>> -             copied = copy_to_iter(iomap->inline_data + pos, length, iter);
>>> +             copied = copy_to_iter(iomap->inline_data +
>>> +                                   offset_in_block(pos, inode),
>>> +                                   length, iter);
>>>       }
>>>       dio->size += copied;
>>>       return copied;
>>> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
>>> index 2103b94cb1bf..a8a60dd2fdc0 100644
>>> --- a/include/linux/iomap.h
>>> +++ b/include/linux/iomap.h
>>> @@ -131,6 +131,9 @@ static inline struct iomap_page *to_iomap_page(struct page *page)
>>>       return NULL;
>>>  }
>>>
>>> +void __iomap_read_inline_data(struct inode *inode, struct page *page,
>>> +             struct iomap *iomap);
>>> +
>>>  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
>>>               const struct iomap_ops *ops);
>>>  int iomap_readpage(struct page *page, const struct iomap_ops *ops);
>>>
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-15  9:26             ` Chao Yu
@ 2019-07-17  2:58               ` Chao Yu
  -1 siblings, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-17  2:58 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs, linux-fsdevel,
	Gao Xiang, chao

Ping, Andreas. :P

On 2019/7/15 17:26, Chao Yu wrote:
> On 2019/7/12 19:54, Andreas Gruenbacher wrote:
>> On Fri, 12 Jul 2019 at 11:31, Chao Yu <yuchao0@huawei.com> wrote:
>>> On 2019/7/11 20:28, Andreas Gruenbacher wrote:
>>>> Something along the lines of the attached, broken patch might work in
>>>> the end.
>>>>
>>>> Andreas
>>>>
>>>> ---
>>>>  fs/buffer.c           | 10 ++++--
>>>>  fs/iomap.c            | 74 +++++++++++++++++++++++++++++--------------
>>>>  include/linux/iomap.h |  3 ++
>>>>  3 files changed, 61 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/fs/buffer.c b/fs/buffer.c
>>>> index e450c55f6434..8d8668e377ab 100644
>>>> --- a/fs/buffer.c
>>>> +++ b/fs/buffer.c
>>>> @@ -1873,8 +1873,8 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to)
>>>>  EXPORT_SYMBOL(page_zero_new_buffers);
>>>>
>>>>  static void
>>>> -iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>>>> -             struct iomap *iomap)
>>>> +iomap_to_bh(struct inode *inode, struct page *page, sector_t block,
>>>> +             struct buffer_head *bh, struct iomap *iomap)
>>>>  {
>>>>       loff_t offset = block << inode->i_blkbits;
>>>>
>>>> @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>>>>                               inode->i_blkbits;
>>>>               set_buffer_mapped(bh);
>>>>               break;
>>>> +     case IOMAP_INLINE:
>>>> +             __iomap_read_inline_data(inode, page, iomap);
>>>> +             set_buffer_uptodate(bh);
>>>> +             break;
>>>>       }
>>>>  }
>>>>
>>>> @@ -1969,7 +1973,7 @@ int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
>>>>                               if (err)
>>>>                                       break;
>>>>                       } else {
>>>> -                             iomap_to_bh(inode, block, bh, iomap);
>>>> +                             iomap_to_bh(inode, page, block, bh, iomap);
>>>>                       }
>>>>
>>>>                       if (buffer_new(bh)) {
>>>> diff --git a/fs/iomap.c b/fs/iomap.c
>>>> index 45aa58e837b5..61188e95def2 100644
>>>> --- a/fs/iomap.c
>>>> +++ b/fs/iomap.c
>>>> @@ -260,24 +260,47 @@ struct iomap_readpage_ctx {
>>>>       struct list_head        *pages;
>>>>  };
>>>>
>>>> -static void
>>>> -iomap_read_inline_data(struct inode *inode, struct page *page,
>>>> +#define offset_in_block(offset, inode) \
>>>> +     ((unsigned long)(offset) & (i_blocksize(inode) - 1))
>>>> +
>>>> +static bool
>>>> +inline_data_within_block(struct inode *inode, struct iomap *iomap,
>>>> +             unsigned int size)
>>>> +{
>>>> +     unsigned int off = offset_in_block(iomap->inline_data, inode);
>>>> +
>>>> +     return size <= i_blocksize(inode) - off;
>>>> +}
>>>> +
>>>> +void
>>>> +__iomap_read_inline_data(struct inode *inode, struct page *page,
>>>>               struct iomap *iomap)
>>>>  {
>>>> -     size_t size = i_size_read(inode);
>>>> +     size_t size = offset_in_block(i_size_read(inode), inode);
>>>> +     unsigned int poff = offset_in_page(iomap->offset);
>>>> +     unsigned int bsize = i_blocksize(inode);
>>>>       void *addr;
>>>>
>>>>       if (PageUptodate(page))
>>>>               return;
>>>>
>>>> -     BUG_ON(page->index);
>>>> -     BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>>> +     BUG_ON(!inline_data_within_block(inode, iomap, size));
>>>>
>>>>       addr = kmap_atomic(page);
>>>> -     memcpy(addr, iomap->inline_data, size);
>>>> -     memset(addr + size, 0, PAGE_SIZE - size);
>>>> +     memcpy(addr + poff, iomap->inline_data, size);
>>>> +     memset(addr + poff + size, 0, bsize - size);
>>>>       kunmap_atomic(addr);
>>>> -     SetPageUptodate(page);
>>>> +}
>>>> +
>>>> +static void
>>>> +iomap_read_inline_data(struct inode *inode, struct page *page,
>>>> +             struct iomap *iomap)
>>>> +{
>>>> +     unsigned int poff = offset_in_page(iomap->offset);
>>>> +     unsigned int bsize = i_blocksize(inode);
>>>> +
>>>> +     __iomap_read_inline_data(inode, page, iomap);
>>>> +     iomap_set_range_uptodate(page, poff, bsize);
>>>>  }
>>>>
>>>>  static loff_t
>>>> @@ -292,11 +315,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>>>>       unsigned poff, plen;
>>>>       sector_t sector;
>>>>
>>>> -     if (iomap->type == IOMAP_INLINE) {
>>>> -             WARN_ON_ONCE(pos);
>>>> +     if (iomap->type == IOMAP_INLINE)
>>>>               iomap_read_inline_data(inode, page, iomap);
>>>> -             return PAGE_SIZE;
>>>
>>> Hi Andreas,
>>>
>>> Thanks for your patch.
>>>
>>> In my erofs test case, filled inline data will be zeroed out due to we fallback
>>> to following flow:
>>>
>>>         if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
>>>                 zero_user(page, poff, plen);
>>>
>>> Should we return before this condition check?
>>
>> Yes, probably by returning i_blocksize(inode) after
>> iomap_read_inline_data, but that alone isn't enough to make the patch
> 
> Could you please resend this diff as a new patch with above fix?
> 
>> work completely. This really needs a review from Christoph and careful
>> testing of all the code paths.
> 
> Later, I can do the test on read path.
> 
> Thanks,
> 
>>
>> Andreas
>>
>>> Thanks,
>>>
>>>> -     }
>>>>
>>>>       /* zero post-eof blocks as the page may be mapped */
>>>>       iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
>>>> @@ -637,6 +657,11 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
>>>>       if (PageUptodate(page))
>>>>               return 0;
>>>>
>>>> +     if (iomap->type == IOMAP_INLINE) {
>>>> +             iomap_read_inline_data(inode, page, iomap);
>>>> +             return 0;
>>>> +     }
>>>> +
>>>>       do {
>>>>               iomap_adjust_read_range(inode, iop, &block_start,
>>>>                               block_end - block_start, &poff, &plen);
>>>> @@ -682,9 +707,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>>>>               goto out_no_page;
>>>>       }
>>>>
>>>> -     if (iomap->type == IOMAP_INLINE)
>>>> -             iomap_read_inline_data(inode, page, iomap);
>>>> -     else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>>>> +     if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>>>>               status = __block_write_begin_int(page, pos, len, NULL, iomap);
>>>>       else
>>>>               status = __iomap_write_begin(inode, pos, len, page, iomap);
>>>> @@ -761,11 +784,11 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>>>>  {
>>>>       void *addr;
>>>>
>>>> -     WARN_ON_ONCE(!PageUptodate(page));
>>>> -     BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>>> +     BUG_ON(!inline_data_within_block(inode, iomap, pos + copied));
>>>>
>>>>       addr = kmap_atomic(page);
>>>> -     memcpy(iomap->inline_data + pos, addr + pos, copied);
>>>> +     memcpy(iomap->inline_data + offset_in_block(pos, inode),
>>>> +            addr + offset_in_page(pos), copied);
>>>>       kunmap_atomic(addr);
>>>>
>>>>       mark_inode_dirty(inode);
>>>> @@ -1064,7 +1087,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
>>>>               const struct iomap_ops *ops)
>>>>  {
>>>>       unsigned int blocksize = i_blocksize(inode);
>>>> -     unsigned int off = pos & (blocksize - 1);
>>>> +     unsigned int off = offset_in_block(pos, inode);
>>>>
>>>>       /* Block boundary? Nothing to do */
>>>>       if (!off)
>>>> @@ -1772,21 +1795,26 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>>>>       struct iov_iter *iter = dio->submit.iter;
>>>>       size_t copied;
>>>>
>>>> -     BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>>> +     BUG_ON(!inline_data_within_block(inode, iomap, pos + length));
>>>>
>>>>       if (dio->flags & IOMAP_DIO_WRITE) {
>>>>               loff_t size = inode->i_size;
>>>>
>>>>               if (pos > size)
>>>> -                     memset(iomap->inline_data + size, 0, pos - size);
>>>> -             copied = copy_from_iter(iomap->inline_data + pos, length, iter);
>>>> +                     memset(iomap->inline_data +
>>>> +                            offset_in_block(size, inode), 0, pos - size);
>>>> +             copied = copy_from_iter(iomap->inline_data +
>>>> +                                     offset_in_block(pos, inode),
>>>> +                                     length, iter);
>>>>               if (copied) {
>>>>                       if (pos + copied > size)
>>>>                               i_size_write(inode, pos + copied);
>>>>                       mark_inode_dirty(inode);
>>>>               }
>>>>       } else {
>>>> -             copied = copy_to_iter(iomap->inline_data + pos, length, iter);
>>>> +             copied = copy_to_iter(iomap->inline_data +
>>>> +                                   offset_in_block(pos, inode),
>>>> +                                   length, iter);
>>>>       }
>>>>       dio->size += copied;
>>>>       return copied;
>>>> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
>>>> index 2103b94cb1bf..a8a60dd2fdc0 100644
>>>> --- a/include/linux/iomap.h
>>>> +++ b/include/linux/iomap.h
>>>> @@ -131,6 +131,9 @@ static inline struct iomap_page *to_iomap_page(struct page *page)
>>>>       return NULL;
>>>>  }
>>>>
>>>> +void __iomap_read_inline_data(struct inode *inode, struct page *page,
>>>> +             struct iomap *iomap);
>>>> +
>>>>  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
>>>>               const struct iomap_ops *ops);
>>>>  int iomap_readpage(struct page *page, const struct iomap_ops *ops);
>>>>
>> .
>>
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
@ 2019-07-17  2:58               ` Chao Yu
  0 siblings, 0 replies; 26+ messages in thread
From: Chao Yu @ 2019-07-17  2:58 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs, linux-fsdevel,
	Gao Xiang, chao

Ping, Andreas. :P

On 2019/7/15 17:26, Chao Yu wrote:
> On 2019/7/12 19:54, Andreas Gruenbacher wrote:
>> On Fri, 12 Jul 2019 at 11:31, Chao Yu <yuchao0@huawei.com> wrote:
>>> On 2019/7/11 20:28, Andreas Gruenbacher wrote:
>>>> Something along the lines of the attached, broken patch might work in
>>>> the end.
>>>>
>>>> Andreas
>>>>
>>>> ---
>>>>  fs/buffer.c           | 10 ++++--
>>>>  fs/iomap.c            | 74 +++++++++++++++++++++++++++++--------------
>>>>  include/linux/iomap.h |  3 ++
>>>>  3 files changed, 61 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/fs/buffer.c b/fs/buffer.c
>>>> index e450c55f6434..8d8668e377ab 100644
>>>> --- a/fs/buffer.c
>>>> +++ b/fs/buffer.c
>>>> @@ -1873,8 +1873,8 @@ void page_zero_new_buffers(struct page *page, unsigned from, unsigned to)
>>>>  EXPORT_SYMBOL(page_zero_new_buffers);
>>>>
>>>>  static void
>>>> -iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>>>> -             struct iomap *iomap)
>>>> +iomap_to_bh(struct inode *inode, struct page *page, sector_t block,
>>>> +             struct buffer_head *bh, struct iomap *iomap)
>>>>  {
>>>>       loff_t offset = block << inode->i_blkbits;
>>>>
>>>> @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>>>>                               inode->i_blkbits;
>>>>               set_buffer_mapped(bh);
>>>>               break;
>>>> +     case IOMAP_INLINE:
>>>> +             __iomap_read_inline_data(inode, page, iomap);
>>>> +             set_buffer_uptodate(bh);
>>>> +             break;
>>>>       }
>>>>  }
>>>>
>>>> @@ -1969,7 +1973,7 @@ int __block_write_begin_int(struct page *page, loff_t pos, unsigned len,
>>>>                               if (err)
>>>>                                       break;
>>>>                       } else {
>>>> -                             iomap_to_bh(inode, block, bh, iomap);
>>>> +                             iomap_to_bh(inode, page, block, bh, iomap);
>>>>                       }
>>>>
>>>>                       if (buffer_new(bh)) {
>>>> diff --git a/fs/iomap.c b/fs/iomap.c
>>>> index 45aa58e837b5..61188e95def2 100644
>>>> --- a/fs/iomap.c
>>>> +++ b/fs/iomap.c
>>>> @@ -260,24 +260,47 @@ struct iomap_readpage_ctx {
>>>>       struct list_head        *pages;
>>>>  };
>>>>
>>>> -static void
>>>> -iomap_read_inline_data(struct inode *inode, struct page *page,
>>>> +#define offset_in_block(offset, inode) \
>>>> +     ((unsigned long)(offset) & (i_blocksize(inode) - 1))
>>>> +
>>>> +static bool
>>>> +inline_data_within_block(struct inode *inode, struct iomap *iomap,
>>>> +             unsigned int size)
>>>> +{
>>>> +     unsigned int off = offset_in_block(iomap->inline_data, inode);
>>>> +
>>>> +     return size <= i_blocksize(inode) - off;
>>>> +}
>>>> +
>>>> +void
>>>> +__iomap_read_inline_data(struct inode *inode, struct page *page,
>>>>               struct iomap *iomap)
>>>>  {
>>>> -     size_t size = i_size_read(inode);
>>>> +     size_t size = offset_in_block(i_size_read(inode), inode);
>>>> +     unsigned int poff = offset_in_page(iomap->offset);
>>>> +     unsigned int bsize = i_blocksize(inode);
>>>>       void *addr;
>>>>
>>>>       if (PageUptodate(page))
>>>>               return;
>>>>
>>>> -     BUG_ON(page->index);
>>>> -     BUG_ON(size > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>>> +     BUG_ON(!inline_data_within_block(inode, iomap, size));
>>>>
>>>>       addr = kmap_atomic(page);
>>>> -     memcpy(addr, iomap->inline_data, size);
>>>> -     memset(addr + size, 0, PAGE_SIZE - size);
>>>> +     memcpy(addr + poff, iomap->inline_data, size);
>>>> +     memset(addr + poff + size, 0, bsize - size);
>>>>       kunmap_atomic(addr);
>>>> -     SetPageUptodate(page);
>>>> +}
>>>> +
>>>> +static void
>>>> +iomap_read_inline_data(struct inode *inode, struct page *page,
>>>> +             struct iomap *iomap)
>>>> +{
>>>> +     unsigned int poff = offset_in_page(iomap->offset);
>>>> +     unsigned int bsize = i_blocksize(inode);
>>>> +
>>>> +     __iomap_read_inline_data(inode, page, iomap);
>>>> +     iomap_set_range_uptodate(page, poff, bsize);
>>>>  }
>>>>
>>>>  static loff_t
>>>> @@ -292,11 +315,8 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>>>>       unsigned poff, plen;
>>>>       sector_t sector;
>>>>
>>>> -     if (iomap->type == IOMAP_INLINE) {
>>>> -             WARN_ON_ONCE(pos);
>>>> +     if (iomap->type == IOMAP_INLINE)
>>>>               iomap_read_inline_data(inode, page, iomap);
>>>> -             return PAGE_SIZE;
>>>
>>> Hi Andreas,
>>>
>>> Thanks for your patch.
>>>
>>> In my erofs test case, filled inline data will be zeroed out due to we fallback
>>> to following flow:
>>>
>>>         if (iomap->type != IOMAP_MAPPED || pos >= i_size_read(inode)) {
>>>                 zero_user(page, poff, plen);
>>>
>>> Should we return before this condition check?
>>
>> Yes, probably by returning i_blocksize(inode) after
>> iomap_read_inline_data, but that alone isn't enough to make the patch
> 
> Could you please resend this diff as a new patch with above fix?
> 
>> work completely. This really needs a review from Christoph and careful
>> testing of all the code paths.
> 
> Later, I can do the test on read path.
> 
> Thanks,
> 
>>
>> Andreas
>>
>>> Thanks,
>>>
>>>> -     }
>>>>
>>>>       /* zero post-eof blocks as the page may be mapped */
>>>>       iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen);
>>>> @@ -637,6 +657,11 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
>>>>       if (PageUptodate(page))
>>>>               return 0;
>>>>
>>>> +     if (iomap->type == IOMAP_INLINE) {
>>>> +             iomap_read_inline_data(inode, page, iomap);
>>>> +             return 0;
>>>> +     }
>>>> +
>>>>       do {
>>>>               iomap_adjust_read_range(inode, iop, &block_start,
>>>>                               block_end - block_start, &poff, &plen);
>>>> @@ -682,9 +707,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
>>>>               goto out_no_page;
>>>>       }
>>>>
>>>> -     if (iomap->type == IOMAP_INLINE)
>>>> -             iomap_read_inline_data(inode, page, iomap);
>>>> -     else if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>>>> +     if (iomap->flags & IOMAP_F_BUFFER_HEAD)
>>>>               status = __block_write_begin_int(page, pos, len, NULL, iomap);
>>>>       else
>>>>               status = __iomap_write_begin(inode, pos, len, page, iomap);
>>>> @@ -761,11 +784,11 @@ iomap_write_end_inline(struct inode *inode, struct page *page,
>>>>  {
>>>>       void *addr;
>>>>
>>>> -     WARN_ON_ONCE(!PageUptodate(page));
>>>> -     BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>>> +     BUG_ON(!inline_data_within_block(inode, iomap, pos + copied));
>>>>
>>>>       addr = kmap_atomic(page);
>>>> -     memcpy(iomap->inline_data + pos, addr + pos, copied);
>>>> +     memcpy(iomap->inline_data + offset_in_block(pos, inode),
>>>> +            addr + offset_in_page(pos), copied);
>>>>       kunmap_atomic(addr);
>>>>
>>>>       mark_inode_dirty(inode);
>>>> @@ -1064,7 +1087,7 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
>>>>               const struct iomap_ops *ops)
>>>>  {
>>>>       unsigned int blocksize = i_blocksize(inode);
>>>> -     unsigned int off = pos & (blocksize - 1);
>>>> +     unsigned int off = offset_in_block(pos, inode);
>>>>
>>>>       /* Block boundary? Nothing to do */
>>>>       if (!off)
>>>> @@ -1772,21 +1795,26 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length,
>>>>       struct iov_iter *iter = dio->submit.iter;
>>>>       size_t copied;
>>>>
>>>> -     BUG_ON(pos + length > PAGE_SIZE - offset_in_page(iomap->inline_data));
>>>> +     BUG_ON(!inline_data_within_block(inode, iomap, pos + length));
>>>>
>>>>       if (dio->flags & IOMAP_DIO_WRITE) {
>>>>               loff_t size = inode->i_size;
>>>>
>>>>               if (pos > size)
>>>> -                     memset(iomap->inline_data + size, 0, pos - size);
>>>> -             copied = copy_from_iter(iomap->inline_data + pos, length, iter);
>>>> +                     memset(iomap->inline_data +
>>>> +                            offset_in_block(size, inode), 0, pos - size);
>>>> +             copied = copy_from_iter(iomap->inline_data +
>>>> +                                     offset_in_block(pos, inode),
>>>> +                                     length, iter);
>>>>               if (copied) {
>>>>                       if (pos + copied > size)
>>>>                               i_size_write(inode, pos + copied);
>>>>                       mark_inode_dirty(inode);
>>>>               }
>>>>       } else {
>>>> -             copied = copy_to_iter(iomap->inline_data + pos, length, iter);
>>>> +             copied = copy_to_iter(iomap->inline_data +
>>>> +                                   offset_in_block(pos, inode),
>>>> +                                   length, iter);
>>>>       }
>>>>       dio->size += copied;
>>>>       return copied;
>>>> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
>>>> index 2103b94cb1bf..a8a60dd2fdc0 100644
>>>> --- a/include/linux/iomap.h
>>>> +++ b/include/linux/iomap.h
>>>> @@ -131,6 +131,9 @@ static inline struct iomap_page *to_iomap_page(struct page *page)
>>>>       return NULL;
>>>>  }
>>>>
>>>> +void __iomap_read_inline_data(struct inode *inode, struct page *page,
>>>> +             struct iomap *iomap);
>>>> +
>>>>  ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
>>>>               const struct iomap_ops *ops);
>>>>  int iomap_readpage(struct page *page, const struct iomap_ops *ops);
>>>>
>> .
>>
> .
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-11 12:28     ` Andreas Gruenbacher
  2019-07-12  9:31         ` Chao Yu
@ 2019-07-18 12:31       ` Christoph Hellwig
  2019-07-18 13:27           ` Gao Xiang
  1 sibling, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2019-07-18 12:31 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Chao Yu, Christoph Hellwig, Darrick J. Wong, linux-xfs,
	linux-fsdevel, Gao Xiang, chao

> +iomap_to_bh(struct inode *inode, struct page *page, sector_t block,
> +		struct buffer_head *bh, struct iomap *iomap)
>  {
>  	loff_t offset = block << inode->i_blkbits;
>  
> @@ -1924,6 +1924,10 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
>  				inode->i_blkbits;
>  		set_buffer_mapped(bh);
>  		break;
> +	case IOMAP_INLINE:
> +		__iomap_read_inline_data(inode, page, iomap);
> +		set_buffer_uptodate(bh);
> +		break;

I have to say I hate pushing this into the legacy buffer_head code.
My hope was that we could get rid of this code rather than adding to it.

The other issue is that this now calls iomap functions from buffer.c,
which I'd also really avoid.

That being said until the tail packing fs (erofs?) actually uses
buffer_heads we should not need this hunk, and I don't think erofs
should have any reason to use buffer_heads.

> +#define offset_in_block(offset, inode) \
> +	((unsigned long)(offset) & (i_blocksize(inode) - 1))

Make this an inline function, please.  I think we also have a few
other places that could make use of this helper, maybe it might
even go into fs.h.

Otherwise this looks sensible to me.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-12 11:54         ` Andreas Gruenbacher
  2019-07-15  9:26             ` Chao Yu
@ 2019-07-18 12:33           ` Christoph Hellwig
  1 sibling, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2019-07-18 12:33 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Chao Yu, Christoph Hellwig, Darrick J. Wong, linux-xfs,
	linux-fsdevel, Gao Xiang, chao

On Fri, Jul 12, 2019 at 01:54:07PM +0200, Andreas Gruenbacher wrote:
> Yes, probably by returning i_blocksize(inode) after
> iomap_read_inline_data, but that alone isn't enough to make the patch
> work completely. This really needs a review from Christoph and careful
> testing of all the code paths.

Well, lets start with the testing.  Can we get a coherent series that
includes the erofs bits?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
  2019-07-18 12:31       ` Christoph Hellwig
@ 2019-07-18 13:27           ` Gao Xiang
  0 siblings, 0 replies; 26+ messages in thread
From: Gao Xiang @ 2019-07-18 13:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, Chao Yu, Darrick J. Wong, linux-xfs,
	linux-fsdevel, chao



On 2019/7/18 20:31, Christoph Hellwig wrote:
> That being said until the tail packing fs (erofs?) actually uses
> buffer_heads we should not need this hunk, and I don't think erofs
> should have any reason to use buffer_heads.

Yes, erofs will not use buffer_head to support sub-page blocksize
in the long term. too heavy and no need...

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case
@ 2019-07-18 13:27           ` Gao Xiang
  0 siblings, 0 replies; 26+ messages in thread
From: Gao Xiang @ 2019-07-18 13:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Gruenbacher, Chao Yu, Darrick J. Wong, linux-xfs,
	linux-fsdevel, chao



On 2019/7/18 20:31, Christoph Hellwig wrote:
> That being said until the tail packing fs (erofs?) actually uses
> buffer_heads we should not need this hunk, and I don't think erofs
> should have any reason to use buffer_heads.

Yes, erofs will not use buffer_head to support sub-page blocksize
in the long term. too heavy and no need...

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-07-18 13:27 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-03  7:55 [RFC PATCH] iomap: generalize IOMAP_INLINE to cover tail-packing case Chao Yu
2019-07-03  7:55 ` Chao Yu
2019-07-08  2:12 ` Chao Yu
2019-07-08  2:12   ` Chao Yu
2019-07-08 16:03 ` Christoph Hellwig
2019-07-09 13:52   ` Chao Yu
2019-07-09 23:32 ` Andreas Grünbacher
2019-07-10 10:30   ` Chao Yu
2019-07-10 10:30     ` Chao Yu
2019-07-10 21:50     ` Andreas Grünbacher
2019-07-10 23:42       ` Gao Xiang
2019-07-11 13:06         ` Matthew Wilcox
2019-07-11 13:54           ` Gao Xiang
2019-07-11 14:15       ` Chao Yu
2019-07-11 12:28     ` Andreas Gruenbacher
2019-07-12  9:31       ` Chao Yu
2019-07-12  9:31         ` Chao Yu
2019-07-12 11:54         ` Andreas Gruenbacher
2019-07-15  9:26           ` Chao Yu
2019-07-15  9:26             ` Chao Yu
2019-07-17  2:58             ` Chao Yu
2019-07-17  2:58               ` Chao Yu
2019-07-18 12:33           ` Christoph Hellwig
2019-07-18 12:31       ` Christoph Hellwig
2019-07-18 13:27         ` Gao Xiang
2019-07-18 13:27           ` Gao Xiang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.