linux-erofs.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] erofs-utils: add clusterofs zero check to write_uncompressed_extent()
@ 2021-08-17  4:06 Yue Hu
  2021-08-17 12:18 ` Gao Xiang
  2021-08-17 13:13 ` Gao Xiang
  0 siblings, 2 replies; 7+ messages in thread
From: Yue Hu @ 2021-08-17  4:06 UTC (permalink / raw)
  To: linux-erofs, xiang; +Cc: huyue2, zbestahu

From: Yue Hu <huyue2@yulong.com>

No need to reset clusterofs to 0 if it's already 0. Acturally, we also
observed that case frequently. Let's avoid it.

Signed-off-by: Yue Hu <huyue2@yulong.com>
---
 lib/compress.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/compress.c b/lib/compress.c
index 40723a1..a8ebbc1 100644
--- a/lib/compress.c
+++ b/lib/compress.c
@@ -130,7 +130,7 @@ static int write_uncompressed_extent(struct z_erofs_vle_compress_ctx *ctx,
 	unsigned int count;
 
 	/* reset clusterofs to 0 if permitted */
-	if (!erofs_sb_has_lz4_0padding() &&
+	if (!erofs_sb_has_lz4_0padding() && ctx->clusterofs &&
 	    ctx->head >= ctx->clusterofs) {
 		ctx->head -= ctx->clusterofs;
 		*len += ctx->clusterofs;
-- 
1.9.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] erofs-utils: add clusterofs zero check to write_uncompressed_extent()
  2021-08-17  4:06 [PATCH] erofs-utils: add clusterofs zero check to write_uncompressed_extent() Yue Hu
@ 2021-08-17 12:18 ` Gao Xiang
  2021-08-17 13:13 ` Gao Xiang
  1 sibling, 0 replies; 7+ messages in thread
From: Gao Xiang @ 2021-08-17 12:18 UTC (permalink / raw)
  To: Yue Hu; +Cc: xiang, linux-erofs, huyue2, zbestahu

On Tue, Aug 17, 2021 at 12:06:04PM +0800, Yue Hu wrote:
> From: Yue Hu <huyue2@yulong.com>
> 
> No need to reset clusterofs to 0 if it's already 0. Acturally, we also
> observed that case frequently. Let's avoid it.
> 
> Signed-off-by: Yue Hu <huyue2@yulong.com>

Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>

(btw, also, if compression performance is important for some cases, I'd
 suggest if someone could take some time on multi-threaded mkfs...
 That includes:
  - paralleled compression for different files;
  - paralleled segment compression for one file (the basic principle is
    to deal with a file with sub-files with the same segment size, such
    as 16M, like what I described in the following thread:

 https://lore.kernel.org/r/20200705182049.GA20632@hsiangkao-HP-ZHAN-66-Pro-G1

 That may be quite also useful for LZMA compression later.)
Thanks,
Gao Xiang


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] erofs-utils: add clusterofs zero check to write_uncompressed_extent()
  2021-08-17  4:06 [PATCH] erofs-utils: add clusterofs zero check to write_uncompressed_extent() Yue Hu
  2021-08-17 12:18 ` Gao Xiang
@ 2021-08-17 13:13 ` Gao Xiang
  2021-08-17 14:10   ` Yue Hu
  1 sibling, 1 reply; 7+ messages in thread
From: Gao Xiang @ 2021-08-17 13:13 UTC (permalink / raw)
  To: Yue Hu; +Cc: xiang, linux-erofs, huyue2, zbestahu

Hi Yue,

On Tue, Aug 17, 2021 at 12:06:04PM +0800, Yue Hu wrote:
> From: Yue Hu <huyue2@yulong.com>
> 
> No need to reset clusterofs to 0 if it's already 0. Acturally, we also
> observed that case frequently. Let's avoid it.
> 
> Signed-off-by: Yue Hu <huyue2@yulong.com>
> ---
>  lib/compress.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/compress.c b/lib/compress.c
> index 40723a1..a8ebbc1 100644
> --- a/lib/compress.c
> +++ b/lib/compress.c
> @@ -130,7 +130,7 @@ static int write_uncompressed_extent(struct z_erofs_vle_compress_ctx *ctx,
>  	unsigned int count;
>  
>  	/* reset clusterofs to 0 if permitted */
> -	if (!erofs_sb_has_lz4_0padding() &&
> +	if (!erofs_sb_has_lz4_0padding() && ctx->clusterofs &&

Also out of curiosity, which means erofs is used without lz4 0padding?
That way is not recommended now anyway, since it forbids erofs in-place
decompression (only inplace I/O works instead.)

Actually we could make clusterofs aligned with 0 for 0padding cases as
well, but in that cases, we need to recompress the previous pcluster
with new trimmed size rather than just like this.

Thanks,
Gao Xiang

>  	    ctx->head >= ctx->clusterofs) {
>  		ctx->head -= ctx->clusterofs;
>  		*len += ctx->clusterofs;
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] erofs-utils: add clusterofs zero check to write_uncompressed_extent()
  2021-08-17 13:13 ` Gao Xiang
@ 2021-08-17 14:10   ` Yue Hu
  2021-08-17 14:38     ` Gao Xiang
  0 siblings, 1 reply; 7+ messages in thread
From: Yue Hu @ 2021-08-17 14:10 UTC (permalink / raw)
  To: Gao Xiang; +Cc: xiang, linux-erofs, huyue2, zbestahu

Hi Xiang,

On Tue, 17 Aug 2021 21:13:14 +0800
Gao Xiang <hsiangkao@linux.alibaba.com> wrote:

> Hi Yue,
> 
> On Tue, Aug 17, 2021 at 12:06:04PM +0800, Yue Hu wrote:
> > From: Yue Hu <huyue2@yulong.com>
> > 
> > No need to reset clusterofs to 0 if it's already 0. Acturally, we also
> > observed that case frequently. Let's avoid it.
> > 
> > Signed-off-by: Yue Hu <huyue2@yulong.com>
> > ---
> >  lib/compress.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/lib/compress.c b/lib/compress.c
> > index 40723a1..a8ebbc1 100644
> > --- a/lib/compress.c
> > +++ b/lib/compress.c
> > @@ -130,7 +130,7 @@ static int write_uncompressed_extent(struct z_erofs_vle_compress_ctx *ctx,
> >  	unsigned int count;
> >  
> >  	/* reset clusterofs to 0 if permitted */
> > -	if (!erofs_sb_has_lz4_0padding() &&
> > +	if (!erofs_sb_has_lz4_0padding() && ctx->clusterofs &&  
> 
> Also out of curiosity, which means erofs is used without lz4 0padding?

Yes. We are using legacy compression layout now.

> That way is not recommended now anyway, since it forbids erofs in-place
> decompression (only inplace I/O works instead.)

Thanks for the reminding.

> 
> Actually we could make clusterofs aligned with 0 for 0padding cases as
> well, but in that cases, we need to recompress the previous pcluster
> with new trimmed size rather than just like this.

Got it. 
BTW, I think the information should be useful for me to understand the code further.

Thanks.

> 
> Thanks,
> Gao Xiang
> 
> >  	    ctx->head >= ctx->clusterofs) {
> >  		ctx->head -= ctx->clusterofs;
> >  		*len += ctx->clusterofs;
> > -- 
> > 1.9.1  


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] erofs-utils: add clusterofs zero check to write_uncompressed_extent()
  2021-08-17 14:10   ` Yue Hu
@ 2021-08-17 14:38     ` Gao Xiang
  2021-08-18  2:21       ` Yue Hu
  0 siblings, 1 reply; 7+ messages in thread
From: Gao Xiang @ 2021-08-17 14:38 UTC (permalink / raw)
  To: Yue Hu; +Cc: xiang, linux-erofs, huyue2, zbestahu

On Tue, Aug 17, 2021 at 10:10:46PM +0800, Yue Hu wrote:
> Hi Xiang,
> 
> On Tue, 17 Aug 2021 21:13:14 +0800
> Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> 
> > Hi Yue,
> > 
> > On Tue, Aug 17, 2021 at 12:06:04PM +0800, Yue Hu wrote:
> > > From: Yue Hu <huyue2@yulong.com>
> > > 
> > > No need to reset clusterofs to 0 if it's already 0. Acturally, we also
> > > observed that case frequently. Let's avoid it.
> > > 
> > > Signed-off-by: Yue Hu <huyue2@yulong.com>
> > > ---
> > >  lib/compress.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/lib/compress.c b/lib/compress.c
> > > index 40723a1..a8ebbc1 100644
> > > --- a/lib/compress.c
> > > +++ b/lib/compress.c
> > > @@ -130,7 +130,7 @@ static int write_uncompressed_extent(struct z_erofs_vle_compress_ctx *ctx,
> > >  	unsigned int count;
> > >  
> > >  	/* reset clusterofs to 0 if permitted */
> > > -	if (!erofs_sb_has_lz4_0padding() &&
> > > +	if (!erofs_sb_has_lz4_0padding() && ctx->clusterofs &&  
> > 
> > Also out of curiosity, which means erofs is used without lz4 0padding?
> 
> Yes. We are using legacy compression layout now.

Ok, I think sometime I will seperate it into (non)compact compress
indexes, and (non)0padding. It was once named as legacy just before
Linux < 5.3 didn't support them.

By default, the preferred option now I think is compact and 0padding.

0padding will enable in-place decompression (and LZMA will always
enable it.)
compact will reduce metadata even further as long as the compressed
data is consecutive (e.g. 2 bytes each lcluster for compact indexes
rather than 8 bytes each lcluster for noncompact indexes, but anyway,
either (non)compact metadata can support effective data random access.)

Ok, if it's somewhat kernel 4.19 based code, I'd suggest to upgrade
the codebase at least to 5.4, not because 4.19 doesn't work, but
really for some performance concern....

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] erofs-utils: add clusterofs zero check to write_uncompressed_extent()
  2021-08-17 14:38     ` Gao Xiang
@ 2021-08-18  2:21       ` Yue Hu
  2021-08-18 14:06         ` Gao Xiang
  0 siblings, 1 reply; 7+ messages in thread
From: Yue Hu @ 2021-08-18  2:21 UTC (permalink / raw)
  To: Gao Xiang; +Cc: xiang, linux-erofs, huyue2, zbestahu

Hi Xiang,

On Tue, 17 Aug 2021 22:38:04 +0800
Gao Xiang <hsiangkao@linux.alibaba.com> wrote:

> On Tue, Aug 17, 2021 at 10:10:46PM +0800, Yue Hu wrote:
> > Hi Xiang,
> > 
> > On Tue, 17 Aug 2021 21:13:14 +0800
> > Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> >   
> > > Hi Yue,
> > > 
> > > On Tue, Aug 17, 2021 at 12:06:04PM +0800, Yue Hu wrote:  
> > > > From: Yue Hu <huyue2@yulong.com>
> > > > 
> > > > No need to reset clusterofs to 0 if it's already 0. Acturally, we also
> > > > observed that case frequently. Let's avoid it.
> > > > 
> > > > Signed-off-by: Yue Hu <huyue2@yulong.com>
> > > > ---
> > > >  lib/compress.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/lib/compress.c b/lib/compress.c
> > > > index 40723a1..a8ebbc1 100644
> > > > --- a/lib/compress.c
> > > > +++ b/lib/compress.c
> > > > @@ -130,7 +130,7 @@ static int write_uncompressed_extent(struct z_erofs_vle_compress_ctx *ctx,
> > > >  	unsigned int count;
> > > >  
> > > >  	/* reset clusterofs to 0 if permitted */
> > > > -	if (!erofs_sb_has_lz4_0padding() &&
> > > > +	if (!erofs_sb_has_lz4_0padding() && ctx->clusterofs &&    
> > > 
> > > Also out of curiosity, which means erofs is used without lz4 0padding?  

Moreover, seems 'ctx->head' will always >= 'ctx->clusterofs' here since there
are only 2 sites to increase it. One is 'ctx->head += count' after compression,
another is 'ctx->head = qh_after' ( which should be same as ->clusterofs, rt?).

I understand there are part of duplication data when writing uncompressed data
to disk due to 'ctx->head' decreased?

Thanks.

> > 
> > Yes. We are using legacy compression layout now.  
> 
> Ok, I think sometime I will seperate it into (non)compact compress
> indexes, and (non)0padding. It was once named as legacy just before
> Linux < 5.3 didn't support them.
> 
> By default, the preferred option now I think is compact and 0padding.
> 
> 0padding will enable in-place decompression (and LZMA will always
> enable it.)
> compact will reduce metadata even further as long as the compressed
> data is consecutive (e.g. 2 bytes each lcluster for compact indexes
> rather than 8 bytes each lcluster for noncompact indexes, but anyway,
> either (non)compact metadata can support effective data random access.)
> 
> Ok, if it's somewhat kernel 4.19 based code, I'd suggest to upgrade
> the codebase at least to 5.4, not because 4.19 doesn't work, but
> really for some performance concern....
> 
> Thanks,
> Gao Xiang


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] erofs-utils: add clusterofs zero check to write_uncompressed_extent()
  2021-08-18  2:21       ` Yue Hu
@ 2021-08-18 14:06         ` Gao Xiang
  0 siblings, 0 replies; 7+ messages in thread
From: Gao Xiang @ 2021-08-18 14:06 UTC (permalink / raw)
  To: Yue Hu; +Cc: xiang, linux-erofs, huyue2, zbestahu

Hi Yue,

On Wed, Aug 18, 2021 at 10:21:21AM +0800, Yue Hu wrote:
> Hi Xiang,
> 
> On Tue, 17 Aug 2021 22:38:04 +0800
> Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> 
> > On Tue, Aug 17, 2021 at 10:10:46PM +0800, Yue Hu wrote:
> > > Hi Xiang,
> > > 
> > > On Tue, 17 Aug 2021 21:13:14 +0800
> > > Gao Xiang <hsiangkao@linux.alibaba.com> wrote:
> > >   
> > > > Hi Yue,
> > > > 
> > > > On Tue, Aug 17, 2021 at 12:06:04PM +0800, Yue Hu wrote:  
> > > > > From: Yue Hu <huyue2@yulong.com>
> > > > > 
> > > > > No need to reset clusterofs to 0 if it's already 0. Acturally, we also
> > > > > observed that case frequently. Let's avoid it.
> > > > > 
> > > > > Signed-off-by: Yue Hu <huyue2@yulong.com>
> > > > > ---
> > > > >  lib/compress.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/lib/compress.c b/lib/compress.c
> > > > > index 40723a1..a8ebbc1 100644
> > > > > --- a/lib/compress.c
> > > > > +++ b/lib/compress.c
> > > > > @@ -130,7 +130,7 @@ static int write_uncompressed_extent(struct z_erofs_vle_compress_ctx *ctx,
> > > > >  	unsigned int count;
> > > > >  
> > > > >  	/* reset clusterofs to 0 if permitted */
> > > > > -	if (!erofs_sb_has_lz4_0padding() &&
> > > > > +	if (!erofs_sb_has_lz4_0padding() && ctx->clusterofs &&    
> > > > 
> > > > Also out of curiosity, which means erofs is used without lz4 0padding?  
> 
> Moreover, seems 'ctx->head' will always >= 'ctx->clusterofs' here since there
> are only 2 sites to increase it. One is 'ctx->head += count' after compression,
> another is 'ctx->head = qh_after' ( which should be same as ->clusterofs, rt?).

Sorry for some delay (since I shouldn't take working time about this.)

The design of this part is that

Due to most compression algorithms can only accept a one-shot consecutive
buffer for each lz4 block and we limit the maximal input (uncompressed)
data for each pcluster is EROFS_CONFIG_COMPR_MAX_SZ, so EROFS allocates
a buffer with the size of EROFS_CONFIG_COMPR_MAX_SZ * 2, as below:

                     2 * EROFS_CONFIG_COMPR_MAX_SZ
  _____________________________________________________________
 |_____________________________________________________________|
      ^                                     ^
      head                                  tail

head is the pending data for compression, and tail is for the next
read. As you can see, if head < EROFS_CONFIG_COMPR_MAX_SZ, we can
always get `EROFS_CONFIG_COMPR_MAX_SZ consecutive data' in this buffer.

So after head >= EROFS_CONFIG_COMPR_MAX_SZ, we needs to move head ~ tail
to the start of this buffer to keep it work.

> 
> I understand there are part of duplication data when writing uncompressed data
> to disk due to 'ctx->head' decreased?

Yeah, qh_aligned is mainly used for !0padding alignment of uncompressed
data. It's not quite useful for 0padding, since head pointer won't be
decreased.

But I don't think I like omiting ">= 'ctx->clusterofs'" (and I suspect
if it's correct or just because the special EROFS_CONFIG_COMPR_MAX_SZ)
since it involves many assumption and deep condition, and could cause
maintain burden if we change logic like this.

Thanks,
Gao Xiang

> 
> Thanks.
> 
> > > 
> > > Yes. We are using legacy compression layout now.  
> > 
> > Ok, I think sometime I will seperate it into (non)compact compress
> > indexes, and (non)0padding. It was once named as legacy just before
> > Linux < 5.3 didn't support them.
> > 
> > By default, the preferred option now I think is compact and 0padding.
> > 
> > 0padding will enable in-place decompression (and LZMA will always
> > enable it.)
> > compact will reduce metadata even further as long as the compressed
> > data is consecutive (e.g. 2 bytes each lcluster for compact indexes
> > rather than 8 bytes each lcluster for noncompact indexes, but anyway,
> > either (non)compact metadata can support effective data random access.)
> > 
> > Ok, if it's somewhat kernel 4.19 based code, I'd suggest to upgrade
> > the codebase at least to 5.4, not because 4.19 doesn't work, but
> > really for some performance concern....
> > 
> > Thanks,
> > Gao Xiang

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-18 14:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-17  4:06 [PATCH] erofs-utils: add clusterofs zero check to write_uncompressed_extent() Yue Hu
2021-08-17 12:18 ` Gao Xiang
2021-08-17 13:13 ` Gao Xiang
2021-08-17 14:10   ` Yue Hu
2021-08-17 14:38     ` Gao Xiang
2021-08-18  2:21       ` Yue Hu
2021-08-18 14:06         ` Gao Xiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).