All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2] archiver: Configurable tarball compression
@ 2021-09-21  6:15 Ian Ray
  2021-09-21 12:18 ` [OE-core] " Richard Purdie
  0 siblings, 1 reply; 7+ messages in thread
From: Ian Ray @ 2021-09-21  6:15 UTC (permalink / raw)
  To: openembedded-core; +Cc: ian.ray

In order to be more efficient, we use xz as compression method
to create GPL sources archives.

Signed-off-by: Fabien Lahoudere <fabien.lahoudere@collabora.com>
[V1 was https://patchwork.openembedded.org/patch/155985/]
[Rebased]
Signed-off-by: Ian Ray <ian.ray@ge.com>
---
 meta/classes/archiver.bbclass | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/meta/classes/archiver.bbclass b/meta/classes/archiver.bbclass
index dd31dc0..411d459 100644
--- a/meta/classes/archiver.bbclass
+++ b/meta/classes/archiver.bbclass
@@ -51,6 +51,7 @@ ARCHIVER_MODE[diff-exclude] ?= ".pc autom4te.cache patches"
 ARCHIVER_MODE[dumpdata] ?= "0"
 ARCHIVER_MODE[recipe] ?= "0"
 ARCHIVER_MODE[mirror] ?= "split"
+ARCHIVER_MODE[compression] ?= "gz"
 
 DEPLOY_DIR_SRC ?= "${DEPLOY_DIR}/sources"
 ARCHIVER_TOPDIR ?= "${WORKDIR}/archiver-sources"
@@ -409,15 +410,16 @@ def create_tarball(d, srcdir, suffix, ar_outdir):
     # that we archive the actual directory and not just the link.
     srcdir = os.path.realpath(srcdir)
 
+    compression_method = d.getVarFlag('ARCHIVER_MODE', 'compression')
     bb.utils.mkdirhier(ar_outdir)
     if suffix:
-        filename = '%s-%s.tar.gz' % (d.getVar('PF'), suffix)
+        filename = '%s-%s.tar.%s' % (d.getVar('PF'), suffix, compression_method)
     else:
-        filename = '%s.tar.gz' % d.getVar('PF')
+        filename = '%s.tar.%s' % (d.getVar('PF'), compression_method)
     tarname = os.path.join(ar_outdir, filename)
 
     bb.note('Creating %s' % tarname)
-    tar = tarfile.open(tarname, 'w:gz')
+    tar = tarfile.open(tarname, 'w:%s' % compression_method)
     tar.add(srcdir, arcname=os.path.basename(srcdir), filter=exclude_useless_paths)
     tar.close()
 
-- 
2.10.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [OE-core] [PATCH V2] archiver: Configurable tarball compression
  2021-09-21  6:15 [PATCH V2] archiver: Configurable tarball compression Ian Ray
@ 2021-09-21 12:18 ` Richard Purdie
  2021-09-21 13:20   ` Michael Opdenacker
       [not found]   ` <16A6D8EA388563F3.1316@lists.openembedded.org>
  0 siblings, 2 replies; 7+ messages in thread
From: Richard Purdie @ 2021-09-21 12:18 UTC (permalink / raw)
  To: Ian Ray, openembedded-core

On Tue, 2021-09-21 at 09:15 +0300, Ian Ray wrote:
> In order to be more efficient, we use xz as compression method
> to create GPL sources archives.
> 
> Signed-off-by: Fabien Lahoudere <fabien.lahoudere@collabora.com>
> [V1 was https://patchwork.openembedded.org/patch/155985/]
> [Rebased]
> Signed-off-by: Ian Ray <ian.ray@ge.com>
> ---
>  meta/classes/archiver.bbclass | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)

Would it be better just to move to zstd and rather than making it configurable,
just switch to the better compression format?

Configurability is good but where there is a clear good choice, it may be better
just to do that rather than giving too many options if they aren't really
needed?

Cheers,

Richard


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [OE-core] [PATCH V2] archiver: Configurable tarball compression
  2021-09-21 12:18 ` [OE-core] " Richard Purdie
@ 2021-09-21 13:20   ` Michael Opdenacker
       [not found]   ` <16A6D8EA388563F3.1316@lists.openembedded.org>
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Opdenacker @ 2021-09-21 13:20 UTC (permalink / raw)
  To: Richard Purdie, Ian Ray, openembedded-core


On 9/21/21 2:18 PM, Richard Purdie wrote:
> On Tue, 2021-09-21 at 09:15 +0300, Ian Ray wrote:
>> In order to be more efficient, we use xz as compression method
>> to create GPL sources archives.
>>
>> Signed-off-by: Fabien Lahoudere <fabien.lahoudere@collabora.com>
>> [V1 was https://patchwork.openembedded.org/patch/155985/]
>> [Rebased]
>> Signed-off-by: Ian Ray <ian.ray@ge.com>
>> ---
>>  meta/classes/archiver.bbclass | 8 +++++---
>>  1 file changed, 5 insertions(+), 3 deletions(-)
> Would it be better just to move to zstd and rather than making it configurable,
> just switch to the better compression format?
>
> Configurability is good but where there is a clear good choice, it may be better
> just to do that rather than giving too many options if they aren't really
> needed?


I agree. We shouldn't add unnecessary complexity to the manuals ;-)

By the way, zstd seems to be marginally worse (+1%) than xz in terms of
compressed size, but is orders of magnitude faster (see
https://archlinux.org/news/now-using-zstandard-instead-of-xz-for-package-compression/).

I vote for zstd.

Thanks for starting the discussion.

Cheers,
Michael.

>
> Cheers,
>
> Richard
>
>
> 
>
-- 
Michael Opdenacker, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [OE-core] [PATCH V2] archiver: Configurable tarball compression
       [not found]   ` <16A6D8EA388563F3.1316@lists.openembedded.org>
@ 2021-09-21 13:48     ` Michael Opdenacker
  2021-09-22  8:29       ` EXT: " Ian Ray
  2021-10-27 11:31       ` Martyn Welch
  0 siblings, 2 replies; 7+ messages in thread
From: Michael Opdenacker @ 2021-09-21 13:48 UTC (permalink / raw)
  To: openembedded-core


On 9/21/21 3:20 PM, Michael Opdenacker wrote:
> By the way, zstd seems to be marginally worse (+1%) than xz in terms of
> compressed size, but is orders of magnitude faster (see
> https://archlinux.org/news/now-using-zstandard-instead-of-xz-for-package-compression/).


Actually, this article only mentions decompression speed, but that's
also true for compression speed.

Here are my own tests:

mike@mike-laptop:~/tmp$ time gzip linux-5.15-rc2.tar

real    0m29.293s
user    0m28.712s
sys    0m0.553s

mike@mike-laptop:~/tmp$ time xz linux-5.15-rc2.tar

real    7m2.658s
user    7m1.096s
sys    0m1.280s

mike@mike-laptop:~/tmp$ time zstd linux-5.15-rc2.tar
linux-5.15-rc2.tar   : 16.29%   (1136803840 => 185233271 bytes,
linux-5.15-rc2.tar.zst)

real    0m5.476s
user    0m5.530s
sys    0m0.864s

mike@mike-laptop:~/tmp$ ls -la linux-5.15*
-rw-rw-r-- 1 mike mike 1136803840 Sep 21 15:31 linux-5.15-rc2.tar
-rw-rw-r-- 1 mike mike  198135832 Sep 21 15:24 linux-5.15-rc2.tar.gz
-rw-rw-r-- 1 mike mike  125980548 Sep 21 15:26 linux-5.15-rc2.tar.xz
-rw-rw-r-- 1 mike mike  185233271 Sep 21 15:31 linux-5.15-rc2.tar.zst

So, here the claim that zstd (with default options) is almost as good as
xz in compressed size is not confirmed. However, zstd is a clear winner
in terms of compression speed, and anyway better than gzip. This is
worth switching.

Cheers

Michael

-- 
Michael Opdenacker, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: EXT: Re: [OE-core] [PATCH V2] archiver: Configurable tarball compression
  2021-09-21 13:48     ` Michael Opdenacker
@ 2021-09-22  8:29       ` Ian Ray
  2021-10-27 11:31       ` Martyn Welch
  1 sibling, 0 replies; 7+ messages in thread
From: Ian Ray @ 2021-09-22  8:29 UTC (permalink / raw)
  To: Michael Opdenacker; +Cc: openembedded-core

On Tue, Sep 21, 2021 at 03:48:29PM +0200, Michael Opdenacker wrote:
> 
> On 9/21/21 3:20 PM, Michael Opdenacker wrote:
> > By the way, zstd seems to be marginally worse (+1%) than xz in terms of
> > compressed size, but is orders of magnitude faster (see
> > https://archlinux.org/news/now-using-zstandard-instead-of-xz-for-package-compression/).
> 
> 
> Actually, this article only mentions decompression speed, but that's
> also true for compression speed.
> 
> Here are my own tests:
> 
> mike@mike-laptop:~/tmp$ time gzip linux-5.15-rc2.tar
> 
> real    0m29.293s
> user    0m28.712s
> sys    0m0.553s
> 
> mike@mike-laptop:~/tmp$ time xz linux-5.15-rc2.tar
> 
> real    7m2.658s
> user    7m1.096s
> sys    0m1.280s
> 
> mike@mike-laptop:~/tmp$ time zstd linux-5.15-rc2.tar
> linux-5.15-rc2.tar   : 16.29%   (1136803840 => 185233271 bytes,
> linux-5.15-rc2.tar.zst)
> 
> real    0m5.476s
> user    0m5.530s
> sys    0m0.864s
> 
> mike@mike-laptop:~/tmp$ ls -la linux-5.15*
> -rw-rw-r-- 1 mike mike 1136803840 Sep 21 15:31 linux-5.15-rc2.tar
> -rw-rw-r-- 1 mike mike  198135832 Sep 21 15:24 linux-5.15-rc2.tar.gz
> -rw-rw-r-- 1 mike mike  125980548 Sep 21 15:26 linux-5.15-rc2.tar.xz
> -rw-rw-r-- 1 mike mike  185233271 Sep 21 15:31 linux-5.15-rc2.tar.zst
> 
> So, here the claim that zstd (with default options) is almost as good as
> xz in compressed size is not confirmed. However, zstd is a clear winner
> in terms of compression speed, and anyway better than gzip. This is
> worth switching.

Thank you for measuring this!  

I will re-submit the patch when we update to a more recent Yocto
version.


> 
> Cheers
> 
> Michael
> 
> -- 
> Michael Opdenacker, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2] archiver: Configurable tarball compression
  2021-09-21 13:48     ` Michael Opdenacker
  2021-09-22  8:29       ` EXT: " Ian Ray
@ 2021-10-27 11:31       ` Martyn Welch
  1 sibling, 0 replies; 7+ messages in thread
From: Martyn Welch @ 2021-10-27 11:31 UTC (permalink / raw)
  To: openembedded-core

[-- Attachment #1: Type: text/plain, Size: 3567 bytes --]

> 
> So, here the claim that zstd (with default options) is almost as good as
> xz in compressed size is not confirmed. However, zstd is a clear winner
> in terms of compression speed, and anyway better than gzip. This is
> worth switching.
> 

That claim doesn't seem to be confirmed with any of the (admittedly small) selection of archives I tried, with zstd compression being approx 21 to 69% less efficient in terms of storage space than xz compression, but still being the best in terms of compression and decompression speeds.

However, I think neatly highlights why it may make sense to make this configurable, as which algorithm is "best" is going to depend on whether you're optimising for (de)compression speed or size.

Testing results below,

Martyn

---

$ time gzip -k linux-5.14.tar

real 0m26.807s
user 0m26.392s
sys 0m0.368s
$ time xz -k linux-5.14.tar

real 6m42.494s
user 6m40.167s
sys 0m1.757s
$ time zstd -k linux-5.14.tar
linux-5.14.tar       : 16.28%   (1126737920 => 183398470 bytes, linux-5.14.tar.zst)

real 0m3.531s
user 0m3.631s
sys 0m0.509s
$ ls -la  *
-rw-r--r-- 1 martyn martyn 1126737920 Oct 27 10:54 linux-5.14.tar
-rw-r--r-- 1 martyn martyn  196107916 Oct 27 10:54 linux-5.14.tar.gz
-rw-r--r-- 1 martyn martyn  124724612 Oct 27 10:54 linux-5.14.tar.xz
-rw-r--r-- 1 martyn martyn  183398470 Oct 27 10:54 linux-5.14.tar.zst
$ time gunzip linux-5.14.tar.gz

real 0m5.141s
user 0m4.462s
sys 0m0.613s
$ time xz -d linux-5.14.tar.xz

real 0m8.571s
user 0m7.739s
sys 0m0.820s
$ time zstd -d linux-5.14.tar.zst
linux-5.14.tar.zst  : 1126737920 bytes

real 0m1.906s
user 0m1.185s
sys 0m0.710s

$ time gzip -k coreutils-9.0.tar

real 0m1.685s
user 0m1.669s
sys 0m0.016s
$ time xz -k coreutils-9.0.tar

real 0m14.891s
user 0m14.795s
sys 0m0.060s
$ time zstd -k coreutils-9.0.tar
coreutils-9.0.tar    : 19.21%   (54394880 => 10447053 bytes, coreutils-9.0.tar.zst)

real 0m0.207s
user 0m0.215s
sys 0m0.029s
$ ls -la coreutils-9.0.tar*
-rw-r--r-- 1 martyn martyn 54394880 Oct 27 11:16 coreutils-9.0.tar
-rw-r--r-- 1 martyn martyn 13595007 Oct 27 11:16 coreutils-9.0.tar.gz
-rw-r--r-- 1 martyn martyn  6177372 Oct 27 11:16 coreutils-9.0.tar.xz
-rw-r--r-- 1 martyn martyn 10447053 Oct 27 11:16 coreutils-9.0.tar.zst
$ time gzip -d coreutils-9.0.tar.gz

real 0m0.362s
user 0m0.280s
sys 0m0.048s
$ time xz -d coreutils-9.0.tar.xz

real 0m0.444s
user 0m0.424s
sys 0m0.020s
$ time zstd -d coreutils-9.0.tar.zst
coreutils-9.0.tar.zst: 54394880 bytes

real 0m0.095s
user 0m0.044s
sys 0m0.052s

$ time gzip -k tcp_wrappers_7.6.tar

real 0m0.033s
user 0m0.033s
sys 0m0.000s
$ time xz -k tcp_wrappers_7.6.tar

real 0m0.116s
user 0m0.104s
sys 0m0.012s
$ time zstd -k tcp_wrappers_7.6.tar
tcp_wrappers_7.6.tar : 26.57%   (360448 =>  95772 bytes, tcp_wrappers_7.6.tar.zst)

real 0m0.006s
user 0m0.003s
sys 0m0.003s
$ ls -la tcp_wrappers_7.6.tar*
-rw-r--r-- 1 martyn martyn 360448 Oct 27 11:15 tcp_wrappers_7.6.tar
-rw-r--r-- 1 martyn martyn  99459 Oct 27 11:15 tcp_wrappers_7.6.tar.gz
-rw-r--r-- 1 martyn martyn  79316 Oct 27 11:15 tcp_wrappers_7.6.tar.xz
-rw-r--r-- 1 martyn martyn  95772 Oct 27 11:15 tcp_wrappers_7.6.tar.zst
$ time gzip -d tcp_wrappers_7.6.tar.gz

real 0m0.008s
user 0m0.004s
sys 0m0.004s
$ time xz -d tcp_wrappers_7.6.tar.xz

real 0m0.019s
user 0m0.015s
sys 0m0.004s
$ time zstd -d tcp_wrappers_7.6.tar.zst
tcp_wrappers_7.6.tar.zst: 360448 bytes

real 0m0.005s
user 0m0.000s
sys 0m0.005s

[-- Attachment #2: Type: text/html, Size: 8010 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V2] archiver: Configurable tarball compression
@ 2021-09-20 10:25 Ian Ray
  0 siblings, 0 replies; 7+ messages in thread
From: Ian Ray @ 2021-09-20 10:25 UTC (permalink / raw)
  To: openembedded-core; +Cc: ian.ray

In order to be more efficient, we use xz as compression method
to create GPL sources archives.

Signed-off-by: Fabien Lahoudere <fabien.lahoudere@collabora.com>
[V1 was https://patchwork.openembedded.org/patch/155985/]
[Rebased]
Signed-off-by: Ian Ray <ian.ray@ge.com>
---
 meta/classes/archiver.bbclass | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/meta/classes/archiver.bbclass b/meta/classes/archiver.bbclass
index dd31dc0..411d459 100644
--- a/meta/classes/archiver.bbclass
+++ b/meta/classes/archiver.bbclass
@@ -51,6 +51,7 @@ ARCHIVER_MODE[diff-exclude] ?= ".pc autom4te.cache patches"
 ARCHIVER_MODE[dumpdata] ?= "0"
 ARCHIVER_MODE[recipe] ?= "0"
 ARCHIVER_MODE[mirror] ?= "split"
+ARCHIVER_MODE[compression] ?= "gz"
 
 DEPLOY_DIR_SRC ?= "${DEPLOY_DIR}/sources"
 ARCHIVER_TOPDIR ?= "${WORKDIR}/archiver-sources"
@@ -409,15 +410,16 @@ def create_tarball(d, srcdir, suffix, ar_outdir):
     # that we archive the actual directory and not just the link.
     srcdir = os.path.realpath(srcdir)
 
+    compression_method = d.getVarFlag('ARCHIVER_MODE', 'compression')
     bb.utils.mkdirhier(ar_outdir)
     if suffix:
-        filename = '%s-%s.tar.gz' % (d.getVar('PF'), suffix)
+        filename = '%s-%s.tar.%s' % (d.getVar('PF'), suffix, compression_method)
     else:
-        filename = '%s.tar.gz' % d.getVar('PF')
+        filename = '%s.tar.%s' % (d.getVar('PF'), compression_method)
     tarname = os.path.join(ar_outdir, filename)
 
     bb.note('Creating %s' % tarname)
-    tar = tarfile.open(tarname, 'w:gz')
+    tar = tarfile.open(tarname, 'w:%s' % compression_method)
     tar.add(srcdir, arcname=os.path.basename(srcdir), filter=exclude_useless_paths)
     tar.close()
 
-- 
2.10.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-27 11:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21  6:15 [PATCH V2] archiver: Configurable tarball compression Ian Ray
2021-09-21 12:18 ` [OE-core] " Richard Purdie
2021-09-21 13:20   ` Michael Opdenacker
     [not found]   ` <16A6D8EA388563F3.1316@lists.openembedded.org>
2021-09-21 13:48     ` Michael Opdenacker
2021-09-22  8:29       ` EXT: " Ian Ray
2021-10-27 11:31       ` Martyn Welch
  -- strict thread matches above, loose matches on Subject: below --
2021-09-20 10:25 Ian Ray

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.