All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] sstate: Switch to Zstandard compressor
@ 2021-10-01 17:31 Henry Kleynhans
  2021-10-01 17:31 ` [PATCH] sstate: Switch to ZStandard compressor support Henry Kleynhans
  2021-10-02  1:17 ` [poky] [PATCH 1/1] sstate: Switch to Zstandard compressor Joshua Watt
  0 siblings, 2 replies; 7+ messages in thread
From: Henry Kleynhans @ 2021-10-01 17:31 UTC (permalink / raw)
  To: poky; +Cc: hkleynhans, rmikey

Patch revision, changes from the last version:
* Switch to Zstandard rather than adding it as an option.
* Specify 'zstd' threading and compression as arguments instead of
  using environment variables
* Remove debug output
* Non-posix if statements were removed as a result of replacing 'gzip'
  with 'zstd'

I am not sure if we need support for 'pzstd'.  If this is a requirement
I can add it in a subsequent revision.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] sstate: Switch to ZStandard compressor support
  2021-10-01 17:31 [PATCH 1/1] sstate: Switch to Zstandard compressor Henry Kleynhans
@ 2021-10-01 17:31 ` Henry Kleynhans
  2021-10-02  9:58   ` [poky] " Alexander Kanavin
  2021-10-02 13:34   ` Peter Kjellerstedt
  2021-10-02  1:17 ` [poky] [PATCH 1/1] sstate: Switch to Zstandard compressor Joshua Watt
  1 sibling, 2 replies; 7+ messages in thread
From: Henry Kleynhans @ 2021-10-01 17:31 UTC (permalink / raw)
  To: poky; +Cc: hkleynhans, rmikey

This patch switches the compressor from Gzip to ZStandard for ssate cache
files.

Zstandard compression provides a significant improvement in
decompression speed as well as improvement in compression speed and disk
usage over the 'tgz' format in use.  Furthermore, its configurable
compression level offers a trade-off between time spent compressing
sstate cache files and disk space used by those files.  The reduced disk
usage also contributes to saving network traffic for those sharing their
sstate cache with others.

Zstandard should therefore be a good choice when:
* disk space is at a premium
* network speed / resources are limited
* the CI server can sstate packages can be created at high compression
* less CPU on the build server should be used for sstate decompression

Signed-off-by: Henry Kleynhans <hkleynhans@fb.com>
---
 meta/classes/sstate.bbclass        | 22 ++++++++--------
 scripts/sstate-cache-management.sh | 40 +++++++++++++++---------------
 2 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 92a73114bb..0068557927 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -1,17 +1,19 @@
 SSTATE_VERSION = "3"
 
+SSTATE_ZSTD_CLEVEL = "8"
+
 SSTATE_MANIFESTS ?= "${TMPDIR}/sstate-control"
 SSTATE_MANFILEPREFIX = "${SSTATE_MANIFESTS}/manifest-${SSTATE_MANMACH}-${PN}"
 
 def generate_sstatefn(spec, hash, taskname, siginfo, d):
     if taskname is None:
        return ""
-    extension = ".tgz"
+    extension = ".tar.zst"
     # 8 chars reserved for siginfo
     limit = 254 - 8
     if siginfo:
         limit = 254
-        extension = ".tgz.siginfo"
+        extension = ".tar.zst.siginfo"
     if not hash:
         hash = "INVALID"
     fn = spec + hash + "_" + taskname + extension
@@ -37,7 +39,7 @@ SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PK
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
-SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/*/${SSTATE_PKGSPEC}*_${SSTATE_PATH_CURRTASK}.tgz*"
+SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/*/${SSTATE_PKGSPEC}*_${SSTATE_PATH_CURRTASK}.tar.zst*"
 
 # explicitly make PV to depend on evaluated value of PV variable
 PV[vardepvalue] = "${PV}"
@@ -825,23 +827,20 @@ sstate_create_package () {
 	mkdir --mode=0775 -p `dirname ${SSTATE_PKG}`
 	TFILE=`mktemp ${SSTATE_PKG}.XXXXXXXX`
 
-	# Use pigz if available
-	OPT="-czS"
-	if [ -x "$(command -v pigz)" ]; then
-		OPT="-I pigz -cS"
-	fi
+	OPT="-cS"
+	ZSTD="zstd -${SSTATE_ZSTD_CLEVEL} -T${BB_NUMBER_THREADS}"
 
 	# Need to handle empty directories
 	if [ "$(ls -A)" ]; then
 		set +e
-		tar $OPT -f $TFILE *
+		tar -I "${ZSTD}" $OPT -f $TFILE *
 		ret=$?
 		if [ $ret -ne 0 ] && [ $ret -ne 1 ]; then
 			exit 1
 		fi
 		set -e
 	else
-		tar $OPT --file=$TFILE --files-from=/dev/null
+		tar -I "${ZSTD}" $OPT --file=$TFILE --files-from=/dev/null
 	fi
 	chmod 0664 $TFILE
 	# Skip if it was already created by some other process
@@ -880,7 +879,8 @@ python sstate_report_unihash() {
 # Will be run from within SSTATE_INSTDIR.
 #
 sstate_unpack_package () {
-	tar -xvzf ${SSTATE_PKG}
+	ZSTD="zstd -T${BB_NUMBER_THREADS}"
+	tar -I "${ZSTD}" -xvf ${SSTATE_PKG}
 	# update .siginfo atime on local/NFS mirror
 	[ -O ${SSTATE_PKG}.siginfo ] && [ -w ${SSTATE_PKG}.siginfo ] && [ -h ${SSTATE_PKG}.siginfo ] && touch -a ${SSTATE_PKG}.siginfo
 	# Use "! -w ||" to return true for read only files
diff --git a/scripts/sstate-cache-management.sh b/scripts/sstate-cache-management.sh
index f1706a2229..d39671f7c6 100755
--- a/scripts/sstate-cache-management.sh
+++ b/scripts/sstate-cache-management.sh
@@ -114,7 +114,7 @@ echo_error () {
 # * Add .done/.siginfo to the remove list
 # * Add destination of symlink to the remove list
 #
-# $1: output file, others: sstate cache file (.tgz)
+# $1: output file, others: sstate cache file (.tar.zst)
 gen_rmlist (){
   local rmlist_file="$1"
   shift
@@ -131,13 +131,13 @@ gen_rmlist (){
               dest="`readlink -e $i`"
               if [ -n "$dest" ]; then
                   echo $dest >> $rmlist_file
-                  # Remove the .siginfo when .tgz is removed
+                  # Remove the .siginfo when .tar.zst is removed
                   if [ -f "$dest.siginfo" ]; then
                       echo $dest.siginfo >> $rmlist_file
                   fi
               fi
           fi
-          # Add the ".tgz.done" and ".siginfo.done" (may exist in the future)
+          # Add the ".tar.zst.done" and ".siginfo.done" (may exist in the future)
           base_fn="${i##/*/}"
           t_fn="$base_fn.done"
           s_fn="$base_fn.siginfo.done"
@@ -188,10 +188,10 @@ remove_duplicated () {
   total_files=`find $cache_dir -name 'sstate*' | wc -l`
   # Save all the sstate files in a file
   sstate_files_list=`mktemp` || exit 1
-  find $cache_dir -name 'sstate:*:*:*:*:*:*:*.tgz*' >$sstate_files_list
+  find $cache_dir -iname 'sstate:*:*:*:*:*:*:*.tar.zst*' >$sstate_files_list
 
   echo "Figuring out the suffixes in the sstate cache dir ... "
-  sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tgz.*%\1%g' $sstate_files_list | sort -u`"
+  sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tar\.zst.*%\1%g' $sstate_files_list | sort -u`"
   echo "Done"
   echo "The following suffixes have been found in the cache dir:"
   echo $sstate_suffixes
@@ -200,10 +200,10 @@ remove_duplicated () {
   # Using this SSTATE_PKGSPEC definition it's 6th colon separated field
   # SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
   for arch in $all_archs; do
-      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:$arch:[^:]*:[^:]*\.tgz$" $sstate_files_list
+      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:$arch:[^:]*:[^:]*\.tar\.zst$" $sstate_files_list
       [ $? -eq 0 ] && ava_archs="$ava_archs $arch"
       # ${builder_arch}_$arch used by toolchain sstate
-      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:${builder_arch}_$arch:[^:]*:[^:]*\.tgz$" $sstate_files_list
+      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:${builder_arch}_$arch:[^:]*:[^:]*\.tar\.zst$" $sstate_files_list
       [ $? -eq 0 ] && ava_archs="$ava_archs ${builder_arch}_$arch"
   done
   echo "Done"
@@ -219,13 +219,13 @@ remove_duplicated () {
           continue
       fi
       # Total number of files including .siginfo and .done files
-      total_files_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz.*" $sstate_files_list | wc -l 2>/dev/null`
-      total_tgz_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz$" $sstate_files_list | wc -l 2>/dev/null`
+      total_files_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tar\.zst.*" $sstate_files_list | wc -l 2>/dev/null`
+      total_archive_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tar\.zst$" $sstate_files_list | wc -l 2>/dev/null`
       # Save the file list to a file, some suffix's file may not exist
-      grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz.*" $sstate_files_list >$list_suffix 2>/dev/null
-      local deleted_tgz=0
+      grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tar\.zst.*" $sstate_files_list >$list_suffix 2>/dev/null
+      local deleted_archives=0
       local deleted_files=0
-      for ext in tgz tgz.siginfo tgz.done; do
+      for ext in tar.zst tar.zst.siginfo tar.zst.done; do
           echo "Figuring out the sstate:xxx_$suffix.$ext ... "
           # Uniq BPNs
           file_names=`for arch in $ava_archs ""; do
@@ -268,19 +268,19 @@ remove_duplicated () {
               done
           done
       done
-      deleted_tgz=`cat $rm_list.* 2>/dev/null | grep ".tgz$" | wc -l`
+      deleted_archives=`cat $rm_list.* 2>/dev/null | grep "\.tar\.zst$" | wc -l`
       deleted_files=`cat $rm_list.* 2>/dev/null | wc -l`
       [ "$deleted_files" -gt 0 -a $debug -gt 0 ] && cat $rm_list.*
-      echo "($deleted_tgz out of $total_tgz_suffix .tgz files for $suffix suffix will be removed or $deleted_files out of $total_files_suffix when counting also .siginfo and .done files)"
+      echo "($deleted_archives out of $total_archives_suffix .tar.zst files for $suffix suffix will be removed or $deleted_files out of $total_files_suffix when counting also .siginfo and .done files)"
       let total_deleted=$total_deleted+$deleted_files
   done
-  deleted_tgz=0
+  deleted_archives=0
   rm_old_list=$remove_listdir/sstate-old-filenames
-  find $cache_dir -name 'sstate-*.tgz' >$rm_old_list
-  [ -s "$rm_old_list" ] && deleted_tgz=`cat $rm_old_list | grep ".tgz$" | wc -l`
+  find $cache_dir -name 'sstate-*.tar.zst' >$rm_old_list
+  [ -s "$rm_old_list" ] && deleted_archives=`cat $rm_old_list | grep "\.tar\.zst$" | wc -l`
   [ -s "$rm_old_list" ] && deleted_files=`cat $rm_old_list | wc -l`
   [ -s "$rm_old_list" -a $debug -gt 0 ] && cat $rm_old_list
-  echo "($deleted_tgz .tgz files with old sstate-* filenames will be removed or $deleted_files when counting also .siginfo and .done files)"
+  echo "($deleted_archives or .tar.zst files with old sstate-* filenames will be removed or $deleted_files when counting also .siginfo and .done files)"
   let total_deleted=$total_deleted+$deleted_files
 
   rm -f $list_suffix
@@ -289,7 +289,7 @@ remove_duplicated () {
       read_confirm
       if [ "$confirm" = "y" -o "$confirm" = "Y" ]; then
           for list in `ls $remove_listdir/`; do
-              echo "Removing $list.tgz (`cat $remove_listdir/$list | wc -w` files) ... "
+              echo "Removing $list.tar.zst archive (`cat $remove_listdir/$list | wc -w` files) ... "
               # Remove them one by one to avoid the argument list too long error
               for i in `cat $remove_listdir/$list`; do
                   rm -f $verbose $i
@@ -322,7 +322,7 @@ rm_by_stamps (){
   find $cache_dir -type f -name 'sstate*' | sort -u -o $cache_list
 
   echo "Figuring out the suffixes in the sstate cache dir ... "
-  local sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tgz.*%\1%g' $cache_list | sort -u`"
+  local sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tar\.zst.*%\1%g' $cache_list | sort -u`"
   echo "Done"
   echo "The following suffixes have been found in the cache dir:"
   echo $sstate_suffixes
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [poky] [PATCH 1/1] sstate: Switch to Zstandard compressor
  2021-10-01 17:31 [PATCH 1/1] sstate: Switch to Zstandard compressor Henry Kleynhans
  2021-10-01 17:31 ` [PATCH] sstate: Switch to ZStandard compressor support Henry Kleynhans
@ 2021-10-02  1:17 ` Joshua Watt
  2021-10-02 14:13   ` Richard Purdie
  1 sibling, 1 reply; 7+ messages in thread
From: Joshua Watt @ 2021-10-02  1:17 UTC (permalink / raw)
  To: hkleynhans; +Cc: poky, rmikey

On Fri, Oct 1, 2021 at 12:32 PM Henry Kleynhans via
lists.yoctoproject.org <hkleynhans=fb.com@lists.yoctoproject.org>
wrote:
>
> Patch revision, changes from the last version:
> * Switch to Zstandard rather than adding it as an option.
> * Specify 'zstd' threading and compression as arguments instead of
>   using environment variables
> * Remove debug output
> * Non-posix if statements were removed as a result of replacing 'gzip'
>   with 'zstd'
>
> I am not sure if we need support for 'pzstd'.  If this is a requirement
> I can add it in a subsequent revision.

Thanks, this looks pretty good. I was curious about the performance
difference between pzstd and "zstd -T", so I ran a few numbers;
specifically, I ran these on Ubuntu 18.04 which is our oldest
supported distro that uses the host zstd.

It appears that pzstd is about 20% faster to decompress data than
"zstd -T"; I think this is because zstd never does threaded
decompression (at least on Ubuntu 18.04). Would you mind switching to
using pzstd?

---

root@0df785908a53:/bin# time seq 1000000000 | zstd -10 -T36 | zstd -d
-T36 > /dev/null

real    0m35.503s
user    11m34.695s
sys     0m9.405s
root@0df785908a53:/bin# time seq 1000000000 | zstd -10 -T36 | pzstd -d
-p36 > /dev/null

real    0m27.830s
user    15m18.733s
sys     0m13.895s
root@0df785908a53:/bin# time seq 1000000000 | pzstd -10 -p36 | zstd -d
-T36 > /dev/null

real    0m28.134s
user    14m42.322s
sys     0m12.886s
root@0df785908a53:/bin# time seq 1000000000 | pzstd -10 -p36 | pzstd
-d -p36 > /dev/null

real    0m27.911s
user    15m28.182s
sys     0m17.036s


>
>
> 
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [poky] [PATCH] sstate: Switch to ZStandard compressor support
  2021-10-01 17:31 ` [PATCH] sstate: Switch to ZStandard compressor support Henry Kleynhans
@ 2021-10-02  9:58   ` Alexander Kanavin
  2021-10-02 13:34   ` Peter Kjellerstedt
  1 sibling, 0 replies; 7+ messages in thread
From: Alexander Kanavin @ 2021-10-02  9:58 UTC (permalink / raw)
  To: hkleynhans; +Cc: poky, rmikey

[-- Attachment #1: Type: text/plain, Size: 11974 bytes --]

I think it might be better to use ZSTD_THREADS? BB_NUMBER is for
controlling bitbake tasks, and ZSTD_THREADS is already defined in
kirkstone-next.

Alex

On Fri, 1 Oct 2021 at 19:32, Henry Kleynhans via lists.yoctoproject.org
<hkleynhans=fb.com@lists.yoctoproject.org> wrote:

> This patch switches the compressor from Gzip to ZStandard for ssate cache
> files.
>
> Zstandard compression provides a significant improvement in
> decompression speed as well as improvement in compression speed and disk
> usage over the 'tgz' format in use.  Furthermore, its configurable
> compression level offers a trade-off between time spent compressing
> sstate cache files and disk space used by those files.  The reduced disk
> usage also contributes to saving network traffic for those sharing their
> sstate cache with others.
>
> Zstandard should therefore be a good choice when:
> * disk space is at a premium
> * network speed / resources are limited
> * the CI server can sstate packages can be created at high compression
> * less CPU on the build server should be used for sstate decompression
>
> Signed-off-by: Henry Kleynhans <hkleynhans@fb.com>
> ---
>  meta/classes/sstate.bbclass        | 22 ++++++++--------
>  scripts/sstate-cache-management.sh | 40 +++++++++++++++---------------
>  2 files changed, 31 insertions(+), 31 deletions(-)
>
> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> index 92a73114bb..0068557927 100644
> --- a/meta/classes/sstate.bbclass
> +++ b/meta/classes/sstate.bbclass
> @@ -1,17 +1,19 @@
>  SSTATE_VERSION = "3"
>
> +SSTATE_ZSTD_CLEVEL = "8"
> +
>  SSTATE_MANIFESTS ?= "${TMPDIR}/sstate-control"
>  SSTATE_MANFILEPREFIX =
> "${SSTATE_MANIFESTS}/manifest-${SSTATE_MANMACH}-${PN}"
>
>  def generate_sstatefn(spec, hash, taskname, siginfo, d):
>      if taskname is None:
>         return ""
> -    extension = ".tgz"
> +    extension = ".tar.zst"
>      # 8 chars reserved for siginfo
>      limit = 254 - 8
>      if siginfo:
>          limit = 254
> -        extension = ".tgz.siginfo"
> +        extension = ".tar.zst.siginfo"
>      if not hash:
>          hash = "INVALID"
>      fn = spec + hash + "_" + taskname + extension
> @@ -37,7 +39,7 @@ SSTATE_PKGNAME    =
> "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PK
>  SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
>  SSTATE_EXTRAPATH   = ""
>  SSTATE_EXTRAPATHWILDCARD = ""
> -SSTATE_PATHSPEC   =
> "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/*/${SSTATE_PKGSPEC}*_${SSTATE_PATH_CURRTASK}.tgz*"
> +SSTATE_PATHSPEC   =
> "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/*/${SSTATE_PKGSPEC}*_${SSTATE_PATH_CURRTASK}.tar.zst*"
>
>  # explicitly make PV to depend on evaluated value of PV variable
>  PV[vardepvalue] = "${PV}"
> @@ -825,23 +827,20 @@ sstate_create_package () {
>         mkdir --mode=0775 -p `dirname ${SSTATE_PKG}`
>         TFILE=`mktemp ${SSTATE_PKG}.XXXXXXXX`
>
> -       # Use pigz if available
> -       OPT="-czS"
> -       if [ -x "$(command -v pigz)" ]; then
> -               OPT="-I pigz -cS"
> -       fi
> +       OPT="-cS"
> +       ZSTD="zstd -${SSTATE_ZSTD_CLEVEL} -T${BB_NUMBER_THREADS}"
>
>         # Need to handle empty directories
>         if [ "$(ls -A)" ]; then
>                 set +e
> -               tar $OPT -f $TFILE *
> +               tar -I "${ZSTD}" $OPT -f $TFILE *
>                 ret=$?
>                 if [ $ret -ne 0 ] && [ $ret -ne 1 ]; then
>                         exit 1
>                 fi
>                 set -e
>         else
> -               tar $OPT --file=$TFILE --files-from=/dev/null
> +               tar -I "${ZSTD}" $OPT --file=$TFILE --files-from=/dev/null
>         fi
>         chmod 0664 $TFILE
>         # Skip if it was already created by some other process
> @@ -880,7 +879,8 @@ python sstate_report_unihash() {
>  # Will be run from within SSTATE_INSTDIR.
>  #
>  sstate_unpack_package () {
> -       tar -xvzf ${SSTATE_PKG}
> +       ZSTD="zstd -T${BB_NUMBER_THREADS}"
> +       tar -I "${ZSTD}" -xvf ${SSTATE_PKG}
>         # update .siginfo atime on local/NFS mirror
>         [ -O ${SSTATE_PKG}.siginfo ] && [ -w ${SSTATE_PKG}.siginfo ] && [
> -h ${SSTATE_PKG}.siginfo ] && touch -a ${SSTATE_PKG}.siginfo
>         # Use "! -w ||" to return true for read only files
> diff --git a/scripts/sstate-cache-management.sh
> b/scripts/sstate-cache-management.sh
> index f1706a2229..d39671f7c6 100755
> --- a/scripts/sstate-cache-management.sh
> +++ b/scripts/sstate-cache-management.sh
> @@ -114,7 +114,7 @@ echo_error () {
>  # * Add .done/.siginfo to the remove list
>  # * Add destination of symlink to the remove list
>  #
> -# $1: output file, others: sstate cache file (.tgz)
> +# $1: output file, others: sstate cache file (.tar.zst)
>  gen_rmlist (){
>    local rmlist_file="$1"
>    shift
> @@ -131,13 +131,13 @@ gen_rmlist (){
>                dest="`readlink -e $i`"
>                if [ -n "$dest" ]; then
>                    echo $dest >> $rmlist_file
> -                  # Remove the .siginfo when .tgz is removed
> +                  # Remove the .siginfo when .tar.zst is removed
>                    if [ -f "$dest.siginfo" ]; then
>                        echo $dest.siginfo >> $rmlist_file
>                    fi
>                fi
>            fi
> -          # Add the ".tgz.done" and ".siginfo.done" (may exist in the
> future)
> +          # Add the ".tar.zst.done" and ".siginfo.done" (may exist in the
> future)
>            base_fn="${i##/*/}"
>            t_fn="$base_fn.done"
>            s_fn="$base_fn.siginfo.done"
> @@ -188,10 +188,10 @@ remove_duplicated () {
>    total_files=`find $cache_dir -name 'sstate*' | wc -l`
>    # Save all the sstate files in a file
>    sstate_files_list=`mktemp` || exit 1
> -  find $cache_dir -name 'sstate:*:*:*:*:*:*:*.tgz*' >$sstate_files_list
> +  find $cache_dir -iname 'sstate:*:*:*:*:*:*:*.tar.zst*'
> >$sstate_files_list
>
>    echo "Figuring out the suffixes in the sstate cache dir ... "
> -  sstate_suffixes="`sed
> 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tgz.*%\1%g'
> $sstate_files_list | sort -u`"
> +  sstate_suffixes="`sed
> 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tar\.zst.*%\1%g'
> $sstate_files_list | sort -u`"
>    echo "Done"
>    echo "The following suffixes have been found in the cache dir:"
>    echo $sstate_suffixes
> @@ -200,10 +200,10 @@ remove_duplicated () {
>    # Using this SSTATE_PKGSPEC definition it's 6th colon separated field
>    # SSTATE_PKGSPEC    =
> "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
>    for arch in $all_archs; do
> -      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:$arch:[^:]*:[^:]*\.tgz$"
> $sstate_files_list
> +      grep -q
> ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:$arch:[^:]*:[^:]*\.tar\.zst$"
> $sstate_files_list
>        [ $? -eq 0 ] && ava_archs="$ava_archs $arch"
>        # ${builder_arch}_$arch used by toolchain sstate
> -      grep -q
> ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:${builder_arch}_$arch:[^:]*:[^:]*\.tgz$"
> $sstate_files_list
> +      grep -q
> ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:${builder_arch}_$arch:[^:]*:[^:]*\.tar\.zst$"
> $sstate_files_list
>        [ $? -eq 0 ] && ava_archs="$ava_archs ${builder_arch}_$arch"
>    done
>    echo "Done"
> @@ -219,13 +219,13 @@ remove_duplicated () {
>            continue
>        fi
>        # Total number of files including .siginfo and .done files
> -      total_files_suffix=`grep
> ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz.*"
> $sstate_files_list | wc -l 2>/dev/null`
> -      total_tgz_suffix=`grep
> ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz$"
> $sstate_files_list | wc -l 2>/dev/null`
> +      total_files_suffix=`grep
> ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tar\.zst.*"
> $sstate_files_list | wc -l 2>/dev/null`
> +      total_archive_suffix=`grep
> ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tar\.zst$"
> $sstate_files_list | wc -l 2>/dev/null`
>        # Save the file list to a file, some suffix's file may not exist
> -      grep
> ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz.*"
> $sstate_files_list >$list_suffix 2>/dev/null
> -      local deleted_tgz=0
> +      grep
> ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tar\.zst.*"
> $sstate_files_list >$list_suffix 2>/dev/null
> +      local deleted_archives=0
>        local deleted_files=0
> -      for ext in tgz tgz.siginfo tgz.done; do
> +      for ext in tar.zst tar.zst.siginfo tar.zst.done; do
>            echo "Figuring out the sstate:xxx_$suffix.$ext ... "
>            # Uniq BPNs
>            file_names=`for arch in $ava_archs ""; do
> @@ -268,19 +268,19 @@ remove_duplicated () {
>                done
>            done
>        done
> -      deleted_tgz=`cat $rm_list.* 2>/dev/null | grep ".tgz$" | wc -l`
> +      deleted_archives=`cat $rm_list.* 2>/dev/null | grep "\.tar\.zst$" |
> wc -l`
>        deleted_files=`cat $rm_list.* 2>/dev/null | wc -l`
>        [ "$deleted_files" -gt 0 -a $debug -gt 0 ] && cat $rm_list.*
> -      echo "($deleted_tgz out of $total_tgz_suffix .tgz files for $suffix
> suffix will be removed or $deleted_files out of $total_files_suffix when
> counting also .siginfo and .done files)"
> +      echo "($deleted_archives out of $total_archives_suffix .tar.zst
> files for $suffix suffix will be removed or $deleted_files out of
> $total_files_suffix when counting also .siginfo and .done files)"
>        let total_deleted=$total_deleted+$deleted_files
>    done
> -  deleted_tgz=0
> +  deleted_archives=0
>    rm_old_list=$remove_listdir/sstate-old-filenames
> -  find $cache_dir -name 'sstate-*.tgz' >$rm_old_list
> -  [ -s "$rm_old_list" ] && deleted_tgz=`cat $rm_old_list | grep ".tgz$" |
> wc -l`
> +  find $cache_dir -name 'sstate-*.tar.zst' >$rm_old_list
> +  [ -s "$rm_old_list" ] && deleted_archives=`cat $rm_old_list | grep
> "\.tar\.zst$" | wc -l`
>    [ -s "$rm_old_list" ] && deleted_files=`cat $rm_old_list | wc -l`
>    [ -s "$rm_old_list" -a $debug -gt 0 ] && cat $rm_old_list
> -  echo "($deleted_tgz .tgz files with old sstate-* filenames will be
> removed or $deleted_files when counting also .siginfo and .done files)"
> +  echo "($deleted_archives or .tar.zst files with old sstate-* filenames
> will be removed or $deleted_files when counting also .siginfo and .done
> files)"
>    let total_deleted=$total_deleted+$deleted_files
>
>    rm -f $list_suffix
> @@ -289,7 +289,7 @@ remove_duplicated () {
>        read_confirm
>        if [ "$confirm" = "y" -o "$confirm" = "Y" ]; then
>            for list in `ls $remove_listdir/`; do
> -              echo "Removing $list.tgz (`cat $remove_listdir/$list | wc
> -w` files) ... "
> +              echo "Removing $list.tar.zst archive (`cat
> $remove_listdir/$list | wc -w` files) ... "
>                # Remove them one by one to avoid the argument list too
> long error
>                for i in `cat $remove_listdir/$list`; do
>                    rm -f $verbose $i
> @@ -322,7 +322,7 @@ rm_by_stamps (){
>    find $cache_dir -type f -name 'sstate*' | sort -u -o $cache_list
>
>    echo "Figuring out the suffixes in the sstate cache dir ... "
> -  local sstate_suffixes="`sed
> 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tgz.*%\1%g'
> $cache_list | sort -u`"
> +  local sstate_suffixes="`sed
> 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tar\.zst.*%\1%g'
> $cache_list | sort -u`"
>    echo "Done"
>    echo "The following suffixes have been found in the cache dir:"
>    echo $sstate_suffixes
> --
> 2.30.2
>
>
> 
>
>

[-- Attachment #2: Type: text/html, Size: 14418 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [poky] [PATCH] sstate: Switch to ZStandard compressor support
  2021-10-01 17:31 ` [PATCH] sstate: Switch to ZStandard compressor support Henry Kleynhans
  2021-10-02  9:58   ` [poky] " Alexander Kanavin
@ 2021-10-02 13:34   ` Peter Kjellerstedt
  1 sibling, 0 replies; 7+ messages in thread
From: Peter Kjellerstedt @ 2021-10-02 13:34 UTC (permalink / raw)
  To: hkleynhans, poky; +Cc: rmikey

This patch should be sent to openembedded-core@lists.openembedded.org 
instead.

Some further comments below.

> -----Original Message-----
> From: poky@lists.yoctoproject.org <poky@lists.yoctoproject.org> On Behalf
> Of Henry Kleynhans via lists.yoctoproject.org
> Sent: den 1 oktober 2021 19:32
> To: poky@lists.yoctoproject.org
> Cc: hkleynhans@fb.com; rmikey@fb.com
> Subject: [poky] [PATCH] sstate: Switch to ZStandard compressor support
> 
> This patch switches the compressor from Gzip to ZStandard for ssate cache
> files.
> 
> Zstandard compression provides a significant improvement in
> decompression speed as well as improvement in compression speed and disk
> usage over the 'tgz' format in use.  Furthermore, its configurable
> compression level offers a trade-off between time spent compressing
> sstate cache files and disk space used by those files.  The reduced disk
> usage also contributes to saving network traffic for those sharing their
> sstate cache with others.
> 
> Zstandard should therefore be a good choice when:
> * disk space is at a premium
> * network speed / resources are limited
> * the CI server can sstate packages can be created at high compression
> * less CPU on the build server should be used for sstate decompression
> 
> Signed-off-by: Henry Kleynhans <hkleynhans@fb.com>
> ---
>  meta/classes/sstate.bbclass        | 22 ++++++++--------
>  scripts/sstate-cache-management.sh | 40 +++++++++++++++---------------
>  2 files changed, 31 insertions(+), 31 deletions(-)
> 
> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> index 92a73114bb..0068557927 100644
> --- a/meta/classes/sstate.bbclass
> +++ b/meta/classes/sstate.bbclass
> @@ -1,17 +1,19 @@
>  SSTATE_VERSION = "3"
> 
> +SSTATE_ZSTD_CLEVEL = "8"
> +
>  SSTATE_MANIFESTS ?= "${TMPDIR}/sstate-control"
>  SSTATE_MANFILEPREFIX = "${SSTATE_MANIFESTS}/manifest-${SSTATE_MANMACH}-${PN}"
> 
>  def generate_sstatefn(spec, hash, taskname, siginfo, d):
>      if taskname is None:
>         return ""
> -    extension = ".tgz"
> +    extension = ".tar.zst"
>      # 8 chars reserved for siginfo
>      limit = 254 - 8
>      if siginfo:
>          limit = 254
> -        extension = ".tgz.siginfo"
> +        extension = ".tar.zst.siginfo"
>      if not hash:
>          hash = "INVALID"
>      fn = spec + hash + "_" + taskname + extension
> @@ -37,7 +39,7 @@ SSTATE_PKGNAME    =
> "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PK
>  SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
>  SSTATE_EXTRAPATH   = ""
>  SSTATE_EXTRAPATHWILDCARD = ""
> -SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/*/${SSTATE_PKGSPEC}*_${SSTATE_PATH_CURRTASK}.tgz*"
> +SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/*/${SSTATE_PKGSPEC}*_${SSTATE_PATH_CURRTASK}.tar.zst*"
> 
>  # explicitly make PV to depend on evaluated value of PV variable
>  PV[vardepvalue] = "${PV}"
> @@ -825,23 +827,20 @@ sstate_create_package () {
>  	mkdir --mode=0775 -p `dirname ${SSTATE_PKG}`
>  	TFILE=`mktemp ${SSTATE_PKG}.XXXXXXXX`
> 
> -	# Use pigz if available
> -	OPT="-czS"
> -	if [ -x "$(command -v pigz)" ]; then
> -		OPT="-I pigz -cS"
> -	fi
> +	OPT="-cS"
> +	ZSTD="zstd -${SSTATE_ZSTD_CLEVEL} -T${BB_NUMBER_THREADS}"
> 
>  	# Need to handle empty directories
>  	if [ "$(ls -A)" ]; then
>  		set +e
> -		tar $OPT -f $TFILE *
> +		tar -I "${ZSTD}" $OPT -f $TFILE *

Use $ZSTD rather than ${ZSTD} as it is a shell variable.

>  		ret=$?
>  		if [ $ret -ne 0 ] && [ $ret -ne 1 ]; then
>  			exit 1
>  		fi
>  		set -e
>  	else
> -		tar $OPT --file=$TFILE --files-from=/dev/null
> +		tar -I "${ZSTD}" $OPT --file=$TFILE --files-from=/dev/null

Same here.

>  	fi
>  	chmod 0664 $TFILE
>  	# Skip if it was already created by some other process
> @@ -880,7 +879,8 @@ python sstate_report_unihash() {
>  # Will be run from within SSTATE_INSTDIR.
>  #
>  sstate_unpack_package () {
> -	tar -xvzf ${SSTATE_PKG}
> +	ZSTD="zstd -T${BB_NUMBER_THREADS}"
> +	tar -I "${ZSTD}" -xvf ${SSTATE_PKG}

And here.

>  	# update .siginfo atime on local/NFS mirror
>  	[ -O ${SSTATE_PKG}.siginfo ] && [ -w ${SSTATE_PKG}.siginfo ] && [ -h ${SSTATE_PKG}.siginfo ] && touch -a ${SSTATE_PKG}.siginfo
>  	# Use "! -w ||" to return true for read only files
> diff --git a/scripts/sstate-cache-management.sh b/scripts/sstate-cache-management.sh
> index f1706a2229..d39671f7c6 100755
> --- a/scripts/sstate-cache-management.sh
> +++ b/scripts/sstate-cache-management.sh
> @@ -114,7 +114,7 @@ echo_error () {
>  # * Add .done/.siginfo to the remove list
>  # * Add destination of symlink to the remove list
>  #
> -# $1: output file, others: sstate cache file (.tgz)
> +# $1: output file, others: sstate cache file (.tar.zst)
>  gen_rmlist (){
>    local rmlist_file="$1"
>    shift
> @@ -131,13 +131,13 @@ gen_rmlist (){
>                dest="`readlink -e $i`"
>                if [ -n "$dest" ]; then
>                    echo $dest >> $rmlist_file
> -                  # Remove the .siginfo when .tgz is removed
> +                  # Remove the .siginfo when .tar.zst is removed
>                    if [ -f "$dest.siginfo" ]; then
>                        echo $dest.siginfo >> $rmlist_file
>                    fi
>                fi
>            fi
> -          # Add the ".tgz.done" and ".siginfo.done" (may exist in the future)
> +          # Add the ".tar.zst.done" and ".siginfo.done" (may exist in the future)
>            base_fn="${i##/*/}"
>            t_fn="$base_fn.done"
>            s_fn="$base_fn.siginfo.done"
> @@ -188,10 +188,10 @@ remove_duplicated () {
>    total_files=`find $cache_dir -name 'sstate*' | wc -l`
>    # Save all the sstate files in a file
>    sstate_files_list=`mktemp` || exit 1
> -  find $cache_dir -name 'sstate:*:*:*:*:*:*:*.tgz*' >$sstate_files_list
> +  find $cache_dir -iname 'sstate:*:*:*:*:*:*:*.tar.zst*' >$sstate_files_list
> 
>    echo "Figuring out the suffixes in the sstate cache dir ... "
> -  sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tgz.*%\1%g' $sstate_files_list | sort -u`"
> +  sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tar\.zst.*%\1%g' $sstate_files_list | sort -u`"
>    echo "Done"
>    echo "The following suffixes have been found in the cache dir:"
>    echo $sstate_suffixes
> @@ -200,10 +200,10 @@ remove_duplicated () {
>    # Using this SSTATE_PKGSPEC definition it's 6th colon separated field
>    # SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
>    for arch in $all_archs; do
> -      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:$arch:[^:]*:[^:]*\.tgz$" $sstate_files_list
> +      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:$arch:[^:]*:[^:]*\.tar\.zst$" $sstate_files_list
>        [ $? -eq 0 ] && ava_archs="$ava_archs $arch"
>        # ${builder_arch}_$arch used by toolchain sstate
> -      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:${builder_arch}_$arch:[^:]*:[^:]*\.tgz$ " $sstate_files_list
> +      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:${builder_arch}_$arch:[^:]*:[^:]*\.tar\.zst$" $sstate_files_list
>        [ $? -eq 0 ] && ava_archs="$ava_archs ${builder_arch}_$arch"
>    done
>    echo "Done"
> @@ -219,13 +219,13 @@ remove_duplicated () {
>            continue
>        fi
>        # Total number of files including .siginfo and .done files
> -      total_files_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz.*" $sstate_files_list | wc -l 2>/dev/null`
> -      total_tgz_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz$" $sstate_files_list | wc -l 2>/dev/null`
> +      total_files_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tar\.zst.*" $sstate_files_list | wc -l 2>/dev/null`
> +      total_archive_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tar\.zst$" $sstate_files_list | wc -l 2>/dev/null`
>        # Save the file list to a file, some suffix's file may not exist
> -      grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz.*" $sstate_files_list >$list_suffix 2>/dev/null
> -      local deleted_tgz=0
> +      grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tar\.zst.*" $sstate_files_list >$list_suffix 2>/dev/null
> +      local deleted_archives=0
>        local deleted_files=0
> -      for ext in tgz tgz.siginfo tgz.done; do
> +      for ext in tar.zst tar.zst.siginfo tar.zst.done; do
>            echo "Figuring out the sstate:xxx_$suffix.$ext ... "
>            # Uniq BPNs
>            file_names=`for arch in $ava_archs ""; do
> @@ -268,19 +268,19 @@ remove_duplicated () {
>                done
>            done
>        done
> -      deleted_tgz=`cat $rm_list.* 2>/dev/null | grep ".tgz$" | wc -l`
> +      deleted_archives=`cat $rm_list.* 2>/dev/null | grep "\.tar\.zst$" | wc -l`
>        deleted_files=`cat $rm_list.* 2>/dev/null | wc -l`
>        [ "$deleted_files" -gt 0 -a $debug -gt 0 ] && cat $rm_list.*
> -      echo "($deleted_tgz out of $total_tgz_suffix .tgz files for $suffix suffix will be removed or $deleted_files out of $total_files_suffix when counting also .siginfo and .done files)"
> +      echo "($deleted_archives out of $total_archives_suffix .tar.zst files for $suffix suffix will be removed or $deleted_files out of $total_files_suffix when counting also .siginfo and .done files)"
>        let total_deleted=$total_deleted+$deleted_files
>    done
> -  deleted_tgz=0
> +  deleted_archives=0
>    rm_old_list=$remove_listdir/sstate-old-filenames
> -  find $cache_dir -name 'sstate-*.tgz' >$rm_old_list
> -  [ -s "$rm_old_list" ] && deleted_tgz=`cat $rm_old_list | grep ".tgz$" | wc -l`
> +  find $cache_dir -name 'sstate-*.tar.zst' >$rm_old_list
> +  [ -s "$rm_old_list" ] && deleted_archives=`cat $rm_old_list | grep "\.tar\.zst$" | wc -l`
>    [ -s "$rm_old_list" ] && deleted_files=`cat $rm_old_list | wc -l`
>    [ -s "$rm_old_list" -a $debug -gt 0 ] && cat $rm_old_list
> -  echo "($deleted_tgz .tgz files with old sstate-* filenames will be removed or $deleted_files when counting also .siginfo and .done files)"
> +  echo "($deleted_archives or .tar.zst files with old sstate-* filenames will be removed or $deleted_files when counting also .siginfo and .done files)"
>    let total_deleted=$total_deleted+$deleted_files
> 
>    rm -f $list_suffix
> @@ -289,7 +289,7 @@ remove_duplicated () {
>        read_confirm
>        if [ "$confirm" = "y" -o "$confirm" = "Y" ]; then
>            for list in `ls $remove_listdir/`; do
> -              echo "Removing $list.tgz (`cat $remove_listdir/$list | wc -w` files) ... "
> +              echo "Removing $list.tar.zst archive (`cat $remove_listdir/$list | wc -w` files) ... "
>                # Remove them one by one to avoid the argument list too long error
>                for i in `cat $remove_listdir/$list`; do
>                    rm -f $verbose $i
> @@ -322,7 +322,7 @@ rm_by_stamps (){
>    find $cache_dir -type f -name 'sstate*' | sort -u -o $cache_list
> 
>    echo "Figuring out the suffixes in the sstate cache dir ... "
> -  local sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tgz.*%\1%g' $cache_list | sort -u`"
> +  local sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tar\.zst.*%\1%g' $cache_list | sort -u`"
>    echo "Done"
>    echo "The following suffixes have been found in the cache dir:"
>    echo $sstate_suffixes
> --
> 2.30.2

//Peter


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [poky] [PATCH 1/1] sstate: Switch to Zstandard compressor
  2021-10-02  1:17 ` [poky] [PATCH 1/1] sstate: Switch to Zstandard compressor Joshua Watt
@ 2021-10-02 14:13   ` Richard Purdie
  2021-10-04  8:24     ` Henry Kleynhans
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Purdie @ 2021-10-02 14:13 UTC (permalink / raw)
  To: Joshua Watt, hkleynhans; +Cc: poky, rmikey

On Fri, 2021-10-01 at 20:17 -0500, Joshua Watt wrote:
> On Fri, Oct 1, 2021 at 12:32 PM Henry Kleynhans via
> lists.yoctoproject.org <hkleynhans=fb.com@lists.yoctoproject.org>
> wrote:
> > 
> > Patch revision, changes from the last version:
> > * Switch to Zstandard rather than adding it as an option.
> > * Specify 'zstd' threading and compression as arguments instead of
> >   using environment variables
> > * Remove debug output
> > * Non-posix if statements were removed as a result of replacing 'gzip'
> >   with 'zstd'
> > 
> > I am not sure if we need support for 'pzstd'.  If this is a requirement
> > I can add it in a subsequent revision.
> 
> Thanks, this looks pretty good. I was curious about the performance
> difference between pzstd and "zstd -T", so I ran a few numbers;
> specifically, I ran these on Ubuntu 18.04 which is our oldest
> supported distro that uses the host zstd.
> 
> It appears that pzstd is about 20% faster to decompress data than
> "zstd -T"; I think this is because zstd never does threaded
> decompression (at least on Ubuntu 18.04). Would you mind switching to
> using pzstd?
> 
> ---
> 
> root@0df785908a53:/bin# time seq 1000000000 | zstd -10 -T36 | zstd -d
> -T36 > /dev/null
> 
> real    0m35.503s
> user    11m34.695s
> sys     0m9.405s
> root@0df785908a53:/bin# time seq 1000000000 | zstd -10 -T36 | pzstd -d
> -p36 > /dev/null
> 
> real    0m27.830s
> user    15m18.733s
> sys     0m13.895s
> root@0df785908a53:/bin# time seq 1000000000 | pzstd -10 -p36 | zstd -d
> -T36 > /dev/null
> 
> real    0m28.134s
> user    14m42.322s
> sys     0m12.886s
> root@0df785908a53:/bin# time seq 1000000000 | pzstd -10 -p36 | pzstd
> -d -p36 > /dev/null
> 
> real    0m27.911s
> user    15m28.182s
> sys     0m17.036s
> 

I agree, I think we should try and use pzstd where it is available. This code
can be similar to the pigz code we have right now which was written for a
similar fallback reason.

As Peter mentions, the patch needs to go to the openembedded-core mailing list
too.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [poky] [PATCH 1/1] sstate: Switch to Zstandard compressor
  2021-10-02 14:13   ` Richard Purdie
@ 2021-10-04  8:24     ` Henry Kleynhans
  0 siblings, 0 replies; 7+ messages in thread
From: Henry Kleynhans @ 2021-10-04  8:24 UTC (permalink / raw)
  To: Richard Purdie, Joshua Watt; +Cc: poky, Michael van der Westhuizen

[-- Attachment #1: Type: text/plain, Size: 2662 bytes --]

Thanks for the feedback!  Will add support for pzstd and add in the openembedded mailing list.

Kind regards, Henry
________________________________
From: poky@lists.yoctoproject.org <poky@lists.yoctoproject.org> on behalf of Richard Purdie <richard.purdie@linuxfoundation.org>
Sent: 02 October 2021 15:13
To: Joshua Watt <JPEWhacker@gmail.com>; Henry Kleynhans <hkleynhans@fb.com>
Cc: poky@lists.yoctoproject.org <poky@lists.yoctoproject.org>; Michael van der Westhuizen <rmikey@fb.com>
Subject: Re: [poky] [PATCH 1/1] sstate: Switch to Zstandard compressor

On Fri, 2021-10-01 at 20:17 -0500, Joshua Watt wrote:
> On Fri, Oct 1, 2021 at 12:32 PM Henry Kleynhans via
> lists.yoctoproject.org <hkleynhans=fb.com@lists.yoctoproject.org>
> wrote:
> >
> > Patch revision, changes from the last version:
> > * Switch to Zstandard rather than adding it as an option.
> > * Specify 'zstd' threading and compression as arguments instead of
> >   using environment variables
> > * Remove debug output
> > * Non-posix if statements were removed as a result of replacing 'gzip'
> >   with 'zstd'
> >
> > I am not sure if we need support for 'pzstd'.  If this is a requirement
> > I can add it in a subsequent revision.
>
> Thanks, this looks pretty good. I was curious about the performance
> difference between pzstd and "zstd -T", so I ran a few numbers;
> specifically, I ran these on Ubuntu 18.04 which is our oldest
> supported distro that uses the host zstd.
>
> It appears that pzstd is about 20% faster to decompress data than
> "zstd -T"; I think this is because zstd never does threaded
> decompression (at least on Ubuntu 18.04). Would you mind switching to
> using pzstd?
>
> ---
>
> root@0df785908a53:/bin# time seq 1000000000 | zstd -10 -T36 | zstd -d
> -T36 > /dev/null
>
> real    0m35.503s
> user    11m34.695s
> sys     0m9.405s
> root@0df785908a53:/bin# time seq 1000000000 | zstd -10 -T36 | pzstd -d
> -p36 > /dev/null
>
> real    0m27.830s
> user    15m18.733s
> sys     0m13.895s
> root@0df785908a53:/bin# time seq 1000000000 | pzstd -10 -p36 | zstd -d
> -T36 > /dev/null
>
> real    0m28.134s
> user    14m42.322s
> sys     0m12.886s
> root@0df785908a53:/bin# time seq 1000000000 | pzstd -10 -p36 | pzstd
> -d -p36 > /dev/null
>
> real    0m27.911s
> user    15m28.182s
> sys     0m17.036s
>

I agree, I think we should try and use pzstd where it is available. This code
can be similar to the pigz code we have right now which was written for a
similar fallback reason.

As Peter mentions, the patch needs to go to the openembedded-core mailing list
too.

Cheers,

Richard


[-- Attachment #2: Type: text/html, Size: 4145 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-04  8:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-01 17:31 [PATCH 1/1] sstate: Switch to Zstandard compressor Henry Kleynhans
2021-10-01 17:31 ` [PATCH] sstate: Switch to ZStandard compressor support Henry Kleynhans
2021-10-02  9:58   ` [poky] " Alexander Kanavin
2021-10-02 13:34   ` Peter Kjellerstedt
2021-10-02  1:17 ` [poky] [PATCH 1/1] sstate: Switch to Zstandard compressor Joshua Watt
2021-10-02 14:13   ` Richard Purdie
2021-10-04  8:24     ` Henry Kleynhans

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.