All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sstate: Add ZStandard compressor support
@ 2021-10-01 10:11 hkleynhans
  2021-10-01 13:13 ` [poky] " Peter Kjellerstedt
  2021-10-01 13:52 ` Joshua Watt
  0 siblings, 2 replies; 6+ messages in thread
From: hkleynhans @ 2021-10-01 10:11 UTC (permalink / raw)
  To: poky; +Cc: hkleynhans, rmikey

This patch adds support to optionally use the Zstandard compressor for
ssate cache files.

Zstandard compression provides a significant improvement in
decompression speed as well as improvement in compression speed and disk
usage over the 'tgz' format in use.  Furthermore, its configurable
compression level offers a trade-off between time spent compressing
sstate cache files and disk space used by those files.  The reduced disk
usage also contributes to saving network traffic for those sharing their
sstate cache with others.

Zstandard should therefore be a good choice when:
* disk space is at a premium
* network speed / resources are limited
* the CI server can sstate packages can be created at high compression
* less CPU on the build server should be used for sstate decompression

Signed-off-by: Henry Kleynhans <hkleynhans@fb.com>
---
 meta/classes/sstate.bbclass        | 49 +++++++++++++++++++++++-------
 scripts/sstate-cache-management.sh | 40 ++++++++++++------------
 2 files changed, 58 insertions(+), 31 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 92a73114bb..a73d631679 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -1,17 +1,30 @@
 SSTATE_VERSION = "3"
 
+SSTATE_USE_ZSTD ?= "0"
+SSTATE_ZSTD_CLEVEL ?= "3"
+SSTATE_ZSTD_NTHREADS ?= "0"
+
 SSTATE_MANIFESTS ?= "${TMPDIR}/sstate-control"
 SSTATE_MANFILEPREFIX = "${SSTATE_MANIFESTS}/manifest-${SSTATE_MANMACH}-${PN}"
 
-def generate_sstatefn(spec, hash, taskname, siginfo, d):
+def generate_sstate_ext(use_zstd, d):
+    if use_zstd == "1":
+        return "tar.zst"
+    return "tgz"
+
+def generate_sstatefn(spec, hash, taskname, siginfo, use_zstd, d):
     if taskname is None:
        return ""
     extension = ".tgz"
+    if use_zstd == "1":
+       extension = ".tar.zst"
     # 8 chars reserved for siginfo
     limit = 254 - 8
     if siginfo:
         limit = 254
         extension = ".tgz.siginfo"
+        if use_zstd == "1":
+           extension = ".tar.zst.siginfo"
     if not hash:
         hash = "INVALID"
     fn = spec + hash + "_" + taskname + extension
@@ -33,11 +46,12 @@ def generate_sstatefn(spec, hash, taskname, siginfo, d):
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d.getVar('SSTATE_CURRTASK'), False, d)}"
+SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d.getVar('SSTATE_CURRTASK'), False, d.getVar('SSTATE_USE_ZSTD'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
-SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/*/${SSTATE_PKGSPEC}*_${SSTATE_PATH_CURRTASK}.tgz*"
+SSTATE_PKG_EXT    = "${@generate_sstate_ext(d.getVar('SSTATE_USE_ZSTD'), d)}"
+SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/*/${SSTATE_PKGSPEC}*_${SSTATE_PATH_CURRTASK}.${SSTATE_PKG_EXT}*"
 
 # explicitly make PV to depend on evaluated value of PV variable
 PV[vardepvalue] = "${PV}"
@@ -825,12 +839,20 @@ sstate_create_package () {
 	mkdir --mode=0775 -p `dirname ${SSTATE_PKG}`
 	TFILE=`mktemp ${SSTATE_PKG}.XXXXXXXX`
 
-	# Use pigz if available
-	OPT="-czS"
-	if [ -x "$(command -v pigz)" ]; then
-		OPT="-I pigz -cS"
+	if [ x"${SSTATE_USE_ZSTD}" != x"0" ]; then
+		export ZSTD_CLEVEL="${SSTATE_ZSTD_CLEVEL}"
+		export ZSTD_NBTHREADS="${SSTATE_ZSTD_NTHREADS}"
+		OPT="-I zstd -cS"
+	else
+		# Use pigz if available
+		OPT="-czS"
+		if [ -x "$(command -v pigz)" ]; then
+			OPT="-I pigz -cS"
+		fi
 	fi
 
+  echo "OPTS=${OPTS}"
+
 	# Need to handle empty directories
 	if [ "$(ls -A)" ]; then
 		set +e
@@ -880,7 +902,12 @@ python sstate_report_unihash() {
 # Will be run from within SSTATE_INSTDIR.
 #
 sstate_unpack_package () {
-	tar -xvzf ${SSTATE_PKG}
+	if [[ "${SSTATE_PKG}" == *.tar.zst ]]; then
+		export ZSTD_NBTHREADS="${SSTATE_ZSTD_NTHREADS}"
+		tar -I zstd -xvf ${SSTATE_PKG}
+	else
+		tar -xvzf ${SSTATE_PKG}
+	fi
 	# update .siginfo atime on local/NFS mirror
 	[ -O ${SSTATE_PKG}.siginfo ] && [ -w ${SSTATE_PKG}.siginfo ] && [ -h ${SSTATE_PKG}.siginfo ] && touch -a ${SSTATE_PKG}.siginfo
 	# Use "! -w ||" to return true for read only files
@@ -922,7 +949,7 @@ def sstate_checkhashes(sq_data, d, siginfo=False, currentcount=0, summary=True,
 
         spec, extrapath, tname = getpathcomponents(tid, d)
 
-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, gethash(tid), tname, siginfo, d))
+        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, gethash(tid), tname, siginfo, d.getVar('SSTATE_USE_ZSTD'), d))
 
         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -1016,11 +1043,11 @@ def sstate_checkhashes(sq_data, d, siginfo=False, currentcount=0, summary=True,
         evdata = {'missed': [], 'found': []};
         for tid in missed:
             spec, extrapath, tname = getpathcomponents(tid, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(tid), tname, False, d))
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(tid), tname, siginfo, False, d))
             evdata['missed'].append((bb.runqueue.fn_from_tid(tid), bb.runqueue.taskname_from_tid(tid), gethash(tid), sstatefile ) )
         for tid in found:
             spec, extrapath, tname = getpathcomponents(tid, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(tid), tname, False, d))
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(tid), tname, siginfo, False, d))
             evdata['found'].append((bb.runqueue.fn_from_tid(tid), bb.runqueue.taskname_from_tid(tid), gethash(tid), sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
diff --git a/scripts/sstate-cache-management.sh b/scripts/sstate-cache-management.sh
index f1706a2229..61c7f9f763 100755
--- a/scripts/sstate-cache-management.sh
+++ b/scripts/sstate-cache-management.sh
@@ -114,7 +114,7 @@ echo_error () {
 # * Add .done/.siginfo to the remove list
 # * Add destination of symlink to the remove list
 #
-# $1: output file, others: sstate cache file (.tgz)
+# $1: output file, others: sstate cache file (.tgz or .tar.zstd)
 gen_rmlist (){
   local rmlist_file="$1"
   shift
@@ -131,13 +131,13 @@ gen_rmlist (){
               dest="`readlink -e $i`"
               if [ -n "$dest" ]; then
                   echo $dest >> $rmlist_file
-                  # Remove the .siginfo when .tgz is removed
+                  # Remove the .siginfo when .tgz or .tar.zst is removed
                   if [ -f "$dest.siginfo" ]; then
                       echo $dest.siginfo >> $rmlist_file
                   fi
               fi
           fi
-          # Add the ".tgz.done" and ".siginfo.done" (may exist in the future)
+          # Add the ".tgz.done" or ".tar.zst.done" and ".siginfo.done" (may exist in the future)
           base_fn="${i##/*/}"
           t_fn="$base_fn.done"
           s_fn="$base_fn.siginfo.done"
@@ -188,10 +188,10 @@ remove_duplicated () {
   total_files=`find $cache_dir -name 'sstate*' | wc -l`
   # Save all the sstate files in a file
   sstate_files_list=`mktemp` || exit 1
-  find $cache_dir -name 'sstate:*:*:*:*:*:*:*.tgz*' >$sstate_files_list
+  find $cache_dir -name 'sstate:*:*:*:*:*:*:*.tgz*' -o -iname 'sstate:*:*:*:*:*:*:*.tar.zst*' >$sstate_files_list
 
   echo "Figuring out the suffixes in the sstate cache dir ... "
-  sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tgz.*%\1%g' $sstate_files_list | sort -u`"
+  sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.\(tgz\|tar\.\zst\).*%\1%g' $sstate_files_list | sort -u`"
   echo "Done"
   echo "The following suffixes have been found in the cache dir:"
   echo $sstate_suffixes
@@ -200,10 +200,10 @@ remove_duplicated () {
   # Using this SSTATE_PKGSPEC definition it's 6th colon separated field
   # SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
   for arch in $all_archs; do
-      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:$arch:[^:]*:[^:]*\.tgz$" $sstate_files_list
+      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:$arch:[^:]*:[^:]*\.\(tgz\|tar\.\zst\)$" $sstate_files_list
       [ $? -eq 0 ] && ava_archs="$ava_archs $arch"
       # ${builder_arch}_$arch used by toolchain sstate
-      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:${builder_arch}_$arch:[^:]*:[^:]*\.tgz$" $sstate_files_list
+      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:${builder_arch}_$arch:[^:]*:[^:]*\.\(tgz\|tar\.zst\)$" $sstate_files_list
       [ $? -eq 0 ] && ava_archs="$ava_archs ${builder_arch}_$arch"
   done
   echo "Done"
@@ -219,13 +219,13 @@ remove_duplicated () {
           continue
       fi
       # Total number of files including .siginfo and .done files
-      total_files_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz.*" $sstate_files_list | wc -l 2>/dev/null`
-      total_tgz_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz$" $sstate_files_list | wc -l 2>/dev/null`
+      total_files_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.\(tgz\|tar\.zst\).*" $sstate_files_list | wc -l 2>/dev/null`
+      total_archive_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.\(tgz\|tar\.zst\)$" $sstate_files_list | wc -l 2>/dev/null`
       # Save the file list to a file, some suffix's file may not exist
-      grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz.*" $sstate_files_list >$list_suffix 2>/dev/null
-      local deleted_tgz=0
+      grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.\(tgz\|tar\.zst\).*" $sstate_files_list >$list_suffix 2>/dev/null
+      local deleted_archives=0
       local deleted_files=0
-      for ext in tgz tgz.siginfo tgz.done; do
+      for ext in tgz tgz.siginfo tgz.done tar.zst tar.zst.siginfo tar.zst.done; do
           echo "Figuring out the sstate:xxx_$suffix.$ext ... "
           # Uniq BPNs
           file_names=`for arch in $ava_archs ""; do
@@ -268,19 +268,19 @@ remove_duplicated () {
               done
           done
       done
-      deleted_tgz=`cat $rm_list.* 2>/dev/null | grep ".tgz$" | wc -l`
+      deleted_archives=`cat $rm_list.* 2>/dev/null | grep ".\(tgz\|tar\.zst\)$" | wc -l`
       deleted_files=`cat $rm_list.* 2>/dev/null | wc -l`
       [ "$deleted_files" -gt 0 -a $debug -gt 0 ] && cat $rm_list.*
-      echo "($deleted_tgz out of $total_tgz_suffix .tgz files for $suffix suffix will be removed or $deleted_files out of $total_files_suffix when counting also .siginfo and .done files)"
+      echo "($deleted_archives out of $total_archives_suffix .tgz or .tar.zst files for $suffix suffix will be removed or $deleted_files out of $total_files_suffix when counting also .siginfo and .done files)"
       let total_deleted=$total_deleted+$deleted_files
   done
-  deleted_tgz=0
+  deleted_archives=0
   rm_old_list=$remove_listdir/sstate-old-filenames
-  find $cache_dir -name 'sstate-*.tgz' >$rm_old_list
-  [ -s "$rm_old_list" ] && deleted_tgz=`cat $rm_old_list | grep ".tgz$" | wc -l`
+  find $cache_dir -name 'sstate-*.tgz' -o -name 'sstate-*.tar.zst' >$rm_old_list
+  [ -s "$rm_old_list" ] && deleted_archives=`cat $rm_old_list | grep ".\(tgz\|tar\.zst\)$" | wc -l`
   [ -s "$rm_old_list" ] && deleted_files=`cat $rm_old_list | wc -l`
   [ -s "$rm_old_list" -a $debug -gt 0 ] && cat $rm_old_list
-  echo "($deleted_tgz .tgz files with old sstate-* filenames will be removed or $deleted_files when counting also .siginfo and .done files)"
+  echo "($deleted_archives .tgz or .tar.zst files with old sstate-* filenames will be removed or $deleted_files when counting also .siginfo and .done files)"
   let total_deleted=$total_deleted+$deleted_files
 
   rm -f $list_suffix
@@ -289,7 +289,7 @@ remove_duplicated () {
       read_confirm
       if [ "$confirm" = "y" -o "$confirm" = "Y" ]; then
           for list in `ls $remove_listdir/`; do
-              echo "Removing $list.tgz (`cat $remove_listdir/$list | wc -w` files) ... "
+              echo "Removing $list archive (`cat $remove_listdir/$list | wc -w` files) ... "
               # Remove them one by one to avoid the argument list too long error
               for i in `cat $remove_listdir/$list`; do
                   rm -f $verbose $i
@@ -322,7 +322,7 @@ rm_by_stamps (){
   find $cache_dir -type f -name 'sstate*' | sort -u -o $cache_list
 
   echo "Figuring out the suffixes in the sstate cache dir ... "
-  local sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tgz.*%\1%g' $cache_list | sort -u`"
+  local sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.\(tgz\|tar\.zst\).*%\1%g' $cache_list | sort -u`"
   echo "Done"
   echo "The following suffixes have been found in the cache dir:"
   echo $sstate_suffixes
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [poky] [PATCH] sstate: Add ZStandard compressor support
  2021-10-01 10:11 [PATCH] sstate: Add ZStandard compressor support hkleynhans
@ 2021-10-01 13:13 ` Peter Kjellerstedt
  2021-10-01 13:16   ` Richard Purdie
  2021-10-01 13:52 ` Joshua Watt
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Kjellerstedt @ 2021-10-01 13:13 UTC (permalink / raw)
  To: hkleynhans, poky; +Cc: rmikey

> -----Original Message-----
> From: poky@lists.yoctoproject.org <poky@lists.yoctoproject.org> On Behalf
> Of Henry Kleynhans via lists.yoctoproject.org
> Sent: den 1 oktober 2021 12:11
> To: poky@lists.yoctoproject.org
> Cc: hkleynhans@fb.com; rmikey@fb.com
> Subject: [poky] [PATCH] sstate: Add ZStandard compressor support
> 
> This patch adds support to optionally use the Zstandard compressor for
> ssate cache files.
> 
> Zstandard compression provides a significant improvement in
> decompression speed as well as improvement in compression speed and disk
> usage over the 'tgz' format in use.  Furthermore, its configurable
> compression level offers a trade-off between time spent compressing
> sstate cache files and disk space used by those files.  The reduced disk
> usage also contributes to saving network traffic for those sharing their
> sstate cache with others.
> 
> Zstandard should therefore be a good choice when:
> * disk space is at a premium
> * network speed / resources are limited
> * the CI server can sstate packages can be created at high compression
> * less CPU on the build server should be used for sstate decompression
> 
> Signed-off-by: Henry Kleynhans <hkleynhans@fb.com>
> ---
>  meta/classes/sstate.bbclass        | 49 +++++++++++++++++++++++-------
>  scripts/sstate-cache-management.sh | 40 ++++++++++++------------
>  2 files changed, 58 insertions(+), 31 deletions(-)
> 
> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> index 92a73114bb..a73d631679 100644
> --- a/meta/classes/sstate.bbclass
> +++ b/meta/classes/sstate.bbclass
> @@ -1,17 +1,30 @@
>  SSTATE_VERSION = "3"
> 
> +SSTATE_USE_ZSTD ?= "0"
> +SSTATE_ZSTD_CLEVEL ?= "3"
> +SSTATE_ZSTD_NTHREADS ?= "0"

Do we really need to make this configurable? Can't we just decide 
to use zstd and be done with it?

> +
>  SSTATE_MANIFESTS ?= "${TMPDIR}/sstate-control"
>  SSTATE_MANFILEPREFIX = "${SSTATE_MANIFESTS}/manifest-${SSTATE_MANMACH}-${PN}"
> 
> -def generate_sstatefn(spec, hash, taskname, siginfo, d):
> +def generate_sstate_ext(use_zstd, d):
> +    if use_zstd == "1":
> +        return "tar.zst"
> +    return "tgz"
> +
> +def generate_sstatefn(spec, hash, taskname, siginfo, use_zstd, d):
>      if taskname is None:
>         return ""
>      extension = ".tgz"
> +    if use_zstd == "1":
> +       extension = ".tar.zst"
>      # 8 chars reserved for siginfo
>      limit = 254 - 8
>      if siginfo:
>          limit = 254
>          extension = ".tgz.siginfo"
> +        if use_zstd == "1":
> +           extension = ".tar.zst.siginfo"
>      if not hash:
>          hash = "INVALID"
>      fn = spec + hash + "_" + taskname + extension
> @@ -33,11 +46,12 @@ def generate_sstatefn(spec, hash, taskname, siginfo, d):
>  SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
>  SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
>  SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
> -SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d.getVar('SSTATE_CURRTASK'), False, d)}"
> +SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d.getVar('SSTATE_CURRTASK'), False, d.getVar('SSTATE_USE_ZSTD'), d)}"
>  SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
>  SSTATE_EXTRAPATH   = ""
>  SSTATE_EXTRAPATHWILDCARD = ""
> -SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/*/${SSTATE_PKGSPEC}*_${SSTATE_PATH_CURRTASK}.tgz*"
> +SSTATE_PKG_EXT    = "${@generate_sstate_ext(d.getVar('SSTATE_USE_ZSTD'), d)}"
> +SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/*/${SSTATE_PKGSPEC}*_${SSTATE_PATH_CURRTASK}.${SSTATE_PKG_EXT}*"
> 
>  # explicitly make PV to depend on evaluated value of PV variable
>  PV[vardepvalue] = "${PV}"
> @@ -825,12 +839,20 @@ sstate_create_package () {
>  	mkdir --mode=0775 -p `dirname ${SSTATE_PKG}`
>  	TFILE=`mktemp ${SSTATE_PKG}.XXXXXXXX`
> 
> -	# Use pigz if available
> -	OPT="-czS"
> -	if [ -x "$(command -v pigz)" ]; then
> -		OPT="-I pigz -cS"
> +	if [ x"${SSTATE_USE_ZSTD}" != x"0" ]; then
> +		export ZSTD_CLEVEL="${SSTATE_ZSTD_CLEVEL}"
> +		export ZSTD_NBTHREADS="${SSTATE_ZSTD_NTHREADS}"
> +		OPT="-I zstd -cS"
> +	else
> +		# Use pigz if available
> +		OPT="-czS"
> +		if [ -x "$(command -v pigz)" ]; then
> +			OPT="-I pigz -cS"
> +		fi
>  	fi
> 
> +  echo "OPTS=${OPTS}"
> +

Remove debug output again.

>  	# Need to handle empty directories
>  	if [ "$(ls -A)" ]; then
>  		set +e
> @@ -880,7 +902,12 @@ python sstate_report_unihash() {
>  # Will be run from within SSTATE_INSTDIR.
>  #
>  sstate_unpack_package () {
> -	tar -xvzf ${SSTATE_PKG}
> +	if [[ "${SSTATE_PKG}" == *.tar.zst ]]; then

Don't use [[ ]]. It is a bashism. Instead you can use:

	case ${SSTATE_PKG} in
		*.tar.zst)
			export ZSTD_NBTHREADS="${SSTATE_ZSTD_NTHREADS}"
			tar -I zstd -xvf ${SSTATE_PKG}
			;;
		*)
			tar -xvzf ${SSTATE_PKG}
			;;
	esac

However, tar should be able to figure out the decompressor itself, 
so it should be possible to just do:

	export ZSTD_NBTHREADS="${SSTATE_ZSTD_NTHREADS}"
	tar -xvf ${SSTATE_PKG}

> +		export ZSTD_NBTHREADS="${SSTATE_ZSTD_NTHREADS}"
> +		tar -I zstd -xvf ${SSTATE_PKG}
> +	else
> +		tar -xvzf ${SSTATE_PKG}
> +	fi
>  	# update .siginfo atime on local/NFS mirror
>  	[ -O ${SSTATE_PKG}.siginfo ] && [ -w ${SSTATE_PKG}.siginfo ] && [ -h ${SSTATE_PKG}.siginfo ] && touch -a ${SSTATE_PKG}.siginfo
>  	# Use "! -w ||" to return true for read only files
> @@ -922,7 +949,7 @@ def sstate_checkhashes(sq_data, d, siginfo=False, currentcount=0, summary=True,
> 
>          spec, extrapath, tname = getpathcomponents(tid, d)
> 
> -        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, gethash(tid), tname, siginfo, d))
> +        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, gethash(tid), tname, siginfo, d.getVar('SSTATE_USE_ZSTD'), d))
> 
>          if os.path.exists(sstatefile):
>              bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
> @@ -1016,11 +1043,11 @@ def sstate_checkhashes(sq_data, d, siginfo=False, currentcount=0, summary=True,
>          evdata = {'missed': [], 'found': []};
>          for tid in missed:
>              spec, extrapath, tname = getpathcomponents(tid, d)
> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(tid), tname, False, d))
> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(tid), tname, siginfo, False, d))
>              evdata['missed'].append((bb.runqueue.fn_from_tid(tid), bb.runqueue.taskname_from_tid(tid), gethash(tid), sstatefile ) )
>          for tid in found:
>              spec, extrapath, tname = getpathcomponents(tid, d)
> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(tid), tname, False, d))
> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(tid), tname, siginfo, False, d))
>              evdata['found'].append((bb.runqueue.fn_from_tid(tid), bb.runqueue.taskname_from_tid(tid), gethash(tid), sstatefile ) )
>          bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
> 
> diff --git a/scripts/sstate-cache-management.sh b/scripts/sstate-cache-management.sh
> index f1706a2229..61c7f9f763 100755
> --- a/scripts/sstate-cache-management.sh
> +++ b/scripts/sstate-cache-management.sh
> @@ -114,7 +114,7 @@ echo_error () {
>  # * Add .done/.siginfo to the remove list
>  # * Add destination of symlink to the remove list
>  #
> -# $1: output file, others: sstate cache file (.tgz)
> +# $1: output file, others: sstate cache file (.tgz or .tar.zstd)
>  gen_rmlist (){
>    local rmlist_file="$1"
>    shift
> @@ -131,13 +131,13 @@ gen_rmlist (){
>                dest="`readlink -e $i`"
>                if [ -n "$dest" ]; then
>                    echo $dest >> $rmlist_file
> -                  # Remove the .siginfo when .tgz is removed
> +                  # Remove the .siginfo when .tgz or .tar.zst is removed
>                    if [ -f "$dest.siginfo" ]; then
>                        echo $dest.siginfo >> $rmlist_file
>                    fi
>                fi
>            fi
> -          # Add the ".tgz.done" and ".siginfo.done" (may exist in the future)
> +          # Add the ".tgz.done" or ".tar.zst.done" and ".siginfo.done" (may exist in the future)
>            base_fn="${i##/*/}"
>            t_fn="$base_fn.done"
>            s_fn="$base_fn.siginfo.done"
> @@ -188,10 +188,10 @@ remove_duplicated () {
>    total_files=`find $cache_dir -name 'sstate*' | wc -l`
>    # Save all the sstate files in a file
>    sstate_files_list=`mktemp` || exit 1
> -  find $cache_dir -name 'sstate:*:*:*:*:*:*:*.tgz*' >$sstate_files_list
> +  find $cache_dir -name 'sstate:*:*:*:*:*:*:*.tgz*' -o -iname 'sstate:*:*:*:*:*:*:*.tar.zst*' >$sstate_files_list
> 
>    echo "Figuring out the suffixes in the sstate cache dir ... "
> -  sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tgz.*%\1%g' $sstate_files_list | sort -u`"
> +  sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.\(tgz\|tar\.\zst\).*%\1%g' $sstate_files_list | sort -u`"
>    echo "Done"
>    echo "The following suffixes have been found in the cache dir:"
>    echo $sstate_suffixes
> @@ -200,10 +200,10 @@ remove_duplicated () {
>    # Using this SSTATE_PKGSPEC definition it's 6th colon separated field
>    # SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
>    for arch in $all_archs; do
> -      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:$arch:[^:]*:[^:]*\.tgz$" $sstate_files_list
> +      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:$arch:[^:]*:[^:]*\.\(tgz\|tar\.\zst\)$" $sstate_files_list
>        [ $? -eq 0 ] && ava_archs="$ava_archs $arch"
>        # ${builder_arch}_$arch used by toolchain sstate
> -      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:${builder_arch}_$arch:[^:]*:[^:]*\.tgz$ " $sstate_files_list
> +      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:${builder_arch}_$arch:[^:]*:[^:]*\.\(tgz\|tar\.zst\)$" $sstate_files_list
>        [ $? -eq 0 ] && ava_archs="$ava_archs ${builder_arch}_$arch"
>    done
>    echo "Done"
> @@ -219,13 +219,13 @@ remove_duplicated () {
>            continue
>        fi
>        # Total number of files including .siginfo and .done files
> -      total_files_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz.*" $sstate_files_list | wc -l 2>/dev/null`
> -      total_tgz_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz$" $sstate_files_list | wc -l 2>/dev/null`
> +      total_files_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.\(tgz\|tar\.zst\).*" $sstate_files_list | wc -l 2>/dev/null`
> +      total_archive_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.\(tgz\|tar\.zst\)$" $sstate_files_list | wc -l 2>/dev/null`
>        # Save the file list to a file, some suffix's file may not exist
> -      grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz.*" $sstate_files_list >$list_suffix 2>/dev/null
> -      local deleted_tgz=0
> +      grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.\(tgz\|tar\.zst\).*" $sstate_files_list >$list_suffix 2>/dev/null
> +      local deleted_archives=0
>        local deleted_files=0
> -      for ext in tgz tgz.siginfo tgz.done; do
> +      for ext in tgz tgz.siginfo tgz.done tar.zst tar.zst.siginfo tar.zst.done; do
>            echo "Figuring out the sstate:xxx_$suffix.$ext ... "
>            # Uniq BPNs
>            file_names=`for arch in $ava_archs ""; do
> @@ -268,19 +268,19 @@ remove_duplicated () {
>                done
>            done
>        done
> -      deleted_tgz=`cat $rm_list.* 2>/dev/null | grep ".tgz$" | wc -l`
> +      deleted_archives=`cat $rm_list.* 2>/dev/null | grep ".\(tgz\|tar\.zst\)$" | wc -l`
>        deleted_files=`cat $rm_list.* 2>/dev/null | wc -l`
>        [ "$deleted_files" -gt 0 -a $debug -gt 0 ] && cat $rm_list.*
> -      echo "($deleted_tgz out of $total_tgz_suffix .tgz files for $suffix suffix will be removed or $deleted_files out of $total_files_suffix when counting also .siginfo and .done files)"
> +      echo "($deleted_archives out of $total_archives_suffix .tgz or .tar.zst files for $suffix suffix will be removed or $deleted_files out of $total_files_suffix when counting also .siginfo and .done files)"
>        let total_deleted=$total_deleted+$deleted_files
>    done
> -  deleted_tgz=0
> +  deleted_archives=0
>    rm_old_list=$remove_listdir/sstate-old-filenames
> -  find $cache_dir -name 'sstate-*.tgz' >$rm_old_list
> -  [ -s "$rm_old_list" ] && deleted_tgz=`cat $rm_old_list | grep ".tgz$" | wc -l`
> +  find $cache_dir -name 'sstate-*.tgz' -o -name 'sstate-*.tar.zst' >$rm_old_list
> +  [ -s "$rm_old_list" ] && deleted_archives=`cat $rm_old_list | grep ".\(tgz\|tar\.zst\)$" | wc -l`
>    [ -s "$rm_old_list" ] && deleted_files=`cat $rm_old_list | wc -l`
>    [ -s "$rm_old_list" -a $debug -gt 0 ] && cat $rm_old_list
> -  echo "($deleted_tgz .tgz files with old sstate-* filenames will be removed or $deleted_files when counting also .siginfo and .done files)"
> +  echo "($deleted_archives .tgz or .tar.zst files with old sstate-* filenames will be removed or $deleted_files when counting also .siginfo and .done files)"
>    let total_deleted=$total_deleted+$deleted_files
> 
>    rm -f $list_suffix
> @@ -289,7 +289,7 @@ remove_duplicated () {
>        read_confirm
>        if [ "$confirm" = "y" -o "$confirm" = "Y" ]; then
>            for list in `ls $remove_listdir/`; do
> -              echo "Removing $list.tgz (`cat $remove_listdir/$list | wc -w` files) ... "
> +              echo "Removing $list archive (`cat $remove_listdir/$list | wc -w` files) ... "
>                # Remove them one by one to avoid the argument list too long error
>                for i in `cat $remove_listdir/$list`; do
>                    rm -f $verbose $i
> @@ -322,7 +322,7 @@ rm_by_stamps (){
>    find $cache_dir -type f -name 'sstate*' | sort -u -o $cache_list
> 
>    echo "Figuring out the suffixes in the sstate cache dir ... "
> -  local sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tgz.*%\1%g' $cache_list | sort -u`"
> +  local sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.\(tgz\|tar\.zst\).*%\1%g' $cache_list | sort -u`"
>    echo "Done"
>    echo "The following suffixes have been found in the cache dir:"
>    echo $sstate_suffixes
> --
> 2.30.2

//Peter


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [poky] [PATCH] sstate: Add ZStandard compressor support
  2021-10-01 13:13 ` [poky] " Peter Kjellerstedt
@ 2021-10-01 13:16   ` Richard Purdie
  2021-10-01 13:44     ` Joshua Watt
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Purdie @ 2021-10-01 13:16 UTC (permalink / raw)
  To: Peter Kjellerstedt, hkleynhans, poky; +Cc: rmikey

On Fri, 2021-10-01 at 13:13 +0000, Peter Kjellerstedt wrote:
> > -----Original Message-----
> > From: poky@lists.yoctoproject.org <poky@lists.yoctoproject.org> On Behalf
> > Of Henry Kleynhans via lists.yoctoproject.org
> > Sent: den 1 oktober 2021 12:11
> > To: poky@lists.yoctoproject.org
> > Cc: hkleynhans@fb.com; rmikey@fb.com
> > Subject: [poky] [PATCH] sstate: Add ZStandard compressor support
> > 
> > This patch adds support to optionally use the Zstandard compressor for
> > ssate cache files.
> > 
> > Zstandard compression provides a significant improvement in
> > decompression speed as well as improvement in compression speed and disk
> > usage over the 'tgz' format in use.  Furthermore, its configurable
> > compression level offers a trade-off between time spent compressing
> > sstate cache files and disk space used by those files.  The reduced disk
> > usage also contributes to saving network traffic for those sharing their
> > sstate cache with others.
> > 
> > Zstandard should therefore be a good choice when:
> > * disk space is at a premium
> > * network speed / resources are limited
> > * the CI server can sstate packages can be created at high compression
> > * less CPU on the build server should be used for sstate decompression
> > 
> > Signed-off-by: Henry Kleynhans <hkleynhans@fb.com>
> > ---
> >  meta/classes/sstate.bbclass        | 49 +++++++++++++++++++++++-------
> >  scripts/sstate-cache-management.sh | 40 ++++++++++++------------
> >  2 files changed, 58 insertions(+), 31 deletions(-)
> > 
> > diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> > index 92a73114bb..a73d631679 100644
> > --- a/meta/classes/sstate.bbclass
> > +++ b/meta/classes/sstate.bbclass
> > @@ -1,17 +1,30 @@
> >  SSTATE_VERSION = "3"
> > 
> > +SSTATE_USE_ZSTD ?= "0"
> > +SSTATE_ZSTD_CLEVEL ?= "3"
> > +SSTATE_ZSTD_NTHREADS ?= "0"
> 
> Do we really need to make this configurable? Can't we just decide 
> to use zstd and be done with it?

I have to admit I was also wondering that. We started requiring zstd on the host
system so it may just be better to switch over?

Cheers,

Richard




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [poky] [PATCH] sstate: Add ZStandard compressor support
  2021-10-01 13:16   ` Richard Purdie
@ 2021-10-01 13:44     ` Joshua Watt
  2021-10-01 14:19       ` Henry Kleynhans
  0 siblings, 1 reply; 6+ messages in thread
From: Joshua Watt @ 2021-10-01 13:44 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Peter Kjellerstedt, hkleynhans, poky, rmikey

On Fri, Oct 1, 2021 at 8:16 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
>
> On Fri, 2021-10-01 at 13:13 +0000, Peter Kjellerstedt wrote:
> > > -----Original Message-----
> > > From: poky@lists.yoctoproject.org <poky@lists.yoctoproject.org> On Behalf
> > > Of Henry Kleynhans via lists.yoctoproject.org
> > > Sent: den 1 oktober 2021 12:11
> > > To: poky@lists.yoctoproject.org
> > > Cc: hkleynhans@fb.com; rmikey@fb.com
> > > Subject: [poky] [PATCH] sstate: Add ZStandard compressor support
> > >
> > > This patch adds support to optionally use the Zstandard compressor for
> > > ssate cache files.
> > >
> > > Zstandard compression provides a significant improvement in
> > > decompression speed as well as improvement in compression speed and disk
> > > usage over the 'tgz' format in use.  Furthermore, its configurable
> > > compression level offers a trade-off between time spent compressing
> > > sstate cache files and disk space used by those files.  The reduced disk
> > > usage also contributes to saving network traffic for those sharing their
> > > sstate cache with others.
> > >
> > > Zstandard should therefore be a good choice when:
> > > * disk space is at a premium
> > > * network speed / resources are limited
> > > * the CI server can sstate packages can be created at high compression
> > > * less CPU on the build server should be used for sstate decompression
> > >
> > > Signed-off-by: Henry Kleynhans <hkleynhans@fb.com>
> > > ---
> > >  meta/classes/sstate.bbclass        | 49 +++++++++++++++++++++++-------
> > >  scripts/sstate-cache-management.sh | 40 ++++++++++++------------
> > >  2 files changed, 58 insertions(+), 31 deletions(-)
> > >
> > > diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> > > index 92a73114bb..a73d631679 100644
> > > --- a/meta/classes/sstate.bbclass
> > > +++ b/meta/classes/sstate.bbclass
> > > @@ -1,17 +1,30 @@
> > >  SSTATE_VERSION = "3"
> > >
> > > +SSTATE_USE_ZSTD ?= "0"
> > > +SSTATE_ZSTD_CLEVEL ?= "3"
> > > +SSTATE_ZSTD_NTHREADS ?= "0"
> >
> > Do we really need to make this configurable? Can't we just decide
> > to use zstd and be done with it?
>
> I have to admit I was also wondering that. We started requiring zstd on the host
> system so it may just be better to switch over?

Since zstd is required and a clear winner in compression/decompression
time, I think requiring it and simplifying the sstate code to only use
zstd would be fine.

>
> Cheers,
>
> Richard
>
>
>
>
> 
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [poky] [PATCH] sstate: Add ZStandard compressor support
  2021-10-01 10:11 [PATCH] sstate: Add ZStandard compressor support hkleynhans
  2021-10-01 13:13 ` [poky] " Peter Kjellerstedt
@ 2021-10-01 13:52 ` Joshua Watt
  1 sibling, 0 replies; 6+ messages in thread
From: Joshua Watt @ 2021-10-01 13:52 UTC (permalink / raw)
  To: hkleynhans; +Cc: poky, rmikey

On Fri, Oct 1, 2021 at 5:11 AM Henry Kleynhans via
lists.yoctoproject.org <hkleynhans=fb.com@lists.yoctoproject.org>
wrote:
>
> This patch adds support to optionally use the Zstandard compressor for
> ssate cache files.
>
> Zstandard compression provides a significant improvement in
> decompression speed as well as improvement in compression speed and disk
> usage over the 'tgz' format in use.  Furthermore, its configurable
> compression level offers a trade-off between time spent compressing
> sstate cache files and disk space used by those files.  The reduced disk
> usage also contributes to saving network traffic for those sharing their
> sstate cache with others.
>
> Zstandard should therefore be a good choice when:
> * disk space is at a premium
> * network speed / resources are limited
> * the CI server can sstate packages can be created at high compression
> * less CPU on the build server should be used for sstate decompression
>
> Signed-off-by: Henry Kleynhans <hkleynhans@fb.com>
> ---
>  meta/classes/sstate.bbclass        | 49 +++++++++++++++++++++++-------
>  scripts/sstate-cache-management.sh | 40 ++++++++++++------------
>  2 files changed, 58 insertions(+), 31 deletions(-)
>
> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> index 92a73114bb..a73d631679 100644
> --- a/meta/classes/sstate.bbclass
> +++ b/meta/classes/sstate.bbclass
> @@ -1,17 +1,30 @@
>  SSTATE_VERSION = "3"
>
> +SSTATE_USE_ZSTD ?= "0"
> +SSTATE_ZSTD_CLEVEL ?= "3"
> +SSTATE_ZSTD_NTHREADS ?= "0"
> +
>  SSTATE_MANIFESTS ?= "${TMPDIR}/sstate-control"
>  SSTATE_MANFILEPREFIX = "${SSTATE_MANIFESTS}/manifest-${SSTATE_MANMACH}-${PN}"
>
> -def generate_sstatefn(spec, hash, taskname, siginfo, d):
> +def generate_sstate_ext(use_zstd, d):
> +    if use_zstd == "1":
> +        return "tar.zst"
> +    return "tgz"
> +
> +def generate_sstatefn(spec, hash, taskname, siginfo, use_zstd, d):
>      if taskname is None:
>         return ""
>      extension = ".tgz"
> +    if use_zstd == "1":
> +       extension = ".tar.zst"
>      # 8 chars reserved for siginfo
>      limit = 254 - 8
>      if siginfo:
>          limit = 254
>          extension = ".tgz.siginfo"
> +        if use_zstd == "1":
> +           extension = ".tar.zst.siginfo"
>      if not hash:
>          hash = "INVALID"
>      fn = spec + hash + "_" + taskname + extension
> @@ -33,11 +46,12 @@ def generate_sstatefn(spec, hash, taskname, siginfo, d):
>  SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
>  SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
>  SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
> -SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d.getVar('SSTATE_CURRTASK'), False, d)}"
> +SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d.getVar('SSTATE_CURRTASK'), False, d.getVar('SSTATE_USE_ZSTD'), d)}"
>  SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
>  SSTATE_EXTRAPATH   = ""
>  SSTATE_EXTRAPATHWILDCARD = ""
> -SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/*/${SSTATE_PKGSPEC}*_${SSTATE_PATH_CURRTASK}.tgz*"
> +SSTATE_PKG_EXT    = "${@generate_sstate_ext(d.getVar('SSTATE_USE_ZSTD'), d)}"
> +SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/*/${SSTATE_PKGSPEC}*_${SSTATE_PATH_CURRTASK}.${SSTATE_PKG_EXT}*"
>
>  # explicitly make PV to depend on evaluated value of PV variable
>  PV[vardepvalue] = "${PV}"
> @@ -825,12 +839,20 @@ sstate_create_package () {
>         mkdir --mode=0775 -p `dirname ${SSTATE_PKG}`
>         TFILE=`mktemp ${SSTATE_PKG}.XXXXXXXX`
>
> -       # Use pigz if available
> -       OPT="-czS"
> -       if [ -x "$(command -v pigz)" ]; then
> -               OPT="-I pigz -cS"
> +       if [ x"${SSTATE_USE_ZSTD}" != x"0" ]; then
> +               export ZSTD_CLEVEL="${SSTATE_ZSTD_CLEVEL}"
> +               export ZSTD_NBTHREADS="${SSTATE_ZSTD_NTHREADS}"
> +               OPT="-I zstd -cS"

I'm not sure this is going to work in parallel like we want. I can't
see any reference to ZSTD_NBTHREADS in my zstd man page, and by
default zstd is only going to use one core. I think that we probably
want something like:

  OPT="-I 'zstd -T${BB_NUMBER_THREADS}' -cS"

Also, depending on host compatibility, we may need to use pzstd instead

  OPT="-I 'pzstd -p${BB_NUMBER_THREADS}' -cS"

> +       else
> +               # Use pigz if available
> +               OPT="-czS"
> +               if [ -x "$(command -v pigz)" ]; then
> +                       OPT="-I pigz -cS"
> +               fi
>         fi
>
> +  echo "OPTS=${OPTS}"
> +
>         # Need to handle empty directories
>         if [ "$(ls -A)" ]; then
>                 set +e
> @@ -880,7 +902,12 @@ python sstate_report_unihash() {
>  # Will be run from within SSTATE_INSTDIR.
>  #
>  sstate_unpack_package () {
> -       tar -xvzf ${SSTATE_PKG}
> +       if [[ "${SSTATE_PKG}" == *.tar.zst ]]; then
> +               export ZSTD_NBTHREADS="${SSTATE_ZSTD_NTHREADS}"
> +               tar -I zstd -xvf ${SSTATE_PKG}

Make sure to specify the number of threads here also

> +       else
> +               tar -xvzf ${SSTATE_PKG}
> +       fi
>         # update .siginfo atime on local/NFS mirror
>         [ -O ${SSTATE_PKG}.siginfo ] && [ -w ${SSTATE_PKG}.siginfo ] && [ -h ${SSTATE_PKG}.siginfo ] && touch -a ${SSTATE_PKG}.siginfo
>         # Use "! -w ||" to return true for read only files
> @@ -922,7 +949,7 @@ def sstate_checkhashes(sq_data, d, siginfo=False, currentcount=0, summary=True,
>
>          spec, extrapath, tname = getpathcomponents(tid, d)
>
> -        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, gethash(tid), tname, siginfo, d))
> +        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, gethash(tid), tname, siginfo, d.getVar('SSTATE_USE_ZSTD'), d))
>
>          if os.path.exists(sstatefile):
>              bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
> @@ -1016,11 +1043,11 @@ def sstate_checkhashes(sq_data, d, siginfo=False, currentcount=0, summary=True,
>          evdata = {'missed': [], 'found': []};
>          for tid in missed:
>              spec, extrapath, tname = getpathcomponents(tid, d)
> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(tid), tname, False, d))
> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(tid), tname, siginfo, False, d))
>              evdata['missed'].append((bb.runqueue.fn_from_tid(tid), bb.runqueue.taskname_from_tid(tid), gethash(tid), sstatefile ) )
>          for tid in found:
>              spec, extrapath, tname = getpathcomponents(tid, d)
> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(tid), tname, False, d))
> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(tid), tname, siginfo, False, d))
>              evdata['found'].append((bb.runqueue.fn_from_tid(tid), bb.runqueue.taskname_from_tid(tid), gethash(tid), sstatefile ) )
>          bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
>
> diff --git a/scripts/sstate-cache-management.sh b/scripts/sstate-cache-management.sh
> index f1706a2229..61c7f9f763 100755
> --- a/scripts/sstate-cache-management.sh
> +++ b/scripts/sstate-cache-management.sh
> @@ -114,7 +114,7 @@ echo_error () {
>  # * Add .done/.siginfo to the remove list
>  # * Add destination of symlink to the remove list
>  #
> -# $1: output file, others: sstate cache file (.tgz)
> +# $1: output file, others: sstate cache file (.tgz or .tar.zstd)
>  gen_rmlist (){
>    local rmlist_file="$1"
>    shift
> @@ -131,13 +131,13 @@ gen_rmlist (){
>                dest="`readlink -e $i`"
>                if [ -n "$dest" ]; then
>                    echo $dest >> $rmlist_file
> -                  # Remove the .siginfo when .tgz is removed
> +                  # Remove the .siginfo when .tgz or .tar.zst is removed
>                    if [ -f "$dest.siginfo" ]; then
>                        echo $dest.siginfo >> $rmlist_file
>                    fi
>                fi
>            fi
> -          # Add the ".tgz.done" and ".siginfo.done" (may exist in the future)
> +          # Add the ".tgz.done" or ".tar.zst.done" and ".siginfo.done" (may exist in the future)
>            base_fn="${i##/*/}"
>            t_fn="$base_fn.done"
>            s_fn="$base_fn.siginfo.done"
> @@ -188,10 +188,10 @@ remove_duplicated () {
>    total_files=`find $cache_dir -name 'sstate*' | wc -l`
>    # Save all the sstate files in a file
>    sstate_files_list=`mktemp` || exit 1
> -  find $cache_dir -name 'sstate:*:*:*:*:*:*:*.tgz*' >$sstate_files_list
> +  find $cache_dir -name 'sstate:*:*:*:*:*:*:*.tgz*' -o -iname 'sstate:*:*:*:*:*:*:*.tar.zst*' >$sstate_files_list
>
>    echo "Figuring out the suffixes in the sstate cache dir ... "
> -  sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tgz.*%\1%g' $sstate_files_list | sort -u`"
> +  sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.\(tgz\|tar\.\zst\).*%\1%g' $sstate_files_list | sort -u`"
>    echo "Done"
>    echo "The following suffixes have been found in the cache dir:"
>    echo $sstate_suffixes
> @@ -200,10 +200,10 @@ remove_duplicated () {
>    # Using this SSTATE_PKGSPEC definition it's 6th colon separated field
>    # SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
>    for arch in $all_archs; do
> -      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:$arch:[^:]*:[^:]*\.tgz$" $sstate_files_list
> +      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:$arch:[^:]*:[^:]*\.\(tgz\|tar\.\zst\)$" $sstate_files_list
>        [ $? -eq 0 ] && ava_archs="$ava_archs $arch"
>        # ${builder_arch}_$arch used by toolchain sstate
> -      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:${builder_arch}_$arch:[^:]*:[^:]*\.tgz$" $sstate_files_list
> +      grep -q ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:${builder_arch}_$arch:[^:]*:[^:]*\.\(tgz\|tar\.zst\)$" $sstate_files_list
>        [ $? -eq 0 ] && ava_archs="$ava_archs ${builder_arch}_$arch"
>    done
>    echo "Done"
> @@ -219,13 +219,13 @@ remove_duplicated () {
>            continue
>        fi
>        # Total number of files including .siginfo and .done files
> -      total_files_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz.*" $sstate_files_list | wc -l 2>/dev/null`
> -      total_tgz_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz$" $sstate_files_list | wc -l 2>/dev/null`
> +      total_files_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.\(tgz\|tar\.zst\).*" $sstate_files_list | wc -l 2>/dev/null`
> +      total_archive_suffix=`grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.\(tgz\|tar\.zst\)$" $sstate_files_list | wc -l 2>/dev/null`
>        # Save the file list to a file, some suffix's file may not exist
> -      grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.tgz.*" $sstate_files_list >$list_suffix 2>/dev/null
> -      local deleted_tgz=0
> +      grep ".*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:_]*_$suffix\.\(tgz\|tar\.zst\).*" $sstate_files_list >$list_suffix 2>/dev/null
> +      local deleted_archives=0
>        local deleted_files=0
> -      for ext in tgz tgz.siginfo tgz.done; do
> +      for ext in tgz tgz.siginfo tgz.done tar.zst tar.zst.siginfo tar.zst.done; do
>            echo "Figuring out the sstate:xxx_$suffix.$ext ... "
>            # Uniq BPNs
>            file_names=`for arch in $ava_archs ""; do
> @@ -268,19 +268,19 @@ remove_duplicated () {
>                done
>            done
>        done
> -      deleted_tgz=`cat $rm_list.* 2>/dev/null | grep ".tgz$" | wc -l`
> +      deleted_archives=`cat $rm_list.* 2>/dev/null | grep ".\(tgz\|tar\.zst\)$" | wc -l`
>        deleted_files=`cat $rm_list.* 2>/dev/null | wc -l`
>        [ "$deleted_files" -gt 0 -a $debug -gt 0 ] && cat $rm_list.*
> -      echo "($deleted_tgz out of $total_tgz_suffix .tgz files for $suffix suffix will be removed or $deleted_files out of $total_files_suffix when counting also .siginfo and .done files)"
> +      echo "($deleted_archives out of $total_archives_suffix .tgz or .tar.zst files for $suffix suffix will be removed or $deleted_files out of $total_files_suffix when counting also .siginfo and .done files)"
>        let total_deleted=$total_deleted+$deleted_files
>    done
> -  deleted_tgz=0
> +  deleted_archives=0
>    rm_old_list=$remove_listdir/sstate-old-filenames
> -  find $cache_dir -name 'sstate-*.tgz' >$rm_old_list
> -  [ -s "$rm_old_list" ] && deleted_tgz=`cat $rm_old_list | grep ".tgz$" | wc -l`
> +  find $cache_dir -name 'sstate-*.tgz' -o -name 'sstate-*.tar.zst' >$rm_old_list
> +  [ -s "$rm_old_list" ] && deleted_archives=`cat $rm_old_list | grep ".\(tgz\|tar\.zst\)$" | wc -l`
>    [ -s "$rm_old_list" ] && deleted_files=`cat $rm_old_list | wc -l`
>    [ -s "$rm_old_list" -a $debug -gt 0 ] && cat $rm_old_list
> -  echo "($deleted_tgz .tgz files with old sstate-* filenames will be removed or $deleted_files when counting also .siginfo and .done files)"
> +  echo "($deleted_archives .tgz or .tar.zst files with old sstate-* filenames will be removed or $deleted_files when counting also .siginfo and .done files)"
>    let total_deleted=$total_deleted+$deleted_files
>
>    rm -f $list_suffix
> @@ -289,7 +289,7 @@ remove_duplicated () {
>        read_confirm
>        if [ "$confirm" = "y" -o "$confirm" = "Y" ]; then
>            for list in `ls $remove_listdir/`; do
> -              echo "Removing $list.tgz (`cat $remove_listdir/$list | wc -w` files) ... "
> +              echo "Removing $list archive (`cat $remove_listdir/$list | wc -w` files) ... "
>                # Remove them one by one to avoid the argument list too long error
>                for i in `cat $remove_listdir/$list`; do
>                    rm -f $verbose $i
> @@ -322,7 +322,7 @@ rm_by_stamps (){
>    find $cache_dir -type f -name 'sstate*' | sort -u -o $cache_list
>
>    echo "Figuring out the suffixes in the sstate cache dir ... "
> -  local sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.tgz.*%\1%g' $cache_list | sort -u`"
> +  local sstate_suffixes="`sed 's%.*/sstate:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^:]*:[^_]*_\([^:]*\)\.\(tgz\|tar\.zst\).*%\1%g' $cache_list | sort -u`"
>    echo "Done"
>    echo "The following suffixes have been found in the cache dir:"
>    echo $sstate_suffixes
> --
> 2.30.2
>
>
> 
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [poky] [PATCH] sstate: Add ZStandard compressor support
  2021-10-01 13:44     ` Joshua Watt
@ 2021-10-01 14:19       ` Henry Kleynhans
  0 siblings, 0 replies; 6+ messages in thread
From: Henry Kleynhans @ 2021-10-01 14:19 UTC (permalink / raw)
  To: Joshua Watt, Richard Purdie
  Cc: Peter Kjellerstedt, poky, Michael van der Westhuizen

[-- Attachment #1: Type: text/plain, Size: 3142 bytes --]

Thank you all for the feedback!  I like the idea of switching completely to zstd.  I will update the patch accordingly and resubmit.

Kind regards, Henry
________________________________
From: Joshua Watt <jpewhacker@gmail.com>
Sent: 01 October 2021 14:44
To: Richard Purdie <richard.purdie@linuxfoundation.org>
Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Henry Kleynhans <hkleynhans@fb.com>; poky@lists.yoctoproject.org <poky@lists.yoctoproject.org>; Michael van der Westhuizen <rmikey@fb.com>
Subject: Re: [poky] [PATCH] sstate: Add ZStandard compressor support

On Fri, Oct 1, 2021 at 8:16 AM Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
>
> On Fri, 2021-10-01 at 13:13 +0000, Peter Kjellerstedt wrote:
> > > -----Original Message-----
> > > From: poky@lists.yoctoproject.org <poky@lists.yoctoproject.org> On Behalf
> > > Of Henry Kleynhans via lists.yoctoproject.org
> > > Sent: den 1 oktober 2021 12:11
> > > To: poky@lists.yoctoproject.org
> > > Cc: hkleynhans@fb.com; rmikey@fb.com
> > > Subject: [poky] [PATCH] sstate: Add ZStandard compressor support
> > >
> > > This patch adds support to optionally use the Zstandard compressor for
> > > ssate cache files.
> > >
> > > Zstandard compression provides a significant improvement in
> > > decompression speed as well as improvement in compression speed and disk
> > > usage over the 'tgz' format in use.  Furthermore, its configurable
> > > compression level offers a trade-off between time spent compressing
> > > sstate cache files and disk space used by those files.  The reduced disk
> > > usage also contributes to saving network traffic for those sharing their
> > > sstate cache with others.
> > >
> > > Zstandard should therefore be a good choice when:
> > > * disk space is at a premium
> > > * network speed / resources are limited
> > > * the CI server can sstate packages can be created at high compression
> > > * less CPU on the build server should be used for sstate decompression
> > >
> > > Signed-off-by: Henry Kleynhans <hkleynhans@fb.com>
> > > ---
> > >  meta/classes/sstate.bbclass        | 49 +++++++++++++++++++++++-------
> > >  scripts/sstate-cache-management.sh | 40 ++++++++++++------------
> > >  2 files changed, 58 insertions(+), 31 deletions(-)
> > >
> > > diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> > > index 92a73114bb..a73d631679 100644
> > > --- a/meta/classes/sstate.bbclass
> > > +++ b/meta/classes/sstate.bbclass
> > > @@ -1,17 +1,30 @@
> > >  SSTATE_VERSION = "3"
> > >
> > > +SSTATE_USE_ZSTD ?= "0"
> > > +SSTATE_ZSTD_CLEVEL ?= "3"
> > > +SSTATE_ZSTD_NTHREADS ?= "0"
> >
> > Do we really need to make this configurable? Can't we just decide
> > to use zstd and be done with it?
>
> I have to admit I was also wondering that. We started requiring zstd on the host
> system so it may just be better to switch over?

Since zstd is required and a clear winner in compression/decompression
time, I think requiring it and simplifying the sstate code to only use
zstd would be fine.

>
> Cheers,
>
> Richard
>
>
>
>
> 
>

[-- Attachment #2: Type: text/html, Size: 4720 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-01 14:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-01 10:11 [PATCH] sstate: Add ZStandard compressor support hkleynhans
2021-10-01 13:13 ` [poky] " Peter Kjellerstedt
2021-10-01 13:16   ` Richard Purdie
2021-10-01 13:44     ` Joshua Watt
2021-10-01 14:19       ` Henry Kleynhans
2021-10-01 13:52 ` Joshua Watt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.