All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] sstate: Switch from tgz to tar.xz for sstate
@ 2016-01-09 16:42 Richard Purdie
  2016-01-11 19:05 ` Andre McCurdy
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Purdie @ 2016-01-09 16:42 UTC (permalink / raw)
  To: openembedded-core

xz compresses with a better compression ratio than gz with similar speed
for compression and decompression. It therefore makes sense to switch
to it for the sstate objects.

As an example, the gcc-cross populate_sysroot object goes from
79,509,871 to 53,031,752 bytes which is a significant improvement.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>

diff --git a/meta/classes/buildhistory.bbclass b/meta/classes/buildhistory.bbclass
index 4153e58..734303c 100644
--- a/meta/classes/buildhistory.bbclass
+++ b/meta/classes/buildhistory.bbclass
@@ -537,7 +537,7 @@ python buildhistory_get_extra_sdkinfo() {
         filesizes = {}
         for root, _, files in os.walk('${SDK_OUTPUT}/${SDKPATH}/sstate-cache'):
             for fn in files:
-                if fn.endswith('.tgz'):
+                if fn.endswith('.tar.xz'):
                     fsize = int(math.ceil(float(os.path.getsize(os.path.join(root, fn))) / 1024))
                     task = fn.rsplit(':', 1)[1].split('_', 1)[1].split('.')[0]
                     origtotal = tasksizes.get(task, 0)
diff --git a/meta/classes/populate_sdk_ext.bbclass b/meta/classes/populate_sdk_ext.bbclass
index 3a65c07..4ff5e9e 100644
--- a/meta/classes/populate_sdk_ext.bbclass
+++ b/meta/classes/populate_sdk_ext.bbclass
@@ -189,7 +189,7 @@ python copy_buildsystem () {
     # We don't need sstate do_package files
     for root, dirs, files in os.walk(sstate_out):
         for name in files:
-            if name.endswith("_package.tgz"):
+            if name.endswith("_package.tar.xz"):
                 f = os.path.join(root, name)
                 os.remove(f)
 }
diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 9bef212..d9adf01 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -294,8 +294,8 @@ def sstate_installpkg(ss, d):
         oe.path.remove(dir)
 
     sstateinst = d.expand("${WORKDIR}/sstate-install-%s/" % ss['task'])
-    sstatefetch = d.getVar('SSTATE_PKGNAME', True) + '_' + ss['task'] + ".tgz"
-    sstatepkg = d.getVar('SSTATE_PKG', True) + '_' + ss['task'] + ".tgz"
+    sstatefetch = d.getVar('SSTATE_PKGNAME', True) + '_' + ss['task'] + ".tar.xz"
+    sstatepkg = d.getVar('SSTATE_PKG', True) + '_' + ss['task'] + ".tar.xz"
 
     if not os.path.exists(sstatepkg):
         pstaging_fetch(sstatefetch, sstatepkg, d)
@@ -372,7 +372,7 @@ python sstate_hardcode_path_unpack () {
 def sstate_clean_cachefile(ss, d):
     import oe.path
 
-    sstatepkgfile = d.getVar('SSTATE_PATHSPEC', True) + "*_" + ss['task'] + ".tgz*"
+    sstatepkgfile = d.getVar('SSTATE_PATHSPEC', True) + "*_" + ss['task'] + ".tar.xz*"
     bb.note("Removing %s" % sstatepkgfile)
     oe.path.remove(sstatepkgfile)
 
@@ -555,7 +555,7 @@ def sstate_package(ss, d):
     tmpdir = d.getVar('TMPDIR', True)
 
     sstatebuild = d.expand("${WORKDIR}/sstate-build-%s/" % ss['task'])
-    sstatepkg = d.getVar('SSTATE_PKG', True) + '_'+ ss['task'] + ".tgz"
+    sstatepkg = d.getVar('SSTATE_PKG', True) + '_'+ ss['task'] + ".tar.xz"
     bb.utils.remove(sstatebuild, recurse=True)
     bb.utils.mkdirhier(sstatebuild)
     bb.utils.mkdirhier(os.path.dirname(sstatepkg))
@@ -677,14 +677,14 @@ sstate_create_package () {
 	# Need to handle empty directories
 	if [ "$(ls -A)" ]; then
 		set +e
-		tar -czf $TFILE *
+		tar -cJf $TFILE *
 		ret=$?
 		if [ $ret -ne 0 ] && [ $ret -ne 1 ]; then
 			exit 1
 		fi
 		set -e
 	else
-		tar -cz --file=$TFILE --files-from=/dev/null
+		tar -cJ --file=$TFILE --files-from=/dev/null
 	fi
 	chmod 0664 $TFILE
 	mv -f $TFILE ${SSTATE_PKG}
@@ -703,7 +703,7 @@ sstate_create_package () {
 # Will be run from within SSTATE_INSTDIR.
 #
 sstate_unpack_package () {
-	tar -xmvzf ${SSTATE_PKG}
+	tar -xmvJf ${SSTATE_PKG}
 	# Use "! -w ||" to return true for read only files
 	[ ! -w ${SSTATE_PKG} ] || touch --no-dereference ${SSTATE_PKG}
 	[ ! -w ${SSTATE_PKG}.sig ] || [ ! -e ${SSTATE_PKG}.sig ] || touch --no-dereference ${SSTATE_PKG}.sig
@@ -716,7 +716,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
 
     ret = []
     missed = []
-    extension = ".tgz"
+    extension = ".tar.xz"
     if siginfo:
         extension = extension + ".siginfo"
 
@@ -821,11 +821,11 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tar.xz")
             evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
+            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tar.xz")
             evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
 
@@ -914,7 +914,7 @@ python sstate_eventhandler() {
     d = e.data
     # When we write an sstate package we rewrite the SSTATE_PKG
     spkg = d.getVar('SSTATE_PKG', True)
-    if not spkg.endswith(".tgz"):
+    if not spkg.endswith(".tar.xz"):
         taskname = d.getVar("BB_RUNTASK", True)[3:]
         spec = d.getVar('SSTATE_PKGSPEC', True)
         swspec = d.getVar('SSTATE_SWSPEC', True)
@@ -922,7 +922,7 @@ python sstate_eventhandler() {
             d.setVar("SSTATE_PKGSPEC", "${SSTATE_SWSPEC}")
             d.setVar("SSTATE_EXTRAPATH", "")
         sstatepkg = d.getVar('SSTATE_PKG', True)
-        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tgz" ".siginfo", d)
+        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tar.xz" ".siginfo", d)
 }
 
 SSTATE_PRUNE_OBSOLETEWORKDIR = "1"
diff --git a/meta/lib/oeqa/selftest/signing.py b/meta/lib/oeqa/selftest/signing.py
index c33662b..4d545ad 100644
--- a/meta/lib/oeqa/selftest/signing.py
+++ b/meta/lib/oeqa/selftest/signing.py
@@ -111,13 +111,13 @@ class Signing(oeSelfTest):
         bitbake('-c cleansstate %s' % test_recipe)
         bitbake(test_recipe)
 
-        recipe_sig = glob.glob(sstatedir + '/*/*:ed:*_package.tgz.sig')
-        recipe_tgz = glob.glob(sstatedir + '/*/*:ed:*_package.tgz')
+        recipe_sig = glob.glob(sstatedir + '/*/*:ed:*_package.tar.xz.sig')
+        recipe_txz = glob.glob(sstatedir + '/*/*:ed:*_package.tar.xz')
 
         self.assertEqual(len(recipe_sig), 1, 'Failed to find .sig file.')
-        self.assertEqual(len(recipe_tgz), 1, 'Failed to find .tgz file.')
+        self.assertEqual(len(recipe_txz), 1, 'Failed to find .tar.xz file.')
 
-        ret = runCmd('gpg --homedir %s --verify %s %s' % (self.gpg_dir, recipe_sig[0], recipe_tgz[0]))
+        ret = runCmd('gpg --homedir %s --verify %s %s' % (self.gpg_dir, recipe_sig[0], recipe_txz[0]))
         # gpg: Signature made Thu 22 Oct 2015 01:45:09 PM EEST using RSA key ID 61EEFB30
         # gpg: Good signature from "testuser (nocomment) <testuser@email.com>"
         self.assertIn('gpg: Good signature from', ret.output, 'Package signed incorrectly.')
diff --git a/meta/lib/oeqa/selftest/sstatetests.py b/meta/lib/oeqa/selftest/sstatetests.py
index 512cb4f..73e5132 100644
--- a/meta/lib/oeqa/selftest/sstatetests.py
+++ b/meta/lib/oeqa/selftest/sstatetests.py
@@ -55,15 +55,15 @@ class SStateTests(SStateBase):
         bitbake(['-ccleansstate'] + targets)
 
         bitbake(targets)
-        tgz_created = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific, distro_nonspecific)
-        self.assertTrue(tgz_created, msg="Could not find sstate .tgz files for: %s" % ', '.join(map(str, targets)))
+        txz_created = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific, distro_nonspecific)
+        self.assertTrue(txz_created, msg="Could not find sstate .tar.xz files for: %s" % ', '.join(map(str, targets)))
 
         siginfo_created = self.search_sstate('|'.join(map(str, [s + '.*?\.siginfo$' for s in targets])), distro_specific, distro_nonspecific)
         self.assertTrue(siginfo_created, msg="Could not find sstate .siginfo files for: %s" % ', '.join(map(str, targets)))
 
         bitbake(['-ccleansstate'] + targets)
-        tgz_removed = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific, distro_nonspecific)
-        self.assertTrue(not tgz_removed, msg="do_cleansstate didn't remove .tgz sstate files for: %s" % ', '.join(map(str, targets)))
+        txz_removed = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific, distro_nonspecific)
+        self.assertTrue(not txz_removed, msg="do_cleansstate didn't remove .tar.xz sstate files for: %s" % ', '.join(map(str, targets)))
 
     @testcase(977)
     def test_cleansstate_task_distro_specific_nonspecific(self):
@@ -87,8 +87,8 @@ class SStateTests(SStateBase):
         bitbake(['-ccleansstate'] + targets)
 
         bitbake(targets)
-        self.assertTrue(self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=False, distro_nonspecific=True) == [], msg="Found distro non-specific sstate for: %s" % ', '.join(map(str, targets)))
-        file_tracker_1 = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
+        self.assertTrue(self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=False, distro_nonspecific=True) == [], msg="Found distro non-specific sstate for: %s" % ', '.join(map(str, targets)))
+        file_tracker_1 = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
         self.assertTrue(len(file_tracker_1) >= len(targets), msg = "Not all sstate files ware created for: %s" % ', '.join(map(str, targets)))
 
         self.track_for_cleanup(self.distro_specific_sstate + "_old")
@@ -97,7 +97,7 @@ class SStateTests(SStateBase):
 
         bitbake(['-cclean'] + targets)
         bitbake(targets)
-        file_tracker_2 = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
+        file_tracker_2 = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
         self.assertTrue(len(file_tracker_2) >= len(targets), msg = "Not all sstate files ware created for: %s" % ', '.join(map(str, targets)))
 
         not_recreated = [x for x in file_tracker_1 if x not in file_tracker_2]
@@ -146,18 +146,18 @@ class SStateTests(SStateBase):
             if not sstate_arch in sstate_archs_list:
                 sstate_archs_list.append(sstate_arch)
             if target_config[idx] == target_config[-1]:
-                target_sstate_before_build = self.search_sstate(target + '.*?\.tgz$')
+                target_sstate_before_build = self.search_sstate(target + '.*?\.tar.xz$')
             bitbake("-cclean %s" % target)
             result = bitbake(target, ignore_status=True)
             if target_config[idx] == target_config[-1]:
-                target_sstate_after_build = self.search_sstate(target + '.*?\.tgz$')
+                target_sstate_after_build = self.search_sstate(target + '.*?\.tar.xz$')
                 expected_remaining_sstate += [x for x in target_sstate_after_build if x not in target_sstate_before_build if not any(pattern in x for pattern in ignore_patterns)]
             self.remove_config(global_config[idx])
             self.remove_recipeinc(target, target_config[idx])
             self.assertEqual(result.status, 0, msg = "build of %s failed with %s" % (target, result.output))
 
         runCmd("sstate-cache-management.sh -y --cache-dir=%s --remove-duplicated --extra-archs=%s" % (self.sstate_path, ','.join(map(str, sstate_archs_list))))
-        actual_remaining_sstate = [x for x in self.search_sstate(target + '.*?\.tgz$') if not any(pattern in x for pattern in ignore_patterns)]
+        actual_remaining_sstate = [x for x in self.search_sstate(target + '.*?\.tar.xz$') if not any(pattern in x for pattern in ignore_patterns)]
 
         actual_not_expected = [x for x in actual_remaining_sstate if x not in expected_remaining_sstate]
         self.assertFalse(actual_not_expected, msg="Files should have been removed but ware not: %s" % ', '.join(map(str, actual_not_expected)))




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] sstate: Switch from tgz to tar.xz for sstate
  2016-01-09 16:42 [PATCH RFC] sstate: Switch from tgz to tar.xz for sstate Richard Purdie
@ 2016-01-11 19:05 ` Andre McCurdy
  2016-01-11 19:52   ` Khem Raj
  0 siblings, 1 reply; 7+ messages in thread
From: Andre McCurdy @ 2016-01-11 19:05 UTC (permalink / raw)
  To: Richard Purdie; +Cc: openembedded-core

On Sat, Jan 9, 2016 at 8:42 AM, Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
> xz compresses with a better compression ratio than gz with similar speed
> for compression and decompression.

When you measured compression speed to be similar, was that with
parallel compression? If so, with how many CPU cores?

A quick test of plain single threaded "tar -cz" -vs- "tar -cJ" on my
laptop seems to indicate that xz is _significantly_ slower:

$ time tar -czf /tmp/jjj.tgz
tmp/work/cortexa15hf-neon-rdk-linux-gnueabi/glibc/2.22-r0/git

 real    0m4.708s
 user    0m4.682s
 sys    0m0.477s

$ time tar -cJf /tmp/jjj.tar.xz
tmp/work/cortexa15hf-neon-rdk-linux-gnueabi/glibc/2.22-r0/git

 real    0m56.491s
 user    0m56.489s
 sys    0m0.744s


> It therefore makes sense to switch
> to it for the sstate objects.
>
> As an example, the gcc-cross populate_sysroot object goes from
> 79,509,871 to 53,031,752 bytes which is a significant improvement.
>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
>
> diff --git a/meta/classes/buildhistory.bbclass b/meta/classes/buildhistory.bbclass
> index 4153e58..734303c 100644
> --- a/meta/classes/buildhistory.bbclass
> +++ b/meta/classes/buildhistory.bbclass
> @@ -537,7 +537,7 @@ python buildhistory_get_extra_sdkinfo() {
>          filesizes = {}
>          for root, _, files in os.walk('${SDK_OUTPUT}/${SDKPATH}/sstate-cache'):
>              for fn in files:
> -                if fn.endswith('.tgz'):
> +                if fn.endswith('.tar.xz'):
>                      fsize = int(math.ceil(float(os.path.getsize(os.path.join(root, fn))) / 1024))
>                      task = fn.rsplit(':', 1)[1].split('_', 1)[1].split('.')[0]
>                      origtotal = tasksizes.get(task, 0)
> diff --git a/meta/classes/populate_sdk_ext.bbclass b/meta/classes/populate_sdk_ext.bbclass
> index 3a65c07..4ff5e9e 100644
> --- a/meta/classes/populate_sdk_ext.bbclass
> +++ b/meta/classes/populate_sdk_ext.bbclass
> @@ -189,7 +189,7 @@ python copy_buildsystem () {
>      # We don't need sstate do_package files
>      for root, dirs, files in os.walk(sstate_out):
>          for name in files:
> -            if name.endswith("_package.tgz"):
> +            if name.endswith("_package.tar.xz"):
>                  f = os.path.join(root, name)
>                  os.remove(f)
>  }
> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> index 9bef212..d9adf01 100644
> --- a/meta/classes/sstate.bbclass
> +++ b/meta/classes/sstate.bbclass
> @@ -294,8 +294,8 @@ def sstate_installpkg(ss, d):
>          oe.path.remove(dir)
>
>      sstateinst = d.expand("${WORKDIR}/sstate-install-%s/" % ss['task'])
> -    sstatefetch = d.getVar('SSTATE_PKGNAME', True) + '_' + ss['task'] + ".tgz"
> -    sstatepkg = d.getVar('SSTATE_PKG', True) + '_' + ss['task'] + ".tgz"
> +    sstatefetch = d.getVar('SSTATE_PKGNAME', True) + '_' + ss['task'] + ".tar.xz"
> +    sstatepkg = d.getVar('SSTATE_PKG', True) + '_' + ss['task'] + ".tar.xz"
>
>      if not os.path.exists(sstatepkg):
>          pstaging_fetch(sstatefetch, sstatepkg, d)
> @@ -372,7 +372,7 @@ python sstate_hardcode_path_unpack () {
>  def sstate_clean_cachefile(ss, d):
>      import oe.path
>
> -    sstatepkgfile = d.getVar('SSTATE_PATHSPEC', True) + "*_" + ss['task'] + ".tgz*"
> +    sstatepkgfile = d.getVar('SSTATE_PATHSPEC', True) + "*_" + ss['task'] + ".tar.xz*"
>      bb.note("Removing %s" % sstatepkgfile)
>      oe.path.remove(sstatepkgfile)
>
> @@ -555,7 +555,7 @@ def sstate_package(ss, d):
>      tmpdir = d.getVar('TMPDIR', True)
>
>      sstatebuild = d.expand("${WORKDIR}/sstate-build-%s/" % ss['task'])
> -    sstatepkg = d.getVar('SSTATE_PKG', True) + '_'+ ss['task'] + ".tgz"
> +    sstatepkg = d.getVar('SSTATE_PKG', True) + '_'+ ss['task'] + ".tar.xz"
>      bb.utils.remove(sstatebuild, recurse=True)
>      bb.utils.mkdirhier(sstatebuild)
>      bb.utils.mkdirhier(os.path.dirname(sstatepkg))
> @@ -677,14 +677,14 @@ sstate_create_package () {
>         # Need to handle empty directories
>         if [ "$(ls -A)" ]; then
>                 set +e
> -               tar -czf $TFILE *
> +               tar -cJf $TFILE *
>                 ret=$?
>                 if [ $ret -ne 0 ] && [ $ret -ne 1 ]; then
>                         exit 1
>                 fi
>                 set -e
>         else
> -               tar -cz --file=$TFILE --files-from=/dev/null
> +               tar -cJ --file=$TFILE --files-from=/dev/null
>         fi
>         chmod 0664 $TFILE
>         mv -f $TFILE ${SSTATE_PKG}
> @@ -703,7 +703,7 @@ sstate_create_package () {
>  # Will be run from within SSTATE_INSTDIR.
>  #
>  sstate_unpack_package () {
> -       tar -xmvzf ${SSTATE_PKG}
> +       tar -xmvJf ${SSTATE_PKG}
>         # Use "! -w ||" to return true for read only files
>         [ ! -w ${SSTATE_PKG} ] || touch --no-dereference ${SSTATE_PKG}
>         [ ! -w ${SSTATE_PKG}.sig ] || [ ! -e ${SSTATE_PKG}.sig ] || touch --no-dereference ${SSTATE_PKG}.sig
> @@ -716,7 +716,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
>
>      ret = []
>      missed = []
> -    extension = ".tgz"
> +    extension = ".tar.xz"
>      if siginfo:
>          extension = extension + ".siginfo"
>
> @@ -821,11 +821,11 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
>          evdata = {'missed': [], 'found': []};
>          for task in missed:
>              spec, extrapath, tname = getpathcomponents(task, d)
> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tar.xz")
>              evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
>          for task in ret:
>              spec, extrapath, tname = getpathcomponents(task, d)
> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tar.xz")
>              evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
>          bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
>
> @@ -914,7 +914,7 @@ python sstate_eventhandler() {
>      d = e.data
>      # When we write an sstate package we rewrite the SSTATE_PKG
>      spkg = d.getVar('SSTATE_PKG', True)
> -    if not spkg.endswith(".tgz"):
> +    if not spkg.endswith(".tar.xz"):
>          taskname = d.getVar("BB_RUNTASK", True)[3:]
>          spec = d.getVar('SSTATE_PKGSPEC', True)
>          swspec = d.getVar('SSTATE_SWSPEC', True)
> @@ -922,7 +922,7 @@ python sstate_eventhandler() {
>              d.setVar("SSTATE_PKGSPEC", "${SSTATE_SWSPEC}")
>              d.setVar("SSTATE_EXTRAPATH", "")
>          sstatepkg = d.getVar('SSTATE_PKG', True)
> -        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tgz" ".siginfo", d)
> +        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tar.xz" ".siginfo", d)
>  }
>
>  SSTATE_PRUNE_OBSOLETEWORKDIR = "1"
> diff --git a/meta/lib/oeqa/selftest/signing.py b/meta/lib/oeqa/selftest/signing.py
> index c33662b..4d545ad 100644
> --- a/meta/lib/oeqa/selftest/signing.py
> +++ b/meta/lib/oeqa/selftest/signing.py
> @@ -111,13 +111,13 @@ class Signing(oeSelfTest):
>          bitbake('-c cleansstate %s' % test_recipe)
>          bitbake(test_recipe)
>
> -        recipe_sig = glob.glob(sstatedir + '/*/*:ed:*_package.tgz.sig')
> -        recipe_tgz = glob.glob(sstatedir + '/*/*:ed:*_package.tgz')
> +        recipe_sig = glob.glob(sstatedir + '/*/*:ed:*_package.tar.xz.sig')
> +        recipe_txz = glob.glob(sstatedir + '/*/*:ed:*_package.tar.xz')
>
>          self.assertEqual(len(recipe_sig), 1, 'Failed to find .sig file.')
> -        self.assertEqual(len(recipe_tgz), 1, 'Failed to find .tgz file.')
> +        self.assertEqual(len(recipe_txz), 1, 'Failed to find .tar.xz file.')
>
> -        ret = runCmd('gpg --homedir %s --verify %s %s' % (self.gpg_dir, recipe_sig[0], recipe_tgz[0]))
> +        ret = runCmd('gpg --homedir %s --verify %s %s' % (self.gpg_dir, recipe_sig[0], recipe_txz[0]))
>          # gpg: Signature made Thu 22 Oct 2015 01:45:09 PM EEST using RSA key ID 61EEFB30
>          # gpg: Good signature from "testuser (nocomment) <testuser@email.com>"
>          self.assertIn('gpg: Good signature from', ret.output, 'Package signed incorrectly.')
> diff --git a/meta/lib/oeqa/selftest/sstatetests.py b/meta/lib/oeqa/selftest/sstatetests.py
> index 512cb4f..73e5132 100644
> --- a/meta/lib/oeqa/selftest/sstatetests.py
> +++ b/meta/lib/oeqa/selftest/sstatetests.py
> @@ -55,15 +55,15 @@ class SStateTests(SStateBase):
>          bitbake(['-ccleansstate'] + targets)
>
>          bitbake(targets)
> -        tgz_created = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific, distro_nonspecific)
> -        self.assertTrue(tgz_created, msg="Could not find sstate .tgz files for: %s" % ', '.join(map(str, targets)))
> +        txz_created = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific, distro_nonspecific)
> +        self.assertTrue(txz_created, msg="Could not find sstate .tar.xz files for: %s" % ', '.join(map(str, targets)))
>
>          siginfo_created = self.search_sstate('|'.join(map(str, [s + '.*?\.siginfo$' for s in targets])), distro_specific, distro_nonspecific)
>          self.assertTrue(siginfo_created, msg="Could not find sstate .siginfo files for: %s" % ', '.join(map(str, targets)))
>
>          bitbake(['-ccleansstate'] + targets)
> -        tgz_removed = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific, distro_nonspecific)
> -        self.assertTrue(not tgz_removed, msg="do_cleansstate didn't remove .tgz sstate files for: %s" % ', '.join(map(str, targets)))
> +        txz_removed = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific, distro_nonspecific)
> +        self.assertTrue(not txz_removed, msg="do_cleansstate didn't remove .tar.xz sstate files for: %s" % ', '.join(map(str, targets)))
>
>      @testcase(977)
>      def test_cleansstate_task_distro_specific_nonspecific(self):
> @@ -87,8 +87,8 @@ class SStateTests(SStateBase):
>          bitbake(['-ccleansstate'] + targets)
>
>          bitbake(targets)
> -        self.assertTrue(self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=False, distro_nonspecific=True) == [], msg="Found distro non-specific sstate for: %s" % ', '.join(map(str, targets)))
> -        file_tracker_1 = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
> +        self.assertTrue(self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=False, distro_nonspecific=True) == [], msg="Found distro non-specific sstate for: %s" % ', '.join(map(str, targets)))
> +        file_tracker_1 = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>          self.assertTrue(len(file_tracker_1) >= len(targets), msg = "Not all sstate files ware created for: %s" % ', '.join(map(str, targets)))
>
>          self.track_for_cleanup(self.distro_specific_sstate + "_old")
> @@ -97,7 +97,7 @@ class SStateTests(SStateBase):
>
>          bitbake(['-cclean'] + targets)
>          bitbake(targets)
> -        file_tracker_2 = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
> +        file_tracker_2 = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>          self.assertTrue(len(file_tracker_2) >= len(targets), msg = "Not all sstate files ware created for: %s" % ', '.join(map(str, targets)))
>
>          not_recreated = [x for x in file_tracker_1 if x not in file_tracker_2]
> @@ -146,18 +146,18 @@ class SStateTests(SStateBase):
>              if not sstate_arch in sstate_archs_list:
>                  sstate_archs_list.append(sstate_arch)
>              if target_config[idx] == target_config[-1]:
> -                target_sstate_before_build = self.search_sstate(target + '.*?\.tgz$')
> +                target_sstate_before_build = self.search_sstate(target + '.*?\.tar.xz$')
>              bitbake("-cclean %s" % target)
>              result = bitbake(target, ignore_status=True)
>              if target_config[idx] == target_config[-1]:
> -                target_sstate_after_build = self.search_sstate(target + '.*?\.tgz$')
> +                target_sstate_after_build = self.search_sstate(target + '.*?\.tar.xz$')
>                  expected_remaining_sstate += [x for x in target_sstate_after_build if x not in target_sstate_before_build if not any(pattern in x for pattern in ignore_patterns)]
>              self.remove_config(global_config[idx])
>              self.remove_recipeinc(target, target_config[idx])
>              self.assertEqual(result.status, 0, msg = "build of %s failed with %s" % (target, result.output))
>
>          runCmd("sstate-cache-management.sh -y --cache-dir=%s --remove-duplicated --extra-archs=%s" % (self.sstate_path, ','.join(map(str, sstate_archs_list))))
> -        actual_remaining_sstate = [x for x in self.search_sstate(target + '.*?\.tgz$') if not any(pattern in x for pattern in ignore_patterns)]
> +        actual_remaining_sstate = [x for x in self.search_sstate(target + '.*?\.tar.xz$') if not any(pattern in x for pattern in ignore_patterns)]
>
>          actual_not_expected = [x for x in actual_remaining_sstate if x not in expected_remaining_sstate]
>          self.assertFalse(actual_not_expected, msg="Files should have been removed but ware not: %s" % ', '.join(map(str, actual_not_expected)))
>
>
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] sstate: Switch from tgz to tar.xz for sstate
  2016-01-11 19:05 ` Andre McCurdy
@ 2016-01-11 19:52   ` Khem Raj
  2016-01-11 20:00     ` Andre McCurdy
  0 siblings, 1 reply; 7+ messages in thread
From: Khem Raj @ 2016-01-11 19:52 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: openembedded-core

[-- Attachment #1: Type: text/plain, Size: 15495 bytes --]


> On Jan 11, 2016, at 11:05 AM, Andre McCurdy <armccurdy@gmail.com> wrote:
> 
> On Sat, Jan 9, 2016 at 8:42 AM, Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
>> xz compresses with a better compression ratio than gz with similar speed
>> for compression and decompression.
> 
> When you measured compression speed to be similar, was that with
> parallel compression? If so, with how many CPU cores?
> 
> A quick test of plain single threaded "tar -cz" -vs- "tar -cJ" on my
> laptop seems to indicate that xz is _significantly_ slower:
> 
> $ time tar -czf /tmp/jjj.tgz
> tmp/work/cortexa15hf-neon-rdk-linux-gnueabi/glibc/2.22-r0/git
> 
> real    0m4.708s
> user    0m4.682s
> sys    0m0.477s
> 
> $ time tar -cJf /tmp/jjj.tar.xz
> tmp/work/cortexa15hf-neon-rdk-linux-gnueabi/glibc/2.22-r0/git
> 
> real    0m56.491s
> user    0m56.489s
> sys    0m0.744s


on 8-core machine with pixz it is recovered a bit but still is slow tried a small load


tar -cJf /tmp/xx.tar.xz   21.14s user 0.36s system 102% cpu 21.061 total

tar -czf /tmp/xx.tar.gz   2.35s user 0.19s system 109% cpu 2.320 total

tar -Ipixz -cf /tmp/xx.tar.xz   27.14s user 0.88s system 490% cpu 5.708 total

When changing the compression level to -3 ( it gets a bit faster )

pixz -3 /tmp/xx.tar /tmp/xx.tar.xz  17.58s user 0.18s system 606% cpu 2.927 total

> 
> 
>> It therefore makes sense to switch
>> to it for the sstate objects.
>> 
>> As an example, the gcc-cross populate_sysroot object goes from
>> 79,509,871 to 53,031,752 bytes which is a significant improvement.
>> 
>> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
>> 
>> diff --git a/meta/classes/buildhistory.bbclass b/meta/classes/buildhistory.bbclass
>> index 4153e58..734303c 100644
>> --- a/meta/classes/buildhistory.bbclass
>> +++ b/meta/classes/buildhistory.bbclass
>> @@ -537,7 +537,7 @@ python buildhistory_get_extra_sdkinfo() {
>>         filesizes = {}
>>         for root, _, files in os.walk('${SDK_OUTPUT}/${SDKPATH}/sstate-cache'):
>>             for fn in files:
>> -                if fn.endswith('.tgz'):
>> +                if fn.endswith('.tar.xz'):
>>                     fsize = int(math.ceil(float(os.path.getsize(os.path.join(root, fn))) / 1024))
>>                     task = fn.rsplit(':', 1)[1].split('_', 1)[1].split('.')[0]
>>                     origtotal = tasksizes.get(task, 0)
>> diff --git a/meta/classes/populate_sdk_ext.bbclass b/meta/classes/populate_sdk_ext.bbclass
>> index 3a65c07..4ff5e9e 100644
>> --- a/meta/classes/populate_sdk_ext.bbclass
>> +++ b/meta/classes/populate_sdk_ext.bbclass
>> @@ -189,7 +189,7 @@ python copy_buildsystem () {
>>     # We don't need sstate do_package files
>>     for root, dirs, files in os.walk(sstate_out):
>>         for name in files:
>> -            if name.endswith("_package.tgz"):
>> +            if name.endswith("_package.tar.xz"):
>>                 f = os.path.join(root, name)
>>                 os.remove(f)
>> }
>> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
>> index 9bef212..d9adf01 100644
>> --- a/meta/classes/sstate.bbclass
>> +++ b/meta/classes/sstate.bbclass
>> @@ -294,8 +294,8 @@ def sstate_installpkg(ss, d):
>>         oe.path.remove(dir)
>> 
>>     sstateinst = d.expand("${WORKDIR}/sstate-install-%s/" % ss['task'])
>> -    sstatefetch = d.getVar('SSTATE_PKGNAME', True) + '_' + ss['task'] + ".tgz"
>> -    sstatepkg = d.getVar('SSTATE_PKG', True) + '_' + ss['task'] + ".tgz"
>> +    sstatefetch = d.getVar('SSTATE_PKGNAME', True) + '_' + ss['task'] + ".tar.xz"
>> +    sstatepkg = d.getVar('SSTATE_PKG', True) + '_' + ss['task'] + ".tar.xz"
>> 
>>     if not os.path.exists(sstatepkg):
>>         pstaging_fetch(sstatefetch, sstatepkg, d)
>> @@ -372,7 +372,7 @@ python sstate_hardcode_path_unpack () {
>> def sstate_clean_cachefile(ss, d):
>>     import oe.path
>> 
>> -    sstatepkgfile = d.getVar('SSTATE_PATHSPEC', True) + "*_" + ss['task'] + ".tgz*"
>> +    sstatepkgfile = d.getVar('SSTATE_PATHSPEC', True) + "*_" + ss['task'] + ".tar.xz*"
>>     bb.note("Removing %s" % sstatepkgfile)
>>     oe.path.remove(sstatepkgfile)
>> 
>> @@ -555,7 +555,7 @@ def sstate_package(ss, d):
>>     tmpdir = d.getVar('TMPDIR', True)
>> 
>>     sstatebuild = d.expand("${WORKDIR}/sstate-build-%s/" % ss['task'])
>> -    sstatepkg = d.getVar('SSTATE_PKG', True) + '_'+ ss['task'] + ".tgz"
>> +    sstatepkg = d.getVar('SSTATE_PKG', True) + '_'+ ss['task'] + ".tar.xz"
>>     bb.utils.remove(sstatebuild, recurse=True)
>>     bb.utils.mkdirhier(sstatebuild)
>>     bb.utils.mkdirhier(os.path.dirname(sstatepkg))
>> @@ -677,14 +677,14 @@ sstate_create_package () {
>>        # Need to handle empty directories
>>        if [ "$(ls -A)" ]; then
>>                set +e
>> -               tar -czf $TFILE *
>> +               tar -cJf $TFILE *
>>                ret=$?
>>                if [ $ret -ne 0 ] && [ $ret -ne 1 ]; then
>>                        exit 1
>>                fi
>>                set -e
>>        else
>> -               tar -cz --file=$TFILE --files-from=/dev/null
>> +               tar -cJ --file=$TFILE --files-from=/dev/null
>>        fi
>>        chmod 0664 $TFILE
>>        mv -f $TFILE ${SSTATE_PKG}
>> @@ -703,7 +703,7 @@ sstate_create_package () {
>> # Will be run from within SSTATE_INSTDIR.
>> #
>> sstate_unpack_package () {
>> -       tar -xmvzf ${SSTATE_PKG}
>> +       tar -xmvJf ${SSTATE_PKG}
>>        # Use "! -w ||" to return true for read only files
>>        [ ! -w ${SSTATE_PKG} ] || touch --no-dereference ${SSTATE_PKG}
>>        [ ! -w ${SSTATE_PKG}.sig ] || [ ! -e ${SSTATE_PKG}.sig ] || touch --no-dereference ${SSTATE_PKG}.sig
>> @@ -716,7 +716,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
>> 
>>     ret = []
>>     missed = []
>> -    extension = ".tgz"
>> +    extension = ".tar.xz"
>>     if siginfo:
>>         extension = extension + ".siginfo"
>> 
>> @@ -821,11 +821,11 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
>>         evdata = {'missed': [], 'found': []};
>>         for task in missed:
>>             spec, extrapath, tname = getpathcomponents(task, d)
>> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
>> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tar.xz")
>>             evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
>>         for task in ret:
>>             spec, extrapath, tname = getpathcomponents(task, d)
>> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
>> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tar.xz")
>>             evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
>>         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
>> 
>> @@ -914,7 +914,7 @@ python sstate_eventhandler() {
>>     d = e.data
>>     # When we write an sstate package we rewrite the SSTATE_PKG
>>     spkg = d.getVar('SSTATE_PKG', True)
>> -    if not spkg.endswith(".tgz"):
>> +    if not spkg.endswith(".tar.xz"):
>>         taskname = d.getVar("BB_RUNTASK", True)[3:]
>>         spec = d.getVar('SSTATE_PKGSPEC', True)
>>         swspec = d.getVar('SSTATE_SWSPEC', True)
>> @@ -922,7 +922,7 @@ python sstate_eventhandler() {
>>             d.setVar("SSTATE_PKGSPEC", "${SSTATE_SWSPEC}")
>>             d.setVar("SSTATE_EXTRAPATH", "")
>>         sstatepkg = d.getVar('SSTATE_PKG', True)
>> -        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tgz" ".siginfo", d)
>> +        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tar.xz" ".siginfo", d)
>> }
>> 
>> SSTATE_PRUNE_OBSOLETEWORKDIR = "1"
>> diff --git a/meta/lib/oeqa/selftest/signing.py b/meta/lib/oeqa/selftest/signing.py
>> index c33662b..4d545ad 100644
>> --- a/meta/lib/oeqa/selftest/signing.py
>> +++ b/meta/lib/oeqa/selftest/signing.py
>> @@ -111,13 +111,13 @@ class Signing(oeSelfTest):
>>         bitbake('-c cleansstate %s' % test_recipe)
>>         bitbake(test_recipe)
>> 
>> -        recipe_sig = glob.glob(sstatedir + '/*/*:ed:*_package.tgz.sig')
>> -        recipe_tgz = glob.glob(sstatedir + '/*/*:ed:*_package.tgz')
>> +        recipe_sig = glob.glob(sstatedir + '/*/*:ed:*_package.tar.xz.sig')
>> +        recipe_txz = glob.glob(sstatedir + '/*/*:ed:*_package.tar.xz')
>> 
>>         self.assertEqual(len(recipe_sig), 1, 'Failed to find .sig file.')
>> -        self.assertEqual(len(recipe_tgz), 1, 'Failed to find .tgz file.')
>> +        self.assertEqual(len(recipe_txz), 1, 'Failed to find .tar.xz file.')
>> 
>> -        ret = runCmd('gpg --homedir %s --verify %s %s' % (self.gpg_dir, recipe_sig[0], recipe_tgz[0]))
>> +        ret = runCmd('gpg --homedir %s --verify %s %s' % (self.gpg_dir, recipe_sig[0], recipe_txz[0]))
>>         # gpg: Signature made Thu 22 Oct 2015 01:45:09 PM EEST using RSA key ID 61EEFB30
>>         # gpg: Good signature from "testuser (nocomment) <testuser@email.com>"
>>         self.assertIn('gpg: Good signature from', ret.output, 'Package signed incorrectly.')
>> diff --git a/meta/lib/oeqa/selftest/sstatetests.py b/meta/lib/oeqa/selftest/sstatetests.py
>> index 512cb4f..73e5132 100644
>> --- a/meta/lib/oeqa/selftest/sstatetests.py
>> +++ b/meta/lib/oeqa/selftest/sstatetests.py
>> @@ -55,15 +55,15 @@ class SStateTests(SStateBase):
>>         bitbake(['-ccleansstate'] + targets)
>> 
>>         bitbake(targets)
>> -        tgz_created = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific, distro_nonspecific)
>> -        self.assertTrue(tgz_created, msg="Could not find sstate .tgz files for: %s" % ', '.join(map(str, targets)))
>> +        txz_created = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific, distro_nonspecific)
>> +        self.assertTrue(txz_created, msg="Could not find sstate .tar.xz files for: %s" % ', '.join(map(str, targets)))
>> 
>>         siginfo_created = self.search_sstate('|'.join(map(str, [s + '.*?\.siginfo$' for s in targets])), distro_specific, distro_nonspecific)
>>         self.assertTrue(siginfo_created, msg="Could not find sstate .siginfo files for: %s" % ', '.join(map(str, targets)))
>> 
>>         bitbake(['-ccleansstate'] + targets)
>> -        tgz_removed = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific, distro_nonspecific)
>> -        self.assertTrue(not tgz_removed, msg="do_cleansstate didn't remove .tgz sstate files for: %s" % ', '.join(map(str, targets)))
>> +        txz_removed = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific, distro_nonspecific)
>> +        self.assertTrue(not txz_removed, msg="do_cleansstate didn't remove .tar.xz sstate files for: %s" % ', '.join(map(str, targets)))
>> 
>>     @testcase(977)
>>     def test_cleansstate_task_distro_specific_nonspecific(self):
>> @@ -87,8 +87,8 @@ class SStateTests(SStateBase):
>>         bitbake(['-ccleansstate'] + targets)
>> 
>>         bitbake(targets)
>> -        self.assertTrue(self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=False, distro_nonspecific=True) == [], msg="Found distro non-specific sstate for: %s" % ', '.join(map(str, targets)))
>> -        file_tracker_1 = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>> +        self.assertTrue(self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=False, distro_nonspecific=True) == [], msg="Found distro non-specific sstate for: %s" % ', '.join(map(str, targets)))
>> +        file_tracker_1 = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>         self.assertTrue(len(file_tracker_1) >= len(targets), msg = "Not all sstate files ware created for: %s" % ', '.join(map(str, targets)))
>> 
>>         self.track_for_cleanup(self.distro_specific_sstate + "_old")
>> @@ -97,7 +97,7 @@ class SStateTests(SStateBase):
>> 
>>         bitbake(['-cclean'] + targets)
>>         bitbake(targets)
>> -        file_tracker_2 = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>> +        file_tracker_2 = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>         self.assertTrue(len(file_tracker_2) >= len(targets), msg = "Not all sstate files ware created for: %s" % ', '.join(map(str, targets)))
>> 
>>         not_recreated = [x for x in file_tracker_1 if x not in file_tracker_2]
>> @@ -146,18 +146,18 @@ class SStateTests(SStateBase):
>>             if not sstate_arch in sstate_archs_list:
>>                 sstate_archs_list.append(sstate_arch)
>>             if target_config[idx] == target_config[-1]:
>> -                target_sstate_before_build = self.search_sstate(target + '.*?\.tgz$')
>> +                target_sstate_before_build = self.search_sstate(target + '.*?\.tar.xz$')
>>             bitbake("-cclean %s" % target)
>>             result = bitbake(target, ignore_status=True)
>>             if target_config[idx] == target_config[-1]:
>> -                target_sstate_after_build = self.search_sstate(target + '.*?\.tgz$')
>> +                target_sstate_after_build = self.search_sstate(target + '.*?\.tar.xz$')
>>                 expected_remaining_sstate += [x for x in target_sstate_after_build if x not in target_sstate_before_build if not any(pattern in x for pattern in ignore_patterns)]
>>             self.remove_config(global_config[idx])
>>             self.remove_recipeinc(target, target_config[idx])
>>             self.assertEqual(result.status, 0, msg = "build of %s failed with %s" % (target, result.output))
>> 
>>         runCmd("sstate-cache-management.sh -y --cache-dir=%s --remove-duplicated --extra-archs=%s" % (self.sstate_path, ','.join(map(str, sstate_archs_list))))
>> -        actual_remaining_sstate = [x for x in self.search_sstate(target + '.*?\.tgz$') if not any(pattern in x for pattern in ignore_patterns)]
>> +        actual_remaining_sstate = [x for x in self.search_sstate(target + '.*?\.tar.xz$') if not any(pattern in x for pattern in ignore_patterns)]
>> 
>>         actual_not_expected = [x for x in actual_remaining_sstate if x not in expected_remaining_sstate]
>>         self.assertFalse(actual_not_expected, msg="Files should have been removed but ware not: %s" % ', '.join(map(str, actual_not_expected)))
>> 
>> 
>> --
>> _______________________________________________
>> Openembedded-core mailing list
>> Openembedded-core@lists.openembedded.org
>> http://lists.openembedded.org/mailman/listinfo/openembedded-core
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 211 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] sstate: Switch from tgz to tar.xz for sstate
  2016-01-11 19:52   ` Khem Raj
@ 2016-01-11 20:00     ` Andre McCurdy
  2016-01-11 20:12       ` Khem Raj
  2016-01-11 22:32       ` Richard Purdie
  0 siblings, 2 replies; 7+ messages in thread
From: Andre McCurdy @ 2016-01-11 20:00 UTC (permalink / raw)
  To: Khem Raj; +Cc: openembedded-core

On Mon, Jan 11, 2016 at 11:52 AM, Khem Raj <raj.khem@gmail.com> wrote:
>
>> On Jan 11, 2016, at 11:05 AM, Andre McCurdy <armccurdy@gmail.com> wrote:
>>
>> On Sat, Jan 9, 2016 at 8:42 AM, Richard Purdie
>> <richard.purdie@linuxfoundation.org> wrote:
>>> xz compresses with a better compression ratio than gz with similar speed
>>> for compression and decompression.
>>
>> When you measured compression speed to be similar, was that with
>> parallel compression? If so, with how many CPU cores?
>>
>> A quick test of plain single threaded "tar -cz" -vs- "tar -cJ" on my
>> laptop seems to indicate that xz is _significantly_ slower:
>>
>> $ time tar -czf /tmp/jjj.tgz
>> tmp/work/cortexa15hf-neon-rdk-linux-gnueabi/glibc/2.22-r0/git
>>
>> real    0m4.708s
>> user    0m4.682s
>> sys    0m0.477s
>>
>> $ time tar -cJf /tmp/jjj.tar.xz
>> tmp/work/cortexa15hf-neon-rdk-linux-gnueabi/glibc/2.22-r0/git
>>
>> real    0m56.491s
>> user    0m56.489s
>> sys    0m0.744s
>
>
> on 8-core machine with pixz it is recovered a bit but still is slow tried a small load
>
>
> tar -cJf /tmp/xx.tar.xz   21.14s user 0.36s system 102% cpu 21.061 total
>
> tar -czf /tmp/xx.tar.gz   2.35s user 0.19s system 109% cpu 2.320 total
>
> tar -Ipixz -cf /tmp/xx.tar.xz   27.14s user 0.88s system 490% cpu 5.708 total
>
> When changing the compression level to -3 ( it gets a bit faster )
>
> pixz -3 /tmp/xx.tar /tmp/xx.tar.xz  17.58s user 0.18s system 606% cpu 2.927 total
>

For a fair comparison, we should probably be testing parallel gzip
against parallel xz.

In general, I'm not really convinced about this change though. Disk
space is cheap and always getting cheaper, but builds can never be
fast enough. Is it really worthwhile to trade off build performance
for a reduction in sstate disk usage?

Perhaps the sstate compression algorithm should be configurable so
that people low on disk space can opt into slower builds?


>>
>>> It therefore makes sense to switch
>>> to it for the sstate objects.
>>>
>>> As an example, the gcc-cross populate_sysroot object goes from
>>> 79,509,871 to 53,031,752 bytes which is a significant improvement.
>>>
>>> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
>>>
>>> diff --git a/meta/classes/buildhistory.bbclass b/meta/classes/buildhistory.bbclass
>>> index 4153e58..734303c 100644
>>> --- a/meta/classes/buildhistory.bbclass
>>> +++ b/meta/classes/buildhistory.bbclass
>>> @@ -537,7 +537,7 @@ python buildhistory_get_extra_sdkinfo() {
>>>         filesizes = {}
>>>         for root, _, files in os.walk('${SDK_OUTPUT}/${SDKPATH}/sstate-cache'):
>>>             for fn in files:
>>> -                if fn.endswith('.tgz'):
>>> +                if fn.endswith('.tar.xz'):
>>>                     fsize = int(math.ceil(float(os.path.getsize(os.path.join(root, fn))) / 1024))
>>>                     task = fn.rsplit(':', 1)[1].split('_', 1)[1].split('.')[0]
>>>                     origtotal = tasksizes.get(task, 0)
>>> diff --git a/meta/classes/populate_sdk_ext.bbclass b/meta/classes/populate_sdk_ext.bbclass
>>> index 3a65c07..4ff5e9e 100644
>>> --- a/meta/classes/populate_sdk_ext.bbclass
>>> +++ b/meta/classes/populate_sdk_ext.bbclass
>>> @@ -189,7 +189,7 @@ python copy_buildsystem () {
>>>     # We don't need sstate do_package files
>>>     for root, dirs, files in os.walk(sstate_out):
>>>         for name in files:
>>> -            if name.endswith("_package.tgz"):
>>> +            if name.endswith("_package.tar.xz"):
>>>                 f = os.path.join(root, name)
>>>                 os.remove(f)
>>> }
>>> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
>>> index 9bef212..d9adf01 100644
>>> --- a/meta/classes/sstate.bbclass
>>> +++ b/meta/classes/sstate.bbclass
>>> @@ -294,8 +294,8 @@ def sstate_installpkg(ss, d):
>>>         oe.path.remove(dir)
>>>
>>>     sstateinst = d.expand("${WORKDIR}/sstate-install-%s/" % ss['task'])
>>> -    sstatefetch = d.getVar('SSTATE_PKGNAME', True) + '_' + ss['task'] + ".tgz"
>>> -    sstatepkg = d.getVar('SSTATE_PKG', True) + '_' + ss['task'] + ".tgz"
>>> +    sstatefetch = d.getVar('SSTATE_PKGNAME', True) + '_' + ss['task'] + ".tar.xz"
>>> +    sstatepkg = d.getVar('SSTATE_PKG', True) + '_' + ss['task'] + ".tar.xz"
>>>
>>>     if not os.path.exists(sstatepkg):
>>>         pstaging_fetch(sstatefetch, sstatepkg, d)
>>> @@ -372,7 +372,7 @@ python sstate_hardcode_path_unpack () {
>>> def sstate_clean_cachefile(ss, d):
>>>     import oe.path
>>>
>>> -    sstatepkgfile = d.getVar('SSTATE_PATHSPEC', True) + "*_" + ss['task'] + ".tgz*"
>>> +    sstatepkgfile = d.getVar('SSTATE_PATHSPEC', True) + "*_" + ss['task'] + ".tar.xz*"
>>>     bb.note("Removing %s" % sstatepkgfile)
>>>     oe.path.remove(sstatepkgfile)
>>>
>>> @@ -555,7 +555,7 @@ def sstate_package(ss, d):
>>>     tmpdir = d.getVar('TMPDIR', True)
>>>
>>>     sstatebuild = d.expand("${WORKDIR}/sstate-build-%s/" % ss['task'])
>>> -    sstatepkg = d.getVar('SSTATE_PKG', True) + '_'+ ss['task'] + ".tgz"
>>> +    sstatepkg = d.getVar('SSTATE_PKG', True) + '_'+ ss['task'] + ".tar.xz"
>>>     bb.utils.remove(sstatebuild, recurse=True)
>>>     bb.utils.mkdirhier(sstatebuild)
>>>     bb.utils.mkdirhier(os.path.dirname(sstatepkg))
>>> @@ -677,14 +677,14 @@ sstate_create_package () {
>>>        # Need to handle empty directories
>>>        if [ "$(ls -A)" ]; then
>>>                set +e
>>> -               tar -czf $TFILE *
>>> +               tar -cJf $TFILE *
>>>                ret=$?
>>>                if [ $ret -ne 0 ] && [ $ret -ne 1 ]; then
>>>                        exit 1
>>>                fi
>>>                set -e
>>>        else
>>> -               tar -cz --file=$TFILE --files-from=/dev/null
>>> +               tar -cJ --file=$TFILE --files-from=/dev/null
>>>        fi
>>>        chmod 0664 $TFILE
>>>        mv -f $TFILE ${SSTATE_PKG}
>>> @@ -703,7 +703,7 @@ sstate_create_package () {
>>> # Will be run from within SSTATE_INSTDIR.
>>> #
>>> sstate_unpack_package () {
>>> -       tar -xmvzf ${SSTATE_PKG}
>>> +       tar -xmvJf ${SSTATE_PKG}
>>>        # Use "! -w ||" to return true for read only files
>>>        [ ! -w ${SSTATE_PKG} ] || touch --no-dereference ${SSTATE_PKG}
>>>        [ ! -w ${SSTATE_PKG}.sig ] || [ ! -e ${SSTATE_PKG}.sig ] || touch --no-dereference ${SSTATE_PKG}.sig
>>> @@ -716,7 +716,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
>>>
>>>     ret = []
>>>     missed = []
>>> -    extension = ".tgz"
>>> +    extension = ".tar.xz"
>>>     if siginfo:
>>>         extension = extension + ".siginfo"
>>>
>>> @@ -821,11 +821,11 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
>>>         evdata = {'missed': [], 'found': []};
>>>         for task in missed:
>>>             spec, extrapath, tname = getpathcomponents(task, d)
>>> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
>>> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tar.xz")
>>>             evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
>>>         for task in ret:
>>>             spec, extrapath, tname = getpathcomponents(task, d)
>>> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
>>> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tar.xz")
>>>             evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
>>>         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
>>>
>>> @@ -914,7 +914,7 @@ python sstate_eventhandler() {
>>>     d = e.data
>>>     # When we write an sstate package we rewrite the SSTATE_PKG
>>>     spkg = d.getVar('SSTATE_PKG', True)
>>> -    if not spkg.endswith(".tgz"):
>>> +    if not spkg.endswith(".tar.xz"):
>>>         taskname = d.getVar("BB_RUNTASK", True)[3:]
>>>         spec = d.getVar('SSTATE_PKGSPEC', True)
>>>         swspec = d.getVar('SSTATE_SWSPEC', True)
>>> @@ -922,7 +922,7 @@ python sstate_eventhandler() {
>>>             d.setVar("SSTATE_PKGSPEC", "${SSTATE_SWSPEC}")
>>>             d.setVar("SSTATE_EXTRAPATH", "")
>>>         sstatepkg = d.getVar('SSTATE_PKG', True)
>>> -        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tgz" ".siginfo", d)
>>> +        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tar.xz" ".siginfo", d)
>>> }
>>>
>>> SSTATE_PRUNE_OBSOLETEWORKDIR = "1"
>>> diff --git a/meta/lib/oeqa/selftest/signing.py b/meta/lib/oeqa/selftest/signing.py
>>> index c33662b..4d545ad 100644
>>> --- a/meta/lib/oeqa/selftest/signing.py
>>> +++ b/meta/lib/oeqa/selftest/signing.py
>>> @@ -111,13 +111,13 @@ class Signing(oeSelfTest):
>>>         bitbake('-c cleansstate %s' % test_recipe)
>>>         bitbake(test_recipe)
>>>
>>> -        recipe_sig = glob.glob(sstatedir + '/*/*:ed:*_package.tgz.sig')
>>> -        recipe_tgz = glob.glob(sstatedir + '/*/*:ed:*_package.tgz')
>>> +        recipe_sig = glob.glob(sstatedir + '/*/*:ed:*_package.tar.xz.sig')
>>> +        recipe_txz = glob.glob(sstatedir + '/*/*:ed:*_package.tar.xz')
>>>
>>>         self.assertEqual(len(recipe_sig), 1, 'Failed to find .sig file.')
>>> -        self.assertEqual(len(recipe_tgz), 1, 'Failed to find .tgz file.')
>>> +        self.assertEqual(len(recipe_txz), 1, 'Failed to find .tar.xz file.')
>>>
>>> -        ret = runCmd('gpg --homedir %s --verify %s %s' % (self.gpg_dir, recipe_sig[0], recipe_tgz[0]))
>>> +        ret = runCmd('gpg --homedir %s --verify %s %s' % (self.gpg_dir, recipe_sig[0], recipe_txz[0]))
>>>         # gpg: Signature made Thu 22 Oct 2015 01:45:09 PM EEST using RSA key ID 61EEFB30
>>>         # gpg: Good signature from "testuser (nocomment) <testuser@email.com>"
>>>         self.assertIn('gpg: Good signature from', ret.output, 'Package signed incorrectly.')
>>> diff --git a/meta/lib/oeqa/selftest/sstatetests.py b/meta/lib/oeqa/selftest/sstatetests.py
>>> index 512cb4f..73e5132 100644
>>> --- a/meta/lib/oeqa/selftest/sstatetests.py
>>> +++ b/meta/lib/oeqa/selftest/sstatetests.py
>>> @@ -55,15 +55,15 @@ class SStateTests(SStateBase):
>>>         bitbake(['-ccleansstate'] + targets)
>>>
>>>         bitbake(targets)
>>> -        tgz_created = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific, distro_nonspecific)
>>> -        self.assertTrue(tgz_created, msg="Could not find sstate .tgz files for: %s" % ', '.join(map(str, targets)))
>>> +        txz_created = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific, distro_nonspecific)
>>> +        self.assertTrue(txz_created, msg="Could not find sstate .tar.xz files for: %s" % ', '.join(map(str, targets)))
>>>
>>>         siginfo_created = self.search_sstate('|'.join(map(str, [s + '.*?\.siginfo$' for s in targets])), distro_specific, distro_nonspecific)
>>>         self.assertTrue(siginfo_created, msg="Could not find sstate .siginfo files for: %s" % ', '.join(map(str, targets)))
>>>
>>>         bitbake(['-ccleansstate'] + targets)
>>> -        tgz_removed = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific, distro_nonspecific)
>>> -        self.assertTrue(not tgz_removed, msg="do_cleansstate didn't remove .tgz sstate files for: %s" % ', '.join(map(str, targets)))
>>> +        txz_removed = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific, distro_nonspecific)
>>> +        self.assertTrue(not txz_removed, msg="do_cleansstate didn't remove .tar.xz sstate files for: %s" % ', '.join(map(str, targets)))
>>>
>>>     @testcase(977)
>>>     def test_cleansstate_task_distro_specific_nonspecific(self):
>>> @@ -87,8 +87,8 @@ class SStateTests(SStateBase):
>>>         bitbake(['-ccleansstate'] + targets)
>>>
>>>         bitbake(targets)
>>> -        self.assertTrue(self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=False, distro_nonspecific=True) == [], msg="Found distro non-specific sstate for: %s" % ', '.join(map(str, targets)))
>>> -        file_tracker_1 = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>> +        self.assertTrue(self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=False, distro_nonspecific=True) == [], msg="Found distro non-specific sstate for: %s" % ', '.join(map(str, targets)))
>>> +        file_tracker_1 = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>>         self.assertTrue(len(file_tracker_1) >= len(targets), msg = "Not all sstate files ware created for: %s" % ', '.join(map(str, targets)))
>>>
>>>         self.track_for_cleanup(self.distro_specific_sstate + "_old")
>>> @@ -97,7 +97,7 @@ class SStateTests(SStateBase):
>>>
>>>         bitbake(['-cclean'] + targets)
>>>         bitbake(targets)
>>> -        file_tracker_2 = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>> +        file_tracker_2 = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>>         self.assertTrue(len(file_tracker_2) >= len(targets), msg = "Not all sstate files ware created for: %s" % ', '.join(map(str, targets)))
>>>
>>>         not_recreated = [x for x in file_tracker_1 if x not in file_tracker_2]
>>> @@ -146,18 +146,18 @@ class SStateTests(SStateBase):
>>>             if not sstate_arch in sstate_archs_list:
>>>                 sstate_archs_list.append(sstate_arch)
>>>             if target_config[idx] == target_config[-1]:
>>> -                target_sstate_before_build = self.search_sstate(target + '.*?\.tgz$')
>>> +                target_sstate_before_build = self.search_sstate(target + '.*?\.tar.xz$')
>>>             bitbake("-cclean %s" % target)
>>>             result = bitbake(target, ignore_status=True)
>>>             if target_config[idx] == target_config[-1]:
>>> -                target_sstate_after_build = self.search_sstate(target + '.*?\.tgz$')
>>> +                target_sstate_after_build = self.search_sstate(target + '.*?\.tar.xz$')
>>>                 expected_remaining_sstate += [x for x in target_sstate_after_build if x not in target_sstate_before_build if not any(pattern in x for pattern in ignore_patterns)]
>>>             self.remove_config(global_config[idx])
>>>             self.remove_recipeinc(target, target_config[idx])
>>>             self.assertEqual(result.status, 0, msg = "build of %s failed with %s" % (target, result.output))
>>>
>>>         runCmd("sstate-cache-management.sh -y --cache-dir=%s --remove-duplicated --extra-archs=%s" % (self.sstate_path, ','.join(map(str, sstate_archs_list))))
>>> -        actual_remaining_sstate = [x for x in self.search_sstate(target + '.*?\.tgz$') if not any(pattern in x for pattern in ignore_patterns)]
>>> +        actual_remaining_sstate = [x for x in self.search_sstate(target + '.*?\.tar.xz$') if not any(pattern in x for pattern in ignore_patterns)]
>>>
>>>         actual_not_expected = [x for x in actual_remaining_sstate if x not in expected_remaining_sstate]
>>>         self.assertFalse(actual_not_expected, msg="Files should have been removed but ware not: %s" % ', '.join(map(str, actual_not_expected)))
>>>
>>>
>>> --
>>> _______________________________________________
>>> Openembedded-core mailing list
>>> Openembedded-core@lists.openembedded.org
>>> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>> --
>> _______________________________________________
>> Openembedded-core mailing list
>> Openembedded-core@lists.openembedded.org
>> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] sstate: Switch from tgz to tar.xz for sstate
  2016-01-11 20:00     ` Andre McCurdy
@ 2016-01-11 20:12       ` Khem Raj
  2016-01-11 20:34         ` Andre McCurdy
  2016-01-11 22:32       ` Richard Purdie
  1 sibling, 1 reply; 7+ messages in thread
From: Khem Raj @ 2016-01-11 20:12 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: openembedded-core

[-- Attachment #1: Type: text/plain, Size: 17039 bytes --]


> On Jan 11, 2016, at 12:00 PM, Andre McCurdy <armccurdy@gmail.com> wrote:
> 
> On Mon, Jan 11, 2016 at 11:52 AM, Khem Raj <raj.khem@gmail.com> wrote:
>> 
>>> On Jan 11, 2016, at 11:05 AM, Andre McCurdy <armccurdy@gmail.com> wrote:
>>> 
>>> On Sat, Jan 9, 2016 at 8:42 AM, Richard Purdie
>>> <richard.purdie@linuxfoundation.org> wrote:
>>>> xz compresses with a better compression ratio than gz with similar speed
>>>> for compression and decompression.
>>> 
>>> When you measured compression speed to be similar, was that with
>>> parallel compression? If so, with how many CPU cores?
>>> 
>>> A quick test of plain single threaded "tar -cz" -vs- "tar -cJ" on my
>>> laptop seems to indicate that xz is _significantly_ slower:
>>> 
>>> $ time tar -czf /tmp/jjj.tgz
>>> tmp/work/cortexa15hf-neon-rdk-linux-gnueabi/glibc/2.22-r0/git
>>> 
>>> real    0m4.708s
>>> user    0m4.682s
>>> sys    0m0.477s
>>> 
>>> $ time tar -cJf /tmp/jjj.tar.xz
>>> tmp/work/cortexa15hf-neon-rdk-linux-gnueabi/glibc/2.22-r0/git
>>> 
>>> real    0m56.491s
>>> user    0m56.489s
>>> sys    0m0.744s
>> 
>> 
>> on 8-core machine with pixz it is recovered a bit but still is slow tried a small load
>> 
>> 
>> tar -cJf /tmp/xx.tar.xz   21.14s user 0.36s system 102% cpu 21.061 total
>> 
>> tar -czf /tmp/xx.tar.gz   2.35s user 0.19s system 109% cpu 2.320 total
>> 
>> tar -Ipixz -cf /tmp/xx.tar.xz   27.14s user 0.88s system 490% cpu 5.708 total
>> 
>> When changing the compression level to -3 ( it gets a bit faster )
>> 
>> pixz -3 /tmp/xx.tar /tmp/xx.tar.xz  17.58s user 0.18s system 606% cpu 2.927 total
>> 
> 
> For a fair comparison, we should probably be testing parallel gzip
> against parallel xz.

its not about speed, all this additional tooling is to achieve more compression in same time.

> 
> In general, I'm not really convinced about this change though. Disk
> space is cheap and always getting cheaper, but builds can never be
> fast enough. Is it really worthwhile to trade off build performance
> for a reduction in sstate disk usage?

if you use sstate mirrors over network then its just not disk but also network load involved.
but if the build speed is impacted due to slower comp/decomp then it needs to optional rather than default. Since it will not be
a one time thing but will impact every build that a developer/user is going to do.

> 
> Perhaps the sstate compression algorithm should be configurable so
> that people low on disk space can opt into slower builds?
> 
> 
>>> 
>>>> It therefore makes sense to switch
>>>> to it for the sstate objects.
>>>> 
>>>> As an example, the gcc-cross populate_sysroot object goes from
>>>> 79,509,871 to 53,031,752 bytes which is a significant improvement.
>>>> 
>>>> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
>>>> 
>>>> diff --git a/meta/classes/buildhistory.bbclass b/meta/classes/buildhistory.bbclass
>>>> index 4153e58..734303c 100644
>>>> --- a/meta/classes/buildhistory.bbclass
>>>> +++ b/meta/classes/buildhistory.bbclass
>>>> @@ -537,7 +537,7 @@ python buildhistory_get_extra_sdkinfo() {
>>>>        filesizes = {}
>>>>        for root, _, files in os.walk('${SDK_OUTPUT}/${SDKPATH}/sstate-cache'):
>>>>            for fn in files:
>>>> -                if fn.endswith('.tgz'):
>>>> +                if fn.endswith('.tar.xz'):
>>>>                    fsize = int(math.ceil(float(os.path.getsize(os.path.join(root, fn))) / 1024))
>>>>                    task = fn.rsplit(':', 1)[1].split('_', 1)[1].split('.')[0]
>>>>                    origtotal = tasksizes.get(task, 0)
>>>> diff --git a/meta/classes/populate_sdk_ext.bbclass b/meta/classes/populate_sdk_ext.bbclass
>>>> index 3a65c07..4ff5e9e 100644
>>>> --- a/meta/classes/populate_sdk_ext.bbclass
>>>> +++ b/meta/classes/populate_sdk_ext.bbclass
>>>> @@ -189,7 +189,7 @@ python copy_buildsystem () {
>>>>    # We don't need sstate do_package files
>>>>    for root, dirs, files in os.walk(sstate_out):
>>>>        for name in files:
>>>> -            if name.endswith("_package.tgz"):
>>>> +            if name.endswith("_package.tar.xz"):
>>>>                f = os.path.join(root, name)
>>>>                os.remove(f)
>>>> }
>>>> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
>>>> index 9bef212..d9adf01 100644
>>>> --- a/meta/classes/sstate.bbclass
>>>> +++ b/meta/classes/sstate.bbclass
>>>> @@ -294,8 +294,8 @@ def sstate_installpkg(ss, d):
>>>>        oe.path.remove(dir)
>>>> 
>>>>    sstateinst = d.expand("${WORKDIR}/sstate-install-%s/" % ss['task'])
>>>> -    sstatefetch = d.getVar('SSTATE_PKGNAME', True) + '_' + ss['task'] + ".tgz"
>>>> -    sstatepkg = d.getVar('SSTATE_PKG', True) + '_' + ss['task'] + ".tgz"
>>>> +    sstatefetch = d.getVar('SSTATE_PKGNAME', True) + '_' + ss['task'] + ".tar.xz"
>>>> +    sstatepkg = d.getVar('SSTATE_PKG', True) + '_' + ss['task'] + ".tar.xz"
>>>> 
>>>>    if not os.path.exists(sstatepkg):
>>>>        pstaging_fetch(sstatefetch, sstatepkg, d)
>>>> @@ -372,7 +372,7 @@ python sstate_hardcode_path_unpack () {
>>>> def sstate_clean_cachefile(ss, d):
>>>>    import oe.path
>>>> 
>>>> -    sstatepkgfile = d.getVar('SSTATE_PATHSPEC', True) + "*_" + ss['task'] + ".tgz*"
>>>> +    sstatepkgfile = d.getVar('SSTATE_PATHSPEC', True) + "*_" + ss['task'] + ".tar.xz*"
>>>>    bb.note("Removing %s" % sstatepkgfile)
>>>>    oe.path.remove(sstatepkgfile)
>>>> 
>>>> @@ -555,7 +555,7 @@ def sstate_package(ss, d):
>>>>    tmpdir = d.getVar('TMPDIR', True)
>>>> 
>>>>    sstatebuild = d.expand("${WORKDIR}/sstate-build-%s/" % ss['task'])
>>>> -    sstatepkg = d.getVar('SSTATE_PKG', True) + '_'+ ss['task'] + ".tgz"
>>>> +    sstatepkg = d.getVar('SSTATE_PKG', True) + '_'+ ss['task'] + ".tar.xz"
>>>>    bb.utils.remove(sstatebuild, recurse=True)
>>>>    bb.utils.mkdirhier(sstatebuild)
>>>>    bb.utils.mkdirhier(os.path.dirname(sstatepkg))
>>>> @@ -677,14 +677,14 @@ sstate_create_package () {
>>>>       # Need to handle empty directories
>>>>       if [ "$(ls -A)" ]; then
>>>>               set +e
>>>> -               tar -czf $TFILE *
>>>> +               tar -cJf $TFILE *
>>>>               ret=$?
>>>>               if [ $ret -ne 0 ] && [ $ret -ne 1 ]; then
>>>>                       exit 1
>>>>               fi
>>>>               set -e
>>>>       else
>>>> -               tar -cz --file=$TFILE --files-from=/dev/null
>>>> +               tar -cJ --file=$TFILE --files-from=/dev/null
>>>>       fi
>>>>       chmod 0664 $TFILE
>>>>       mv -f $TFILE ${SSTATE_PKG}
>>>> @@ -703,7 +703,7 @@ sstate_create_package () {
>>>> # Will be run from within SSTATE_INSTDIR.
>>>> #
>>>> sstate_unpack_package () {
>>>> -       tar -xmvzf ${SSTATE_PKG}
>>>> +       tar -xmvJf ${SSTATE_PKG}
>>>>       # Use "! -w ||" to return true for read only files
>>>>       [ ! -w ${SSTATE_PKG} ] || touch --no-dereference ${SSTATE_PKG}
>>>>       [ ! -w ${SSTATE_PKG}.sig ] || [ ! -e ${SSTATE_PKG}.sig ] || touch --no-dereference ${SSTATE_PKG}.sig
>>>> @@ -716,7 +716,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
>>>> 
>>>>    ret = []
>>>>    missed = []
>>>> -    extension = ".tgz"
>>>> +    extension = ".tar.xz"
>>>>    if siginfo:
>>>>        extension = extension + ".siginfo"
>>>> 
>>>> @@ -821,11 +821,11 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
>>>>        evdata = {'missed': [], 'found': []};
>>>>        for task in missed:
>>>>            spec, extrapath, tname = getpathcomponents(task, d)
>>>> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
>>>> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tar.xz")
>>>>            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
>>>>        for task in ret:
>>>>            spec, extrapath, tname = getpathcomponents(task, d)
>>>> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
>>>> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tar.xz")
>>>>            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
>>>>        bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
>>>> 
>>>> @@ -914,7 +914,7 @@ python sstate_eventhandler() {
>>>>    d = e.data
>>>>    # When we write an sstate package we rewrite the SSTATE_PKG
>>>>    spkg = d.getVar('SSTATE_PKG', True)
>>>> -    if not spkg.endswith(".tgz"):
>>>> +    if not spkg.endswith(".tar.xz"):
>>>>        taskname = d.getVar("BB_RUNTASK", True)[3:]
>>>>        spec = d.getVar('SSTATE_PKGSPEC', True)
>>>>        swspec = d.getVar('SSTATE_SWSPEC', True)
>>>> @@ -922,7 +922,7 @@ python sstate_eventhandler() {
>>>>            d.setVar("SSTATE_PKGSPEC", "${SSTATE_SWSPEC}")
>>>>            d.setVar("SSTATE_EXTRAPATH", "")
>>>>        sstatepkg = d.getVar('SSTATE_PKG', True)
>>>> -        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tgz" ".siginfo", d)
>>>> +        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tar.xz" ".siginfo", d)
>>>> }
>>>> 
>>>> SSTATE_PRUNE_OBSOLETEWORKDIR = "1"
>>>> diff --git a/meta/lib/oeqa/selftest/signing.py b/meta/lib/oeqa/selftest/signing.py
>>>> index c33662b..4d545ad 100644
>>>> --- a/meta/lib/oeqa/selftest/signing.py
>>>> +++ b/meta/lib/oeqa/selftest/signing.py
>>>> @@ -111,13 +111,13 @@ class Signing(oeSelfTest):
>>>>        bitbake('-c cleansstate %s' % test_recipe)
>>>>        bitbake(test_recipe)
>>>> 
>>>> -        recipe_sig = glob.glob(sstatedir + '/*/*:ed:*_package.tgz.sig')
>>>> -        recipe_tgz = glob.glob(sstatedir + '/*/*:ed:*_package.tgz')
>>>> +        recipe_sig = glob.glob(sstatedir + '/*/*:ed:*_package.tar.xz.sig')
>>>> +        recipe_txz = glob.glob(sstatedir + '/*/*:ed:*_package.tar.xz')
>>>> 
>>>>        self.assertEqual(len(recipe_sig), 1, 'Failed to find .sig file.')
>>>> -        self.assertEqual(len(recipe_tgz), 1, 'Failed to find .tgz file.')
>>>> +        self.assertEqual(len(recipe_txz), 1, 'Failed to find .tar.xz file.')
>>>> 
>>>> -        ret = runCmd('gpg --homedir %s --verify %s %s' % (self.gpg_dir, recipe_sig[0], recipe_tgz[0]))
>>>> +        ret = runCmd('gpg --homedir %s --verify %s %s' % (self.gpg_dir, recipe_sig[0], recipe_txz[0]))
>>>>        # gpg: Signature made Thu 22 Oct 2015 01:45:09 PM EEST using RSA key ID 61EEFB30
>>>>        # gpg: Good signature from "testuser (nocomment) <testuser@email.com>"
>>>>        self.assertIn('gpg: Good signature from', ret.output, 'Package signed incorrectly.')
>>>> diff --git a/meta/lib/oeqa/selftest/sstatetests.py b/meta/lib/oeqa/selftest/sstatetests.py
>>>> index 512cb4f..73e5132 100644
>>>> --- a/meta/lib/oeqa/selftest/sstatetests.py
>>>> +++ b/meta/lib/oeqa/selftest/sstatetests.py
>>>> @@ -55,15 +55,15 @@ class SStateTests(SStateBase):
>>>>        bitbake(['-ccleansstate'] + targets)
>>>> 
>>>>        bitbake(targets)
>>>> -        tgz_created = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific, distro_nonspecific)
>>>> -        self.assertTrue(tgz_created, msg="Could not find sstate .tgz files for: %s" % ', '.join(map(str, targets)))
>>>> +        txz_created = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific, distro_nonspecific)
>>>> +        self.assertTrue(txz_created, msg="Could not find sstate .tar.xz files for: %s" % ', '.join(map(str, targets)))
>>>> 
>>>>        siginfo_created = self.search_sstate('|'.join(map(str, [s + '.*?\.siginfo$' for s in targets])), distro_specific, distro_nonspecific)
>>>>        self.assertTrue(siginfo_created, msg="Could not find sstate .siginfo files for: %s" % ', '.join(map(str, targets)))
>>>> 
>>>>        bitbake(['-ccleansstate'] + targets)
>>>> -        tgz_removed = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific, distro_nonspecific)
>>>> -        self.assertTrue(not tgz_removed, msg="do_cleansstate didn't remove .tgz sstate files for: %s" % ', '.join(map(str, targets)))
>>>> +        txz_removed = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific, distro_nonspecific)
>>>> +        self.assertTrue(not txz_removed, msg="do_cleansstate didn't remove .tar.xz sstate files for: %s" % ', '.join(map(str, targets)))
>>>> 
>>>>    @testcase(977)
>>>>    def test_cleansstate_task_distro_specific_nonspecific(self):
>>>> @@ -87,8 +87,8 @@ class SStateTests(SStateBase):
>>>>        bitbake(['-ccleansstate'] + targets)
>>>> 
>>>>        bitbake(targets)
>>>> -        self.assertTrue(self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=False, distro_nonspecific=True) == [], msg="Found distro non-specific sstate for: %s" % ', '.join(map(str, targets)))
>>>> -        file_tracker_1 = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>>> +        self.assertTrue(self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=False, distro_nonspecific=True) == [], msg="Found distro non-specific sstate for: %s" % ', '.join(map(str, targets)))
>>>> +        file_tracker_1 = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>>>        self.assertTrue(len(file_tracker_1) >= len(targets), msg = "Not all sstate files ware created for: %s" % ', '.join(map(str, targets)))
>>>> 
>>>>        self.track_for_cleanup(self.distro_specific_sstate + "_old")
>>>> @@ -97,7 +97,7 @@ class SStateTests(SStateBase):
>>>> 
>>>>        bitbake(['-cclean'] + targets)
>>>>        bitbake(targets)
>>>> -        file_tracker_2 = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>>> +        file_tracker_2 = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>>>        self.assertTrue(len(file_tracker_2) >= len(targets), msg = "Not all sstate files ware created for: %s" % ', '.join(map(str, targets)))
>>>> 
>>>>        not_recreated = [x for x in file_tracker_1 if x not in file_tracker_2]
>>>> @@ -146,18 +146,18 @@ class SStateTests(SStateBase):
>>>>            if not sstate_arch in sstate_archs_list:
>>>>                sstate_archs_list.append(sstate_arch)
>>>>            if target_config[idx] == target_config[-1]:
>>>> -                target_sstate_before_build = self.search_sstate(target + '.*?\.tgz$')
>>>> +                target_sstate_before_build = self.search_sstate(target + '.*?\.tar.xz$')
>>>>            bitbake("-cclean %s" % target)
>>>>            result = bitbake(target, ignore_status=True)
>>>>            if target_config[idx] == target_config[-1]:
>>>> -                target_sstate_after_build = self.search_sstate(target + '.*?\.tgz$')
>>>> +                target_sstate_after_build = self.search_sstate(target + '.*?\.tar.xz$')
>>>>                expected_remaining_sstate += [x for x in target_sstate_after_build if x not in target_sstate_before_build if not any(pattern in x for pattern in ignore_patterns)]
>>>>            self.remove_config(global_config[idx])
>>>>            self.remove_recipeinc(target, target_config[idx])
>>>>            self.assertEqual(result.status, 0, msg = "build of %s failed with %s" % (target, result.output))
>>>> 
>>>>        runCmd("sstate-cache-management.sh -y --cache-dir=%s --remove-duplicated --extra-archs=%s" % (self.sstate_path, ','.join(map(str, sstate_archs_list))))
>>>> -        actual_remaining_sstate = [x for x in self.search_sstate(target + '.*?\.tgz$') if not any(pattern in x for pattern in ignore_patterns)]
>>>> +        actual_remaining_sstate = [x for x in self.search_sstate(target + '.*?\.tar.xz$') if not any(pattern in x for pattern in ignore_patterns)]
>>>> 
>>>>        actual_not_expected = [x for x in actual_remaining_sstate if x not in expected_remaining_sstate]
>>>>        self.assertFalse(actual_not_expected, msg="Files should have been removed but ware not: %s" % ', '.join(map(str, actual_not_expected)))
>>>> 
>>>> 
>>>> --
>>>> _______________________________________________
>>>> Openembedded-core mailing list
>>>> Openembedded-core@lists.openembedded.org
>>>> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>>> --
>>> _______________________________________________
>>> Openembedded-core mailing list
>>> Openembedded-core@lists.openembedded.org
>>> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>> 


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 211 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] sstate: Switch from tgz to tar.xz for sstate
  2016-01-11 20:12       ` Khem Raj
@ 2016-01-11 20:34         ` Andre McCurdy
  0 siblings, 0 replies; 7+ messages in thread
From: Andre McCurdy @ 2016-01-11 20:34 UTC (permalink / raw)
  To: Khem Raj; +Cc: openembedded-core

On Mon, Jan 11, 2016 at 12:12 PM, Khem Raj <raj.khem@gmail.com> wrote:
>
>> On Jan 11, 2016, at 12:00 PM, Andre McCurdy <armccurdy@gmail.com> wrote:
>>
>> On Mon, Jan 11, 2016 at 11:52 AM, Khem Raj <raj.khem@gmail.com> wrote:
>>>
>>>> On Jan 11, 2016, at 11:05 AM, Andre McCurdy <armccurdy@gmail.com> wrote:
>>>>
>>>> On Sat, Jan 9, 2016 at 8:42 AM, Richard Purdie
>>>> <richard.purdie@linuxfoundation.org> wrote:
>>>>> xz compresses with a better compression ratio than gz with similar speed
>>>>> for compression and decompression.
>>>>
>>>> When you measured compression speed to be similar, was that with
>>>> parallel compression? If so, with how many CPU cores?
>>>>
>>>> A quick test of plain single threaded "tar -cz" -vs- "tar -cJ" on my
>>>> laptop seems to indicate that xz is _significantly_ slower:
>>>>
>>>> $ time tar -czf /tmp/jjj.tgz
>>>> tmp/work/cortexa15hf-neon-rdk-linux-gnueabi/glibc/2.22-r0/git
>>>>
>>>> real    0m4.708s
>>>> user    0m4.682s
>>>> sys    0m0.477s
>>>>
>>>> $ time tar -cJf /tmp/jjj.tar.xz
>>>> tmp/work/cortexa15hf-neon-rdk-linux-gnueabi/glibc/2.22-r0/git
>>>>
>>>> real    0m56.491s
>>>> user    0m56.489s
>>>> sys    0m0.744s
>>>
>>>
>>> on 8-core machine with pixz it is recovered a bit but still is slow tried a small load
>>>
>>>
>>> tar -cJf /tmp/xx.tar.xz   21.14s user 0.36s system 102% cpu 21.061 total
>>>
>>> tar -czf /tmp/xx.tar.gz   2.35s user 0.19s system 109% cpu 2.320 total
>>>
>>> tar -Ipixz -cf /tmp/xx.tar.xz   27.14s user 0.88s system 490% cpu 5.708 total
>>>
>>> When changing the compression level to -3 ( it gets a bit faster )
>>>
>>> pixz -3 /tmp/xx.tar /tmp/xx.tar.xz  17.58s user 0.18s system 606% cpu 2.927 total
>>>
>>
>> For a fair comparison, we should probably be testing parallel gzip
>> against parallel xz.
>
> its not about speed, all this additional tooling is to achieve more compression in same time.

Yes, you're right. Creating sstate tar files is going to happen in
parallel with other bitbake tasks, so overall CPU time is important,
not speed.

>> In general, I'm not really convinced about this change though. Disk
>> space is cheap and always getting cheaper, but builds can never be
>> fast enough. Is it really worthwhile to trade off build performance
>> for a reduction in sstate disk usage?
>
> if you use sstate mirrors over network then its just not disk but also network load involved.
> but if the build speed is impacted due to slower comp/decomp then it needs to optional rather than default. Since it will not be
> a one time thing but will impact every build that a developer/user is going to do.
>
>>
>> Perhaps the sstate compression algorithm should be configurable so
>> that people low on disk space can opt into slower builds?
>>
>>
>>>>
>>>>> It therefore makes sense to switch
>>>>> to it for the sstate objects.
>>>>>
>>>>> As an example, the gcc-cross populate_sysroot object goes from
>>>>> 79,509,871 to 53,031,752 bytes which is a significant improvement.
>>>>>
>>>>> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
>>>>>
>>>>> diff --git a/meta/classes/buildhistory.bbclass b/meta/classes/buildhistory.bbclass
>>>>> index 4153e58..734303c 100644
>>>>> --- a/meta/classes/buildhistory.bbclass
>>>>> +++ b/meta/classes/buildhistory.bbclass
>>>>> @@ -537,7 +537,7 @@ python buildhistory_get_extra_sdkinfo() {
>>>>>        filesizes = {}
>>>>>        for root, _, files in os.walk('${SDK_OUTPUT}/${SDKPATH}/sstate-cache'):
>>>>>            for fn in files:
>>>>> -                if fn.endswith('.tgz'):
>>>>> +                if fn.endswith('.tar.xz'):
>>>>>                    fsize = int(math.ceil(float(os.path.getsize(os.path.join(root, fn))) / 1024))
>>>>>                    task = fn.rsplit(':', 1)[1].split('_', 1)[1].split('.')[0]
>>>>>                    origtotal = tasksizes.get(task, 0)
>>>>> diff --git a/meta/classes/populate_sdk_ext.bbclass b/meta/classes/populate_sdk_ext.bbclass
>>>>> index 3a65c07..4ff5e9e 100644
>>>>> --- a/meta/classes/populate_sdk_ext.bbclass
>>>>> +++ b/meta/classes/populate_sdk_ext.bbclass
>>>>> @@ -189,7 +189,7 @@ python copy_buildsystem () {
>>>>>    # We don't need sstate do_package files
>>>>>    for root, dirs, files in os.walk(sstate_out):
>>>>>        for name in files:
>>>>> -            if name.endswith("_package.tgz"):
>>>>> +            if name.endswith("_package.tar.xz"):
>>>>>                f = os.path.join(root, name)
>>>>>                os.remove(f)
>>>>> }
>>>>> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
>>>>> index 9bef212..d9adf01 100644
>>>>> --- a/meta/classes/sstate.bbclass
>>>>> +++ b/meta/classes/sstate.bbclass
>>>>> @@ -294,8 +294,8 @@ def sstate_installpkg(ss, d):
>>>>>        oe.path.remove(dir)
>>>>>
>>>>>    sstateinst = d.expand("${WORKDIR}/sstate-install-%s/" % ss['task'])
>>>>> -    sstatefetch = d.getVar('SSTATE_PKGNAME', True) + '_' + ss['task'] + ".tgz"
>>>>> -    sstatepkg = d.getVar('SSTATE_PKG', True) + '_' + ss['task'] + ".tgz"
>>>>> +    sstatefetch = d.getVar('SSTATE_PKGNAME', True) + '_' + ss['task'] + ".tar.xz"
>>>>> +    sstatepkg = d.getVar('SSTATE_PKG', True) + '_' + ss['task'] + ".tar.xz"
>>>>>
>>>>>    if not os.path.exists(sstatepkg):
>>>>>        pstaging_fetch(sstatefetch, sstatepkg, d)
>>>>> @@ -372,7 +372,7 @@ python sstate_hardcode_path_unpack () {
>>>>> def sstate_clean_cachefile(ss, d):
>>>>>    import oe.path
>>>>>
>>>>> -    sstatepkgfile = d.getVar('SSTATE_PATHSPEC', True) + "*_" + ss['task'] + ".tgz*"
>>>>> +    sstatepkgfile = d.getVar('SSTATE_PATHSPEC', True) + "*_" + ss['task'] + ".tar.xz*"
>>>>>    bb.note("Removing %s" % sstatepkgfile)
>>>>>    oe.path.remove(sstatepkgfile)
>>>>>
>>>>> @@ -555,7 +555,7 @@ def sstate_package(ss, d):
>>>>>    tmpdir = d.getVar('TMPDIR', True)
>>>>>
>>>>>    sstatebuild = d.expand("${WORKDIR}/sstate-build-%s/" % ss['task'])
>>>>> -    sstatepkg = d.getVar('SSTATE_PKG', True) + '_'+ ss['task'] + ".tgz"
>>>>> +    sstatepkg = d.getVar('SSTATE_PKG', True) + '_'+ ss['task'] + ".tar.xz"
>>>>>    bb.utils.remove(sstatebuild, recurse=True)
>>>>>    bb.utils.mkdirhier(sstatebuild)
>>>>>    bb.utils.mkdirhier(os.path.dirname(sstatepkg))
>>>>> @@ -677,14 +677,14 @@ sstate_create_package () {
>>>>>       # Need to handle empty directories
>>>>>       if [ "$(ls -A)" ]; then
>>>>>               set +e
>>>>> -               tar -czf $TFILE *
>>>>> +               tar -cJf $TFILE *
>>>>>               ret=$?
>>>>>               if [ $ret -ne 0 ] && [ $ret -ne 1 ]; then
>>>>>                       exit 1
>>>>>               fi
>>>>>               set -e
>>>>>       else
>>>>> -               tar -cz --file=$TFILE --files-from=/dev/null
>>>>> +               tar -cJ --file=$TFILE --files-from=/dev/null
>>>>>       fi
>>>>>       chmod 0664 $TFILE
>>>>>       mv -f $TFILE ${SSTATE_PKG}
>>>>> @@ -703,7 +703,7 @@ sstate_create_package () {
>>>>> # Will be run from within SSTATE_INSTDIR.
>>>>> #
>>>>> sstate_unpack_package () {
>>>>> -       tar -xmvzf ${SSTATE_PKG}
>>>>> +       tar -xmvJf ${SSTATE_PKG}
>>>>>       # Use "! -w ||" to return true for read only files
>>>>>       [ ! -w ${SSTATE_PKG} ] || touch --no-dereference ${SSTATE_PKG}
>>>>>       [ ! -w ${SSTATE_PKG}.sig ] || [ ! -e ${SSTATE_PKG}.sig ] || touch --no-dereference ${SSTATE_PKG}.sig
>>>>> @@ -716,7 +716,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
>>>>>
>>>>>    ret = []
>>>>>    missed = []
>>>>> -    extension = ".tgz"
>>>>> +    extension = ".tar.xz"
>>>>>    if siginfo:
>>>>>        extension = extension + ".siginfo"
>>>>>
>>>>> @@ -821,11 +821,11 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False):
>>>>>        evdata = {'missed': [], 'found': []};
>>>>>        for task in missed:
>>>>>            spec, extrapath, tname = getpathcomponents(task, d)
>>>>> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
>>>>> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tar.xz")
>>>>>            evdata['missed'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
>>>>>        for task in ret:
>>>>>            spec, extrapath, tname = getpathcomponents(task, d)
>>>>> -            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tgz")
>>>>> +            sstatefile = d.expand(extrapath + generate_sstatefn(spec, sq_hash[task], d) + "_" + tname + ".tar.xz")
>>>>>            evdata['found'].append( (sq_fn[task], sq_task[task], sq_hash[task], sstatefile ) )
>>>>>        bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)
>>>>>
>>>>> @@ -914,7 +914,7 @@ python sstate_eventhandler() {
>>>>>    d = e.data
>>>>>    # When we write an sstate package we rewrite the SSTATE_PKG
>>>>>    spkg = d.getVar('SSTATE_PKG', True)
>>>>> -    if not spkg.endswith(".tgz"):
>>>>> +    if not spkg.endswith(".tar.xz"):
>>>>>        taskname = d.getVar("BB_RUNTASK", True)[3:]
>>>>>        spec = d.getVar('SSTATE_PKGSPEC', True)
>>>>>        swspec = d.getVar('SSTATE_SWSPEC', True)
>>>>> @@ -922,7 +922,7 @@ python sstate_eventhandler() {
>>>>>            d.setVar("SSTATE_PKGSPEC", "${SSTATE_SWSPEC}")
>>>>>            d.setVar("SSTATE_EXTRAPATH", "")
>>>>>        sstatepkg = d.getVar('SSTATE_PKG', True)
>>>>> -        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tgz" ".siginfo", d)
>>>>> +        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tar.xz" ".siginfo", d)
>>>>> }
>>>>>
>>>>> SSTATE_PRUNE_OBSOLETEWORKDIR = "1"
>>>>> diff --git a/meta/lib/oeqa/selftest/signing.py b/meta/lib/oeqa/selftest/signing.py
>>>>> index c33662b..4d545ad 100644
>>>>> --- a/meta/lib/oeqa/selftest/signing.py
>>>>> +++ b/meta/lib/oeqa/selftest/signing.py
>>>>> @@ -111,13 +111,13 @@ class Signing(oeSelfTest):
>>>>>        bitbake('-c cleansstate %s' % test_recipe)
>>>>>        bitbake(test_recipe)
>>>>>
>>>>> -        recipe_sig = glob.glob(sstatedir + '/*/*:ed:*_package.tgz.sig')
>>>>> -        recipe_tgz = glob.glob(sstatedir + '/*/*:ed:*_package.tgz')
>>>>> +        recipe_sig = glob.glob(sstatedir + '/*/*:ed:*_package.tar.xz.sig')
>>>>> +        recipe_txz = glob.glob(sstatedir + '/*/*:ed:*_package.tar.xz')
>>>>>
>>>>>        self.assertEqual(len(recipe_sig), 1, 'Failed to find .sig file.')
>>>>> -        self.assertEqual(len(recipe_tgz), 1, 'Failed to find .tgz file.')
>>>>> +        self.assertEqual(len(recipe_txz), 1, 'Failed to find .tar.xz file.')
>>>>>
>>>>> -        ret = runCmd('gpg --homedir %s --verify %s %s' % (self.gpg_dir, recipe_sig[0], recipe_tgz[0]))
>>>>> +        ret = runCmd('gpg --homedir %s --verify %s %s' % (self.gpg_dir, recipe_sig[0], recipe_txz[0]))
>>>>>        # gpg: Signature made Thu 22 Oct 2015 01:45:09 PM EEST using RSA key ID 61EEFB30
>>>>>        # gpg: Good signature from "testuser (nocomment) <testuser@email.com>"
>>>>>        self.assertIn('gpg: Good signature from', ret.output, 'Package signed incorrectly.')
>>>>> diff --git a/meta/lib/oeqa/selftest/sstatetests.py b/meta/lib/oeqa/selftest/sstatetests.py
>>>>> index 512cb4f..73e5132 100644
>>>>> --- a/meta/lib/oeqa/selftest/sstatetests.py
>>>>> +++ b/meta/lib/oeqa/selftest/sstatetests.py
>>>>> @@ -55,15 +55,15 @@ class SStateTests(SStateBase):
>>>>>        bitbake(['-ccleansstate'] + targets)
>>>>>
>>>>>        bitbake(targets)
>>>>> -        tgz_created = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific, distro_nonspecific)
>>>>> -        self.assertTrue(tgz_created, msg="Could not find sstate .tgz files for: %s" % ', '.join(map(str, targets)))
>>>>> +        txz_created = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific, distro_nonspecific)
>>>>> +        self.assertTrue(txz_created, msg="Could not find sstate .tar.xz files for: %s" % ', '.join(map(str, targets)))
>>>>>
>>>>>        siginfo_created = self.search_sstate('|'.join(map(str, [s + '.*?\.siginfo$' for s in targets])), distro_specific, distro_nonspecific)
>>>>>        self.assertTrue(siginfo_created, msg="Could not find sstate .siginfo files for: %s" % ', '.join(map(str, targets)))
>>>>>
>>>>>        bitbake(['-ccleansstate'] + targets)
>>>>> -        tgz_removed = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific, distro_nonspecific)
>>>>> -        self.assertTrue(not tgz_removed, msg="do_cleansstate didn't remove .tgz sstate files for: %s" % ', '.join(map(str, targets)))
>>>>> +        txz_removed = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific, distro_nonspecific)
>>>>> +        self.assertTrue(not txz_removed, msg="do_cleansstate didn't remove .tar.xz sstate files for: %s" % ', '.join(map(str, targets)))
>>>>>
>>>>>    @testcase(977)
>>>>>    def test_cleansstate_task_distro_specific_nonspecific(self):
>>>>> @@ -87,8 +87,8 @@ class SStateTests(SStateBase):
>>>>>        bitbake(['-ccleansstate'] + targets)
>>>>>
>>>>>        bitbake(targets)
>>>>> -        self.assertTrue(self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=False, distro_nonspecific=True) == [], msg="Found distro non-specific sstate for: %s" % ', '.join(map(str, targets)))
>>>>> -        file_tracker_1 = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>>>> +        self.assertTrue(self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=False, distro_nonspecific=True) == [], msg="Found distro non-specific sstate for: %s" % ', '.join(map(str, targets)))
>>>>> +        file_tracker_1 = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>>>>        self.assertTrue(len(file_tracker_1) >= len(targets), msg = "Not all sstate files ware created for: %s" % ', '.join(map(str, targets)))
>>>>>
>>>>>        self.track_for_cleanup(self.distro_specific_sstate + "_old")
>>>>> @@ -97,7 +97,7 @@ class SStateTests(SStateBase):
>>>>>
>>>>>        bitbake(['-cclean'] + targets)
>>>>>        bitbake(targets)
>>>>> -        file_tracker_2 = self.search_sstate('|'.join(map(str, [s + '.*?\.tgz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>>>> +        file_tracker_2 = self.search_sstate('|'.join(map(str, [s + '.*?\.tar.xz$' for s in targets])), distro_specific=True, distro_nonspecific=False)
>>>>>        self.assertTrue(len(file_tracker_2) >= len(targets), msg = "Not all sstate files ware created for: %s" % ', '.join(map(str, targets)))
>>>>>
>>>>>        not_recreated = [x for x in file_tracker_1 if x not in file_tracker_2]
>>>>> @@ -146,18 +146,18 @@ class SStateTests(SStateBase):
>>>>>            if not sstate_arch in sstate_archs_list:
>>>>>                sstate_archs_list.append(sstate_arch)
>>>>>            if target_config[idx] == target_config[-1]:
>>>>> -                target_sstate_before_build = self.search_sstate(target + '.*?\.tgz$')
>>>>> +                target_sstate_before_build = self.search_sstate(target + '.*?\.tar.xz$')
>>>>>            bitbake("-cclean %s" % target)
>>>>>            result = bitbake(target, ignore_status=True)
>>>>>            if target_config[idx] == target_config[-1]:
>>>>> -                target_sstate_after_build = self.search_sstate(target + '.*?\.tgz$')
>>>>> +                target_sstate_after_build = self.search_sstate(target + '.*?\.tar.xz$')
>>>>>                expected_remaining_sstate += [x for x in target_sstate_after_build if x not in target_sstate_before_build if not any(pattern in x for pattern in ignore_patterns)]
>>>>>            self.remove_config(global_config[idx])
>>>>>            self.remove_recipeinc(target, target_config[idx])
>>>>>            self.assertEqual(result.status, 0, msg = "build of %s failed with %s" % (target, result.output))
>>>>>
>>>>>        runCmd("sstate-cache-management.sh -y --cache-dir=%s --remove-duplicated --extra-archs=%s" % (self.sstate_path, ','.join(map(str, sstate_archs_list))))
>>>>> -        actual_remaining_sstate = [x for x in self.search_sstate(target + '.*?\.tgz$') if not any(pattern in x for pattern in ignore_patterns)]
>>>>> +        actual_remaining_sstate = [x for x in self.search_sstate(target + '.*?\.tar.xz$') if not any(pattern in x for pattern in ignore_patterns)]
>>>>>
>>>>>        actual_not_expected = [x for x in actual_remaining_sstate if x not in expected_remaining_sstate]
>>>>>        self.assertFalse(actual_not_expected, msg="Files should have been removed but ware not: %s" % ', '.join(map(str, actual_not_expected)))
>>>>>
>>>>>
>>>>> --
>>>>> _______________________________________________
>>>>> Openembedded-core mailing list
>>>>> Openembedded-core@lists.openembedded.org
>>>>> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>>>> --
>>>> _______________________________________________
>>>> Openembedded-core mailing list
>>>> Openembedded-core@lists.openembedded.org
>>>> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>>>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC] sstate: Switch from tgz to tar.xz for sstate
  2016-01-11 20:00     ` Andre McCurdy
  2016-01-11 20:12       ` Khem Raj
@ 2016-01-11 22:32       ` Richard Purdie
  1 sibling, 0 replies; 7+ messages in thread
From: Richard Purdie @ 2016-01-11 22:32 UTC (permalink / raw)
  To: Andre McCurdy, Khem Raj; +Cc: openembedded-core

On Mon, 2016-01-11 at 12:00 -0800, Andre McCurdy wrote:
> On Mon, Jan 11, 2016 at 11:52 AM, Khem Raj <raj.khem@gmail.com>
> wrote:
> > 
> > > On Jan 11, 2016, at 11:05 AM, Andre McCurdy <armccurdy@gmail.com>
> > > wrote:
> > > 
> > > On Sat, Jan 9, 2016 at 8:42 AM, Richard Purdie
> > > <richard.purdie@linuxfoundation.org> wrote:
> > > > xz compresses with a better compression ratio than gz with
> > > > similar speed
> > > > for compression and decompression.
> > > 
> > > When you measured compression speed to be similar, was that with
> > > parallel compression? If so, with how many CPU cores?
> > > 
> > > A quick test of plain single threaded "tar -cz" -vs- "tar -cJ" on
> > > my
> > > laptop seems to indicate that xz is _significantly_ slower:
> > > 
> > > $ time tar -czf /tmp/jjj.tgz
> > > tmp/work/cortexa15hf-neon-rdk-linux-gnueabi/glibc/2.22-r0/git
> > > 
> > > real    0m4.708s
> > > user    0m4.682s
> > > sys    0m0.477s
> > > 
> > > $ time tar -cJf /tmp/jjj.tar.xz
> > > tmp/work/cortexa15hf-neon-rdk-linux-gnueabi/glibc/2.22-r0/git
> > > 
> > > real    0m56.491s
> > > user    0m56.489s
> > > sys    0m0.744s
> > 
> > 
> > on 8-core machine with pixz it is recovered a bit but still is slow
> > tried a small load
> > 
> > 
> > tar -cJf /tmp/xx.tar.xz   21.14s user 0.36s system 102% cpu 21.061
> > total
> > 
> > tar -czf /tmp/xx.tar.gz   2.35s user 0.19s system 109% cpu 2.320
> > total
> > 
> > tar -Ipixz -cf /tmp/xx.tar.xz   27.14s user 0.88s system 490% cpu
> > 5.708 total
> > 
> > When changing the compression level to -3 ( it gets a bit faster )
> > 
> > pixz -3 /tmp/xx.tar /tmp/xx.tar.xz  17.58s user 0.18s system 606%
> > cpu 2.927 total
> > 
> 
> For a fair comparison, we should probably be testing parallel gzip
> against parallel xz.
> 
> In general, I'm not really convinced about this change though. Disk
> space is cheap and always getting cheaper, but builds can never be
> fast enough. Is it really worthwhile to trade off build performance
> for a reduction in sstate disk usage?

I think I've been getting confused with the various comparisons I've
been doing recently and whilst my comment does stand for bzip2, it
doesn't stand for gz and I clearly got confused, sorry :(

Rather than my own benchmarks, 
http://tukaani.org/lzma/benchmarks.html tells the story, admittedly
from a while ago but the numbers are likely still representative of the
algorithms. http://catchchallenger.first-world.info//wiki/Quick_Benchma
rk:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO is a more recent
comparison which includes xz directly too. Note that size of the data
being compressed can make a big difference which is why I include the
first link.

Part of the reason for looking at this is less about the disk space in
a given build itself and more about the use of the sstate artefacts. In
usage modes like the extensible SDK, or even a public sstate mirror,
network transfer time is an issue and that corresponds to the size of
the ssate artefacts or the size of the SDK. Lower disk usage of builds
has often translated directly into better build speed too (less IO to
contend with).

> Perhaps the sstate compression algorithm should be configurable so
> that people low on disk space can opt into slower builds?

I've put off starting this discussion in the past as I am really not
sure that making this configurable is in our best interests. My worry
is we'd end up with people who want to do things like create tarballs
as the build proceeds and then out of band compress them so the
artefacts can change. People might also want to support sstate feeds
with multiple types of objects in them so rather than one url to check,
we have a list. This would complicate part of the system which I
believe wouldn't work well with such complications. It is all software
and we can in theory do anything, but should we?

The above all said, for performance what we really care about is wall
clock task speed. I suspect using any parallel algorithm will help
this. The question is when we make this switch, do we at the same time
optimise the space usage and non-core end user workflows a bit as well?
I tend to take this approach with parsing, when we have a speed gain, I
do occasionally trade off some of it for things like better debugging
or new features like having sstate checksums at all originally.

I'd also note that sstate also occupies a tricky part of the system. We
can comparatively easily switch to xz, but it does mean we
ASSUME_PROVIDED xz-native. If we want parallel comparison support, we
have more of a problem though as whilst gzip and xz are present on most
distros out the box, xz -T support (parallel threads) isn't as yet, nor
are pbzip2, pigz, pixz, or pxz. If we could depend on one as an install
prerequisite, great. If not, we need to teach sstate to start out with
"plain" compression, then switch when we've built the compressor. With
xz, -T will be available out the box by default when people move to
5.2.0, there are no such plans for gzip.

Where this leaves us, I don't know :/

Cheers,

Richard



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-01-11 22:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-09 16:42 [PATCH RFC] sstate: Switch from tgz to tar.xz for sstate Richard Purdie
2016-01-11 19:05 ` Andre McCurdy
2016-01-11 19:52   ` Khem Raj
2016-01-11 20:00     ` Andre McCurdy
2016-01-11 20:12       ` Khem Raj
2016-01-11 20:34         ` Andre McCurdy
2016-01-11 22:32       ` Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.