All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sstate: Truncate PV in sstate filenames that are too long
@ 2019-07-30 11:01 Mike Crowe
  2019-07-30 13:25 ` Mark Hatle
  0 siblings, 1 reply; 6+ messages in thread
From: Mike Crowe @ 2019-07-30 11:01 UTC (permalink / raw)
  To: openembedded-core; +Cc: Mike Crowe

sstate filenames are generated by concatenating a variety of bits of
package metadata. Some of these parts could be long, which could cause
the filename to be longer than the 255 character maximum for ext4.

So, let's try to detect this situation and truncate the PV part of the
filename so that it will fit. If this happens, an ellipsis is added to
make it clear that the version number is incomplete.

SSTATE_PKG needs to be consistent for all tasks so that the hash
remains stable. This means that we need to make an assumption for the
maximum length of the task name. In this implementation, the task name
is limited to 27 characters.

This change also results in a sensible error message being emitted if
the resulting filename is still too long.

Signed-off-by: Mike Crowe <mac@mcrowe.com>

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 3342c5ef50..6313b1c538 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -8,6 +8,24 @@ def generate_sstatefn(spec, hash, d):
         hash = "INVALID"
     return hash[:2] + "/" + spec + hash
 
+def sstate_path(taskname, d):
+    max_filename_len = 245 # leave some room for ".siginfo"
+    max_addendum_len = 32 # '_' + taskname + '.tgz'
+    sstate_prefix = d.getVar('SSTATE_PKG')
+    excess = len(os.path.basename(sstate_prefix)) - (max_filename_len - max_addendum_len)
+    if excess > 0:
+        pv = d.getVar('PV')
+        if len(pv) >= excess and len(pv) >= 3:
+            short_pv = d.getVar('PV')[:-excess-3] + '...'
+            d2 = d.createCopy()
+            d2.setVar('PV', short_pv)
+            sstate_prefix = d2.getVar('SSTATE_PKG')
+
+    sstatepkg = sstate_prefix + '_'+ taskname + ".tgz"
+    if len(os.path.basename(sstatepkg)) > max_filename_len:
+        bb.error('Failed to shorten sstate filename')
+    return sstatepkg
+
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
@@ -323,7 +341,7 @@ def sstate_installpkg(ss, d):
 
     sstateinst = d.expand("${WORKDIR}/sstate-install-%s/" % ss['task'])
     sstatefetch = d.getVar('SSTATE_PKGNAME') + '_' + ss['task'] + ".tgz"
-    sstatepkg = d.getVar('SSTATE_PKG') + '_' + ss['task'] + ".tgz"
+    sstatepkg = sstate_path(ss['task'], d)
 
     if not os.path.exists(sstatepkg):
         pstaging_fetch(sstatefetch, d)
@@ -617,7 +635,7 @@ def sstate_package(ss, d):
     tmpdir = d.getVar('TMPDIR')
 
     sstatebuild = d.expand("${WORKDIR}/sstate-build-%s/" % ss['task'])
-    sstatepkg = d.getVar('SSTATE_PKG') + '_'+ ss['task'] + ".tgz"
+    sstatepkg = sstate_path(ss['task'], d)
     bb.utils.remove(sstatebuild, recurse=True)
     bb.utils.mkdirhier(sstatebuild)
     bb.utils.mkdirhier(os.path.dirname(sstatepkg))
@@ -1084,8 +1102,8 @@ python sstate_eventhandler() {
         if taskname in ["fetch", "unpack", "patch", "populate_lic", "preconfigure"] and swspec:
             d.setVar("SSTATE_PKGSPEC", "${SSTATE_SWSPEC}")
             d.setVar("SSTATE_EXTRAPATH", "")
-        sstatepkg = d.getVar('SSTATE_PKG')
-        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tgz" ".siginfo", d)
+        sstateinfo = sstate_path(taskname, d) + ".siginfo"
+        bb.siggen.dump_this_task(sstateinfo, d)
 }
 
 SSTATE_PRUNE_OBSOLETEWORKDIR ?= "1"
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] sstate: Truncate PV in sstate filenames that are too long
  2019-07-30 11:01 [PATCH] sstate: Truncate PV in sstate filenames that are too long Mike Crowe
@ 2019-07-30 13:25 ` Mark Hatle
  2019-07-30 13:49   ` Mike Crowe
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Hatle @ 2019-07-30 13:25 UTC (permalink / raw)
  To: Mike Crowe, openembedded-core

On 7/30/19 6:01 AM, Mike Crowe wrote:
> sstate filenames are generated by concatenating a variety of bits of
> package metadata. Some of these parts could be long, which could cause
> the filename to be longer than the 255 character maximum for ext4.
> 
> So, let's try to detect this situation and truncate the PV part of the
> filename so that it will fit. If this happens, an ellipsis is added to
> make it clear that the version number is incomplete.
> 
> SSTATE_PKG needs to be consistent for all tasks so that the hash
> remains stable. This means that we need to make an assumption for the
> maximum length of the task name. In this implementation, the task name
> is limited to 27 characters.
> 
> This change also results in a sensible error message being emitted if
> the resulting filename is still too long.
> 
> Signed-off-by: Mike Crowe <mac@mcrowe.com>
> 
> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> index 3342c5ef50..6313b1c538 100644
> --- a/meta/classes/sstate.bbclass
> +++ b/meta/classes/sstate.bbclass
> @@ -8,6 +8,24 @@ def generate_sstatefn(spec, hash, d):
>          hash = "INVALID"
>      return hash[:2] + "/" + spec + hash
>  
> +def sstate_path(taskname, d):
> +    max_filename_len = 245 # leave some room for ".siginfo"
> +    max_addendum_len = 32 # '_' + taskname + '.tgz'

Since the task name is variable, is there really a 32 character limit here?

It may make sense to do:

# '_' + taskname + '.tgz', reserving a minimum of 32 for taskname
max_addendum_len = len(taskname) + 5 if len(taskname) + 5 > 32 else 32

Always reserve a minimum of 32 for consistency, but if we go over account for it.

> +    sstate_prefix = d.getVar('SSTATE_PKG')
> +    excess = len(os.path.basename(sstate_prefix)) - (max_filename_len - max_addendum_len)
> +    if excess > 0:
> +        pv = d.getVar('PV')
> +        if len(pv) >= excess and len(pv) >= 3:
> +            short_pv = d.getVar('PV')[:-excess-3] + '...'

Is truncating the PV enough?  In a discussion on the bitbake list, I suggested
possibly changing the order of the entries in the SSTATE_PKGSPEC to allow us to
prune things prior to the hash w/o affecting the hash.  Maybe this is simply not
needed.. but it's a possibility if this proves to not be effective.

> +            d2 = d.createCopy()
> +            d2.setVar('PV', short_pv)
> +            sstate_prefix = d2.getVar('SSTATE_PKG')
> +
> +    sstatepkg = sstate_prefix + '_'+ taskname + ".tgz"
> +    if len(os.path.basename(sstatepkg)) > max_filename_len:
> +        bb.error('Failed to shorten sstate filename')
> +    return sstatepkg
> +
>  SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
>  SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
>  SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"

There is something else I noticed..  "SSTATE_PKGNAME" defined as:

SSTATE_PKGNAME    =
"${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'),
d.getVar('BB_UNIHASH'), d)}"

From what I can tell, this really should be using the new sstate_path function
in someway.

Would it make more sense to define SSTATE_PKGNAME in such a way that it always
resulted in something "short" enough, and in the right format, that it would
always work?

Adjusting or rewriting "generate_sstatefn" could still accomplish the PV change,
but the max length of the string would need further shrinking to accommodate an
unknown task length (which goes back to my previous comment).  If the 32 default
is long enough then that shouldn't be a problem -- and may also resolve my
concerns that something outside of sstate class could try to use that various
and without the new magic function get the wrong results.

Anyway, my immediate 2 cents...

--Mark

> @@ -323,7 +341,7 @@ def sstate_installpkg(ss, d):
>  
>      sstateinst = d.expand("${WORKDIR}/sstate-install-%s/" % ss['task'])
>      sstatefetch = d.getVar('SSTATE_PKGNAME') + '_' + ss['task'] + ".tgz"
> -    sstatepkg = d.getVar('SSTATE_PKG') + '_' + ss['task'] + ".tgz"
> +    sstatepkg = sstate_path(ss['task'], d)
>  
>      if not os.path.exists(sstatepkg):
>          pstaging_fetch(sstatefetch, d)
> @@ -617,7 +635,7 @@ def sstate_package(ss, d):
>      tmpdir = d.getVar('TMPDIR')
>  
>      sstatebuild = d.expand("${WORKDIR}/sstate-build-%s/" % ss['task'])
> -    sstatepkg = d.getVar('SSTATE_PKG') + '_'+ ss['task'] + ".tgz"
> +    sstatepkg = sstate_path(ss['task'], d)
>      bb.utils.remove(sstatebuild, recurse=True)
>      bb.utils.mkdirhier(sstatebuild)
>      bb.utils.mkdirhier(os.path.dirname(sstatepkg))
> @@ -1084,8 +1102,8 @@ python sstate_eventhandler() {
>          if taskname in ["fetch", "unpack", "patch", "populate_lic", "preconfigure"] and swspec:
>              d.setVar("SSTATE_PKGSPEC", "${SSTATE_SWSPEC}")
>              d.setVar("SSTATE_EXTRAPATH", "")
> -        sstatepkg = d.getVar('SSTATE_PKG')
> -        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tgz" ".siginfo", d)
> +        sstateinfo = sstate_path(taskname, d) + ".siginfo"
> +        bb.siggen.dump_this_task(sstateinfo, d)
>  }
>  
>  SSTATE_PRUNE_OBSOLETEWORKDIR ?= "1"
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sstate: Truncate PV in sstate filenames that are too long
  2019-07-30 13:25 ` Mark Hatle
@ 2019-07-30 13:49   ` Mike Crowe
  2019-07-30 14:14     ` Mark Hatle
  0 siblings, 1 reply; 6+ messages in thread
From: Mike Crowe @ 2019-07-30 13:49 UTC (permalink / raw)
  To: Mark Hatle; +Cc: openembedded-core

On Tuesday 30 July 2019 at 08:25:52 -0500, Mark Hatle wrote:
> On 7/30/19 6:01 AM, Mike Crowe wrote:
> > sstate filenames are generated by concatenating a variety of bits of
> > package metadata. Some of these parts could be long, which could cause
> > the filename to be longer than the 255 character maximum for ext4.
> > 
> > So, let's try to detect this situation and truncate the PV part of the
> > filename so that it will fit. If this happens, an ellipsis is added to
> > make it clear that the version number is incomplete.
> > 
> > SSTATE_PKG needs to be consistent for all tasks so that the hash
> > remains stable. This means that we need to make an assumption for the
> > maximum length of the task name. In this implementation, the task name
> > is limited to 27 characters.
> > 
> > This change also results in a sensible error message being emitted if
> > the resulting filename is still too long.
> > 
> > Signed-off-by: Mike Crowe <mac@mcrowe.com>
> > 
> > diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> > index 3342c5ef50..6313b1c538 100644
> > --- a/meta/classes/sstate.bbclass
> > +++ b/meta/classes/sstate.bbclass
> > @@ -8,6 +8,24 @@ def generate_sstatefn(spec, hash, d):
> >          hash = "INVALID"
> >      return hash[:2] + "/" + spec + hash
> >  
> > +def sstate_path(taskname, d):
> > +    max_filename_len = 245 # leave some room for ".siginfo"
> > +    max_addendum_len = 32 # '_' + taskname + '.tgz'
> 
> Since the task name is variable, is there really a 32 character limit here?
> 
> It may make sense to do:
> 
> # '_' + taskname + '.tgz', reserving a minimum of 32 for taskname
> max_addendum_len = len(taskname) + 5 if len(taskname) + 5 > 32 else 32
> 
> Always reserve a minimum of 32 for consistency, but if we go over account
> for it.

I think that would just cause task hash mismatches (see third paragraph of
commit message.)

It probably does make sense to detect such long task names in this
situation and generate errors though.

> > +    sstate_prefix = d.getVar('SSTATE_PKG')
> > +    excess = len(os.path.basename(sstate_prefix)) - (max_filename_len - max_addendum_len)
> > +    if excess > 0:
> > +        pv = d.getVar('PV')
> > +        if len(pv) >= excess and len(pv) >= 3:
> > +            short_pv = d.getVar('PV')[:-excess-3] + '...'
> 
> Is truncating the PV enough?  In a discussion on the bitbake list, I suggested
> possibly changing the order of the entries in the SSTATE_PKGSPEC to allow us to
> prune things prior to the hash w/o affecting the hash.  Maybe this is simply not
> needed.. but it's a possibility if this proves to not be effective.

Truncating PV solves the problem I was having. The other fields don't
really tend to be very long, so there's less to be gained by shortening
them.

> > +            d2 = d.createCopy()
> > +            d2.setVar('PV', short_pv)
> > +            sstate_prefix = d2.getVar('SSTATE_PKG')
> > +
> > +    sstatepkg = sstate_prefix + '_'+ taskname + ".tgz"
> > +    if len(os.path.basename(sstatepkg)) > max_filename_len:
> > +        bb.error('Failed to shorten sstate filename')
> > +    return sstatepkg
> > +
> >  SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
> >  SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
> >  SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
> 
> There is something else I noticed..  "SSTATE_PKGNAME" defined as:
> 
> SSTATE_PKGNAME    =
> "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'),
> d.getVar('BB_UNIHASH'), d)}"
> 
> From what I can tell, this really should be using the new sstate_path function
> in someway.

Hmm, you're right. I can't have covered that in my testing. :(

> Would it make more sense to define SSTATE_PKGNAME in such a way that it always
> resulted in something "short" enough, and in the right format, that it would
> always work?

I'd considered doing it that way. If SSTATE_OKGSPEC contained ${SSTATE_PV}
and that either had the value of ${PV} or the a truncated version of ${PV}
then the rest of the file could remain the same. However, truncating PV
without access to the rest of the spec would mean just picking some
arbitrary maximum PV length which is likely to be more conservative than
necessary.

> Adjusting or rewriting "generate_sstatefn" could still accomplish the PV change,
> but the max length of the string would need further shrinking to accommodate an
> unknown task length (which goes back to my previous comment).  If the 32 default
> is long enough then that shouldn't be a problem -- and may also resolve my
> concerns that something outside of sstate class could try to use that various
> and without the new magic function get the wrong results.

I wonder whether I can get away with applying per-task PV truncation in
generate_sstatefn without causing hash mismatches? That's worth a try.

Thanks for your comments. They've been very helpful.

Mike.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sstate: Truncate PV in sstate filenames that are too long
  2019-07-30 13:49   ` Mike Crowe
@ 2019-07-30 14:14     ` Mark Hatle
  2019-07-31 11:13       ` Mike Crowe
  2019-07-31 17:41       ` Mike Crowe
  0 siblings, 2 replies; 6+ messages in thread
From: Mark Hatle @ 2019-07-30 14:14 UTC (permalink / raw)
  To: Mike Crowe; +Cc: openembedded-core

On 7/30/19 8:49 AM, Mike Crowe wrote:
> On Tuesday 30 July 2019 at 08:25:52 -0500, Mark Hatle wrote:
>> On 7/30/19 6:01 AM, Mike Crowe wrote:
>>> sstate filenames are generated by concatenating a variety of bits of
>>> package metadata. Some of these parts could be long, which could cause
>>> the filename to be longer than the 255 character maximum for ext4.
>>>
>>> So, let's try to detect this situation and truncate the PV part of the
>>> filename so that it will fit. If this happens, an ellipsis is added to
>>> make it clear that the version number is incomplete.
>>>
>>> SSTATE_PKG needs to be consistent for all tasks so that the hash
>>> remains stable. This means that we need to make an assumption for the
>>> maximum length of the task name. In this implementation, the task name
>>> is limited to 27 characters.
>>>
>>> This change also results in a sensible error message being emitted if
>>> the resulting filename is still too long.
>>>
>>> Signed-off-by: Mike Crowe <mac@mcrowe.com>
>>>
>>> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
>>> index 3342c5ef50..6313b1c538 100644
>>> --- a/meta/classes/sstate.bbclass
>>> +++ b/meta/classes/sstate.bbclass
>>> @@ -8,6 +8,24 @@ def generate_sstatefn(spec, hash, d):
>>>          hash = "INVALID"
>>>      return hash[:2] + "/" + spec + hash
>>>  
>>> +def sstate_path(taskname, d):
>>> +    max_filename_len = 245 # leave some room for ".siginfo"
>>> +    max_addendum_len = 32 # '_' + taskname + '.tgz'
>>
>> Since the task name is variable, is there really a 32 character limit here?
>>
>> It may make sense to do:
>>
>> # '_' + taskname + '.tgz', reserving a minimum of 32 for taskname
>> max_addendum_len = len(taskname) + 5 if len(taskname) + 5 > 32 else 32
>>
>> Always reserve a minimum of 32 for consistency, but if we go over account
>> for it.
> 
> I think that would just cause task hash mismatches (see third paragraph of
> commit message.)
> 
> It probably does make sense to detect such long task names in this
> situation and generate errors though.
> 
>>> +    sstate_prefix = d.getVar('SSTATE_PKG')
>>> +    excess = len(os.path.basename(sstate_prefix)) - (max_filename_len - max_addendum_len)
>>> +    if excess > 0:
>>> +        pv = d.getVar('PV')
>>> +        if len(pv) >= excess and len(pv) >= 3:
>>> +            short_pv = d.getVar('PV')[:-excess-3] + '...'
>>
>> Is truncating the PV enough?  In a discussion on the bitbake list, I suggested
>> possibly changing the order of the entries in the SSTATE_PKGSPEC to allow us to
>> prune things prior to the hash w/o affecting the hash.  Maybe this is simply not
>> needed.. but it's a possibility if this proves to not be effective.
> 
> Truncating PV solves the problem I was having. The other fields don't
> really tend to be very long, so there's less to be gained by shortening
> them.
> 
>>> +            d2 = d.createCopy()
>>> +            d2.setVar('PV', short_pv)
>>> +            sstate_prefix = d2.getVar('SSTATE_PKG')
>>> +
>>> +    sstatepkg = sstate_prefix + '_'+ taskname + ".tgz"
>>> +    if len(os.path.basename(sstatepkg)) > max_filename_len:
>>> +        bb.error('Failed to shorten sstate filename')
>>> +    return sstatepkg
>>> +
>>>  SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
>>>  SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
>>>  SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
>>
>> There is something else I noticed..  "SSTATE_PKGNAME" defined as:
>>
>> SSTATE_PKGNAME    =
>> "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'),
>> d.getVar('BB_UNIHASH'), d)}"
>>
>> From what I can tell, this really should be using the new sstate_path function
>> in someway.
> 
> Hmm, you're right. I can't have covered that in my testing. :(
> 
>> Would it make more sense to define SSTATE_PKGNAME in such a way that it always
>> resulted in something "short" enough, and in the right format, that it would
>> always work?
> 
> I'd considered doing it that way. If SSTATE_OKGSPEC contained ${SSTATE_PV}
> and that either had the value of ${PV} or the a truncated version of ${PV}
> then the rest of the file could remain the same. However, truncating PV
> without access to the rest of the spec would mean just picking some
> arbitrary maximum PV length which is likely to be more conservative than
> necessary.
> 
>> Adjusting or rewriting "generate_sstatefn" could still accomplish the PV change,
>> but the max length of the string would need further shrinking to accommodate an
>> unknown task length (which goes back to my previous comment).  If the 32 default
>> is long enough then that shouldn't be a problem -- and may also resolve my
>> concerns that something outside of sstate class could try to use that various
>> and without the new magic function get the wrong results.
> 
> I wonder whether I can get away with applying per-task PV truncation in
> generate_sstatefn without causing hash mismatches? That's worth a try.

The only cause of a task hash mismatch (actual hash, not filename) would be in
SSTATE_PKGNAME is part of the hash itself.  I'd contend if it is, then we should
exclude it.

-All- of the components of the name itself are already included in the hash.  So
why should we (or the system) care that SSTATE_PKGNAME has a specific value or not?

The important items in the sstate hash name, as far as I'm concerned, are:

sstate (literal), sstate version, the hash, and the 'type'. (siginfo, tgz, etc)

Everything else there is for the user to be able to look at the filename and
determine what it is.  It could be used to avoid collisions if we have two items
with the same hash, but otherwise different contents -- but with sha256 this
should be almost impossible.

Richard mentioned in the bitbake-devel thread that there may be external tools
using some of the components.  I'm not sure how to even identify what those
tools are at this time, but a new sstate version entry may be enough to start to
deal with them.

--Mark

> Thanks for your comments. They've been very helpful.
> 
> Mike.
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sstate: Truncate PV in sstate filenames that are too long
  2019-07-30 14:14     ` Mark Hatle
@ 2019-07-31 11:13       ` Mike Crowe
  2019-07-31 17:41       ` Mike Crowe
  1 sibling, 0 replies; 6+ messages in thread
From: Mike Crowe @ 2019-07-31 11:13 UTC (permalink / raw)
  To: openembedded-core

On Tuesday 30 July 2019 at 09:14:01 -0500, Mark Hatle wrote:
> Richard mentioned in the bitbake-devel thread that there may be external tools
> using some of the components.  I'm not sure how to even identify what those
> tools are at this time, but a new sstate version entry may be enough to start to
> deal with them.

We have a couple of tools that know the format of sstate cache files:

The first tool[1] processes locked-sigs.inc to update the atime of any
files that would be necessary for the build. (We discovered that relying on
actually building to update atime was insufficient since it skipped sstate
files that were only necessary in order to build others.)

The second tool is simpler and just expires sstate files that haven't been
accessed in a while. We have a handful of different expiry policies by PN,
so we need to understand where that is in the filename. This tool needs to
cope with a cache populated by a mixture of oe-core versions, so any change
in format risks complicating it.

Mike.

[1] http://lists.openembedded.org/pipermail/openembedded-core/2016-April/120227.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] sstate: Truncate PV in sstate filenames that are too long
  2019-07-30 14:14     ` Mark Hatle
  2019-07-31 11:13       ` Mike Crowe
@ 2019-07-31 17:41       ` Mike Crowe
  1 sibling, 0 replies; 6+ messages in thread
From: Mike Crowe @ 2019-07-31 17:41 UTC (permalink / raw)
  To: Mark Hatle, openembedded-core

On Tuesday 30 July 2019 at 09:14:01 -0500, Mark Hatle wrote:
> On 7/30/19 8:49 AM, Mike Crowe wrote:
> > On Tuesday 30 July 2019 at 08:25:52 -0500, Mark Hatle wrote:
> >> On 7/30/19 6:01 AM, Mike Crowe wrote:
> >>> sstate filenames are generated by concatenating a variety of bits of
> >>> package metadata. Some of these parts could be long, which could cause
> >>> the filename to be longer than the 255 character maximum for ext4.
> >>>
> >>> So, let's try to detect this situation and truncate the PV part of the
> >>> filename so that it will fit. If this happens, an ellipsis is added to
> >>> make it clear that the version number is incomplete.
> >>>
> >>> SSTATE_PKG needs to be consistent for all tasks so that the hash
> >>> remains stable. This means that we need to make an assumption for the
> >>> maximum length of the task name. In this implementation, the task name
> >>> is limited to 27 characters.
> >>>
> >>> This change also results in a sensible error message being emitted if
> >>> the resulting filename is still too long.
> >>>
> >>> Signed-off-by: Mike Crowe <mac@mcrowe.com>
> >>>
> >>> diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
> >>> index 3342c5ef50..6313b1c538 100644
> >>> --- a/meta/classes/sstate.bbclass
> >>> +++ b/meta/classes/sstate.bbclass
> >>> @@ -8,6 +8,24 @@ def generate_sstatefn(spec, hash, d):
> >>>          hash = "INVALID"
> >>>      return hash[:2] + "/" + spec + hash
> >>>  
> >>> +def sstate_path(taskname, d):
> >>> +    max_filename_len = 245 # leave some room for ".siginfo"
> >>> +    max_addendum_len = 32 # '_' + taskname + '.tgz'
> >>
> >> Since the task name is variable, is there really a 32 character limit here?
> >>
> >> It may make sense to do:
> >>
> >> # '_' + taskname + '.tgz', reserving a minimum of 32 for taskname
> >> max_addendum_len = len(taskname) + 5 if len(taskname) + 5 > 32 else 32
> >>
> >> Always reserve a minimum of 32 for consistency, but if we go over account
> >> for it.
> > 
> > I think that would just cause task hash mismatches (see third paragraph of
> > commit message.)
> > 
> > It probably does make sense to detect such long task names in this
> > situation and generate errors though.
> > 
> >>> +    sstate_prefix = d.getVar('SSTATE_PKG')
> >>> +    excess = len(os.path.basename(sstate_prefix)) - (max_filename_len - max_addendum_len)
> >>> +    if excess > 0:
> >>> +        pv = d.getVar('PV')
> >>> +        if len(pv) >= excess and len(pv) >= 3:
> >>> +            short_pv = d.getVar('PV')[:-excess-3] + '...'
> >>
> >> Is truncating the PV enough?  In a discussion on the bitbake list, I suggested
> >> possibly changing the order of the entries in the SSTATE_PKGSPEC to allow us to
> >> prune things prior to the hash w/o affecting the hash.  Maybe this is simply not
> >> needed.. but it's a possibility if this proves to not be effective.
> > 
> > Truncating PV solves the problem I was having. The other fields don't
> > really tend to be very long, so there's less to be gained by shortening
> > them.

Here's a change that just always truncates PV. It has the benefit of being
simple but it will virtually always truncate too much and risks not
truncating enough when other parts of the path are too long.

However, if I can't get my head around enough of this to come up with an
acceptable solution, we'll probably end up just using this change locally:

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 3342c5ef50..e7003ea93f 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -8,17 +8,26 @@ def generate_sstatefn(spec, hash, d):
         hash = "INVALID"
     return hash[:2] + "/" + spec + hash

+def shortened_pv(d):
+    pv = d.getVar('PV')
+    if len(pv) <= 64:
+        return pv
+    else:
+        bb.note("PV shortened: " + pv)
+        return pv[:61] + '...'
+
+SSTATE_PV = "${@shortened_pv(d)}"
 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
-SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
-SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
+SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${SSTATE_PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
+SSTATE_SWSPEC     = "sstate:${PN}::${SSTATE_PV}:${PR}::${SSTATE_VERSION}:"
 SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
 SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
 SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/${SSTATE_PKGSPEC}"

-# explicitly make PV to depend on evaluated value of PV variable
-PV[vardepvalue] = "${PV}"
+# explicitly make SSTATE_PV to depend on evaluated value of PV variable
+SSTATE_PV[vardepvalue] = "${PV}"

 # We don't want the sstate to depend on things like the distro string
 # of the system, we let the sstate paths take care of this.

> >>> +            d2 = d.createCopy()
> >>> +            d2.setVar('PV', short_pv)
> >>> +            sstate_prefix = d2.getVar('SSTATE_PKG')
> >>> +
> >>> +    sstatepkg = sstate_prefix + '_'+ taskname + ".tgz"
> >>> +    if len(os.path.basename(sstatepkg)) > max_filename_len:
> >>> +        bb.error('Failed to shorten sstate filename')
> >>> +    return sstatepkg
> >>> +
> >>>  SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
> >>>  SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
> >>>  SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
> >>
> >> There is something else I noticed..  "SSTATE_PKGNAME" defined as:
> >>
> >> SSTATE_PKGNAME    =
> >> "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'),
> >> d.getVar('BB_UNIHASH'), d)}"
> >>
> >> From what I can tell, this really should be using the new sstate_path function
> >> in someway.
> > 
> > Hmm, you're right. I can't have covered that in my testing. :(
> > 
> >> Would it make more sense to define SSTATE_PKGNAME in such a way that it always
> >> resulted in something "short" enough, and in the right format, that it would
> >> always work?
> > 
> > I'd considered doing it that way. If SSTATE_OKGSPEC contained ${SSTATE_PV}
> > and that either had the value of ${PV} or the a truncated version of ${PV}
> > then the rest of the file could remain the same. However, truncating PV
> > without access to the rest of the spec would mean just picking some
> > arbitrary maximum PV length which is likely to be more conservative than
> > necessary.
> > 
> >> Adjusting or rewriting "generate_sstatefn" could still accomplish the PV change,
> >> but the max length of the string would need further shrinking to accommodate an
> >> unknown task length (which goes back to my previous comment).  If the 32 default
> >> is long enough then that shouldn't be a problem -- and may also resolve my
> >> concerns that something outside of sstate class could try to use that various
> >> and without the new magic function get the wrong results.
> > 
> > I wonder whether I can get away with applying per-task PV truncation in
> > generate_sstatefn without causing hash mismatches? That's worth a try.
> 
> The only cause of a task hash mismatch (actual hash, not filename) would be in
> SSTATE_PKGNAME is part of the hash itself.  I'd contend if it is, then we should
> exclude it.

I would have thought so to, but I'm unable to persuade myself that it is
definitely correct in all situations. If I have a fully built work tree,
and change SSTATE_PKGNAME then would I expect a subsequent build to add the
new filenames to the sstate cache?

Although I don't believe that I yet really understand sstate.bbclass, I've
done some more digging today to learn more. I thought I had a working
solution (including potentially different truncations for different task
names!) but then I discovered that nothing was ever successfully fetched
from the sstate cache. This turned out to be because I hadn't appreciated
the way that BB_HASHFILENAME was being used to transport variable values
into sstate_checkhashes, so I was ending up with default values for PN, PV
etc. :(

I can't see a way to make this work without transporting each of the
individual components of SSTATE_PKGSPEC and SSTATE_DIR, along with their
unexpanded versions through BB_HASHFILENAME. I can't tunnel the full
sstate filename because I don't know the task name at that point. :(

In case anyone cares, here's the patch that shows what I'm trying to do but
fails to work as described above:

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 3342c5ef50..bd2b76bab1 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -3,16 +3,40 @@ SSTATE_VERSION = "3"
 SSTATE_MANIFESTS ?= "${TMPDIR}/sstate-control"
 SSTATE_MANFILEPREFIX = "${SSTATE_MANIFESTS}/manifest-${SSTATE_MANMACH}-${PN}"

-def generate_sstatefn(spec, hash, d):
+# Generate an sstate filename that is guaranteed to not be longer than
+# the 255 character path component maximum for ext4. Returns a tuple
+# (sstate leaf filename, sstate full path)
+def generate_sstate_filename(d, taskname, hash = None, extension = '.tgz'):
+    # leave some room for '.siginfo' or '.XXXXXXXX' temporary filename suffix
+    max_filename_len = 246
+    if not hash:
+        hash = d.getVar('BB_UNIHASH')
     if not hash:
         hash = "INVALID"
-    return hash[:2] + "/" + spec + hash
+    sstate_prefix = d.getVar('SSTATE_PKGSPEC')
+    excess = len(sstate_prefix) + len(hash) + 1 + len(taskname) + len(extension) - max_filename_len
+    if excess > 0:
+        pv = d.getVar('PV')
+        if len(pv) >= excess and len(pv) >= 3:
+            short_pv = d.getVar('PV')[:-excess-3] + '...'
+            d2 = d.createCopy()
+            d2.setVar('PV', short_pv)
+            sstate_prefix = d2.getVar('SSTATE_PKGSPEC')
+
+    sstate_filename = sstate_prefix + hash + '_'+ taskname + extension
+    if len(sstate_filename) > max_filename_len:
+        bb.error('Failed to shorten sstate filename')
+
+    return (sstate_filename, d.getVar('SSTATE_DIR') + '/' + hash[:2] + '/' + sstate_filename)

 SSTATE_PKGARCH    = "${PACKAGE_ARCH}"
 SSTATE_PKGSPEC    = "sstate:${PN}:${PACKAGE_ARCH}${TARGET_VENDOR}-${TARGET_OS}:${PV}:${PR}:${SSTATE_PKGARCH}:${SSTATE_VERSION}:"
 SSTATE_SWSPEC     = "sstate:${PN}::${PV}:${PR}::${SSTATE_VERSION}:"
-SSTATE_PKGNAME    = "${SSTATE_EXTRAPATH}${@generate_sstatefn(d.getVar('SSTATE_PKGSPEC'), d.getVar('BB_UNIHASH'), d)}"
-SSTATE_PKG        = "${SSTATE_DIR}/${SSTATE_PKGNAME}"
+
+# Assigned in sstate_installpkg and sstate_package before running
+# tasks. Should not be used beforehand
+SSTATE_PKG        = "SSTATE_PKG_NOT_YET_ASSIGNED"
+
 SSTATE_EXTRAPATH   = ""
 SSTATE_EXTRAPATHWILDCARD = ""
 SSTATE_PATHSPEC   = "${SSTATE_DIR}/${SSTATE_EXTRAPATHWILDCARD}*/${SSTATE_PKGSPEC}"
@@ -322,9 +346,7 @@ def sstate_installpkg(ss, d):
     from oe.gpg_sign import get_signer

     sstateinst = d.expand("${WORKDIR}/sstate-install-%s/" % ss['task'])
-    sstatefetch = d.getVar('SSTATE_PKGNAME') + '_' + ss['task'] + ".tgz"
-    sstatepkg = d.getVar('SSTATE_PKG') + '_' + ss['task'] + ".tgz"
-
+    (sstatefetch, sstatepkg) = generate_sstate_filename(d, ss['task'])
     if not os.path.exists(sstatepkg):
         pstaging_fetch(sstatefetch, d)

@@ -617,7 +639,7 @@ def sstate_package(ss, d):
     tmpdir = d.getVar('TMPDIR')

     sstatebuild = d.expand("${WORKDIR}/sstate-build-%s/" % ss['task'])
-    sstatepkg = d.getVar('SSTATE_PKG') + '_'+ ss['task'] + ".tgz"
+    sstatepkg = generate_sstate_filename(d, ss['task'])[1]
     bb.utils.remove(sstatebuild, recurse=True)
     bb.utils.mkdirhier(sstatebuild)
     bb.utils.mkdirhier(os.path.dirname(sstatepkg))
@@ -813,9 +835,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,

     ret = []
     missed = []
-    extension = ".tgz"
-    if siginfo:
-        extension = extension + ".siginfo"
+    maybe_siginfo = '.siginfo' if siginfo else ''

     def gethash(task):
         if sq_unihash is not None:
@@ -844,7 +864,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,

         spec, extrapath, tname = getpathcomponents(task, d)

-        sstatefile = d.expand("${SSTATE_DIR}/" + extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + extension)
+        sstatefile = generate_sstate_filename(d, tname, gethash(task))[1] + maybe_siginfo

         if os.path.exists(sstatefile):
             bb.debug(2, "SState: Found valid sstate file %s" % sstatefile)
@@ -907,7 +927,7 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
             if task in ret:
                 continue
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + extension)
+            sstatefile = generate_sstate_filename(d, tname, gethash(task))[1] + maybe_siginfo
             tasklist.append((task, sstatefile))

         if tasklist:
@@ -937,11 +957,11 @@ def sstate_checkhashes(sq_fn, sq_task, sq_hash, sq_hashfn, d, siginfo=False, *,
         evdata = {'missed': [], 'found': []};
         for task in missed:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
+            sstatefile = generate_sstate_filename(d, tname, gethash(task))
             evdata['missed'].append( (sq_fn[task], sq_task[task], gethash(task), sstatefile ) )
         for task in ret:
             spec, extrapath, tname = getpathcomponents(task, d)
-            sstatefile = d.expand(extrapath + generate_sstatefn(spec, gethash(task), d) + "_" + tname + ".tgz")
+            sstatefile = generate_sstate_filename(d, tname, gethash(task))
             evdata['found'].append( (sq_fn[task], sq_task[task], gethash(task), sstatefile ) )
         bb.event.fire(bb.event.MetadataEvent("MissedSstate", evdata), d)

@@ -1084,8 +1104,8 @@ python sstate_eventhandler() {
         if taskname in ["fetch", "unpack", "patch", "populate_lic", "preconfigure"] and swspec:
             d.setVar("SSTATE_PKGSPEC", "${SSTATE_SWSPEC}")
             d.setVar("SSTATE_EXTRAPATH", "")
-        sstatepkg = d.getVar('SSTATE_PKG')
-        bb.siggen.dump_this_task(sstatepkg + '_' + taskname + ".tgz" ".siginfo", d)
+        sstateinfo = generate_sstate_filename(d, taskname)[1] + ".siginfo"
+        bb.siggen.dump_this_task(sstateinfo, d)
 }

 SSTATE_PRUNE_OBSOLETEWORKDIR ?= "1"

Thanks.

Mike.


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-07-31 17:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30 11:01 [PATCH] sstate: Truncate PV in sstate filenames that are too long Mike Crowe
2019-07-30 13:25 ` Mark Hatle
2019-07-30 13:49   ` Mike Crowe
2019-07-30 14:14     ` Mark Hatle
2019-07-31 11:13       ` Mike Crowe
2019-07-31 17:41       ` Mike Crowe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.