All of lore.kernel.org
 help / color / mirror / Atom feed
* Use of github /archive/ tarballs in SRC_URI
@ 2017-09-19  8:09 ` Burton, Ross
  0 siblings, 0 replies; 12+ messages in thread
From: Burton, Ross @ 2017-09-19  8:09 UTC (permalink / raw)
  To: OE-core, OpenEmbedded Devel List

[-- Attachment #1: Type: text/plain, Size: 2385 bytes --]

Hi,

Some people have insisted for a long time that the dynamically generated
/archive/ tarballs at github cannot be trusted for long-term stability, as
they are generated using git-archive.  Others said that git-archive takes
measures to ensure the files are identical, and basic testing does indeed
show that.  Personally I was on the fence: a dynamically generated tarball
that is cached *could* change, but the git-archive generates the same file
list and the tools are stable.

Then the tarball for one of Erlang's repositories changed, and was noticed
by the checksum in the recipe (thanks Gunnar Andersson for reporting
this).  The extracted contents are identical, but the tarball itself has
changed.  I'm presuming this is due to the old tarball expiring in their
cache, and a newly generated tarball using a later version of tar.

So we now have documented evidence that this can and does happen, and it's
quite frustrating when this happens.  So, I suggest that use of github.com
/archive/ URLs be considered a bad practise for the primary SRC_URI tarball.

I'm working my way through the recipes in oe-core that use /archive/ and
replacing them with either maintainer-generated tarballs or git clones, but
there are a number of recipes in meta-oe which could do with fixing:

$ git grep -l -E github\\.com/.*/archive/
meta-initramfs/recipes-devtools/grubby/grubby_8.40.bb
meta-multimedia/recipes-support/gst-instruments/gst-instruments_0.2.3.bb
meta-multimedia/recipes-support/libsrtp/libsrtp_1.5.2.bb
meta-networking/recipes-support/geoip/geoip-perl_1.50.bb
meta-oe/recipes-connectivity/libuv/libuv_1.11.0.bb
meta-oe/recipes-connectivity/libwebsockets/libwebsockets_2.1.0.bb
meta-oe/recipes-devtools/protobuf/protobuf-c_1.2.1.bb
meta-oe/recipes-devtools/python/python-cpuset_1.5.7.bb
meta-oe/recipes-extended/boinc/boinc-client_7.6.33.bb
meta-oe/recipes-graphics/libvncserver/libvncserver_0.9.11.bb
meta-oe/recipes-graphics/openjpeg/openjpeg_2.1.1.bb
meta-oe/recipes-kernel/crash/crash_7.1.9.bb
meta-oe/recipes-multimedia/mplayer/mpv_0.24.0.bb
meta-oe/recipes-support/hunspell/hunspell_1.6.1.bb
meta-oe/recipes-support/lm_sensors/lmsensors_3.4.0.bb
meta-oe/recipes-test/gtest/gtest_1.8.0.bb
meta-perl/recipes-perl/libmodule/libmodule-pluggable-perl_5.2.bb
meta-perl/recipes-perl/libmodule/libmodule-runtime-perl_0.015.bb

Ross

[-- Attachment #2: Type: text/html, Size: 3495 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Use of github /archive/ tarballs in SRC_URI
@ 2017-09-19  8:09 ` Burton, Ross
  0 siblings, 0 replies; 12+ messages in thread
From: Burton, Ross @ 2017-09-19  8:09 UTC (permalink / raw)
  To: OE-core, OpenEmbedded Devel List

Hi,

Some people have insisted for a long time that the dynamically generated
/archive/ tarballs at github cannot be trusted for long-term stability, as
they are generated using git-archive.  Others said that git-archive takes
measures to ensure the files are identical, and basic testing does indeed
show that.  Personally I was on the fence: a dynamically generated tarball
that is cached *could* change, but the git-archive generates the same file
list and the tools are stable.

Then the tarball for one of Erlang's repositories changed, and was noticed
by the checksum in the recipe (thanks Gunnar Andersson for reporting
this).  The extracted contents are identical, but the tarball itself has
changed.  I'm presuming this is due to the old tarball expiring in their
cache, and a newly generated tarball using a later version of tar.

So we now have documented evidence that this can and does happen, and it's
quite frustrating when this happens.  So, I suggest that use of github.com
/archive/ URLs be considered a bad practise for the primary SRC_URI tarball.

I'm working my way through the recipes in oe-core that use /archive/ and
replacing them with either maintainer-generated tarballs or git clones, but
there are a number of recipes in meta-oe which could do with fixing:

$ git grep -l -E github\\.com/.*/archive/
meta-initramfs/recipes-devtools/grubby/grubby_8.40.bb
meta-multimedia/recipes-support/gst-instruments/gst-instruments_0.2.3.bb
meta-multimedia/recipes-support/libsrtp/libsrtp_1.5.2.bb
meta-networking/recipes-support/geoip/geoip-perl_1.50.bb
meta-oe/recipes-connectivity/libuv/libuv_1.11.0.bb
meta-oe/recipes-connectivity/libwebsockets/libwebsockets_2.1.0.bb
meta-oe/recipes-devtools/protobuf/protobuf-c_1.2.1.bb
meta-oe/recipes-devtools/python/python-cpuset_1.5.7.bb
meta-oe/recipes-extended/boinc/boinc-client_7.6.33.bb
meta-oe/recipes-graphics/libvncserver/libvncserver_0.9.11.bb
meta-oe/recipes-graphics/openjpeg/openjpeg_2.1.1.bb
meta-oe/recipes-kernel/crash/crash_7.1.9.bb
meta-oe/recipes-multimedia/mplayer/mpv_0.24.0.bb
meta-oe/recipes-support/hunspell/hunspell_1.6.1.bb
meta-oe/recipes-support/lm_sensors/lmsensors_3.4.0.bb
meta-oe/recipes-test/gtest/gtest_1.8.0.bb
meta-perl/recipes-perl/libmodule/libmodule-pluggable-perl_5.2.bb
meta-perl/recipes-perl/libmodule/libmodule-runtime-perl_0.015.bb

Ross


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [oe] Use of github /archive/ tarballs in SRC_URI
  2017-09-19  8:09 ` Burton, Ross
@ 2017-09-19  8:32   ` Mikko.Rapeli
  -1 siblings, 0 replies; 12+ messages in thread
From: Mikko.Rapeli @ 2017-09-19  8:32 UTC (permalink / raw)
  To: ross.burton; +Cc: openembedded-devel, openembedded-core

Yes, we have been affected by this several times too.

In addition to fixing recipes, a sanity test to warn about the situation
would be nice to have so that the issue could be detected in other
meta layers too.

-Mikko

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Use of github /archive/ tarballs in SRC_URI
@ 2017-09-19  8:32   ` Mikko.Rapeli
  0 siblings, 0 replies; 12+ messages in thread
From: Mikko.Rapeli @ 2017-09-19  8:32 UTC (permalink / raw)
  To: ross.burton; +Cc: openembedded-devel, openembedded-core

Yes, we have been affected by this several times too.

In addition to fixing recipes, a sanity test to warn about the situation
would be nice to have so that the issue could be detected in other
meta layers too.

-Mikko

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [oe] Use of github /archive/ tarballs in SRC_URI
  2017-09-19  8:32   ` Mikko.Rapeli
@ 2017-09-19 10:57     ` Burton, Ross
  -1 siblings, 0 replies; 12+ messages in thread
From: Burton, Ross @ 2017-09-19 10:57 UTC (permalink / raw)
  To: Mikko Rapeli; +Cc: OpenEmbedded Devel List, OE-core

[-- Attachment #1: Type: text/plain, Size: 548 bytes --]

On 19 September 2017 at 09:32, <Mikko.Rapeli@bmw.de> wrote:

> Yes, we have been affected by this several times too.
>
> In addition to fixing recipes, a sanity test to warn about the situation
> would be nice to have so that the issue could be detected in other
> meta layers too.
>

So I have a class adding a few tests to package_qa, including this one:

https://github.com/rossburton/meta-ross/blob/0f788385dbe13914e9e07d382793c494c4927ef9/classes/insanitier.bbclass#L93

If I get a moment I'll move it to insane.bbclass.

Ross

[-- Attachment #2: Type: text/html, Size: 1107 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Use of github /archive/ tarballs in SRC_URI
@ 2017-09-19 10:57     ` Burton, Ross
  0 siblings, 0 replies; 12+ messages in thread
From: Burton, Ross @ 2017-09-19 10:57 UTC (permalink / raw)
  To: Mikko Rapeli; +Cc: OpenEmbedded Devel List, OE-core

On 19 September 2017 at 09:32, <Mikko.Rapeli@bmw.de> wrote:

> Yes, we have been affected by this several times too.
>
> In addition to fixing recipes, a sanity test to warn about the situation
> would be nice to have so that the issue could be detected in other
> meta layers too.
>

So I have a class adding a few tests to package_qa, including this one:

https://github.com/rossburton/meta-ross/blob/0f788385dbe13914e9e07d382793c494c4927ef9/classes/insanitier.bbclass#L93

If I get a moment I'll move it to insane.bbclass.

Ross


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [oe] Use of github /archive/ tarballs in SRC_URI
  2017-09-19  8:09 ` Burton, Ross
@ 2017-09-19 14:34   ` Carlos Alberto Lopez Perez
  -1 siblings, 0 replies; 12+ messages in thread
From: Carlos Alberto Lopez Perez @ 2017-09-19 14:34 UTC (permalink / raw)
  To: OE-core, OpenEmbedded Devel List


[-- Attachment #1.1: Type: text/plain, Size: 1300 bytes --]

On 19/09/17 10:09, Burton, Ross wrote:
> Then the tarball for one of Erlang's repositories changed, and was noticed
> by the checksum in the recipe (thanks Gunnar Andersson for reporting
> this).  The extracted contents are identical, but the tarball itself has
> changed.  I'm presuming this is due to the old tarball expiring in their
> cache, and a newly generated tarball using a later version of tar.

I don't think tar is the one to blame, but gzip.
I have just tested GNU tar (versions 1.29 and 1.26) and BSD tar (from
OpenBSD 5.8) and the 3 have produced identical archives (same md5sum)
when invoked like git archive does.

However, the .tar.gz file generated was different in the case of BSD.
It even had a different file size.

I bet that if you uncompress both files, the .tar will have the same
checksum in both cases.

I guess that a different version (or implementation) of gzip, or even
different local settings like forcing a more or less aggressive
compression can be the explanation here.

Maybe an idea is that OE could gain a feature to optionally do a
checksum of the .tar file once uncompressed that could be used as a
fallback second check if the first one fails?
If the first one fails but the second one passes a non-fatal WARN could
be issued.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 914 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Use of github /archive/ tarballs in SRC_URI
@ 2017-09-19 14:34   ` Carlos Alberto Lopez Perez
  0 siblings, 0 replies; 12+ messages in thread
From: Carlos Alberto Lopez Perez @ 2017-09-19 14:34 UTC (permalink / raw)
  To: OE-core, OpenEmbedded Devel List


[-- Attachment #1.1: Type: text/plain, Size: 1300 bytes --]

On 19/09/17 10:09, Burton, Ross wrote:
> Then the tarball for one of Erlang's repositories changed, and was noticed
> by the checksum in the recipe (thanks Gunnar Andersson for reporting
> this).  The extracted contents are identical, but the tarball itself has
> changed.  I'm presuming this is due to the old tarball expiring in their
> cache, and a newly generated tarball using a later version of tar.

I don't think tar is the one to blame, but gzip.
I have just tested GNU tar (versions 1.29 and 1.26) and BSD tar (from
OpenBSD 5.8) and the 3 have produced identical archives (same md5sum)
when invoked like git archive does.

However, the .tar.gz file generated was different in the case of BSD.
It even had a different file size.

I bet that if you uncompress both files, the .tar will have the same
checksum in both cases.

I guess that a different version (or implementation) of gzip, or even
different local settings like forcing a more or less aggressive
compression can be the explanation here.

Maybe an idea is that OE could gain a feature to optionally do a
checksum of the .tar file once uncompressed that could be used as a
fallback second check if the first one fails?
If the first one fails but the second one passes a non-fatal WARN could
be issued.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 914 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [oe] Use of github /archive/ tarballs in SRC_URI
  2017-09-19 14:34   ` Carlos Alberto Lopez Perez
@ 2017-09-19 15:55     ` Burton, Ross
  -1 siblings, 0 replies; 12+ messages in thread
From: Burton, Ross @ 2017-09-19 15:55 UTC (permalink / raw)
  To: OE-core, OpenEmbedded Devel List

[-- Attachment #1: Type: text/plain, Size: 1473 bytes --]

On 19 September 2017 at 15:34, Carlos Alberto Lopez Perez <clopez@igalia.com
> wrote:

> On 19/09/17 10:09, Burton, Ross wrote:
> > Then the tarball for one of Erlang's repositories changed, and was
> noticed
> > by the checksum in the recipe (thanks Gunnar Andersson for reporting
> > this).  The extracted contents are identical, but the tarball itself has
> > changed.  I'm presuming this is due to the old tarball expiring in their
> > cache, and a newly generated tarball using a later version of tar.
>
> I don't think tar is the one to blame, but gzip.
> I have just tested GNU tar (versions 1.29 and 1.26) and BSD tar (from
> OpenBSD 5.8) and the 3 have produced identical archives (same md5sum)
> when invoked like git archive does.
>
> However, the .tar.gz file generated was different in the case of BSD.
> It even had a different file size.
>
> I bet that if you uncompress both files, the .tar will have the same
> checksum in both cases.
>

In the situation that I was talking to Gunnar with, the source is the tar
itself.

https://paste.fedoraproject.org/paste/rP67aSPn1IuYbQcz2UNY3g

They have identical length
> $ wc -c new-OTP-18.2.3.tar
> 159068160 new-OTP-18.2.3.tar
> 159068160 orig-OTP-18.2.3.tar
> ...but different content
> $ cmp new-OTP-18.2.3.tar orig-OTP-18.2.3.tar
> $ new-OTP-18.2.3.tar orig-OTP-18.2.3.tar differ: byte 79122433, line
> 2004431
>

Basically, both tar and gzip can cause problems.

Ross

[-- Attachment #2: Type: text/html, Size: 2249 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [OE-core] Use of github /archive/ tarballs in SRC_URI
@ 2017-09-19 15:55     ` Burton, Ross
  0 siblings, 0 replies; 12+ messages in thread
From: Burton, Ross @ 2017-09-19 15:55 UTC (permalink / raw)
  To: OE-core, OpenEmbedded Devel List

On 19 September 2017 at 15:34, Carlos Alberto Lopez Perez <clopez@igalia.com
> wrote:

> On 19/09/17 10:09, Burton, Ross wrote:
> > Then the tarball for one of Erlang's repositories changed, and was
> noticed
> > by the checksum in the recipe (thanks Gunnar Andersson for reporting
> > this).  The extracted contents are identical, but the tarball itself has
> > changed.  I'm presuming this is due to the old tarball expiring in their
> > cache, and a newly generated tarball using a later version of tar.
>
> I don't think tar is the one to blame, but gzip.
> I have just tested GNU tar (versions 1.29 and 1.26) and BSD tar (from
> OpenBSD 5.8) and the 3 have produced identical archives (same md5sum)
> when invoked like git archive does.
>
> However, the .tar.gz file generated was different in the case of BSD.
> It even had a different file size.
>
> I bet that if you uncompress both files, the .tar will have the same
> checksum in both cases.
>

In the situation that I was talking to Gunnar with, the source is the tar
itself.

https://paste.fedoraproject.org/paste/rP67aSPn1IuYbQcz2UNY3g

They have identical length
> $ wc -c new-OTP-18.2.3.tar
> 159068160 new-OTP-18.2.3.tar
> 159068160 orig-OTP-18.2.3.tar
> ...but different content
> $ cmp new-OTP-18.2.3.tar orig-OTP-18.2.3.tar
> $ new-OTP-18.2.3.tar orig-OTP-18.2.3.tar differ: byte 79122433, line
> 2004431
>

Basically, both tar and gzip can cause problems.

Ross


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [oe] Use of github /archive/ tarballs in SRC_URI
       [not found]   ` <CAJTo0La5F6e8sADgUz7PQO9GzCDObGWdH7tQ7xD9wTaN74EDxw@mail.gmail.com>
@ 2017-09-19 17:02       ` Randy MacLeod
  0 siblings, 0 replies; 12+ messages in thread
From: Randy MacLeod @ 2017-09-19 17:02 UTC (permalink / raw)
  To: Burton, Ross, Patches and discussions about the oe-core layer,
	openembedded-devel

Add oe-core, oe-devel back after checking privately with Ross.

On 2017-09-19 12:38 PM, Burton, Ross wrote:
> On 19 September 2017 at 17:29, Randy MacLeod 
> <randy.macleod@windriver.com <mailto:randy.macleod@windriver.com>> wrote:
> 
>     On 2017-09-19 04:09 AM, Burton, Ross wrote:
> 
>         Then the tarball for one of Erlang's repositories changed, and
>         was noticed
>         by the checksum in the recipe (thanks Gunnar Andersson for reporting
>         this).  The extracted contents are identical, but the tarball
>         itself has
>         changed.  I'm presuming this is due to the old tarball expiring
>         in their
>         cache, and a newly generated tarball using a later version of tar.
> 
>         So we now have documented evidence that this can and does
>         happen, and it's
>         quite frustrating when this happens.  So, I suggest that use of
>         github.com <http://github.com>
>         /archive/  URLs be considered a bad practise for the primary
>         SRC_URI tarball.
> 
> 
>     Agreed but it's a shame to have to deal with this as users rather
>     than have github fix the problem.
> 
>     Could report a bug if it's not already known?
>     https://github.com/contact
> 
> 
> They know, and don't commit to the /archive/ tarballs being persistent.  
> They're for convenience and if the project owner wants a stable tarball 
> they should upload one (that's almost quoted verbatim from my mails with 
> github).
> 
> Last time I looked they were generated on demand using git-archive and 
> cached for "some time" on AWS.  If nobody access the tarball for a while 
> it will fall off AWS and then be re-generated.  The output from 
> git-archive will be identical but eg a new tar may write a different 
> byte stream for the same input.

Ugh, but I guess it makes sense from their point of view since
they're all about the git repos.

Thanks for the background info Ross,

-- 
# Randy MacLeod.  WR Linux
# Wind River an Intel Company


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Use of github /archive/ tarballs in SRC_URI
@ 2017-09-19 17:02       ` Randy MacLeod
  0 siblings, 0 replies; 12+ messages in thread
From: Randy MacLeod @ 2017-09-19 17:02 UTC (permalink / raw)
  To: Burton, Ross, Patches and discussions about the oe-core layer,
	openembedded-devel

Add oe-core, oe-devel back after checking privately with Ross.

On 2017-09-19 12:38 PM, Burton, Ross wrote:
> On 19 September 2017 at 17:29, Randy MacLeod 
> <randy.macleod@windriver.com <mailto:randy.macleod@windriver.com>> wrote:
> 
>     On 2017-09-19 04:09 AM, Burton, Ross wrote:
> 
>         Then the tarball for one of Erlang's repositories changed, and
>         was noticed
>         by the checksum in the recipe (thanks Gunnar Andersson for reporting
>         this).  The extracted contents are identical, but the tarball
>         itself has
>         changed.  I'm presuming this is due to the old tarball expiring
>         in their
>         cache, and a newly generated tarball using a later version of tar.
> 
>         So we now have documented evidence that this can and does
>         happen, and it's
>         quite frustrating when this happens.  So, I suggest that use of
>         github.com <http://github.com>
>         /archive/  URLs be considered a bad practise for the primary
>         SRC_URI tarball.
> 
> 
>     Agreed but it's a shame to have to deal with this as users rather
>     than have github fix the problem.
> 
>     Could report a bug if it's not already known?
>     https://github.com/contact
> 
> 
> They know, and don't commit to the /archive/ tarballs being persistent.  
> They're for convenience and if the project owner wants a stable tarball 
> they should upload one (that's almost quoted verbatim from my mails with 
> github).
> 
> Last time I looked they were generated on demand using git-archive and 
> cached for "some time" on AWS.  If nobody access the tarball for a while 
> it will fall off AWS and then be re-generated.  The output from 
> git-archive will be identical but eg a new tar may write a different 
> byte stream for the same input.

Ugh, but I guess it makes sense from their point of view since
they're all about the git repos.

Thanks for the background info Ross,

-- 
# Randy MacLeod.  WR Linux
# Wind River an Intel Company


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-09-19 17:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-19  8:09 Use of github /archive/ tarballs in SRC_URI Burton, Ross
2017-09-19  8:09 ` Burton, Ross
2017-09-19  8:32 ` [oe] " Mikko.Rapeli
2017-09-19  8:32   ` Mikko.Rapeli
2017-09-19 10:57   ` [oe] " Burton, Ross
2017-09-19 10:57     ` Burton, Ross
2017-09-19 14:34 ` [oe] " Carlos Alberto Lopez Perez
2017-09-19 14:34   ` Carlos Alberto Lopez Perez
2017-09-19 15:55   ` [oe] " Burton, Ross
2017-09-19 15:55     ` [OE-core] " Burton, Ross
     [not found] ` <efbc10e5-7596-1a3d-e7f8-526cedf20394@windriver.com>
     [not found]   ` <CAJTo0La5F6e8sADgUz7PQO9GzCDObGWdH7tQ7xD9wTaN74EDxw@mail.gmail.com>
2017-09-19 17:02     ` [oe] " Randy MacLeod
2017-09-19 17:02       ` Randy MacLeod

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.