* [Buildroot] [PATCH 1/1] support/download/git: Prioritize remote archive @ 2016-08-17 19:10 Benjamin Kamath 2016-08-17 20:39 ` Thomas Petazzoni 2016-08-17 21:03 ` Yann E. MORIN 0 siblings, 2 replies; 10+ messages in thread From: Benjamin Kamath @ 2016-08-17 19:10 UTC (permalink / raw) To: buildroot Attempt to do a remote archive since it shortcuts us past a few steps when available. Additionally. if the git server has uploadArchive.allowUnreachable set to true, then this method can also work on arbitrary sha1s, offering a huge speed advantage over a full clone. Signed-off-by: Benjamin Kamath <kamath.ben@gmail.com> --- support/download/git | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/support/download/git b/support/download/git index 416cd1b..043a6de 100755 --- a/support/download/git +++ b/support/download/git @@ -36,6 +36,15 @@ _git() { eval ${GIT} "${@}" } +# Try a remote archive, since it is as fast as a shallow clone and can give us +# an archive directly. Also, if uploadArchive.allowUnreachable is set to true +# on the remote, this will also work for arbitrary sha1s, and will offer a +# considerable speedup over a full clone. +printf "Doing remote archive\n" +if _git archive --format=tar.gz --prefix=${basename}/ --remote=${repo} -o ${output} ${cset} 2>&1; then + exit 0 +fi + # Try a shallow clone, since it is faster than a full clone - but that only # works if the version is a ref (tag or branch). Before trying to do a shallow # clone we check if ${cset} is in the list provided by git ls-remote. If not -- 2.7.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH 1/1] support/download/git: Prioritize remote archive 2016-08-17 19:10 [Buildroot] [PATCH 1/1] support/download/git: Prioritize remote archive Benjamin Kamath @ 2016-08-17 20:39 ` Thomas Petazzoni 2016-08-17 21:06 ` Benjamin Kamath 2016-08-17 21:03 ` Yann E. MORIN 1 sibling, 1 reply; 10+ messages in thread From: Thomas Petazzoni @ 2016-08-17 20:39 UTC (permalink / raw) To: buildroot Hello, On Wed, 17 Aug 2016 12:10:16 -0700, Benjamin Kamath wrote: > Attempt to do a remote archive since it shortcuts us past a few steps when > available. Additionally. if the git server has uploadArchive.allowUnreachable > set to true, then this method can also work on arbitrary sha1s, offering a huge > speed advantage over a full clone. > > Signed-off-by: Benjamin Kamath <kamath.ben@gmail.com> > --- > support/download/git | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/support/download/git b/support/download/git > index 416cd1b..043a6de 100755 > --- a/support/download/git > +++ b/support/download/git > @@ -36,6 +36,15 @@ _git() { > eval ${GIT} "${@}" > } > > +# Try a remote archive, since it is as fast as a shallow clone and can give us > +# an archive directly. Also, if uploadArchive.allowUnreachable is set to true > +# on the remote, this will also work for arbitrary sha1s, and will offer a > +# considerable speedup over a full clone. > +printf "Doing remote archive\n" > +if _git archive --format=tar.gz --prefix=${basename}/ --remote=${repo} -o ${output} ${cset} 2>&1; then > + exit 0 > +fi Are the tarballs produced using this method reproducible? We need the tarballs produced here to be reproducible so that we can store hashes for them in the package .hash file. Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH 1/1] support/download/git: Prioritize remote archive 2016-08-17 20:39 ` Thomas Petazzoni @ 2016-08-17 21:06 ` Benjamin Kamath 0 siblings, 0 replies; 10+ messages in thread From: Benjamin Kamath @ 2016-08-17 21:06 UTC (permalink / raw) To: buildroot On Wed, Aug 17, 2016 at 1:39 PM, Thomas Petazzoni <thomas.petazzoni@free-electrons.com> wrote: > > Hello, > > > Are the tarballs produced using this method reproducible? We need the > tarballs produced here to be reproducible so that we can store hashes > for them in the package .hash file. > I've investigated this a little bit, and it does seem to create reproducible tarballs. In my testing, the archive's produced via git archive --remote=... actually match checksums with those produced via shallow clone, delete .git, tar + gzip. I'd possibly need to investigate the git source a bit more. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH 1/1] support/download/git: Prioritize remote archive 2016-08-17 19:10 [Buildroot] [PATCH 1/1] support/download/git: Prioritize remote archive Benjamin Kamath 2016-08-17 20:39 ` Thomas Petazzoni @ 2016-08-17 21:03 ` Yann E. MORIN 2016-08-17 21:13 ` Benjamin Kamath 2016-08-22 19:53 ` Peter Korsgaard 1 sibling, 2 replies; 10+ messages in thread From: Yann E. MORIN @ 2016-08-17 21:03 UTC (permalink / raw) To: buildroot Benjamin, All, On 2016-08-17 12:10 -0700, Benjamin Kamath spake thusly: > Attempt to do a remote archive since it shortcuts us past a few steps when > available. Additionally. if the git server has uploadArchive.allowUnreachable > set to true, then this method can also work on arbitrary sha1s, offering a huge > speed advantage over a full clone. > > Signed-off-by: Benjamin Kamath <kamath.ben@gmail.com> > --- > support/download/git | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/support/download/git b/support/download/git > index 416cd1b..043a6de 100755 > --- a/support/download/git > +++ b/support/download/git > @@ -36,6 +36,15 @@ _git() { > eval ${GIT} "${@}" > } > > +# Try a remote archive, since it is as fast as a shallow clone and can give us > +# an archive directly. Also, if uploadArchive.allowUnreachable is set to true > +# on the remote, this will also work for arbitrary sha1s, and will offer a > +# considerable speedup over a full clone. > +printf "Doing remote archive\n" > +if _git archive --format=tar.gz --prefix=${basename}/ --remote=${repo} -o ${output} ${cset} 2>&1; then > + exit 0 > +fi NAK in the state. If the package needs submodules, we can't ask the remote to generate the archive for us, because git-archive does not know how to include submodules. So, maybe this would work: if [ ${recurse} -eq 0 ]; then if _git blabla remote archive; then exit 0 fi fi Also, as stated by Thomas, we want to generate reproducible archives, so that we can check the hashes of archives. We go at great length to generate such archives locally, but I don't see a guarantee that the remote archive would be reproducible. Regards, Yann E. MORIN. > # Try a shallow clone, since it is faster than a full clone - but that only > # works if the version is a ref (tag or branch). Before trying to do a shallow > # clone we check if ${cset} is in the list provided by git ls-remote. If not > -- > 2.7.4 > > _______________________________________________ > buildroot mailing list > buildroot at busybox.net > http://lists.busybox.net/mailman/listinfo/buildroot -- .-----------------.--------------------.------------------.--------------------. | Yann E. MORIN | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: | | +33 662 376 056 | Software Designer | \ / CAMPAIGN | ___ | | +33 223 225 172 `------------.-------: X AGAINST | \e/ There is no | | http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL | v conspiracy. | '------------------------------^-------^------------------^--------------------' ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH 1/1] support/download/git: Prioritize remote archive 2016-08-17 21:03 ` Yann E. MORIN @ 2016-08-17 21:13 ` Benjamin Kamath 2016-08-17 21:31 ` Yann E. MORIN 2016-08-22 19:53 ` Peter Korsgaard 1 sibling, 1 reply; 10+ messages in thread From: Benjamin Kamath @ 2016-08-17 21:13 UTC (permalink / raw) To: buildroot On Wed, Aug 17, 2016 at 2:03 PM, Yann E. MORIN <yann.morin.1998@free.fr> wrote: > Benjamin, All, > >> >> +# Try a remote archive, since it is as fast as a shallow clone and can give us >> +# an archive directly. Also, if uploadArchive.allowUnreachable is set to true >> +# on the remote, this will also work for arbitrary sha1s, and will offer a >> +# considerable speedup over a full clone. >> +printf "Doing remote archive\n" >> +if _git archive --format=tar.gz --prefix=${basename}/ --remote=${repo} -o ${output} ${cset} 2>&1; then >> + exit 0 >> +fi > > NAK in the state. Is this related to the following paragraph or a separate issue? > > If the package needs submodules, we can't ask the remote to generate > the archive for us, because git-archive does not know how to include > submodules. > > So, maybe this would work: > > if [ ${recurse} -eq 0 ]; then > if _git blabla remote archive; then > exit 0 > fi > fi Indeed, I hadn't thought about submodules. I think your suggestion would be sufficient. After all, it should fall back to the older behavior upon failure. > > Also, as stated by Thomas, we want to generate reproducible archives, so > that we can check the hashes of archives. We go at great length to > generate such archives locally, but I don't see a guarantee that the > remote archive would be reproducible. I'm quite certain the archive is reproducible but this requires a bit more investigation to prove. > > Regards, > Yann E. MORIN. > >> # Try a shallow clone, since it is faster than a full clone - but that only >> # works if the version is a ref (tag or branch). Before trying to do a shallow >> # clone we check if ${cset} is in the list provided by git ls-remote. If not >> -- >> 2.7.4 >> >> _______________________________________________ >> buildroot mailing list >> buildroot at busybox.net >> http://lists.busybox.net/mailman/listinfo/buildroot > > -- > .-----------------.--------------------.------------------.--------------------. > | Yann E. MORIN | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: | > | +33 662 376 056 | Software Designer | \ / CAMPAIGN | ___ | > | +33 223 225 172 `------------.-------: X AGAINST | \e/ There is no | > | http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL | v conspiracy. | > '------------------------------^-------^------------------^--------------------' ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH 1/1] support/download/git: Prioritize remote archive 2016-08-17 21:13 ` Benjamin Kamath @ 2016-08-17 21:31 ` Yann E. MORIN 2016-08-17 21:54 ` Thomas Petazzoni 0 siblings, 1 reply; 10+ messages in thread From: Yann E. MORIN @ 2016-08-17 21:31 UTC (permalink / raw) To: buildroot Benjamin, All, On 2016-08-17 14:13 -0700, Benjamin Kamath spake thusly: > On Wed, Aug 17, 2016 at 2:03 PM, Yann E. MORIN <yann.morin.1998@free.fr> wrote: > > Benjamin, All, > > > >> > >> +# Try a remote archive, since it is as fast as a shallow clone and can give us > >> +# an archive directly. Also, if uploadArchive.allowUnreachable is set to true > >> +# on the remote, this will also work for arbitrary sha1s, and will offer a > >> +# considerable speedup over a full clone. > >> +printf "Doing remote archive\n" > >> +if _git archive --format=tar.gz --prefix=${basename}/ --remote=${repo} -o ${output} ${cset} 2>&1; then > >> + exit 0 > >> +fi > > > > NAK in the state. > Is this related to the following paragraph or a separate issue? It's "NAK in the state" because of what I explained below. I'm OK for this feature if: - the submodule support is handled (at least as I suggest), - the reproducibility of archives is guaranteed. > > If the package needs submodules, we can't ask the remote to generate > > the archive for us, because git-archive does not know how to include > > submodules. > > > > So, maybe this would work: > > > > if [ ${recurse} -eq 0 ]; then > > if _git blabla remote archive; then > > exit 0 > > fi > > fi > Indeed, I hadn't thought about submodules. I think your suggestion > would be sufficient. After all, > it should fall back to the older behavior upon failure. > > > > > Also, as stated by Thomas, we want to generate reproducible archives, so > > that we can check the hashes of archives. We go at great length to > > generate such archives locally, but I don't see a guarantee that the > > remote archive would be reproducible. > > I'm quite certain the archive is reproducible but this requires a bit > more investigation > to prove. Well, I had a wquick look at archive.c in the git git tree (weird to write that!), and I can neither conclusively state that they are not that are not... :-/ There does not seem to be any call to sort() in there, not are they setting LC_COLLATE anywhere. However, I've tried to generate two archives (locally) with different collating rules (en_US.UTF-8 which does not differentiate between upper and lower case, and C which does) and the two archive had the same sha1. Inspecting the archives in both cases shows that the collating seems to always be C, with Uppercase always before lowercase, with .files before non-dot files, and so on... So, I think it is safe to assume that git-archives always generates reproducible archive. There. Solved that one for you! ;-) Regards, Yann E. MORIN. -- .-----------------.--------------------.------------------.--------------------. | Yann E. MORIN | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: | | +33 662 376 056 | Software Designer | \ / CAMPAIGN | ___ | | +33 223 225 172 `------------.-------: X AGAINST | \e/ There is no | | http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL | v conspiracy. | '------------------------------^-------^------------------^--------------------' ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH 1/1] support/download/git: Prioritize remote archive 2016-08-17 21:31 ` Yann E. MORIN @ 2016-08-17 21:54 ` Thomas Petazzoni 2016-08-17 22:00 ` Yann E. MORIN 0 siblings, 1 reply; 10+ messages in thread From: Thomas Petazzoni @ 2016-08-17 21:54 UTC (permalink / raw) To: buildroot Hello, On Wed, 17 Aug 2016 23:31:02 +0200, Yann E. MORIN wrote: > So, I think it is safe to assume that git-archives always generates > reproducible archive. So the only remaining reason to not use git archive all the time is to support submodules? Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH 1/1] support/download/git: Prioritize remote archive 2016-08-17 21:54 ` Thomas Petazzoni @ 2016-08-17 22:00 ` Yann E. MORIN 0 siblings, 0 replies; 10+ messages in thread From: Yann E. MORIN @ 2016-08-17 22:00 UTC (permalink / raw) To: buildroot On 2016-08-17 23:54 +0200, Thomas Petazzoni spake thusly: > Hello, > > On Wed, 17 Aug 2016 23:31:02 +0200, Yann E. MORIN wrote: > > > So, I think it is safe to assume that git-archives always generates > > reproducible archive. > > So the only remaining reason to not use git archive all the time is to > support submodules? Yes. When doing submodules, I pondered doing two code paths: one for non-submodules, that would use git-archive, and one where we would do it all manually as we do today. But then I concluded that it was better to have a single code path. Note that, before submodules, we were happily using git-archive locally, not even forcing the collating rules. Regards, Yann E. MORIN. -- .-----------------.--------------------.------------------.--------------------. | Yann E. MORIN | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: | | +33 662 376 056 | Software Designer | \ / CAMPAIGN | ___ | | +33 223 225 172 `------------.-------: X AGAINST | \e/ There is no | | http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL | v conspiracy. | '------------------------------^-------^------------------^--------------------' ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH 1/1] support/download/git: Prioritize remote archive 2016-08-17 21:03 ` Yann E. MORIN 2016-08-17 21:13 ` Benjamin Kamath @ 2016-08-22 19:53 ` Peter Korsgaard 2016-08-22 20:55 ` Yann E. MORIN 1 sibling, 1 reply; 10+ messages in thread From: Peter Korsgaard @ 2016-08-22 19:53 UTC (permalink / raw) To: buildroot >>>>> "Yann" == Yann E MORIN <yann.morin.1998@free.fr> writes: Hi, > NAK in the state. > If the package needs submodules, we can't ask the remote to generate > the archive for us, because git-archive does not know how to include > submodules. > So, maybe this would work: > if [ ${recurse} -eq 0 ]; then > if _git blabla remote archive; then > exit 0 > fi > fi Or alternatively, we look at the alternative approach for handling submodules - E.G. splicing git archive outputs. > Also, as stated by Thomas, we want to generate reproducible archives, so > that we can check the hashes of archives. We go at great length to > generate such archives locally, but I don't see a guarantee that the > remote archive would be reproducible. Normal 'git archive' output should be reproducable, E.G. that is what we used until recently. -- Bye, Peter Korsgaard ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Buildroot] [PATCH 1/1] support/download/git: Prioritize remote archive 2016-08-22 19:53 ` Peter Korsgaard @ 2016-08-22 20:55 ` Yann E. MORIN 0 siblings, 0 replies; 10+ messages in thread From: Yann E. MORIN @ 2016-08-22 20:55 UTC (permalink / raw) To: buildroot Peter, All, On 2016-08-22 21:53 +0200, Peter Korsgaard spake thusly: > >>>>> "Yann" == Yann E MORIN <yann.morin.1998@free.fr> writes: > > NAK in the state. > > > If the package needs submodules, we can't ask the remote to generate > > the archive for us, because git-archive does not know how to include > > submodules. > > > So, maybe this would work: > > > if [ ${recurse} -eq 0 ]; then > > if _git blabla remote archive; then > > exit 0 > > fi > > fi > > Or alternatively, we look at the alternative approach for handling > submodules - E.G. splicing git archive outputs. And I think I already explained that this was not so trivial... For example, I did this layout of git tree and submodules: foo/ foo/.git/ foo/foo <- file with "FOO" in it foo/bar/ foo/bar/.git foo/bar/bar <- file with "BAR" in it foo/bar/buz/ foo/bar/buz/.git foo/bar/buz/buz <- file with "BUW" in it - each git tree has a file named after the git tree and containing the name of the git tree in uppercase (just for fun and as a way to check what I did). - 'foo' is a git tree with a submodule 'bar'. - 'bar' is a git tree with a submodule 'buz'. - So, 'buz' is *not* a submodule of 'foo' $ git submodule foreach -q --recursive 'printf "name=${name} path=${path} toplevel=${toplevel}\n"' name=bar path=bar toplevel=/home/ymorin/dev/buildroot/foo/git/foo name=buz path=buz toplevel=/home/ymorin/dev/buildroot/foo/git/foo/bar So it means we have no easy way to get the relative path to the sub-submodules. We have to extract them: $ git submodule foreach -q --recursive "printf \"reldir=\${toplevel#$(git rev-parse --show-toplevel)}/\${path}\n\"" reldir=/bar reldir=/bar/buz And then for each of them, we shoe-horn that path as a --prefix to git archive. This does not make our git wrapper any much simpler: - we still need to try a shallow clone and fallback to a full clone, - we still need to fetch the special refs, - we still need to do checkouts (thus non-bare clones) because submodules are only known with a working tree, - we still need to init and update submodules, recursively. The only slight simplification would be with using git-archive instead of a canned tar, but even then this git-archive command would be quite complex (untested): $ git archive --prefix=${basename} --format=tar >"${output}.tmp" $ git submodule foreach -q --recursive \ "git archive --prefix=${basename}\${toplevel#$(git rev-parse --show-toplevel)}/\${path}/ --format=tar" \ >>"${output}.tmp" $ gzip -9 <"${output}.tmp" >"${output}" Sorry, but this is totally unreadable... :-/ And this is only about replacing the *single* tar we have right now. We'd still have to keep all the rest of the wrapper... However, taking again my exmple git tree above: $ git archive --prefix=foo/ --format=tar HEAD >foo.tar $ ls -l foo.tar -rw-rw-r-- 1 ymorin ymorin 10240 Aug 22 22:37 foo.tar $ git submodule foreach -q --recursive "git archive --prefix=foo\${toplevel#$(git rev-parse --show-toplevel)}/\${path}/ --format=tar HEAD >>$(pwd)/foo.tar" $ ls -l foo.tar -rw-rw-r-- 1 ymorin ymorin 30720 Aug 22 22:37 foo.tar So it seems the submodules were somewhat added to the acrchive, right? Well, at least it seems the archive is ill-formed: $ tar tf foo.tar foo/ foo/.gitmodules foo/bar/ foo/foo If I 'hexdump -Cv foo.tar' it looks like there is everything in there, though... But git-archive generates a 'global pax header' (whatever that is) by default. We can tell it not too, by using a special syntax when specifying the tree-ish: using HEAD^{tree} instead of HEAD. No more luck at extracting the archive... :-( So I'm not sure where to go from here. > > Also, as stated by Thomas, we want to generate reproducible archives, so > > that we can check the hashes of archives. We go at great length to > > generate such archives locally, but I don't see a guarantee that the > > remote archive would be reproducible. > > Normal 'git archive' output should be reproducable, E.G. that is what we > used until recently. Yet, we did notice that, at one point, github archives were *not* reproducible... Regards, Yann E. MORIN. -- .-----------------.--------------------.------------------.--------------------. | Yann E. MORIN | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: | | +33 662 376 056 | Software Designer | \ / CAMPAIGN | ___ | | +33 223 225 172 `------------.-------: X AGAINST | \e/ There is no | | http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL | v conspiracy. | '------------------------------^-------^------------------^--------------------' ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-08-22 20:55 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-08-17 19:10 [Buildroot] [PATCH 1/1] support/download/git: Prioritize remote archive Benjamin Kamath 2016-08-17 20:39 ` Thomas Petazzoni 2016-08-17 21:06 ` Benjamin Kamath 2016-08-17 21:03 ` Yann E. MORIN 2016-08-17 21:13 ` Benjamin Kamath 2016-08-17 21:31 ` Yann E. MORIN 2016-08-17 21:54 ` Thomas Petazzoni 2016-08-17 22:00 ` Yann E. MORIN 2016-08-22 19:53 ` Peter Korsgaard 2016-08-22 20:55 ` Yann E. MORIN
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.