All of lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [PATCH v2 0/4] support/download/git using git fetch
@ 2016-12-02 15:21 Ricardo Martincoski
  2016-12-02 15:21 ` [Buildroot] [PATCH v2 1/4] support/download/git: do not use git clone Ricardo Martincoski
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Ricardo Martincoski @ 2016-12-02 15:21 UTC (permalink / raw)
  To: buildroot

All,

This is the rewrite of the git downloader using git init + git fetch instead of
git clone.

v1 had these patches:
 support/download/git: log checked out sha1
 test/support/download/git: new test
 support/download/git: do not use git clone

Changes v1 -> v2:
 - add DEVELOPERS entry to the series;
 - move the optimized download of sha1 to a separate patch (Arnout);
 - remove log of sha1 and the automated test from this series (they will be
   part of a follow-up series);
 - the diff of the main patch to master is now smaller and hopefully will make
   the review easier;
 - many improvements to the download script, described in the changelog of the
   first patch (most from Arnout);
 - few corner cases documented (dumb http servers) or fixed (optimized download
   of tag using git 1.7.1 from RHEL6);
 - create a new RFC patch at the end of the series adding the fetch of 100
   commits from each branch as a middle ground between a depth 1 shallow fetch
   and a full fetch.

Advantages:
 - any type of ref can be used as a package version (sha1, tag, branch, special
   ref);
 - do a partial fetch when possible (good especially for large repos, like
   linux kernel);
 - a second fetch reuses the objects already downloaded;
 - sha1 use optimized download when available;
Disadvantages:
 - some small repos (really small without active development and with all refs
   packed) can require to transfer more data from the server when compared to a
   normal git clone.

A comparison between the download size for the current master and for each patch
in this series can be found here:
https://gist.github.com/ricardo-martincoski/bab4ca3257ef292c7175df1e4fdc17e4

Ricardo Martincoski (4):
  support/download/git: do not use git clone
  support/download/git: optimized download of sha1
  DEVELOPERS: add entry for support/download/git
  support/download/git: shallow fetch of all branches

 DEVELOPERS           |   7 +--
 support/download/git | 124 ++++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 98 insertions(+), 33 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Buildroot] [PATCH v2 1/4] support/download/git: do not use git clone
  2016-12-02 15:21 [Buildroot] [PATCH v2 0/4] support/download/git using git fetch Ricardo Martincoski
@ 2016-12-02 15:21 ` Ricardo Martincoski
  2017-02-07 16:48   ` Arnout Vandecappelle
  2016-12-02 15:21 ` [Buildroot] [PATCH v2 2/4] support/download/git: optimized download of sha1 Ricardo Martincoski
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Ricardo Martincoski @ 2016-12-02 15:21 UTC (permalink / raw)
  To: buildroot

Rewrite the script using git init and git fetch instead of git clone,
based in some ideas discussed during the review of [1].

Always using git init + git fetch has these advantages:
- git fetch works with all kinds of refs, while git clone supports only
  branches and tags without the ref/heads/ and ref/tags/ prefixes;
- a fetch can be done for the head of those refs (the same as git clone
  -b but works for all refs);
- the objects already downloaded by a call to git fetch are reused in
  the next call.

First ask the remote for its references using git ls-remote and save the
output to a file. Later on, inspect the saved file for the desired
change set, determining if it is a named ref and so can be downloaded
using a shallow fetch.
Use an analytical solution to inspect the output of git ls-remote
instead of a single awk line. It makes each line of code simple: 'grep'
to check the entry is in the ls-remote output and 'cut' to actually get
the reference to use.

A concern that arrives from this method is that the remote can change
between the git ls-remote and the git fetch, but in this case, once the
script creates the equivalent of a shallow clone, the fetch and the
checkout do what is expected:
- for a removed named reference (branch, tag or special ref), the fetch
  fails, falling back to a possibly successful checkout after all
  branches and tags are fetched (the checkout will only succeed in
  specific cases, e.g. the remote removed a branch but created a tag
  with the same name);
- for a changed named reference (branch, tag or special ref), the fetch
  and checkout are successful using the "new" sha1 from the remote.

Move the git checkout command together to each git fetch command since
now the checkout can fail (if the change set is not yet fetched) falling
back to the next method (that downloads more objects from the remote).

When doing a full fetch in a local shallow copy, use --unshallow to
ensure the local copy is converted to a complete one.

If after the fetch of all branches and tags (equivalent to git clone)
the desired change set cannot be checked out, do a fetch of all
references (equivalent to git clone --mirror). This approach allows the
use of any unambiguous partial sha1 as package version and also allows
the use of sha1 of special refs, while keeping the usual bandwidth need
unchanged by avoiding downloading special refs when not needed.

[1] http://patchwork.ozlabs.org/patch/681841/

Signed-off-by: Ricardo Martincoski <ricardo.martincoski@datacom.ind.br>
---
Please notice the comment I added to the code about falling back to full
clone when the remote only supports dumb http transport is not a new
behavior of git init + git fetch. All git commands, including git clone,
refuse to use --depth for dumb http.

Changes v1 -> v2:
  - removed acked-by/tested-by since the diff to v1 is not trivial;
  - use grep -E to avoid lots of escaping (Arnout);
  - move each checkout together to the fetch it belongs (Arnout);
  - save the output of grep to a temporary file and then get the ref
    name using cut instead of using awk (Arnout);
    - notice I extended this by using 3 temporary files. It was done to
      avoid problem with funny (valid!) names of branches and tags, e.g.
      'bra{2}ch' or 'tag$$'. Now we first do a strict grep (grep -F) and
      then its output can be checked in a less strict way. Notice using
      grep -F is the only way to make '.' not match any character
      without using e.g. bash pattern substitution to escape the change
      set before passing to grep;
  - moved the optimized download of sha1 to a separate patch (Arnout);
  - use of head -1 to get the first match (this is an extension of the
    suggestion from Arnout of using tail -1 to get the last match);
  - use if [ "$ref" ] instead of a specific flag (Arnout);
  - use ${repo} as parameter to git fetch instead of adding a remote
    (Arnout);
  - wrap to 80 columns (Arnout);
    - only one line remains with 84 chars;
  - allow optimized download of tags when using ancient git client (e.g.
    1.7.1 in RHEL6);
  - the script is now much more alike the version in current master
    (only one flag 'git_done' is used and few empty lines v1 introduced
    were removed), so comparing which code is equivalent to each should
    be easier in this version;
  - do not use a separate namespace refs/buildroot since now ${ref} is
    always in the complete form refs/tags/tag;
  - --unshallow to ensure to convert a shallow copy to a complete one;
  - added more precise comments for the cases the remote changed after
    git ls-remote;
  - added more comments;
  - update the commit message.
---
 support/download/git | 90 ++++++++++++++++++++++++++++++++++------------------
 1 file changed, 60 insertions(+), 30 deletions(-)

diff --git a/support/download/git b/support/download/git
index 792141183..0780c6a9e 100755
--- a/support/download/git
+++ b/support/download/git
@@ -38,44 +38,74 @@ _git() {
     eval ${GIT} "${@}"
 }
 
-# Try a shallow clone, since it is faster than a full clone - but that only
-# works if the version is a ref (tag or branch). Before trying to do a shallow
-# clone we check if ${cset} is in the list provided by git ls-remote. If not
-# we fall back on a full clone.
-#
-# Messages for the type of clone used are provided to ease debugging in case of
-# problems
+_git init ${verbose} "'${basename}'"
+
+pushd "${basename}" >/dev/null
+
+# Save temporary files inside the .git directory that will be deleted after the
+# checkout is done.
+a_r=".git/all_refs"
+c_r=".git/candidate_refs"
+m_r=".git/matching_refs"
+
+# Ask the server the list of all refs that can use the most optimized download
+# (git fetch --depth 1). In the case the remote gets updated between this
+# command and the git fetch, fall back to a full fetch.
+_git ls-remote "'${repo}'" >${a_r}
+
+# Do a strict filtering before hand, so we can use less strict checking to
+# determine a cset can use the optimized download. This way, refs with funny
+# names (e.g. 'bra{2}nch', 'tag$$') will simply fall back to a full fetch.
+if grep -F "${cset}" ${a_r} >${c_r} 2>/dev/null; then
+    if grep -E "\<(|(|refs/)(heads|tags)/)${cset}$" ${c_r} >${m_r} 2>/dev/null; then
+        # Support branches and tags in the simplified form.
+        # Support branches and tags and special refs in the form refs/tags/tag.
+        # NOTE: When using an ancient git client, the fetch of a tag in the
+        # simplified form fails and would fall back to a full fetch. Git version
+        # 1.7.1 (RHEL6) fails, 1.8.2.3 (RHEL5+EPEL) succeeds. Instead of using
+        # the received ${cset} as ref, always use the complete form.
+        # When the name is ambiguous (there is a branch and a tag with the same
+        # name), the branch is selected. This way we behave like git fetch, git
+        # clone and git checkout. To accomplish this we use the first match
+        # because output of git ls-remote is already sorted by ref.
+        ref="$(cut -f 2 ${m_r} | head -1)"
+    fi
+fi
 git_done=0
-if [ -n "$(_git ls-remote "'${repo}'" "'${cset}'" 2>&1)" ]; then
-    printf "Doing shallow clone\n"
-    if _git clone ${verbose} "${@}" --depth 1 -b "'${cset}'" "'${repo}'" "'${basename}'"; then
-        git_done=1
+if [ "${ref}" ]; then
+    printf "Doing shallow fetch, using '%s' to get '%s'\n" "${ref}" "${cset}"
+    # Because ${ref} is always in the complete form we don't need to create a
+    # separate namespace (i.e. refs/buildroot/) and just use the ref as is.
+    if _git fetch -u ${verbose} "${@}" --depth 1 "'${repo}'" \
+                  "'+${ref}:${ref}'" 2>&1; then
+        unshallow=--unshallow
+        if _git checkout -q "'${cset}'" 2>&1; then
+            git_done=1
+        else
+            printf "Checkout failed, falling back to doing a full fetch\n"
+        fi
     else
-        printf "Shallow clone failed, falling back to doing a full clone\n"
+        # It catches the case the remote supports only dumb http transport.
+        # It catches the case the remote removed the ref after git ls-remote.
+        printf "Shallow fetch failed, falling back to doing a full fetch\n"
     fi
 fi
 if [ ${git_done} -eq 0 ]; then
-    printf "Doing full clone\n"
-    _git clone ${verbose} "${@}" "'${repo}'" "'${basename}'"
+    printf "Doing full fetch\n"
+    # Fetch all branch and tag refs. The same as git clone.
+    _git fetch -u ${verbose} "${@}" ${unshallow} "'${repo}'" \
+               "'+refs/tags/*:refs/tags/*'" "'+refs/heads/*:refs/heads/*'"
+    if _git checkout -q "'${cset}'" 2>&1; then
+        git_done=1
+    fi
 fi
-
-pushd "${basename}" >/dev/null
-
-# Try to get the special refs exposed by some forges (pull-requests for
-# github, changes for gerrit...). There is no easy way to know whether
-# the cset the user passed us is such a special ref or a tag or a sha1
-# or whatever else. We'll eventually fail at checking out that cset,
-# below, if there is an issue anyway. Since most of the cset we're gonna
-# have to clone are not such special refs, consign the output to oblivion
-# so as not to alarm unsuspecting users, but still trace it as a warning.
-if ! _git fetch origin "'${cset}:${cset}'" >/dev/null 2>&1; then
-    printf "Could not fetch special ref '%s'; assuming it is not special.\n" "${cset}"
+if [ ${git_done} -eq 0 ]; then
+    printf "Doing mirror fetch\n"
+    # Fetch all refs, including special refs. The same as git clone --mirror.
+    _git fetch -u ${verbose} "${@}" ${unshallow} "'${repo}'" "'+refs/*:refs/*'"
+    _git checkout -q "'${cset}'"
 fi
 
-# Checkout the required changeset, so that we can update the required
-# submodules.
-_git checkout -q "'${cset}'"
-
 # Get date of commit to generate a reproducible archive.
 # %cD is RFC2822, so it's fully qualified, with TZ and all.
 date="$( _git log -1 --pretty=format:%cD )"
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Buildroot] [PATCH v2 2/4] support/download/git: optimized download of sha1
  2016-12-02 15:21 [Buildroot] [PATCH v2 0/4] support/download/git using git fetch Ricardo Martincoski
  2016-12-02 15:21 ` [Buildroot] [PATCH v2 1/4] support/download/git: do not use git clone Ricardo Martincoski
@ 2016-12-02 15:21 ` Ricardo Martincoski
  2017-02-07 16:48   ` Arnout Vandecappelle
  2016-12-02 15:21 ` [Buildroot] [PATCH v2 3/4] DEVELOPERS: add entry for support/download/git Ricardo Martincoski
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Ricardo Martincoski @ 2016-12-02 15:21 UTC (permalink / raw)
  To: buildroot

Shallow fetch don't work for downloading SHAs. However it's common to
track a tag or branch by the commit SHA so that a branch or tag doesn't
unexpectedly change.

We can take advantage of this scenario by searching the remote for a ref
that it equivalent to our sha, then shallow fetching that ref instead.

Regarding the case the remote changes between the git ls-remote and the
git fetch, once the script creates the equivalent of a shallow clone,
the fetch and the checkout do what is expected:
- for a sha1 of a removed ref (branch, tag or special ref), the fetch
  fails, falling back to the equivalent of a full clone;
- for a sha1 of a changed ref (branch, tag or special ref), the checkout
  fails (because the desired sha1 is not fetched), falling back to the
  equivalent of a full clone;
In both cases, the checkout after the equivalent of a full clone or
mirror clone is successful if the sha1 is still reachable from any ref.

Similar patch was submitted in [1] using git clone.
[1] http://patchwork.ozlabs.org/patch/681841/

Reported-by: Brandon Maier <brandon.maier@rockwellcollins.com>
Reported-by: Bryce Ferguson <bryce.ferguson@rockwellcollins.com>
Signed-off-by: Ricardo Martincoski <ricardo.martincoski@datacom.ind.br>
---
Changes v1 -> v2:
  - moved the optimized download of sha1 to this separate patch
    (Arnout);
  - part of commit message copied from Brandon's patch;
  - only full sha1 are optimized because the output of git ls-remote is
    a subset of all sha1 and so cannot be used to validate a sha1 is
    unambiguous in the full set of sha1s from the repo (that also
    include tree-ish sha1s);
  - save the output of grep to a temporary file and then get the ref
    name using cut instead of using awk (Arnout);
  - any reference that points to the sha1 can be used, we just want to
    avoid HEAD because it is the most volatile one. So use tail -1 to
    get the last match instead of doing many 'if's to accomplish the
    same goal (Arnout);
  - added more comments, especially this: ref for sha1 pointed by
    annotated tags shows a ^{} at the end in the output of ls-remote;
  - use 'Pattern substitution' from bash to remove ^{}.
---
 support/download/git | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/support/download/git b/support/download/git
index 0780c6a9e..42048ad48 100755
--- a/support/download/git
+++ b/support/download/git
@@ -69,6 +69,18 @@ if grep -F "${cset}" ${a_r} >${c_r} 2>/dev/null; then
         # clone and git checkout. To accomplish this we use the first match
         # because output of git ls-remote is already sorted by ref.
         ref="$(cut -f 2 ${m_r} | head -1)"
+    elif grep "^${cset}\>" ${c_r} >${m_r} 2>/dev/null; then
+        # Support sha1 of branch head and commit pointed by tag (annotated or
+        # not) and sha1 of special refs head.
+        # Do not support partial sha1 because it is possible it is unambiguous
+        # in the subset of ls-remote but ambiguous in the full set of all sha1.
+        # A sha1 can be referenced by many names. Any reference can be used but
+        # avoid using HEAD if possible because it is the most volatile one.
+        # HEAD appears at the begin of ls-remote, so use the last match.
+        ref="$(cut -f 2 ${m_r} | tail -1)"
+        # The ref of the sha1 pointed by an annotated tag ends with tag^{} in
+        # the output of git ls-remote but we want the name of the tag to fetch.
+        ref=${ref/%"^{}"}
     fi
 fi
 git_done=0
@@ -82,6 +94,8 @@ if [ "${ref}" ]; then
         if _git checkout -q "'${cset}'" 2>&1; then
             git_done=1
         else
+            # It catches the case we want a sha1 but the remote changed the ref
+            # after git ls-remote.
             printf "Checkout failed, falling back to doing a full fetch\n"
         fi
     else
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Buildroot] [PATCH v2 3/4] DEVELOPERS: add entry for support/download/git
  2016-12-02 15:21 [Buildroot] [PATCH v2 0/4] support/download/git using git fetch Ricardo Martincoski
  2016-12-02 15:21 ` [Buildroot] [PATCH v2 1/4] support/download/git: do not use git clone Ricardo Martincoski
  2016-12-02 15:21 ` [Buildroot] [PATCH v2 2/4] support/download/git: optimized download of sha1 Ricardo Martincoski
@ 2016-12-02 15:21 ` Ricardo Martincoski
  2016-12-02 15:21 ` [Buildroot] [RFC v2 4/4] support/download/git: shallow fetch of all branches Ricardo Martincoski
  2016-12-06 17:22 ` [Buildroot] [PATCH v2 0/4] support/download/git using git fetch Brandon Maier
  4 siblings, 0 replies; 13+ messages in thread
From: Ricardo Martincoski @ 2016-12-02 15:21 UTC (permalink / raw)
  To: buildroot

Also fix alphabetical order.

Signed-off-by: Ricardo Martincoski <ricardo.martincoski@datacom.ind.br>
---
 DEVELOPERS | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/DEVELOPERS b/DEVELOPERS
index 684be4866..755971189 100644
--- a/DEVELOPERS
+++ b/DEVELOPERS
@@ -1229,12 +1229,13 @@ F:	package/ustream-ssl/
 N:	Renaud Aubin <root@renaud.io>
 F:	package/libhttpparser/
 
-N:	Ricardo Martincoski <ricardo.martincoski@datacom.ind.br>
-F:	package/atop/
-
 N:	Rhys Williams <github@wilberforce.co.nz>
 F:	package/lirc-tools/
 
+N:	Ricardo Martincoski <ricardo.martincoski@datacom.ind.br>
+F:	package/atop/
+F:	support/download/git
+
 N:	Richard Braun <rbraun@sceen.net>
 F:	package/curlftpfs/
 F:	package/tzdata/
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Buildroot] [RFC v2 4/4] support/download/git: shallow fetch of all branches
  2016-12-02 15:21 [Buildroot] [PATCH v2 0/4] support/download/git using git fetch Ricardo Martincoski
                   ` (2 preceding siblings ...)
  2016-12-02 15:21 ` [Buildroot] [PATCH v2 3/4] DEVELOPERS: add entry for support/download/git Ricardo Martincoski
@ 2016-12-02 15:21 ` Ricardo Martincoski
  2016-12-06 17:13   ` Brandon Maier
  2017-02-07 16:49   ` Arnout Vandecappelle
  2016-12-06 17:22 ` [Buildroot] [PATCH v2 0/4] support/download/git using git fetch Brandon Maier
  4 siblings, 2 replies; 13+ messages in thread
From: Ricardo Martincoski @ 2016-12-02 15:21 UTC (permalink / raw)
  To: buildroot

When a branch of a package is tracked using sha1 and the remote branch
moves (usually gets another commit), this script stops using a shallow
fetch since the sha1 is not anymore the head of the branch, falling back
to a full clone.

As a middle ground between a --depth 1 fetch and a full clone, fetch all
branches with depth of 100. We have a good change to catch the desired
version reducing the data transfer from the remote when fetching from
large repos, especially hardware-specific linux trees.

In the case the desired version is not in this shallow fetch, fall back
to a full fetch that reuses the objects already downloaded.

This method causes an extra burden in the server side ('Compressing
objects'), usually a few seconds, but the potential reduction of data
transferred should be beneficial for both the user and the server.

https://github.com/raspberrypi/linux.git
100 commits from all branches 567.47 MiB
git clone 1.49 GiB

https://github.com/Freescale/linux-fslc.git
100 commits from all branches 944.72 MiB + 73.89 KiB
git clone 1.68 GiB

https://github.com/linux-sunxi/sunxi-mali-proprietary
100 commits from all branches 7.55 MiB
git clone 7.55 MiB

Reported-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
Signed-off-by: Ricardo Martincoski <ricardo.martincoski@datacom.ind.br>
---
Changes v1 -> v2:
  - new RFC patch with the feature suggested by Arnout in [1].
    - I implemented only for branches, instead of branches and tags;
    - the code only runs for full sha1 because this shallow fetch has a
      subset of the sha1 from the repo and we could end up successfully
      checking out the wrong sha1 based on a partial sha1 when we really
      should fail the checkout (as if a git clone took place);
    - the check for full sha1 as change set is performed is a very
      simplistic way: cset has 40 char. Of course we could create a more
      complex check (e.g. 40 char in [0-9a-fA-F]) but I thought it does
      not worth the effort, since we cannot know for sure it is a sha1
      or not; and also in the worst case it falls back to a full clone;
    - the only case --depth is not supported for any git command is when
      the server supports only dumb http transport, in this case the
      script falls back to a full clone.

[1] http://patchwork.ozlabs.org/patch/690098/

Here some measurements, notice most of them are not used by Buildroot
and are here just for comparison:

https://github.com/raspberrypi/linux.git
100 commits from all branches 567.47 MiB
git clone 1.49 GiB
git clone --mirror 1.50 GiB

https://github.com/Freescale/linux-fslc.git
100 commits from all branches 944.72 MiB + 73.89 KiB
git clone 1.68 GiB
git clone --mirror 1.69 GiB

http://arago-project.org/git/projects/am33x-cm3.git
100 commits from all branches fails, server supports only dumb http
git clone 5,6M
git clone --mirror 5,6M
(measured using du -s -h .git/objects/)

https://github.com/torvalds/linux.git
100 commits from all branches 469.40 MiB + 66.09 KiB
git clone 1.62 GiB
git clone --mirror 1.82 GiB

https://github.com/linux-sunxi/sunxi-mali-proprietary
100 commits from all branches 7.55 MiB
git clone 7.55 MiB
git clone --mirror 7.55 MiB

https://github.com/tmux/tmux.git
100 commits from all branches 1.13 MiB
git clone 6.62 MiB
git clone --mirror 6.91 MiB

https://github.com/hishamhm/htop.git
100 commits from all branches 1.62 MiB
git clone 1.92 MiB
git clone --mirror 2.20 MiB

git://git.buildroot.net/buildroot
100 commits from all branches 9.61 MiB
git clone 47.51 MiB
git clone --mirror 47.51 MiB

https://github.com/buildroot/buildroot.git
100 commits from all branches 7.82 MiB
git clone 61.94 MiB
git clone --mirror 64.72 MiB

https://github.com/laravel/framework.git
100 commits from all branches 9.12 MiB
git clone 24.09 MiB
git clone --mirror 41.05 MiB
---
 support/download/git | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/support/download/git b/support/download/git
index 42048ad48..59687a323 100755
--- a/support/download/git
+++ b/support/download/git
@@ -104,6 +104,26 @@ if [ "${ref}" ]; then
         printf "Shallow fetch failed, falling back to doing a full fetch\n"
     fi
 fi
+if [ ${git_done} -eq 0 -a ${#cset} -eq 40 ]; then
+    printf "Doing shallow fetch of all branches\n"
+    # When the version of a package is following a branch it is usual to use
+    # the sha1 instead of the branch name in order to assure reproducible
+    # builds. When a new commit is added to the branch in the upstream, the
+    # selected version is not anymore in the branch head, leading to a full
+    # fetch.
+    # As a middle ground between a --depth 1 fetch and a full fetch, fetch all
+    # branches with depth of 100. We have a good change to catch the desired
+    # version and it makes difference when fetching from large repos.
+    # Check the fetch is successful for the case the remote does not support
+    # smart http transport.
+    if _git fetch -u ${verbose} "${@}" --depth 100 "'${repo}'" \
+                  "'+refs/heads/*:refs/heads/*'" 2>&1; then
+        unshallow=--unshallow
+        if _git checkout -q "'${cset}'" 2>&1; then
+            git_done=1
+        fi
+    fi
+fi
 if [ ${git_done} -eq 0 ]; then
     printf "Doing full fetch\n"
     # Fetch all branch and tag refs. The same as git clone.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Buildroot] [RFC v2 4/4] support/download/git: shallow fetch of all branches
  2016-12-02 15:21 ` [Buildroot] [RFC v2 4/4] support/download/git: shallow fetch of all branches Ricardo Martincoski
@ 2016-12-06 17:13   ` Brandon Maier
  2016-12-06 19:30     ` Ricardo Martincoski
  2017-02-07 16:49   ` Arnout Vandecappelle
  1 sibling, 1 reply; 13+ messages in thread
From: Brandon Maier @ 2016-12-06 17:13 UTC (permalink / raw)
  To: buildroot

On Fri, Dec 2, 2016 at 9:21 AM, Ricardo Martincoski <
ricardo.martincoski@datacom.ind.br> wrote:
>
>
> As a middle ground between a --depth 1 fetch and a full clone, fetch all
> branches with depth of 100. We have a good change to catch the desired
>

Should this be "good chance"?

Also, is there any particular reason for depth 100?


> ...

+    # As a middle ground between a --depth 1 fetch and a full fetch, fetch
> all
> +    # branches with depth of 100. We have a good change to catch the
> desired


Again, "good chance"?


Altogether good idea Ricardio, I like the "--depth 100" feature. This will
be helpful for our team as well.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.busybox.net/pipermail/buildroot/attachments/20161206/3247e261/attachment.html>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Buildroot] [PATCH v2 0/4] support/download/git using git fetch
  2016-12-02 15:21 [Buildroot] [PATCH v2 0/4] support/download/git using git fetch Ricardo Martincoski
                   ` (3 preceding siblings ...)
  2016-12-02 15:21 ` [Buildroot] [RFC v2 4/4] support/download/git: shallow fetch of all branches Ricardo Martincoski
@ 2016-12-06 17:22 ` Brandon Maier
  4 siblings, 0 replies; 13+ messages in thread
From: Brandon Maier @ 2016-12-06 17:22 UTC (permalink / raw)
  To: buildroot

Thanks for incorporating the shallow fetch into your patch. The full
rewrite to git init/fetch ended up with a much cleaner implementation then
trying to hack my stuff onto git clone.

I tested these 4 patches on some of our internal projects and it brought a
fresh "make source" from 9.1m down to 1.8m.

Tested-by: Brandon Maier <brandon.maier@rockwellcollins.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.busybox.net/pipermail/buildroot/attachments/20161206/68d2773f/attachment.html>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Buildroot] [RFC v2 4/4] support/download/git: shallow fetch of all branches
  2016-12-06 17:13   ` Brandon Maier
@ 2016-12-06 19:30     ` Ricardo Martincoski
  2016-12-06 20:06       ` Arnout Vandecappelle
  0 siblings, 1 reply; 13+ messages in thread
From: Ricardo Martincoski @ 2016-12-06 19:30 UTC (permalink / raw)
  To: buildroot

Brandon,

Thank you for your tests.

On December 6, 2016 3:13:08 PM, Brandon Maier wrote:
>> As a middle ground between a --depth 1 fetch and a full clone, fetch all 
>> branches with depth of 100. We have a good change to catch the desired 
>
>Should this be "good chance"? 

Yes.


>Also, is there any particular reason for depth 100? 

No particular reason. It was only a guess.


>> ... 
>> + # As a middle ground between a --depth 1 fetch and a full fetch, fetch all 
>> + # branches with depth of 100. We have a good change to catch the desired 
>
> Again, "good chance"? 

Yes.


> Altogether good idea Ricardio, I like the "--depth 100" feature. This will be helpful for our team as well.

Actually it was not my idea, Arnout suggested the feature at
http://patchwork.ozlabs.org/patch/690098/


Regards,
Ricardo Martincoski

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Buildroot] [RFC v2 4/4] support/download/git: shallow fetch of all branches
  2016-12-06 19:30     ` Ricardo Martincoski
@ 2016-12-06 20:06       ` Arnout Vandecappelle
  0 siblings, 0 replies; 13+ messages in thread
From: Arnout Vandecappelle @ 2016-12-06 20:06 UTC (permalink / raw)
  To: buildroot



On 06-12-16 20:30, Ricardo Martincoski wrote:
> Brandon,
> 
> Thank you for your tests.
> 
> On December 6, 2016 3:13:08 PM, Brandon Maier wrote:
>>> As a middle ground between a --depth 1 fetch and a full clone, fetch all 
>>> branches with depth of 100. We have a good change to catch the desired 
>>
>> Should this be "good chance"? 
> 
> Yes.
> 
> 
>> Also, is there any particular reason for depth 100? 
> 
> No particular reason. It was only a guess.

 The reasoning is mainly that the difference between depth 1 and depth 100 is
usually negligible. This may however not be true, particularly for repos with
large binaries.

 Even better would be to do an incremental search, start at 1, multiply by 10
every iteration until you find the wanted commit. But I'm not sure that the
savings warrant the additional complexity, and also the additional overhead of
reconnecting a few times.


 Regards,
 Arnout


>>> ... 
>>> + # As a middle ground between a --depth 1 fetch and a full fetch, fetch all 
>>> + # branches with depth of 100. We have a good change to catch the desired 
>>
>> Again, "good chance"? 
> 
> Yes.
> 
> 
>> Altogether good idea Ricardio, I like the "--depth 100" feature. This will be helpful for our team as well.
> 
> Actually it was not my idea, Arnout suggested the feature at
> http://patchwork.ozlabs.org/patch/690098/
> 
> 
> Regards,
> Ricardo Martincoski
> 

-- 
Arnout Vandecappelle                          arnout at mind be
Senior Embedded Software Architect            +32-16-286500
Essensium/Mind                                http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium           BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint:  7493 020B C7E3 8618 8DEC 222C 82EB F404 F9AC 0DDF

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Buildroot] [PATCH v2 1/4] support/download/git: do not use git clone
  2016-12-02 15:21 ` [Buildroot] [PATCH v2 1/4] support/download/git: do not use git clone Ricardo Martincoski
@ 2017-02-07 16:48   ` Arnout Vandecappelle
  2017-03-20  2:27     ` Ricardo Martincoski
  0 siblings, 1 reply; 13+ messages in thread
From: Arnout Vandecappelle @ 2017-02-07 16:48 UTC (permalink / raw)
  To: buildroot



On 02-12-16 16:21, Ricardo Martincoski wrote:
> Rewrite the script using git init and git fetch instead of git clone,
> based in some ideas discussed during the review of [1].
> 
> Always using git init + git fetch has these advantages:
> - git fetch works with all kinds of refs, while git clone supports only
>   branches and tags without the ref/heads/ and ref/tags/ prefixes;
> - a fetch can be done for the head of those refs (the same as git clone
>   -b but works for all refs);
> - the objects already downloaded by a call to git fetch are reused in
>   the next call.
> 
> First ask the remote for its references using git ls-remote and save the
> output to a file. Later on, inspect the saved file for the desired
> change set, determining if it is a named ref and so can be downloaded
> using a shallow fetch.
> Use an analytical solution to inspect the output of git ls-remote
> instead of a single awk line. It makes each line of code simple: 'grep'
> to check the entry is in the ls-remote output and 'cut' to actually get
> the reference to use.
> 
> A concern that arrives from this method is that the remote can change
> between the git ls-remote and the git fetch, but in this case, once the
> script creates the equivalent of a shallow clone, the fetch and the
> checkout do what is expected:
> - for a removed named reference (branch, tag or special ref), the fetch
>   fails, falling back to a possibly successful checkout after all
>   branches and tags are fetched (the checkout will only succeed in
>   specific cases, e.g. the remote removed a branch but created a tag
>   with the same name);
> - for a changed named reference (branch, tag or special ref), the fetch
>   and checkout are successful using the "new" sha1 from the remote.
> 
> Move the git checkout command together to each git fetch command since
> now the checkout can fail (if the change set is not yet fetched) falling
> back to the next method (that downloads more objects from the remote).
> 
> When doing a full fetch in a local shallow copy, use --unshallow to
> ensure the local copy is converted to a complete one.
> 
> If after the fetch of all branches and tags (equivalent to git clone)
> the desired change set cannot be checked out, do a fetch of all
> references (equivalent to git clone --mirror). This approach allows the
> use of any unambiguous partial sha1 as package version and also allows
> the use of sha1 of special refs, while keeping the usual bandwidth need
> unchanged by avoiding downloading special refs when not needed.
> 
> [1] http://patchwork.ozlabs.org/patch/681841/
> 
> Signed-off-by: Ricardo Martincoski <ricardo.martincoski@datacom.ind.br>

Acked-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>

 I tested getting hashes from a few different hard-to-reach branches, and it
works very well (with the full series applied), using shallow clones wherever
possible and falling back on full mirror when necessary.

[snip]
> +if [ ${git_done} -eq 0 ]; then
> +    printf "Doing mirror fetch\n"
> +    # Fetch all refs, including special refs. The same as git clone --mirror.
> +    _git fetch -u ${verbose} "${@}" ${unshallow} "'${repo}'" "'+refs/*:refs/*'"
> +    _git checkout -q "'${cset}'"
>  fi

 This bit is the only one which is still a little bit controversial. It really
should never be needed for any sane situation. Still, it feels safer to have
such a fallback so I'd keep it. The only disadvantage is that in case that a ref
really can't be fetched, it will take a long time before we discover that.

 Regards,
 Arnout

>  
> -# Checkout the required changeset, so that we can update the required
> -# submodules.
> -_git checkout -q "'${cset}'"
> -
>  # Get date of commit to generate a reproducible archive.
>  # %cD is RFC2822, so it's fully qualified, with TZ and all.
>  date="$( _git log -1 --pretty=format:%cD )"
> 

-- 
Arnout Vandecappelle                          arnout at mind be
Senior Embedded Software Architect            +32-16-286500
Essensium/Mind                                http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium           BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint:  7493 020B C7E3 8618 8DEC 222C 82EB F404 F9AC 0DDF

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Buildroot] [PATCH v2 2/4] support/download/git: optimized download of sha1
  2016-12-02 15:21 ` [Buildroot] [PATCH v2 2/4] support/download/git: optimized download of sha1 Ricardo Martincoski
@ 2017-02-07 16:48   ` Arnout Vandecappelle
  0 siblings, 0 replies; 13+ messages in thread
From: Arnout Vandecappelle @ 2017-02-07 16:48 UTC (permalink / raw)
  To: buildroot



On 02-12-16 16:21, Ricardo Martincoski wrote:
> Shallow fetch don't work for downloading SHAs. However it's common to
> track a tag or branch by the commit SHA so that a branch or tag doesn't
> unexpectedly change.
> 
> We can take advantage of this scenario by searching the remote for a ref
> that it equivalent to our sha, then shallow fetching that ref instead.
> 
> Regarding the case the remote changes between the git ls-remote and the
> git fetch, once the script creates the equivalent of a shallow clone,
> the fetch and the checkout do what is expected:
> - for a sha1 of a removed ref (branch, tag or special ref), the fetch
>   fails, falling back to the equivalent of a full clone;
> - for a sha1 of a changed ref (branch, tag or special ref), the checkout
>   fails (because the desired sha1 is not fetched), falling back to the
>   equivalent of a full clone;
> In both cases, the checkout after the equivalent of a full clone or
> mirror clone is successful if the sha1 is still reachable from any ref.
> 
> Similar patch was submitted in [1] using git clone.
> [1] http://patchwork.ozlabs.org/patch/681841/
> 
> Reported-by: Brandon Maier <brandon.maier@rockwellcollins.com>
> Reported-by: Bryce Ferguson <bryce.ferguson@rockwellcollins.com>
> Signed-off-by: Ricardo Martincoski <ricardo.martincoski@datacom.ind.br>

Acked-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>

 Regards,
 Arnout

[snip]

-- 
Arnout Vandecappelle                          arnout at mind be
Senior Embedded Software Architect            +32-16-286500
Essensium/Mind                                http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium           BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint:  7493 020B C7E3 8618 8DEC 222C 82EB F404 F9AC 0DDF

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Buildroot] [RFC v2 4/4] support/download/git: shallow fetch of all branches
  2016-12-02 15:21 ` [Buildroot] [RFC v2 4/4] support/download/git: shallow fetch of all branches Ricardo Martincoski
  2016-12-06 17:13   ` Brandon Maier
@ 2017-02-07 16:49   ` Arnout Vandecappelle
  1 sibling, 0 replies; 13+ messages in thread
From: Arnout Vandecappelle @ 2017-02-07 16:49 UTC (permalink / raw)
  To: buildroot



On 02-12-16 16:21, Ricardo Martincoski wrote:
> When a branch of a package is tracked using sha1 and the remote branch
> moves (usually gets another commit), this script stops using a shallow
> fetch since the sha1 is not anymore the head of the branch, falling back
> to a full clone.
> 
> As a middle ground between a --depth 1 fetch and a full clone, fetch all
> branches with depth of 100. We have a good change to catch the desired
> version reducing the data transfer from the remote when fetching from
> large repos, especially hardware-specific linux trees.
> 
> In the case the desired version is not in this shallow fetch, fall back
> to a full fetch that reuses the objects already downloaded.
> 
> This method causes an extra burden in the server side ('Compressing
> objects'), usually a few seconds, but the potential reduction of data
> transferred should be beneficial for both the user and the server.
> 
> https://github.com/raspberrypi/linux.git
> 100 commits from all branches 567.47 MiB
> git clone 1.49 GiB
> 
> https://github.com/Freescale/linux-fslc.git
> 100 commits from all branches 944.72 MiB + 73.89 KiB
> git clone 1.68 GiB
> 
> https://github.com/linux-sunxi/sunxi-mali-proprietary
> 100 commits from all branches 7.55 MiB
> git clone 7.55 MiB
> 
> Reported-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
> Signed-off-by: Ricardo Martincoski <ricardo.martincoski@datacom.ind.br>

Acked-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>

 Regards,
 Arnout

[snip]

-- 
Arnout Vandecappelle                          arnout at mind be
Senior Embedded Software Architect            +32-16-286500
Essensium/Mind                                http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium           BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint:  7493 020B C7E3 8618 8DEC 222C 82EB F404 F9AC 0DDF

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Buildroot] [PATCH v2 1/4] support/download/git: do not use git clone
  2017-02-07 16:48   ` Arnout Vandecappelle
@ 2017-03-20  2:27     ` Ricardo Martincoski
  0 siblings, 0 replies; 13+ messages in thread
From: Ricardo Martincoski @ 2017-03-20  2:27 UTC (permalink / raw)
  To: buildroot

Arnout,

On Tue, Feb 07, 2017 at 02:48 PM, Arnout Vandecappelle wrote:

> On 02-12-16 16:21, Ricardo Martincoski wrote:
[snip]
> Acked-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
> 
>  I tested getting hashes from a few different hard-to-reach branches, and it
> works very well (with the full series applied), using shallow clones wherever
> possible and falling back on full mirror when necessary.
> 
> [snip]
>> +if [ ${git_done} -eq 0 ]; then
>> +    printf "Doing mirror fetch\n"
>> +    # Fetch all refs, including special refs. The same as git clone --mirror.
>> +    _git fetch -u ${verbose} "${@}" ${unshallow} "'${repo}'" "'+refs/*:refs/*'"
>> +    _git checkout -q "'${cset}'"
>>  fi
> 
>  This bit is the only one which is still a little bit controversial. It really
> should never be needed for any sane situation. Still, it feels safer to have
> such a fallback so I'd keep it. The only disadvantage is that in case that a ref
> really can't be fetched, it will take a long time before we discover that.

The only cases (that I think of) that fallback to this code are corner cases
with special refs: partial sha1, ref changed after ls-remote, ...
So I will remove it and resend, and I will mark the series as Change Requested.

While at it I will make 2 more changes:
1) merge the DEVELOPERS entry into this patch;
2) fix the support to submodules that use relative path;

I am developing the automated tests for git download using the test infra.
I noticed that downloading submodules from repos that do not use absolute url
inside .gitmodules fails.
The package sunxi-mali uses absolute url inside .gitmodules, so it downloads
fine.

Since a change in this patch is only needed when using submodules I want to use
this diff:
(a)
 # There might be submodules, so fetch them.
 if [ ${recurse} -eq 1 ]; then
+    # When .gitmodules contains relative paths, git submodule needs a remote
+    # named origin to generate the correct urls
+    _git remote add origin "'${repo}'"
     _git submodule update --init --recursive
 fi
Yes, I tested and it seems to need to be named origin.

But there are other solutions:

(b) register the remote using the same command from (a) just after git init, as
Brandon suggested in http://patchwork.ozlabs.org/patch/681841/ . All other
"'${repo}'" can be changed to origin.

(c) use something like 'git config' to manually set the url for the submodules.
I did not yet researched enough about this option.

What do you think about item 2?

Also, since I am unsure, let me ask: should I carry or remove you acked-by on
this patch?

Thank you,
Ricardo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-03-20  2:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-02 15:21 [Buildroot] [PATCH v2 0/4] support/download/git using git fetch Ricardo Martincoski
2016-12-02 15:21 ` [Buildroot] [PATCH v2 1/4] support/download/git: do not use git clone Ricardo Martincoski
2017-02-07 16:48   ` Arnout Vandecappelle
2017-03-20  2:27     ` Ricardo Martincoski
2016-12-02 15:21 ` [Buildroot] [PATCH v2 2/4] support/download/git: optimized download of sha1 Ricardo Martincoski
2017-02-07 16:48   ` Arnout Vandecappelle
2016-12-02 15:21 ` [Buildroot] [PATCH v2 3/4] DEVELOPERS: add entry for support/download/git Ricardo Martincoski
2016-12-02 15:21 ` [Buildroot] [RFC v2 4/4] support/download/git: shallow fetch of all branches Ricardo Martincoski
2016-12-06 17:13   ` Brandon Maier
2016-12-06 19:30     ` Ricardo Martincoski
2016-12-06 20:06       ` Arnout Vandecappelle
2017-02-07 16:49   ` Arnout Vandecappelle
2016-12-06 17:22 ` [Buildroot] [PATCH v2 0/4] support/download/git using git fetch Brandon Maier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.