All of lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [RFC PATCH 0/3] fix some corner cases for download/git v1
@ 2018-04-12  9:28 Ricardo Martincoski
  2018-04-12  9:28 ` [Buildroot] [RFC PATCH 1/3] download/git: fix fetch all refs for old git Ricardo Martincoski
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Ricardo Martincoski @ 2018-04-12  9:28 UTC (permalink / raw)
  To: buildroot

Hello,

This series brings 3 patches trying to cover some corner cases when downloading
packages with method git.

The first 2 patches are intended to fix autobuilder failures.
http://autobuild.buildroot.net/?reason=dtv-scan-tables-07b18ecef17493ac0908a2e112eae3fe269da7fa

This occurs only with old versions of git when a full fetch is needed:
 fatal: reference is not a tree: sha1
These occur when the git cache is dirty:
 error: The following untracked working tree files would be overwritten by checkout:
 error: The following untracked working tree files would be removed by checkout:
A dirty cache can also cause hash mismatches for the tarball.

Patch 3 should fix the case of unshallowing the git cache (when a shallow fetch
downloaded a ref, and later a commit behind this ref is needed). This scenario
is relevant both during the buildroot development (a revert of a bump) and when
a build farm is shared by projects that use different versions of buildroot.

Regards,
Ricardo


Ricardo Martincoski (3):
  download/git: fix fetch all refs for old git
  download/git: recover dirty cache
  download/git: unshallow when fetching all refs

 support/download/git | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 1/3] download/git: fix fetch all refs for old git
  2018-04-12  9:28 [Buildroot] [RFC PATCH 0/3] fix some corner cases for download/git v1 Ricardo Martincoski
@ 2018-04-12  9:28 ` Ricardo Martincoski
  2018-04-12 17:42   ` Yann E. MORIN
  2018-04-12  9:28 ` [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache Ricardo Martincoski
  2018-04-12  9:28 ` [Buildroot] [RFC PATCH 3/3] download/git: unshallow when fetching all refs Ricardo Martincoski
  2 siblings, 1 reply; 27+ messages in thread
From: Ricardo Martincoski @ 2018-04-12  9:28 UTC (permalink / raw)
  To: buildroot

With old versions of git, when the shallow fetch was not enough, the
download fails like this:
 Fetching all references
 Could not fetch special ref 'sha1'; assuming it is not special.
 fatal: reference is not a tree: sha1

At git version 1.9.0, the semantics of the option -t for git fetch
changed, see upstream commit [1]:
< 1.9.0: "fetch tags without also fetching other references";
>=1.9.0: "fetch tags *in addition to* other stuff".

Fall back to explicit refspec to support distros that use ancient
versions of git.

Fixes:
http://autobuild.buildroot.net/results/7a88d1aa9ea155f1527d2fa141207c676af85298

[1] https://github.com/git/git/commit/c5a84e92a2fe9e8748e32341c344d7a6c0f52a50

Signed-off-by: Ricardo Martincoski <ricardo.martincoski@gmail.com>
Cc: "Yann E. MORIN" <yann.morin.1998@free.fr>
---
WARNING: please review carefully, especially related to the plus sign.

I created a special docker image replacing git 2.11.0 with 1.8.3 [2].
Using it together to a special .gitlab-ci.yml [3], I could test in
Gitlab CI the download, without previous git cache and without backup
site, of all packages in the tree that use site method git in these
combinations:
- [4] patch above master (20c1647f) using git 1.8.3;
- [5] patch above master (20c1647f) using git 2.11.0;
- [6] master (20c1647f) using git 1.8.3;
- [7] master (20c1647f) using git 2.11.0;
- [8] 2018.02.1 using git 1.8.3;
- [9] 2018.02.1 using git 2.11.0.

These packages fail with 'fatal: reference is not a tree' in the master
branch using git 1.8.3 but do not fail for the other 5 combinations above:
aer-inject
armbian-firmware
boot-wrapper-aarch64
dbus-triggerd
dropwatch
edid-decode
expedite
flashbench
gst1-interpipe
host-mxsldr
host-pseudo
host-squashfs
libbroadvoice
libg7221
libilbc
libsilk
libsoundtouch
libubox
libuci
libyuv
linux-firmware
linux at beaglebone_qt5_defconfig
ltrace
mmc-utils
net-tools
open-lldp
psplash
rtmpdump
slirp
squashfs
ti-sgx-demos
ti-sgx-km
ti-sgx-um
uboot at ci20_defconfig
uboot at roseapplepi_defconfig
ubus
uclibc-ng-test
uhttpd
ustream-ssl
x264
xdriver_xf86-video-intel
xdriver_xf86-video-voodoo
yavta

[2] https://gitlab.com/RicardoMartincoski/buildroot/commit/d293ba50fa8cd40eb5c396b2aa234c07e2118020
[3] http://patchwork.ozlabs.org/patch/896012/
[4] https://gitlab.com/RicardoMartincoski/buildroot/pipelines/20328681
[5] https://gitlab.com/RicardoMartincoski/buildroot/pipelines/20328707
[6] https://gitlab.com/RicardoMartincoski/buildroot/pipelines/20328727
[7] https://gitlab.com/RicardoMartincoski/buildroot/pipelines/20328745
[8] https://gitlab.com/RicardoMartincoski/buildroot/pipelines/20352543
[9] https://gitlab.com/RicardoMartincoski/buildroot/pipelines/20356539
---
 support/download/git | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/support/download/git b/support/download/git
index 381f3ceeb3..1b09e5c750 100755
--- a/support/download/git
+++ b/support/download/git
@@ -76,7 +76,7 @@ if [ -n "$(_git ls-remote origin "'${cset}'" 2>&1)" ]; then
 fi
 if [ ${git_done} -eq 0 ]; then
     printf "Fetching all references\n"
-    _git fetch origin -t
+    _git fetch origin -u "'+refs/tags/*:refs/tags/*'" "'+refs/heads/*:refs/remotes/origin/*'"
 fi
 
 # Try to get the special refs exposed by some forges (pull-requests for
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-12  9:28 [Buildroot] [RFC PATCH 0/3] fix some corner cases for download/git v1 Ricardo Martincoski
  2018-04-12  9:28 ` [Buildroot] [RFC PATCH 1/3] download/git: fix fetch all refs for old git Ricardo Martincoski
@ 2018-04-12  9:28 ` Ricardo Martincoski
  2018-04-12 17:48   ` Yann E. MORIN
  2018-04-12  9:28 ` [Buildroot] [RFC PATCH 3/3] download/git: unshallow when fetching all refs Ricardo Martincoski
  2 siblings, 1 reply; 27+ messages in thread
From: Ricardo Martincoski @ 2018-04-12  9:28 UTC (permalink / raw)
  To: buildroot

In the case a previous operation in the git cache was abruptly
interrupted the git cache can be in a dirty state (removed files,
untracked files). This can lead to download failures during the
checkout, like this:
 error: The following untracked working tree files would be overwritten by checkout:
and/or this:
 error: The following untracked working tree files would be removed by checkout:

If an untracked file, directory or submodule is present after the
checkout, the tarball will have additional content, leading to a hash
mismatch.

When performing the checkout, use -f to allow it to occur even if there
are untracked files that would be overwritten or removed.

Use git clean to remove any untracked files before generating the
tarball. Use the second -f to ensure the repo is pristine even if
previous checked out ref contained a submodule that is not present in
the ref just checked out.

Fixes:
http://autobuild.buildroot.net/results/b418394f679ad269d587753302c540036793334c
Possibly fixes:
http://autobuild.buildroot.net/results/9b584a624bb08c5ab6d977b93f4b92ed4bd1d40f

Signed-off-by: Ricardo Martincoski <ricardo.martincoski@gmail.com>
Cc: "Yann E. MORIN" <yann.morin.1998@free.fr>
---
I did not split this into 2 patches (the first one adding -f to checkout
and the second one adding git reset) because only adding -f to checkout
would make the git checkout command to succeed but the download would
still fails when checking the hash if any untracked file was left. In
this case the git cache also would still be dirty, so it is not really a
fix to a dirty cache.
---
 support/download/git | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/support/download/git b/support/download/git
index 1b09e5c750..68b5d920a1 100755
--- a/support/download/git
+++ b/support/download/git
@@ -92,7 +92,8 @@ fi
 
 # Checkout the required changeset, so that we can update the required
 # submodules.
-_git checkout -q "'${cset}'"
+_git checkout -f -q "'${cset}'"
+_git clean -ffdx
 
 # Get date of commit to generate a reproducible archive.
 # %cD is RFC2822, so it's fully qualified, with TZ and all.
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 3/3] download/git: unshallow when fetching all refs
  2018-04-12  9:28 [Buildroot] [RFC PATCH 0/3] fix some corner cases for download/git v1 Ricardo Martincoski
  2018-04-12  9:28 ` [Buildroot] [RFC PATCH 1/3] download/git: fix fetch all refs for old git Ricardo Martincoski
  2018-04-12  9:28 ` [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache Ricardo Martincoski
@ 2018-04-12  9:28 ` Ricardo Martincoski
  2018-04-12 11:28   ` Thomas Petazzoni
  2 siblings, 1 reply; 27+ messages in thread
From: Ricardo Martincoski @ 2018-04-12  9:28 UTC (permalink / raw)
  To: buildroot

With any version of git, after the git cache was once populated using a
shallow fetch, a full fetch succeeds but doesn't download the commits
behind the reference that was fetched with --depth 1. In this case the
download fails like this:
 Fetching all references
 Could not fetch special ref 'sha1'; assuming it is not special.
 fatal: reference is not a tree: sha1

One scenario in buildroot development that would trigger this sequence
is for example when a package is bumped on master branch, some users
create the git cache at this time, then the bump is reverted.

Another scenario is when giving maintenance to a product version that
uses one version of buildroot using the same build farm that uses a
newer version of buildroot.

Signed-off-by: Ricardo Martincoski <ricardo.martincoski@gmail.com>
Cc: "Yann E. MORIN" <yann.morin.1998@free.fr>
---
WARNING: this patch is in early stage, really RFC. I tested it in a
short period of time.

I tested it by first creating the cache for dt-utils in its current
version, then I changed dt-utils version to the commit immediately
before it (ea1d08e26b73a4855165ed56043fa439ff9948cb). Without this patch
the fetch of all refs succeeds but does not bring the desired commit and
the checkout fails.
---
 support/download/git | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/support/download/git b/support/download/git
index 68b5d920a1..8216da0a71 100755
--- a/support/download/git
+++ b/support/download/git
@@ -76,7 +76,15 @@ if [ -n "$(_git ls-remote origin "'${cset}'" 2>&1)" ]; then
 fi
 if [ ${git_done} -eq 0 ]; then
     printf "Fetching all references\n"
-    _git fetch origin -u "'+refs/tags/*:refs/tags/*'" "'+refs/heads/*:refs/remotes/origin/*'"
+    # After a shallow fetch was once sucessful in this repo, the only way to
+    # ensure the commits behind its current depth are available is to unshallow.
+    # Git versions older than 2.15.0 do not have a command to know if a repo is
+    # shallow, so test for the actual file.
+    if [ -f "$(git rev-parse --git-dir)/shallow" ]; then
+        unshallow=--unshallow
+    fi
+    _git fetch origin -u ${unshallow} \
+               "'+refs/tags/*:refs/tags/*'" "'+refs/heads/*:refs/remotes/origin/*'"
 fi
 
 # Try to get the special refs exposed by some forges (pull-requests for
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 3/3] download/git: unshallow when fetching all refs
  2018-04-12  9:28 ` [Buildroot] [RFC PATCH 3/3] download/git: unshallow when fetching all refs Ricardo Martincoski
@ 2018-04-12 11:28   ` Thomas Petazzoni
  2018-04-12 17:33     ` Yann E. MORIN
  0 siblings, 1 reply; 27+ messages in thread
From: Thomas Petazzoni @ 2018-04-12 11:28 UTC (permalink / raw)
  To: buildroot

Hello,

Thanks a lot Ricardo for all this work on the download issues, much
appreciated. One question on this specific commit.

On Thu, 12 Apr 2018 06:28:55 -0300, Ricardo Martincoski wrote:
> With any version of git, after the git cache was once populated using a
> shallow fetch, a full fetch succeeds but doesn't download the commits
> behind the reference that was fetched with --depth 1. In this case the
> download fails like this:
>  Fetching all references
>  Could not fetch special ref 'sha1'; assuming it is not special.
>  fatal: reference is not a tree: sha1
> 
> One scenario in buildroot development that would trigger this sequence
> is for example when a package is bumped on master branch, some users
> create the git cache at this time, then the bump is reverted.
> 
> Another scenario is when giving maintenance to a product version that
> uses one version of buildroot using the same build farm that uses a
> newer version of buildroot.

In the light of this, wouldn't it be simpler to stop doing shallow
clones at all ? A shallow clone did make a lot of sense back when we
didn't cache the Git repositories. But now that we are caching the
results, does it make sense to keep the complexity of the code to use
shallow clones ?

Note that this is really an open question, there are possibly some
convincing argument for us to keep a shallow clone strategy.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 3/3] download/git: unshallow when fetching all refs
  2018-04-12 11:28   ` Thomas Petazzoni
@ 2018-04-12 17:33     ` Yann E. MORIN
  2018-04-13 18:32       ` Ricardo Martincoski
  0 siblings, 1 reply; 27+ messages in thread
From: Yann E. MORIN @ 2018-04-12 17:33 UTC (permalink / raw)
  To: buildroot

Thomas, Ricardo, All,

On 2018-04-12 13:28 +0200, Thomas Petazzoni spake thusly:
> Thanks a lot Ricardo for all this work on the download issues, much
> appreciated.

Indeed, thanks!

> One question on this specific commit.
> 
> On Thu, 12 Apr 2018 06:28:55 -0300, Ricardo Martincoski wrote:
> > With any version of git, after the git cache was once populated using a
> > shallow fetch, a full fetch succeeds but doesn't download the commits
> > behind the reference that was fetched with --depth 1. In this case the
> > download fails like this:
> >  Fetching all references
> >  Could not fetch special ref 'sha1'; assuming it is not special.
> >  fatal: reference is not a tree: sha1
> > 
> > One scenario in buildroot development that would trigger this sequence
> > is for example when a package is bumped on master branch, some users
> > create the git cache at this time, then the bump is reverted.
> > 
> > Another scenario is when giving maintenance to a product version that
> > uses one version of buildroot using the same build farm that uses a
> > newer version of buildroot.
> 
> In the light of this, wouldn't it be simpler to stop doing shallow
> clones at all ? A shallow clone did make a lot of sense back when we
> didn't cache the Git repositories. But now that we are caching the
> results, does it make sense to keep the complexity of the code to use
> shallow clones ?
> 
> Note that this is really an open question, there are possibly some
> convincing argument for us to keep a shallow clone strategy.

Indeed, I don't think it makes sense anymore...

So, intead of fixing it, let's just do full clones now.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 1/3] download/git: fix fetch all refs for old git
  2018-04-12  9:28 ` [Buildroot] [RFC PATCH 1/3] download/git: fix fetch all refs for old git Ricardo Martincoski
@ 2018-04-12 17:42   ` Yann E. MORIN
  2018-04-13 18:23     ` Ricardo Martincoski
  0 siblings, 1 reply; 27+ messages in thread
From: Yann E. MORIN @ 2018-04-12 17:42 UTC (permalink / raw)
  To: buildroot

Ricardo, All,

On 2018-04-12 06:28 -0300, Ricardo Martincoski spake thusly:
> With old versions of git, when the shallow fetch was not enough, the
> download fails like this:
>  Fetching all references
>  Could not fetch special ref 'sha1'; assuming it is not special.
>  fatal: reference is not a tree: sha1
> 
> At git version 1.9.0, the semantics of the option -t for git fetch
> changed, see upstream commit [1]:
> < 1.9.0: "fetch tags without also fetching other references";
> >=1.9.0: "fetch tags *in addition to* other stuff".
> 
> Fall back to explicit refspec to support distros that use ancient
> versions of git.
[--SNIP--]
> diff --git a/support/download/git b/support/download/git
> index 381f3ceeb3..1b09e5c750 100755
> --- a/support/download/git
> +++ b/support/download/git
> @@ -76,7 +76,7 @@ if [ -n "$(_git ls-remote origin "'${cset}'" 2>&1)" ]; then
>  fi
>  if [ ${git_done} -eq 0 ]; then
>      printf "Fetching all references\n"
> -    _git fetch origin -t
> +    _git fetch origin -u "'+refs/tags/*:refs/tags/*'" "'+refs/heads/*:refs/remotes/origin/*'"

Why don't we just run:

    git fetch origin
    git fetch origin -t

As I understand it:

                    ancient git         | new git
--------------------------------------------------------------------
git fetch     | fetch all refs but tags | fetches all refs but tags
git fetch -t  | fetches only tags       | fetch all refs and tags

Right?

Regards,
Yann E. MORIN.

>  fi
>  
>  # Try to get the special refs exposed by some forges (pull-requests for
> -- 
> 2.14.1
> 

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-12  9:28 ` [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache Ricardo Martincoski
@ 2018-04-12 17:48   ` Yann E. MORIN
  2018-04-13 18:28     ` Ricardo Martincoski
  0 siblings, 1 reply; 27+ messages in thread
From: Yann E. MORIN @ 2018-04-12 17:48 UTC (permalink / raw)
  To: buildroot

Ricardo, All,

On 2018-04-12 06:28 -0300, Ricardo Martincoski spake thusly:
> In the case a previous operation in the git cache was abruptly
> interrupted the git cache can be in a dirty state (removed files,
> untracked files). This can lead to download failures during the
> checkout, like this:
>  error: The following untracked working tree files would be overwritten by checkout:
> and/or this:
>  error: The following untracked working tree files would be removed by checkout:
> 
> If an untracked file, directory or submodule is present after the
> checkout, the tarball will have additional content, leading to a hash
> mismatch.
> 
> When performing the checkout, use -f to allow it to occur even if there
> are untracked files that would be overwritten or removed.
> 
> Use git clean to remove any untracked files before generating the
> tarball. Use the second -f to ensure the repo is pristine even if
> previous checked out ref contained a submodule that is not present in
> the ref just checked out.
> 
> Fixes:
> http://autobuild.buildroot.net/results/b418394f679ad269d587753302c540036793334c
> Possibly fixes:
> http://autobuild.buildroot.net/results/9b584a624bb08c5ab6d977b93f4b92ed4bd1d40f

I'm not very happy of those tricks... If the repos is broken, it may be
in a state that repairing it is not even possible at all, in case a
critical git data is missing/damaged.

If the repo is broken, either:

  - we ditch the repository and restart frm scratch,

  - or we just bail out and tell the user what hapened, so they can take
    action (e.g. remove the broken repo manually).

Regards,
Yann E. MORIN.

> Signed-off-by: Ricardo Martincoski <ricardo.martincoski@gmail.com>
> Cc: "Yann E. MORIN" <yann.morin.1998@free.fr>
> ---
> I did not split this into 2 patches (the first one adding -f to checkout
> and the second one adding git reset) because only adding -f to checkout
> would make the git checkout command to succeed but the download would
> still fails when checking the hash if any untracked file was left. In
> this case the git cache also would still be dirty, so it is not really a
> fix to a dirty cache.
> ---
>  support/download/git | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/support/download/git b/support/download/git
> index 1b09e5c750..68b5d920a1 100755
> --- a/support/download/git
> +++ b/support/download/git
> @@ -92,7 +92,8 @@ fi
>  
>  # Checkout the required changeset, so that we can update the required
>  # submodules.
> -_git checkout -q "'${cset}'"
> +_git checkout -f -q "'${cset}'"
> +_git clean -ffdx
>  
>  # Get date of commit to generate a reproducible archive.
>  # %cD is RFC2822, so it's fully qualified, with TZ and all.
> -- 
> 2.14.1
> 

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 1/3] download/git: fix fetch all refs for old git
  2018-04-12 17:42   ` Yann E. MORIN
@ 2018-04-13 18:23     ` Ricardo Martincoski
  2018-04-15 12:12       ` Yann E. MORIN
  0 siblings, 1 reply; 27+ messages in thread
From: Ricardo Martincoski @ 2018-04-13 18:23 UTC (permalink / raw)
  To: buildroot

Hello,

Thank you for your review of the series.

On Thu, Apr 12, 2018 at 02:42 PM, Yann E. MORIN wrote:

>> -    _git fetch origin -t
>> +    _git fetch origin -u "'+refs/tags/*:refs/tags/*'" "'+refs/heads/*:refs/remotes/origin/*'"
> 
> Why don't we just run:
> 
>     git fetch origin
>     git fetch origin -t

One extra connection to the server ...
but much more readable/maintainable!
I will do. I will change this patch to Changes Requested.

> As I understand it:
> 
>                     ancient git         | new git
> --------------------------------------------------------------------
> git fetch     | fetch all refs but tags | fetches all refs but tags
> git fetch -t  | fetches only tags       | fetch all refs and tags
> 
> Right?

Yes.
Nit: actually 'git fetch' does fetch some tags, but not necessarily all. Only
tags in commits also reachable by branches are fetched.


Regards,
Ricardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-12 17:48   ` Yann E. MORIN
@ 2018-04-13 18:28     ` Ricardo Martincoski
  2018-04-15 12:02       ` Yann E. MORIN
  0 siblings, 1 reply; 27+ messages in thread
From: Ricardo Martincoski @ 2018-04-13 18:28 UTC (permalink / raw)
  To: buildroot

Hello,

On Thu, Apr 12, 2018 at 02:48 PM, Yann E. MORIN wrote:

> On 2018-04-12 06:28 -0300, Ricardo Martincoski spake thusly:
>> When performing the checkout, use -f to allow it to occur even if there
>> are untracked files that would be overwritten or removed.
>> 
>> Use git clean to remove any untracked files before generating the
>> tarball. Use the second -f to ensure the repo is pristine even if
>> previous checked out ref contained a submodule that is not present in
>> the ref just checked out.
[snip]
> 
> I'm not very happy of those tricks... If the repos is broken, it may be
> in a state that repairing it is not even possible at all, in case a
> critical git data is missing/damaged.

I created the series thinking only on git repos that are sane but have a dirty
worktree. I should have used "dirty worktree" in the commit log.

Rethinking now, taking into account a broken repo, and also the fact that the
default download directory is inside buildroot, this patch is a bit dangerous.

With a completely broken repo in the git cache, the commands called by the
download script end up executed in any repo that eventually contains that file
(parent \(of parent \)\+ directory), so:
$ cd buildroot
$ git init dl/package/git
$ cd dl/package/git
$ rm .git/HEAD # simulate a completely broken repo
$ git <command>
The git command would be executed in the buildroot tree.
A fetch to the wrong repo is bad, but not that bad.
A checkout would fail.
But checkout -f and git clean would succeed, potentially making the user to lose
changes.
I will change this patch to Changes Requested.

A way to avoid this, once the script entered the directory, is to force the
git-dir:
$ git --git-dir=.git <command>
This command would fail for a too-much-broken repo.

This is important also if we test the repo with git fsck before ditch+restart,
otherwise it would be possible to return OK based on the wrong repo.

> 
> If the repo is broken, either:
> 
>   - we ditch the repository and restart frm scratch,
> 
>   - or we just bail out and tell the user what hapened, so they can take
>     action (e.g. remove the broken repo manually).

When the repo is broken (detectable by git fsck) I agree this is the best
approach.

But I think a sane git repo (detectable by git fsck) that is in a dirty state
(in the sense of git clean) is a different case.
It can happen for example if a previous checkout operation was abruptly
interrupted during a git checkout. In this case it seems to me a bit extreme
(but not incorrect of course) to remove the repo and restart from scratch.

For the case a file from the worktree is missing the current 'git checkout'
succeeds and the problem shows up as a hash mismatch.
Using 'git checkout -f' the repo is recovered.
For the case a git object needed by the checkout is missing the current 'git
checkout' succeeds and the problem shows up as a hash mismatch.
Using 'git checkout -f' the checkout fails like this:
 error: invalid object

The git clean makes sure, among other things, that the worktree is clean even in
the case the previous checkout contained a submodule that is no longer present
in the new checkout. Maybe there is another command that ensures this, but I
don't know about yet.
Maybe I am missing something, but as I see, for a package with submodule, if
a submodule is removed from the upstream project, it would trigger a
ditch+restart too. Again, I think this is a bit extreme (more than git clean
-ffdx) but not wrong.

In order to detect broken repos (and trigger the ditch) we could start the
script by running fsck. Right?
Do you have a local patch for this ditch+restart? Or should I create one?


Regards,
Ricardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 3/3] download/git: unshallow when fetching all refs
  2018-04-12 17:33     ` Yann E. MORIN
@ 2018-04-13 18:32       ` Ricardo Martincoski
  2018-04-15 12:08         ` Yann E. MORIN
  0 siblings, 1 reply; 27+ messages in thread
From: Ricardo Martincoski @ 2018-04-13 18:32 UTC (permalink / raw)
  To: buildroot

Hello,

On Thu, Apr 12, 2018 at 02:33 PM, Yann E. MORIN wrote:

> On 2018-04-12 13:28 +0200, Thomas Petazzoni spake thusly:

>> > One scenario in buildroot development that would trigger this sequence
>> > is for example when a package is bumped on master branch, some users
>> > create the git cache at this time, then the bump is reverted.
>> > 
>> > Another scenario is when giving maintenance to a product version that
>> > uses one version of buildroot using the same build farm that uses a
>> > newer version of buildroot.
>> 
>> In the light of this, wouldn't it be simpler to stop doing shallow
>> clones at all ? A shallow clone did make a lot of sense back when we
>> didn't cache the Git repositories. But now that we are caching the
>> results, does it make sense to keep the complexity of the code to use
>> shallow clones ?
>> 
>> Note that this is really an open question, there are possibly some
>> convincing argument for us to keep a shallow clone strategy.
> 
> Indeed, I don't think it makes sense anymore...
> 
> So, intead of fixing it, let's just do full clones now.

I agree it is less needed now than before.

The worst scenario (linux trees) is covered by:
- the use of tarballs from GitHub in the defconfigs, for newcomers who sometimes
  just want to use a defconfig, add few packages and use the image in a new
  board;
- in the long run, due to the git cache feature.

But I think people will miss this feature. At least I will.

In the case of someone always using tags from a linux repo, this person
would never need to download the full repo if the shallow fetch remains.

When someone wants just to try a package, for example, a debug tool, which
maybe will never be used again, this person would need to do a full clone,
instead of potentially do a shallow fetch.

The repo for some packages can be located on slow servers (not necessarily a
package in the tree, it can be on a br2-external).

And the least appealing argument: I was willing to recreate
http://patchwork.ozlabs.org/patch/702006/ on top of the new infra.


Regards,
Ricardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-13 18:28     ` Ricardo Martincoski
@ 2018-04-15 12:02       ` Yann E. MORIN
  2018-04-16  2:54         ` Ricardo Martincoski
  0 siblings, 1 reply; 27+ messages in thread
From: Yann E. MORIN @ 2018-04-15 12:02 UTC (permalink / raw)
  To: buildroot

Ricardo, All,

On 2018-04-13 15:28 -0300, Ricardo Martincoski spake thusly:
> On Thu, Apr 12, 2018 at 02:48 PM, Yann E. MORIN wrote:
> > On 2018-04-12 06:28 -0300, Ricardo Martincoski spake thusly:
> >> When performing the checkout, use -f to allow it to occur even if there
> >> are untracked files that would be overwritten or removed.
> >> 
> >> Use git clean to remove any untracked files before generating the
> >> tarball. Use the second -f to ensure the repo is pristine even if
> >> previous checked out ref contained a submodule that is not present in
> >> the ref just checked out.
> [snip]
> > 
> > I'm not very happy of those tricks... If the repos is broken, it may be
> > in a state that repairing it is not even possible at all, in case a
> > critical git data is missing/damaged.
> 
> I created the series thinking only on git repos that are sane but have a dirty
> worktree. I should have used "dirty worktree" in the commit log.

Right, these are indeed two different cases.

> Rethinking now, taking into account a broken repo, and also the fact that the
> default download directory is inside buildroot, this patch is a bit dangerous.
> 
> With a completely broken repo in the git cache, the commands called by the
> download script end up executed in any repo that eventually contains that file
> (parent \(of parent \)\+ directory), so:
> $ cd buildroot
> $ git init dl/package/git
> $ cd dl/package/git
> $ rm .git/HEAD # simulate a completely broken repo
> $ git <command>
> The git command would be executed in the buildroot tree.
> A fetch to the wrong repo is bad, but not that bad.
> A checkout would fail.
> But checkout -f and git clean would succeed, potentially making the user to lose
> changes.

Arg, indeed that is dangerous...

> I will change this patch to Changes Requested.
> 
> A way to avoid this, once the script entered the directory, is to force the
> git-dir:
> $ git --git-dir=.git <command>
> This command would fail for a too-much-broken repo.

We recently reverted use of -C becasue it was not supported by old git
versions. What is the oldest git version to support --git-dir? It seems
it was introduced in v.1.4.2, dated 2006-08-12. Is that old enough that
all the distros we care about have that version or later?

> This is important also if we test the repo with git fsck before ditch+restart,
> otherwise it would be possible to return OK based on the wrong repo.
> 
> > 
> > If the repo is broken, either:
> > 
> >   - we ditch the repository and restart frm scratch,
> > 
> >   - or we just bail out and tell the user what hapened, so they can take
> >     action (e.g. remove the broken repo manually).
> 
> When the repo is broken (detectable by git fsck) I agree this is the best
> approach.
> 
> But I think a sane git repo (detectable by git fsck) that is in a dirty state
> (in the sense of git clean) is a different case.
> It can happen for example if a previous checkout operation was abruptly
> interrupted during a git checkout. In this case it seems to me a bit extreme
> (but not incorrect of course) to remove the repo and restart from scratch.
> 
> For the case a file from the worktree is missing the current 'git checkout'
> succeeds and the problem shows up as a hash mismatch.
> Using 'git checkout -f' the repo is recovered.
> For the case a git object needed by the checkout is missing the current 'git
> checkout' succeeds and the problem shows up as a hash mismatch.
> Using 'git checkout -f' the checkout fails like this:
>  error: invalid object
> 
> The git clean makes sure, among other things, that the worktree is clean even in
> the case the previous checkout contained a submodule that is no longer present
> in the new checkout. Maybe there is another command that ensures this, but I
> don't know about yet.
> Maybe I am missing something, but as I see, for a package with submodule, if
> a submodule is removed from the upstream project, it would trigger a
> ditch+restart too. Again, I think this is a bit extreme (more than git clean
> -ffdx) but not wrong.
> 
> In order to detect broken repos (and trigger the ditch) we could start the
> script by running fsck. Right?

So, basically, we would do something along those lines:

    _git() {
        eval ${GIT} --git-dir=.git "${@}"
    }

    _git fsck || { rm -rf ${git_cache}; exec "${0}" "${@}"; exit 1; }
    _git fetch
    _git fetch -t
    _git checkout ${ref}
    _git clean -dX
    _git checkout -- .
    _git submodules update

Of course, that would require using appropriate options to fsck to bail
out.

But what to do if any if the following actions fails? Should we simply
exit, or should we clean up and clone again?

I can see shere that could go wrong: the ref does not exist, so the
first checkout fails, so we ditch the repository, clone again, and
checkout again fails...

My opinion, for what it's worth, is to clan only on the fsck. Any other
failure should be left to the user to handle. Maybe with just a little
message saying something like:

    If you are not sure how to solve this, remove ${cache_dir}.

Thoughts?

> Do you have a local patch for this ditch+restart? Or should I create one?

I am working on that...

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 3/3] download/git: unshallow when fetching all refs
  2018-04-13 18:32       ` Ricardo Martincoski
@ 2018-04-15 12:08         ` Yann E. MORIN
  2018-04-16  3:04           ` Ricardo Martincoski
  0 siblings, 1 reply; 27+ messages in thread
From: Yann E. MORIN @ 2018-04-15 12:08 UTC (permalink / raw)
  To: buildroot

Ricardo, All,

On 2018-04-13 15:32 -0300, Ricardo Martincoski spake thusly:
> On Thu, Apr 12, 2018 at 02:33 PM, Yann E. MORIN wrote:
> > On 2018-04-12 13:28 +0200, Thomas Petazzoni spake thusly:
> 
> >> > One scenario in buildroot development that would trigger this sequence
> >> > is for example when a package is bumped on master branch, some users
> >> > create the git cache at this time, then the bump is reverted.
> >> > 
> >> > Another scenario is when giving maintenance to a product version that
> >> > uses one version of buildroot using the same build farm that uses a
> >> > newer version of buildroot.
> >> 
> >> In the light of this, wouldn't it be simpler to stop doing shallow
> >> clones at all ? A shallow clone did make a lot of sense back when we
> >> didn't cache the Git repositories. But now that we are caching the
> >> results, does it make sense to keep the complexity of the code to use
> >> shallow clones ?
> >> 
> >> Note that this is really an open question, there are possibly some
> >> convincing argument for us to keep a shallow clone strategy.
> > 
> > Indeed, I don't think it makes sense anymore...
> > 
> > So, intead of fixing it, let's just do full clones now.
> 
> I agree it is less needed now than before.

OK.

> The worst scenario (linux trees) is covered by:
> - the use of tarballs from GitHub in the defconfigs, for newcomers who sometimes
>   just want to use a defconfig, add few packages and use the image in a new
>   board;
> - in the long run, due to the git cache feature.
> 
> But I think people will miss this feature. At least I will.

Except for the linux git tree and a very few other packages we get from
git, most git tree are reasonably sized, so that the overhead of doign a
full clone is not much as comp[ared to the shallow clone.

And since it also makes our code much simpler and more maintaineable,
that's still a win for me...

> In the case of someone always using tags from a linux repo, this person
> would never need to download the full repo if the shallow fetch remains.
> 
> When someone wants just to try a package, for example, a debug tool, which
> maybe will never be used again, this person would need to do a full clone,
> instead of potentially do a shallow fetch.
> 
> The repo for some packages can be located on slow servers (not necessarily a
> package in the tree, it can be on a br2-external).
> 
> And the least appealing argument: I was willing to recreate
> http://patchwork.ozlabs.org/patch/702006/ on top of the new infra.

I'm still not convinced any more than I previously was. I.e. I still think
we should go with full clone.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 1/3] download/git: fix fetch all refs for old git
  2018-04-13 18:23     ` Ricardo Martincoski
@ 2018-04-15 12:12       ` Yann E. MORIN
  2018-04-16  2:17         ` Ricardo Martincoski
  0 siblings, 1 reply; 27+ messages in thread
From: Yann E. MORIN @ 2018-04-15 12:12 UTC (permalink / raw)
  To: buildroot

Ricardo, All,

On 2018-04-13 15:23 -0300, Ricardo Martincoski spake thusly:
> On Thu, Apr 12, 2018 at 02:42 PM, Yann E. MORIN wrote:
> >> -    _git fetch origin -t
> >> +    _git fetch origin -u "'+refs/tags/*:refs/tags/*'" "'+refs/heads/*:refs/remotes/origin/*'"
> > Why don't we just run:
> >     git fetch origin
> >     git fetch origin -t
> 
> One extra connection to the server ...
> but much more readable/maintainable!
> I will do. I will change this patch to Changes Requested.

Thanks. I don't care much about an extra round-trip to the server,
especially if that makes the code easier to read.

> > As I understand it:
> >                     ancient git         | new git
> > --------------------------------------------------------------------
> > git fetch     | fetch all refs but tags | fetches all refs but tags
> > git fetch -t  | fetches only tags       | fetch all refs and tags
> Nit: actually 'git fetch' does fetch some tags, but not necessarily all. Only
> tags in commits also reachable by branches are fetched.

OK, but overall, we would cover all cases with the two git-fetch,
whether we use an old git or a newer one, and that's the important
info.

Thanks! :-)

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 1/3] download/git: fix fetch all refs for old git
  2018-04-15 12:12       ` Yann E. MORIN
@ 2018-04-16  2:17         ` Ricardo Martincoski
  0 siblings, 0 replies; 27+ messages in thread
From: Ricardo Martincoski @ 2018-04-16  2:17 UTC (permalink / raw)
  To: buildroot

Hello,

> On 2018-04-13 15:23 -0300, Ricardo Martincoski spake thusly:
>> On Thu, Apr 12, 2018 at 02:42 PM, Yann E. MORIN wrote:
>> >> -    _git fetch origin -t
>> >> +    _git fetch origin -u "'+refs/tags/*:refs/tags/*'" "'+refs/heads/*:refs/remotes/origin/*'"
>> > Why don't we just run:
>> >     git fetch origin
>> >     git fetch origin -t
>> 
>> One extra connection to the server ...
>> but much more readable/maintainable!
>> I will do. I will change this patch to Changes Requested.
> 
> Thanks. I don't care much about an extra round-trip to the server,
> especially if that makes the code easier to read.

I agree.
And a few years from now, when we stop supporting distros with git <1.9.0 we
can remove 'git fetch origin' if we wish.

BTW, the order of the commands is important here. Some dumb http servers don't
work well with old git versions if 'git fetch origin -t' is used first, see:
https://gitlab.com/RicardoMartincoski/buildroot/-/jobs/62895510
But in the order you propose (and I see you used in your series) it works fine:
https://gitlab.com/RicardoMartincoski/buildroot/-/jobs/62897927
So we are good.

> OK, but overall, we would cover all cases with the two git-fetch,
> whether we use an old git or a newer one, and that's the important
> info.

Agreed.


Regards,
Ricardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-15 12:02       ` Yann E. MORIN
@ 2018-04-16  2:54         ` Ricardo Martincoski
  2018-04-16 16:01           ` Yann E. MORIN
  0 siblings, 1 reply; 27+ messages in thread
From: Ricardo Martincoski @ 2018-04-16  2:54 UTC (permalink / raw)
  To: buildroot

Hello,

On Sun, Apr 15, 2018 at 09:02 AM, Yann E. MORIN wrote:

> On 2018-04-13 15:28 -0300, Ricardo Martincoski spake thusly:
>> On Thu, Apr 12, 2018 at 02:48 PM, Yann E. MORIN wrote:
>> > On 2018-04-12 06:28 -0300, Ricardo Martincoski spake thusly:
>> A way to avoid this, once the script entered the directory, is to force the
>> git-dir:
>> $ git --git-dir=.git <command>
>> This command would fail for a too-much-broken repo.
> 
> We recently reverted use of -C becasue it was not supported by old git
> versions. What is the oldest git version to support --git-dir? It seems
> it was introduced in v.1.4.2, dated 2006-08-12. Is that old enough that
> all the distros we care about have that version or later?

I think so. I added Thomas to the thread.
In the past some similar decisions were taken based on the oldest supported
RHEL version.
RHEL 6 has EOL on November 30, 2020. It included git 1.7.1 last time I checked
(one year ago!).

Concerning the code, an alternative is: 
GIT_DIR=.git git <command>
It works with fsck. Hopefully it always worked, but I did not checked the
git/.git history.

I think it is important to use this with fsck, otherwise for a too-much-broken
repo (say, .git/HEAD is missing) git would search the parent directory, then
the parent of the parent, and so on, and in the case it is a subdir of
buildroot (or any other git repo) the fsck would return OK for the wrong repo.

>> In order to detect broken repos (and trigger the ditch) we could start the
>> script by running fsck. Right?
> 
> So, basically, we would do something along those lines:
> 
>     _git() {
>         eval ${GIT} --git-dir=.git "${@}"
>     }
> 
>     _git fsck || { rm -rf ${git_cache}; exec "${0}" "${@}"; exit 1; }
>     _git fetch
>     _git fetch -t
>     _git checkout ${ref}
>     _git clean -dX
>     _git checkout -- .
>     _git submodules update

In the overall I agree with this sequence. Below some nits:

I would run fsck using GIT_DIR before the script enters {git_cache}, this way it
is easy to remove the broken git cache and run init again, and the script
continues without the need to call itself.
GIT_DIR works with git 1.8.3, I did not tested with 1.7.1

When we have -f in checkout (I see in your WIP series that you considered to add
it) do we need the '_git checkout -- .'?

I was initially thinking in: first run fsck forcing the GIT_DIR and then trust
the repo is sane, but your approach of adding it to _git() seems even more
robust, it covers even the case someone (not buildroot) messes with the git
cache in the wrong moment.
Maybe it would be possible to even remove pushd/popd altogether. Of course it
would cause changes to find and tar command lines again, so I don't think it's
worth the risk of finding more corner cases with tool versions than the ones
already solved.

> 
> Of course, that would require using appropriate options to fsck to bail
> out.

Yes. Some interesting ones I listed below:
--no-dangling: AFAIK they cause no harm;
--no-reflogs: not sure;
--full: this is for the case someone is abusing the git cache with alternates,
        should we care?

I find out that different git versions also use different sets of errors to
return non-zero code. Not so different, don't worry.
Old versions return 0 for few errors (but the error: message is printed) 
git 1.8.3: return code 0 for a missing sha1 object pointed by a tag, printing:
 error: refs/tags/tag_b does not point to a valid object!
git 2.14.1: return code 2 in the same case
 error: refs/tags/tag_b: invalid sha1 pointer 1f95d47cc18a9ed4e1eab9b71fe2009c9555448d
BUT, as we always do a fetch before checkout, the fetch fixes it!
So again, we are good. I don't think it needs extra code.

> 
> But what to do if any if the following actions fails? Should we simply
> exit, or should we clean up and clone again?
> 
> I can see shere that could go wrong: the ref does not exist, so the
> first checkout fails, so we ditch the repository, clone again, and
> checkout again fails...
> 
> My opinion, for what it's worth, is to clan only on the fsck. Any other
> failure should be left to the user to handle. Maybe with just a little
> message saying something like:
> 
>     If you are not sure how to solve this, remove ${cache_dir}.
> 
> Thoughts?

I agree, to clean only on the fsck is better.

The user-friendly message is not *needed* IMO.
But if you find an easy way to do that, it would be nice to have.
Maybe in _git() but not in the fsck case.
Maybe a trap? Not sure.

Regards,
Ricardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 3/3] download/git: unshallow when fetching all refs
  2018-04-15 12:08         ` Yann E. MORIN
@ 2018-04-16  3:04           ` Ricardo Martincoski
  0 siblings, 0 replies; 27+ messages in thread
From: Ricardo Martincoski @ 2018-04-16  3:04 UTC (permalink / raw)
  To: buildroot

Hello,

On Sun, Apr 15, 2018 at 09:08 AM, Yann E. MORIN wrote:

>> The worst scenario (linux trees) is covered by:
>> - the use of tarballs from GitHub in the defconfigs, for newcomers who sometimes
>>   just want to use a defconfig, add few packages and use the image in a new
>>   board;
>> - in the long run, due to the git cache feature.
>> 
>> But I think people will miss this feature. At least I will.
> 
> Except for the linux git tree and a very few other packages we get from
> git, most git tree are reasonably sized, so that the overhead of doign a
> full clone is not much as comp[ared to the shallow clone.
> 
> And since it also makes our code much simpler and more maintaineable,

Much simpler code indeed. I see in your WIP series.

> that's still a win for me...

Unrelated question:
Should we add something about the git cache to 'Migrating from older Buildroot
versions'?
I am not sure because it does not really need any action from the user. It only
changes the behavior. For better in the long run.

An alternative is to reformat that e-mail that Thomas sent with a nice summary
of the feature and add it to 'Download infrastructure'.
If you don't find an easy way to add a user-friendly message in the rare case
the download infra cannot recover the cache (after your WIP series), the
message
     If you are not sure how to solve this, remove the git cache.
Could be added to this section of the manual.

>> In the case of someone always using tags from a linux repo, this person
>> would never need to download the full repo if the shallow fetch remains.
>> 
>> When someone wants just to try a package, for example, a debug tool, which
>> maybe will never be used again, this person would need to do a full clone,
>> instead of potentially do a shallow fetch.
>> 
>> The repo for some packages can be located on slow servers (not necessarily a
>> package in the tree, it can be on a br2-external).
>> 
>> And the least appealing argument: I was willing to recreate
>> http://patchwork.ozlabs.org/patch/702006/ on top of the new infra.
> 
> I'm still not convinced any more than I previously was. I.e. I still think
> we should go with full clone.

OK. If someone else brings a more convincing argument in the future we can
always re-add it covering all its corner cases.
I will change this patch to Rejected.

For my use case of sha1 from Gerrit, I will pursue different paths...
Maybe tuning the server side options to generate tarballs.
Maybe adding support to register download methods from a br2-external.

Regards,
Ricardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-16  2:54         ` Ricardo Martincoski
@ 2018-04-16 16:01           ` Yann E. MORIN
  2018-04-16 20:56             ` Yann E. MORIN
  2018-04-17  4:45             ` Ricardo Martincoski
  0 siblings, 2 replies; 27+ messages in thread
From: Yann E. MORIN @ 2018-04-16 16:01 UTC (permalink / raw)
  To: buildroot

Ricardo, All,

On 2018-04-15 23:54 -0300, Ricardo Martincoski spake thusly:
> On Sun, Apr 15, 2018 at 09:02 AM, Yann E. MORIN wrote:
[--SNIP--]
> > Of course, that would require using appropriate options to fsck to bail
> > out.
> 
> Yes. Some interesting ones I listed below:
> --no-dangling: AFAIK they cause no harm;
> --no-reflogs: not sure;
> --full: this is for the case someone is abusing the git cache with alternates,
>         should we care?

So I played with git-fsck in quite a few setups, and it irremediably
exits with a non-zero status when something is wrong.

However, on my machine, it takes about 8min to fsck the Linux git tree,
which is a huge amount of time (much more than it takes to do the build
of said kernel).

So, I am a bit reluctant at using git-fsck.

I'm trying to see if we can find a faster way to detect if the git tree
is sane or not. After all, we only need a sanity check, not repairing.
If it is not sane, we ditch it and reclone.

So, maybe just running "git status" or any other fast action should
probably be enough.

Thoughts?

> I find out that different git versions also use different sets of errors to
> return non-zero code. Not so different, don't worry.
> Old versions return 0 for few errors (but the error: message is printed) 
> git 1.8.3: return code 0 for a missing sha1 object pointed by a tag, printing:
>  error: refs/tags/tag_b does not point to a valid object!
> git 2.14.1: return code 2 in the same case
>  error: refs/tags/tag_b: invalid sha1 pointer 1f95d47cc18a9ed4e1eab9b71fe2009c9555448d
> BUT, as we always do a fetch before checkout, the fetch fixes it!
> So again, we are good. I don't think it needs extra code.

Hmm... As long as the repo is not _broken_ and we can recover with the
fetch / clean/ checkout actions, I'm OK.

> > But what to do if any if the following actions fails? Should we simply
> > exit, or should we clean up and clone again?
> > 
> > I can see shere that could go wrong: the ref does not exist, so the
> > first checkout fails, so we ditch the repository, clone again, and
> > checkout again fails...
> > 
> > My opinion, for what it's worth, is to clan only on the fsck. Any other
> > failure should be left to the user to handle. Maybe with just a little
> > message saying something like:
> > 
> >     If you are not sure how to solve this, remove ${cache_dir}.
> > 
> > Thoughts?
> 
> I agree, to clean only on the fsck is better.
> 
> The user-friendly message is not *needed* IMO.
> But if you find an easy way to do that, it would be nice to have.
> Maybe in _git() but not in the fsck case.
> Maybe a trap? Not sure.

Well, a trap is pretty simple, yes.

But let's leave that out for now, we already have enough complexity to
handle...

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-16 16:01           ` Yann E. MORIN
@ 2018-04-16 20:56             ` Yann E. MORIN
       [not found]               ` <1ffe206e-4b1e-c233-a511-ba4c3a8cb5f0@armadeus.com>
  2018-04-17  4:45             ` Ricardo Martincoski
  1 sibling, 1 reply; 27+ messages in thread
From: Yann E. MORIN @ 2018-04-16 20:56 UTC (permalink / raw)
  To: buildroot

Ricardo, All,

To all autobuilder runners: could you create a tarball of the failed
dtv-scan-tables git tree on yuour autobuilders, from each of your
instances, please?

I got mine, butonly one is really broken (missing HEAD). I'd like to
get a few other breakage to investigate furhter the git backend fixes
and be sure they work as expected.

Thanks!

Otherwise, Ricardo, more comments below...

On 2018-04-16 18:01 +0200, Yann E. MORIN spake thusly:
> On 2018-04-15 23:54 -0300, Ricardo Martincoski spake thusly:
> > On Sun, Apr 15, 2018 at 09:02 AM, Yann E. MORIN wrote:
> [--SNIP--]
> > > Of course, that would require using appropriate options to fsck to bail
> > > out.
> > 
> > Yes. Some interesting ones I listed below:
> > --no-dangling: AFAIK they cause no harm;
> > --no-reflogs: not sure;
> > --full: this is for the case someone is abusing the git cache with alternates,
> >         should we care?
> 
> So I played with git-fsck in quite a few setups, and it irremediably
> exits with a non-zero status when something is wrong.

And now I've had a case where git-fsck would barf, because of missing
objects. The git tree was not salvageable with git-fsck.

But then, git-fetch retrieved the missing blobs, and everything was back
to being fine.

So, we'd have situations where we'd detect a borked repository, so we'd
re-clone it from scratch, when a simple fetch would have retrieve the
missing parts...

Hoowever, I'd prefer that we play safe, and download more than
necessary, but be sure we end up with a sane situation.

And I still think that running git-status is enough to detect a borked
repository, rather than the expensive git-fsck.

Thoughts?

Regards,
Yann E. MORIN.

> However, on my machine, it takes about 8min to fsck the Linux git tree,
> which is a huge amount of time (much more than it takes to do the build
> of said kernel).
> 
> So, I am a bit reluctant at using git-fsck.
> 
> I'm trying to see if we can find a faster way to detect if the git tree
> is sane or not. After all, we only need a sanity check, not repairing.
> If it is not sane, we ditch it and reclone.
> 
> So, maybe just running "git status" or any other fast action should
> probably be enough.
> 
> Thoughts?
> 
> > I find out that different git versions also use different sets of errors to
> > return non-zero code. Not so different, don't worry.
> > Old versions return 0 for few errors (but the error: message is printed) 
> > git 1.8.3: return code 0 for a missing sha1 object pointed by a tag, printing:
> >  error: refs/tags/tag_b does not point to a valid object!
> > git 2.14.1: return code 2 in the same case
> >  error: refs/tags/tag_b: invalid sha1 pointer 1f95d47cc18a9ed4e1eab9b71fe2009c9555448d
> > BUT, as we always do a fetch before checkout, the fetch fixes it!
> > So again, we are good. I don't think it needs extra code.
> 
> Hmm... As long as the repo is not _broken_ and we can recover with the
> fetch / clean/ checkout actions, I'm OK.
> 
> > > But what to do if any if the following actions fails? Should we simply
> > > exit, or should we clean up and clone again?
> > > 
> > > I can see shere that could go wrong: the ref does not exist, so the
> > > first checkout fails, so we ditch the repository, clone again, and
> > > checkout again fails...
> > > 
> > > My opinion, for what it's worth, is to clan only on the fsck. Any other
> > > failure should be left to the user to handle. Maybe with just a little
> > > message saying something like:
> > > 
> > >     If you are not sure how to solve this, remove ${cache_dir}.
> > > 
> > > Thoughts?
> > 
> > I agree, to clean only on the fsck is better.
> > 
> > The user-friendly message is not *needed* IMO.
> > But if you find an easy way to do that, it would be nice to have.
> > Maybe in _git() but not in the fsck case.
> > Maybe a trap? Not sure.
> 
> Well, a trap is pretty simple, yes.
> 
> But let's leave that out for now, we already have enough complexity to
> handle...
> 
> Regards,
> Yann E. MORIN.
> 
> -- 
> .-----------------.--------------------.------------------.--------------------.
> |  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
> | +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
> | +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
> | http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
> '------------------------------^-------^------------------^--------------------'
> _______________________________________________
> buildroot mailing list
> buildroot at busybox.net
> http://lists.busybox.net/mailman/listinfo/buildroot

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-16 16:01           ` Yann E. MORIN
  2018-04-16 20:56             ` Yann E. MORIN
@ 2018-04-17  4:45             ` Ricardo Martincoski
  2018-04-17  7:04               ` Ricardo Martincoski
  1 sibling, 1 reply; 27+ messages in thread
From: Ricardo Martincoski @ 2018-04-17  4:45 UTC (permalink / raw)
  To: buildroot

Hello,

On Mon, Apr 16, 2018 at 01:01 PM, Yann E. MORIN wrote:

> On 2018-04-15 23:54 -0300, Ricardo Martincoski spake thusly:
>> On Sun, Apr 15, 2018 at 09:02 AM, Yann E. MORIN wrote:
> [--SNIP--]
>> > Of course, that would require using appropriate options to fsck to bail
>> > out.
>> 
>> Yes. Some interesting ones I listed below:
>> --no-dangling: AFAIK they cause no harm;
>> --no-reflogs: not sure;
>> --full: this is for the case someone is abusing the git cache with alternates,
>>         should we care?
> 
> So I played with git-fsck in quite a few setups, and it irremediably
> exits with a non-zero status when something is wrong.
> 
> However, on my machine, it takes about 8min to fsck the Linux git tree,
> which is a huge amount of time (much more than it takes to do the build
> of said kernel).

Argh. Indeed it tests too much. And it takes too much time.
Sorry my bad suggestion.

> So, I am a bit reluctant at using git-fsck.
> 
> I'm trying to see if we can find a faster way to detect if the git tree
> is sane or not. After all, we only need a sanity check, not repairing.
> If it is not sane, we ditch it and reclone.

Agreed.

> So, maybe just running "git status" or any other fast action should
> probably be enough.

I think 'git status' tests too much too. It's more related to the worktree
status than to the repo (objects) status and we don't want to ditch the repo for
a dirty worktree, git checkout -f and git clean -ffdx fix it for us.

Maybe an even simpler command: 'git log -1' with the right options (to avoid
failing if a checkout did not occur yet) or even other command, like 'git
remote' or 'git tag'. But see below...

> 
> Thoughts?
> 

Maybe the solution is to use a reactive approach.

With your approach of using --git-dir (or the alternate GIT_DIR=) for every
command, we don't need a specific command to do the sanity check.

If the repo is totally broken (e.g. .git/HEAD missing) the 'git remote' fails.

If some object that the ref-to-be-checked-out needs is missing, 'git checkout
-f' fails. Git is smart, it only fails if there is not a checked out copy of the
file that matches the hash of the missing object.
If the fetch was successful but the ref we need was not download (i.e. the case
you described of someone using the wrong ref), the checkout also fails.
So when the checkout fails we could test if the ref we need is in the local
repo. If it is not present, we assume the repo is minimally sane and bail out.
If it is present (and the checkout failed) something really wrong is going on.
I guess 'git log -1 ref' will return the correct results for this test.

    _git() {
        eval ${GIT} --git-dir=.git "${@}"
    }
    _git init
    _git remote
    _git fetch
    _git fetch -t
    _git checkout -f ${ref}
    if failed
        if git log -1 ref
            ditch and restart
        else
            bail out
    _git clean -dX
    _git submodules update

Regards,
Ricardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-17  4:45             ` Ricardo Martincoski
@ 2018-04-17  7:04               ` Ricardo Martincoski
  2018-04-17  8:10                 ` Arnout Vandecappelle
  0 siblings, 1 reply; 27+ messages in thread
From: Ricardo Martincoski @ 2018-04-17  7:04 UTC (permalink / raw)
  To: buildroot

Hello,

On Tue, Apr 17, 2018 at 01:45 AM, Ricardo Martincoski wrote:

>     _git() {
>         eval ${GIT} --git-dir=.git "${@}"
>     }
>     _git init
>     _git remote
>     _git fetch
>     _git fetch -t
>     _git checkout -f ${ref}
>     if failed
>         if git log -1 ref
>             ditch and restart
>         else
>             bail out
>     _git clean -dX
>     _git submodules update

But let's take a step back...

A dirty worktree is not so hard to find in practice. It happens when certain git
commands are abruptly interrupted. As Thomas wrote in another thread, his
autobuilder instance is sometimes restarted with the script running. Therefore
the abrupt interruption will eventually occur during a checkout, possibly
causing a dirty worktree.
But the right commands (I believe they are checkout -f, git clean -ffdx) will
fix it, assuming the repo is sane.

But a broken repo is usually associated either to hardware failures, especially
those repos that cannot be recovered by 'git init' and/or 'git fetch', or to
misuse, that from the git perspective is what the autobuilder script is
currently doing by eventually removing random individual files from inside a
.git dir.

And the implementation of ditch+restart seems to be more complicated than it
seemed to me before.

So, back to your proposal for a broken repo:
either:
 - ditch+restart
 - bail out

I changed my mind and I think we should adopt bail out for now.
It will keep the script simple, yet robust.
ditch+restart can be implemented later if we want/need.

Thoughts?


    _git() {
        eval ${GIT} --git-dir=.git "${@}"
    }
    _git init
    _git remote
    _git fetch
    _git fetch -t
    _git checkout -f ${ref}
    _git clean -dX
    _git submodules update

A simple improvement to the script to recover few broken repo scenarios is to
make git init unconditional. From its help:
"Running git init in an existing repository is safe. It will not overwrite
things that are already there."
For example, running 'git init' in a repo without .git/HEAD fixes it.

Here is a squashed, no commit log, no comments on the code, hackish preview:
https://gitlab.com/RicardoMartincoski/buildroot/commit/f4e20bb62b761bb6c82a29acbd2f8d56d90e4e9c

Regards,
Ricardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-17  7:04               ` Ricardo Martincoski
@ 2018-04-17  8:10                 ` Arnout Vandecappelle
  2018-04-17  8:56                   ` Ricardo Martincoski
  0 siblings, 1 reply; 27+ messages in thread
From: Arnout Vandecappelle @ 2018-04-17  8:10 UTC (permalink / raw)
  To: buildroot



On 17-04-18 09:04, Ricardo Martincoski wrote:
> Hello,
> 
> On Tue, Apr 17, 2018 at 01:45 AM, Ricardo Martincoski wrote:
> 
>>     _git() {
>>         eval ${GIT} --git-dir=.git "${@}"
>>     }
>>     _git init
>>     _git remote
>>     _git fetch
>>     _git fetch -t
>>     _git checkout -f ${ref}
>>     if failed
>>         if git log -1 ref
>>             ditch and restart
>>         else
>>             bail out
>>     _git clean -dX
>>     _git submodules update
> 
> But let's take a step back...
> 
> A dirty worktree is not so hard to find in practice. It happens when certain git
> commands are abruptly interrupted. As Thomas wrote in another thread, his
> autobuilder instance is sometimes restarted with the script running. Therefore
> the abrupt interruption will eventually occur during a checkout, possibly
> causing a dirty worktree.
> But the right commands (I believe they are checkout -f, git clean -ffdx) will
> fix it, assuming the repo is sane.
> 
> But a broken repo is usually associated either to hardware failures, especially
> those repos that cannot be recovered by 'git init' and/or 'git fetch', or to
> misuse, that from the git perspective is what the autobuilder script is
> currently doing by eventually removing random individual files from inside a
> .git dir.

 I was going to say exactly the same yesterday evening but didn't have the time
to think it through. All the fsck or status or whatever checks are only needed
to detect a broken .git directory, and this should never happen in normal git
operation.


> And the implementation of ditch+restart seems to be more complicated than it
> seemed to me before.
> 
> So, back to your proposal for a broken repo:
> either:
>  - ditch+restart
>  - bail out
> 
> I changed my mind and I think we should adopt bail out for now.
> It will keep the script simple, yet robust.
> ditch+restart can be implemented later if we want/need.

 Since a broken repo supposedly should never occur, I think we simply don't need
to handle that case.

> Thoughts?
> 
> 
>     _git() {
>         eval ${GIT} --git-dir=.git "${@}"

 I do indeed think that passing git-dir explicitly is a good idea. However, it
should be passed as GIT_DIR in the environment instead of --git-dir so it also
works with older versions of git. I've just checked git history, it seems that
GIT_DIR in the environment was supported since v0.99 (i.e. from before the first
tag in the repo).


>     }
>     _git init
>     _git remote
>     _git fetch
>     _git fetch -t
>     _git checkout -f ${ref}
>     _git clean -dX
>     _git submodules update
> 
> A simple improvement to the script to recover few broken repo scenarios is to
> make git init unconditional. From its help:
> "Running git init in an existing repository is safe. It will not overwrite
> things that are already there."
> For example, running 'git init' in a repo without .git/HEAD fixes it.

 That is a *great* idea.


> Here is a squashed, no commit log, no comments on the code, hackish preview:
> https://gitlab.com/RicardoMartincoski/buildroot/commit/f4e20bb62b761bb6c82a29acbd2f8d56d90e4e9c

 Looks good to me, with the following mods:

* use GIT_DIR
* add comment above git init explaining that it is safe
* explain why -ff is needed for git clean
* Removing the shallow fetch should be done in a separate patch (I'm not
entirely convinced yet that it is needed).

 Regards,
 Arnout

-- 
Arnout Vandecappelle                          arnout at mind be
Senior Embedded Software Architect            +32-16-286500
Essensium/Mind                                http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium           BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint:  7493 020B C7E3 8618 8DEC 222C 82EB F404 F9AC 0DDF

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-17  8:10                 ` Arnout Vandecappelle
@ 2018-04-17  8:56                   ` Ricardo Martincoski
  2018-04-17 10:36                     ` Yann E. MORIN
  0 siblings, 1 reply; 27+ messages in thread
From: Ricardo Martincoski @ 2018-04-17  8:56 UTC (permalink / raw)
  To: buildroot

Hello,

On Tue, Apr 17, 2018 at 05:10 AM, Arnout Vandecappelle wrote:

> On 17-04-18 09:04, Ricardo Martincoski wrote:

Thank you for your comments.

>> Here is a squashed, no commit log, no comments on the code, hackish preview:
>> https://gitlab.com/RicardoMartincoski/buildroot/commit/f4e20bb62b761bb6c82a29acbd2f8d56d90e4e9c
> 
>  Looks good to me, with the following mods:
> 
> * use GIT_DIR
> * add comment above git init explaining that it is safe

> * explain why -ff is needed for git clean
> * Removing the shallow fetch should be done in a separate patch (I'm not
> entirely convinced yet that it is needed).

These 2 are already covered by Yann's WIP series, with comments and proper
commit logs:
https://git.buildroot.org/~ymorin/git/buildroot/log/?h=yem/git-robust

Regards,
Ricardo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-17  8:56                   ` Ricardo Martincoski
@ 2018-04-17 10:36                     ` Yann E. MORIN
  2018-04-17 11:58                       ` Arnout Vandecappelle
  0 siblings, 1 reply; 27+ messages in thread
From: Yann E. MORIN @ 2018-04-17 10:36 UTC (permalink / raw)
  To: buildroot

Ricardo, Arnout, All,

On 2018-04-17 05:56 -0300, Ricardo Martincoski spake thusly:
> On Tue, Apr 17, 2018 at 05:10 AM, Arnout Vandecappelle wrote:
> > On 17-04-18 09:04, Ricardo Martincoski wrote:
> >> Here is a squashed, no commit log, no comments on the code, hackish preview:
> >> https://gitlab.com/RicardoMartincoski/buildroot/commit/f4e20bb62b761bb6c82a29acbd2f8d56d90e4e9c

ACK. I'll vampirise the git-init part, then.

> >  Looks good to me, with the following mods:
> > 
> > * use GIT_DIR
> > * add comment above git init explaining that it is safe
> > * explain why -ff is needed for git clean

ACK ?3

> > * Removing the shallow fetch should be done in a separate patch (I'm not
> > entirely convinced yet that it is needed).

IIRC, Ricardo explained the case in a previous message: there are cases
where we may miss blobs in a tree.

But anyway, the current code only works by chance. Let's assume you want
to use tag v4.17-rc1 from Linus' tree. The code would currently do
something like (without an existing git cache)

    mkdir git
    git init git
    cd git
    git remote add origin https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
    git fetch origin --depth 1 v4.17-rc1
    if ! git fetch origin v4.17-rc1:v4.17-rc1 ; then
        echo "warning"
    fi
    git checkout v4.17-rc1

The if-block is initially there to fetch special refs created by things
like gerrit or github PRs. If you do the above, you get the warning,
i.e. the if-condition is false, i.e. fetching the tag that way fails
with exit-code 1, and this message:

    error: cannot update the ref 'refs/heads/v4.17-rc1': Trying to write
    non-commit object a79c33a10ce2764c62fb8af6cbb571752d55c1c0 to branch
    refs/heads/v4.17-rc1
    From https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
     ! [new tag]         v4.17-rc1  -> v4.17-rc1  (unable to update local ref)
     * [new tag]         v4.17-rc1  -> v4.17-rc1

But since it is in a if-ciondition, we just print a warning, and
continue.

Then and the checkout succeeds.

But! If you didn't try to fetch the special refs, the checkout would
have failed:

    error: pathspec 'v4.17-rc1' did not match any file(s) known to git.

I.e. we have code that just happens to work by chance (or rather, by
side-effects and luck).

So, even if the missing-blob problem could be solved another way, our
shallow fetch is anyway already borked, but we were lucky so far not to
fall for it...

I still think that this shalow-fetch dance is too risky, and that having
a sane git cache helps so much more in the long run, so much so that we
should just ditch the shallow fetch support.

> These 2 are already covered by Yann's WIP series, with comments and proper
> commit logs:
> https://git.buildroot.org/~ymorin/git/buildroot/log/?h=yem/git-robust

With the last two to be swapped.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
       [not found]               ` <1ffe206e-4b1e-c233-a511-ba4c3a8cb5f0@armadeus.com>
@ 2018-04-17 10:42                 ` Yann E. MORIN
  2018-04-17 11:30                   ` Thomas Petazzoni
  0 siblings, 1 reply; 27+ messages in thread
From: Yann E. MORIN @ 2018-04-17 10:42 UTC (permalink / raw)
  To: buildroot

Julien, All,

On 2018-04-17 12:05 +0200, Julien Boibessot spake thusly:
> On 16/04/2018 22:56, Yann E. MORIN wrote:
> 
>  Ricardo, All,To all autobuilder runners: could you create a tarball of the failed
>  dtv-scan-tables git tree on yuour autobuilders, from each of your
>  instances, please?
> 
> I'm not sure to understand what you reallly need, could you please clarify ?

You are running an autobuilder. That autobuilder will run 'n' builds in
parallel, each in a directory instance-0, instance-1... In each of those
instances, there is a dl/ sub-dir.

In that dl/ sub-dir, there isyet another sub-dir per-package, of which
dtv-scan-tables/

Buildroot downloads dtv-scan-tables as a git clone, stored in the git/
directory of the aforementioned dtv-scan-tables/ directory.

I would need that you provide all thos git trees from all the instances
your autobuilder runs, i.e. something like:

    tar cJf dtv-scan-tables-git.tar.xz instance-*/dl/dtv-scan-tables/git/

And then send me dtv-scan-tables-git.tar.xz

Please! ;-)

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-17 10:42                 ` Yann E. MORIN
@ 2018-04-17 11:30                   ` Thomas Petazzoni
  0 siblings, 0 replies; 27+ messages in thread
From: Thomas Petazzoni @ 2018-04-17 11:30 UTC (permalink / raw)
  To: buildroot

Hello,

On Tue, 17 Apr 2018 12:42:13 +0200, Yann E. MORIN wrote:

> Please! ;-)

On my side, I looked at the squashfs failures:

  http://autobuild.buildroot.net/?submitter=Thomas+Petazzoni+(Free+Electrons+server)&reason=%squashfs%

I looked at instance-1 and instance-2, both of which are failing, and
they look the same on the filesystem.

instance-2 looks like this:

$ tree -a instance-2/dl/squashfs
instance-2/dl/squashfs
|-- git
|   `-- .git
|       |-- FETCH_HEAD
|       |-- HEAD
|       |-- branches
|       |-- config
|       |-- description
|       |-- hooks
|       |   |-- applypatch-msg.sample
|       |   |-- commit-msg.sample
|       |   |-- post-update.sample
|       |   |-- pre-applypatch.sample
|       |   |-- pre-commit.sample
|       |   |-- pre-rebase.sample
|       |   |-- prepare-commit-msg.sample
|       |   `-- update.sample
|       |-- info
|       |   `-- exclude
|       |-- objects
|       |   |-- info
|       |   `-- pack
|       `-- refs
|           |-- heads
|           `-- tags
`-- squashfs-3de1687d7432ea9b302c2db9521996f506c140a3.tar.gz

So basically, there is a tarball of the previous squashfs version, and
the Git cache doesn't contain anything useful.

See
http://autobuild.buildroot.net/results/4f8/4f88b0aea78a8c8be245ec26e8e53f32d8a90eb5/build-end.log
for the actual build failure.

Best regards,

Thomas Petazzoni
-- 
Thomas Petazzoni, CTO, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache
  2018-04-17 10:36                     ` Yann E. MORIN
@ 2018-04-17 11:58                       ` Arnout Vandecappelle
  0 siblings, 0 replies; 27+ messages in thread
From: Arnout Vandecappelle @ 2018-04-17 11:58 UTC (permalink / raw)
  To: buildroot



On 17-04-18 12:36, Yann E. MORIN wrote:
> Ricardo, Arnout, All,
> 
> On 2018-04-17 05:56 -0300, Ricardo Martincoski spake thusly:
>> On Tue, Apr 17, 2018 at 05:10 AM, Arnout Vandecappelle wrote:
[snip]
>>> * Removing the shallow fetch should be done in a separate patch (I'm not
>>> entirely convinced yet that it is needed).
> 
> IIRC, Ricardo explained the case in a previous message: there are cases
> where we may miss blobs in a tree.
> 
> But anyway, the current code only works by chance. Let's assume you want
> to use tag v4.17-rc1 from Linus' tree. The code would currently do
> something like (without an existing git cache)
> 
>     mkdir git
>     git init git
>     cd git
>     git remote add origin https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>     git fetch origin --depth 1 v4.17-rc1
>     if ! git fetch origin v4.17-rc1:v4.17-rc1 ; then
>         echo "warning"
>     fi
>     git checkout v4.17-rc1
> 
> The if-block is initially there to fetch special refs created by things
> like gerrit or github PRs. If you do the above, you get the warning,
> i.e. the if-condition is false, i.e. fetching the tag that way fails
> with exit-code 1, and this message:
> 
>     error: cannot update the ref 'refs/heads/v4.17-rc1': Trying to write
>     non-commit object a79c33a10ce2764c62fb8af6cbb571752d55c1c0 to branch
>     refs/heads/v4.17-rc1
>     From https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
>      ! [new tag]         v4.17-rc1  -> v4.17-rc1  (unable to update local ref)
>      * [new tag]         v4.17-rc1  -> v4.17-rc1
> 
> But since it is in a if-ciondition, we just print a warning, and
> continue.
> 
> Then and the checkout succeeds.
> 
> But! If you didn't try to fetch the special refs, the checkout would
> have failed:
> 
>     error: pathspec 'v4.17-rc1' did not match any file(s) known to git.
> 
> I.e. we have code that just happens to work by chance (or rather, by
> side-effects and luck).
> 
> So, even if the missing-blob problem could be solved another way, our
> shallow fetch is anyway already borked, but we were lucky so far not to
> fall for it...

 Great explanation -- with just a single typo, if-ciondition :-). Would be good
to add that to the commit log (the explanation, not the typo).


> I still think that this shalow-fetch dance is too risky, and that having
> a sane git cache helps so much more in the long run, so much so that we
> should just ditch the shallow fetch support.

 Yeah, given the current sorry state, it's probably better to break it down
completely and then build up something stable again.

 Regards,
 Arnout

-- 
Arnout Vandecappelle                          arnout at mind be
Senior Embedded Software Architect            +32-16-286500
Essensium/Mind                                http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium           BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint:  7493 020B C7E3 8618 8DEC 222C 82EB F404 F9AC 0DDF

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-04-17 11:58 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-12  9:28 [Buildroot] [RFC PATCH 0/3] fix some corner cases for download/git v1 Ricardo Martincoski
2018-04-12  9:28 ` [Buildroot] [RFC PATCH 1/3] download/git: fix fetch all refs for old git Ricardo Martincoski
2018-04-12 17:42   ` Yann E. MORIN
2018-04-13 18:23     ` Ricardo Martincoski
2018-04-15 12:12       ` Yann E. MORIN
2018-04-16  2:17         ` Ricardo Martincoski
2018-04-12  9:28 ` [Buildroot] [RFC PATCH 2/3] download/git: recover dirty cache Ricardo Martincoski
2018-04-12 17:48   ` Yann E. MORIN
2018-04-13 18:28     ` Ricardo Martincoski
2018-04-15 12:02       ` Yann E. MORIN
2018-04-16  2:54         ` Ricardo Martincoski
2018-04-16 16:01           ` Yann E. MORIN
2018-04-16 20:56             ` Yann E. MORIN
     [not found]               ` <1ffe206e-4b1e-c233-a511-ba4c3a8cb5f0@armadeus.com>
2018-04-17 10:42                 ` Yann E. MORIN
2018-04-17 11:30                   ` Thomas Petazzoni
2018-04-17  4:45             ` Ricardo Martincoski
2018-04-17  7:04               ` Ricardo Martincoski
2018-04-17  8:10                 ` Arnout Vandecappelle
2018-04-17  8:56                   ` Ricardo Martincoski
2018-04-17 10:36                     ` Yann E. MORIN
2018-04-17 11:58                       ` Arnout Vandecappelle
2018-04-12  9:28 ` [Buildroot] [RFC PATCH 3/3] download/git: unshallow when fetching all refs Ricardo Martincoski
2018-04-12 11:28   ` Thomas Petazzoni
2018-04-12 17:33     ` Yann E. MORIN
2018-04-13 18:32       ` Ricardo Martincoski
2018-04-15 12:08         ` Yann E. MORIN
2018-04-16  3:04           ` Ricardo Martincoski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.