All of lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir
@ 2018-04-13 13:34 Ricardo Martincoski
  2018-04-15 19:42 ` Thomas Petazzoni
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Ricardo Martincoski @ 2018-04-13 13:34 UTC (permalink / raw)
  To: buildroot

The current logic that removes 5 files at random to force the download
to be tested eventually ends up removing files from the git
repositories used as cache by the download infra.

When a file that was previously checked out in a git cache is removed,
the next 'git checkout' succeeds but does not checkout that file. The
repo remains with 'Changes not staged for commit', the tarball is
generated and the hash mismatches, causing a build failure.

Ensure only the tarballs are removed by excluding any file inside a git
repository from the list of potential files to remove.
The download infra for git method uses 'git fetch' when regenerating the
tarball, this command fails as expected if the upstream location
changed.

Reported-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Reported-by: Yann E. MORIN <yann.morin.1998@free.fr>
Signed-off-by: Ricardo Martincoski <ricardo.martincoski@gmail.com>
Cc: Peter Korsgaard <peter@korsgaard.com>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Yann E. MORIN <yann.morin.1998@free.fr>
---
NOTICE: This fix to autobuilder script will not stop the current build
(download) failures. It will only stop causing new failures.
We will need also either the manual removal of the broken
'dl/package/git' from the autobuilder instances or a change to the
download infra to automatically recover from a broken git cache.

I tested this by first populating the instance-0/dl/ with 19GB of
tarballs that I have locally, then I added 'continue' in the main loop
of the script just after prepare_build() returned.
After the script removed all tarballs from instance-0/dl/, I manually
checked there are git repositories and that they are sane (using git
fsck) and clean in the sense of git clean (using git status).
Tested with Python 2.7.14
---
 scripts/autobuild-run | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index 33d0ae9..6a01151 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -291,6 +291,9 @@ def prepare_build(**kwargs):
     # recursively find files under root
     def find_files(root):
         for r, d, f in os.walk(root):
+            if '.git' in d:
+                d[:] = list()
+                continue
             for i in f:
                 yield os.path.join(r, i)
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir
  2018-04-13 13:34 [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir Ricardo Martincoski
@ 2018-04-15 19:42 ` Thomas Petazzoni
  2018-04-15 19:49   ` Yann E. MORIN
  2018-04-15 22:23   ` Ricardo Martincoski
  2018-04-16 16:09 ` Thomas Petazzoni
  2018-04-17  7:12 ` [Buildroot] [PATCH v2] " Ricardo Martincoski
  2 siblings, 2 replies; 15+ messages in thread
From: Thomas Petazzoni @ 2018-04-15 19:42 UTC (permalink / raw)
  To: buildroot

Hello,

On Fri, 13 Apr 2018 10:34:31 -0300, Ricardo Martincoski wrote:

> diff --git a/scripts/autobuild-run b/scripts/autobuild-run
> index 33d0ae9..6a01151 100755
> --- a/scripts/autobuild-run
> +++ b/scripts/autobuild-run
> @@ -291,6 +291,9 @@ def prepare_build(**kwargs):
>      # recursively find files under root
>      def find_files(root):
>          for r, d, f in os.walk(root):
> +            if '.git' in d:
> +                d[:] = list()
> +                continue

Just to check how this is supposed to work. We have this:

 dl/<package>/git/.git

Correct ?

So you're idea is that when we are inside dl/<package>/git/, one of the
sub-directories is .git, and therefore we shouldn't remove anything in
dl/<package>/git/ ?

My concern is that I'm not sure if what you've done prevents from
removing files inside dl/<package>/git or only inside
dl/<package>/git/.git. I would find it more to do something like:

	if "git" in d:
		d.remove("git")

but perhaps you haven't done this for some good reason ?

Another concern is how to fix those autobuilders that have already
removed some random files from their cached Git repositories? Should we
ask the people who run those autobuilders to entirely wipe the download
folders of their autobuilder instances ? Or do we have a smart (but
simple) thing to do to avoid this ?

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir
  2018-04-15 19:42 ` Thomas Petazzoni
@ 2018-04-15 19:49   ` Yann E. MORIN
  2018-04-15 20:02     ` Thomas Petazzoni
  2018-04-15 23:18     ` Ricardo Martincoski
  2018-04-15 22:23   ` Ricardo Martincoski
  1 sibling, 2 replies; 15+ messages in thread
From: Yann E. MORIN @ 2018-04-15 19:49 UTC (permalink / raw)
  To: buildroot

Thomas, Ricardo, All,

On 2018-04-15 21:42 +0200, Thomas Petazzoni spake thusly:
> On Fri, 13 Apr 2018 10:34:31 -0300, Ricardo Martincoski wrote:
> 
> > diff --git a/scripts/autobuild-run b/scripts/autobuild-run
> > index 33d0ae9..6a01151 100755
> > --- a/scripts/autobuild-run
> > +++ b/scripts/autobuild-run
> > @@ -291,6 +291,9 @@ def prepare_build(**kwargs):
> >      # recursively find files under root
> >      def find_files(root):
> >          for r, d, f in os.walk(root):
> > +            if '.git' in d:
> > +                d[:] = list()
> > +                continue
> 
> Just to check how this is supposed to work. We have this:
> 
>  dl/<package>/git/.git
> 
> Correct ?
> 
> So you're idea is that when we are inside dl/<package>/git/, one of the
> sub-directories is .git, and therefore we shouldn't remove anything in
> dl/<package>/git/ ?
> 
> My concern is that I'm not sure if what you've done prevents from
> removing files inside dl/<package>/git or only inside
> dl/<package>/git/.git. I would find it more to do something like:
> 
> 	if "git" in d:
> 		d.remove("git")
> 
> but perhaps you haven't done this for some good reason ?
> 
> Another concern is how to fix those autobuilders that have already
> removed some random files from their cached Git repositories? Should we
> ask the people who run those autobuilders to entirely wipe the download
> folders of their autobuilder instances ? Or do we have a smart (but
> simple) thing to do to avoid this ?

For this last part, we've already discused this with Ricardo in another
thread: we run git-fsck, and if there is an error, we ditch the git tree
and clone from scratch.

See the current work I have to further robustify (does that even exist?)
the git backend:
    https://git.buildroot.org/~ymorin/git/buildroot/log/?h=yem/git-robust

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir
  2018-04-15 19:49   ` Yann E. MORIN
@ 2018-04-15 20:02     ` Thomas Petazzoni
  2018-04-15 20:10       ` Yann E. MORIN
  2018-04-15 23:18     ` Ricardo Martincoski
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Petazzoni @ 2018-04-15 20:02 UTC (permalink / raw)
  To: buildroot

Hello,

On Sun, 15 Apr 2018 21:49:15 +0200, Yann E. MORIN wrote:

> > Just to check how this is supposed to work. We have this:
> > 
> >  dl/<package>/git/.git
> > 
> > Correct ?
> > 
> > So you're idea is that when we are inside dl/<package>/git/, one of the
> > sub-directories is .git, and therefore we shouldn't remove anything in
> > dl/<package>/git/ ?
> > 
> > My concern is that I'm not sure if what you've done prevents from
> > removing files inside dl/<package>/git or only inside
> > dl/<package>/git/.git. I would find it more to do something like:
> > 
> > 	if "git" in d:
> > 		d.remove("git")
> > 
> > but perhaps you haven't done this for some good reason ?
> > 
> > Another concern is how to fix those autobuilders that have already
> > removed some random files from their cached Git repositories? Should we
> > ask the people who run those autobuilders to entirely wipe the download
> > folders of their autobuilder instances ? Or do we have a smart (but
> > simple) thing to do to avoid this ?  
> 
> For this last part, we've already discused this with Ricardo in another
> thread: we run git-fsck, and if there is an error, we ditch the git tree
> and clone from scratch.

OK, so in practice removing random files inside the git/ folders should
not cause any problem ?

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir
  2018-04-15 20:02     ` Thomas Petazzoni
@ 2018-04-15 20:10       ` Yann E. MORIN
  2018-04-15 20:38         ` Thomas Petazzoni
  0 siblings, 1 reply; 15+ messages in thread
From: Yann E. MORIN @ 2018-04-15 20:10 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2018-04-15 22:02 +0200, Thomas Petazzoni spake thusly:
> On Sun, 15 Apr 2018 21:49:15 +0200, Yann E. MORIN wrote:
> > > Just to check how this is supposed to work. We have this:
> > > 
> > >  dl/<package>/git/.git
> > > 
> > > Correct ?
> > > 
> > > So you're idea is that when we are inside dl/<package>/git/, one of the
> > > sub-directories is .git, and therefore we shouldn't remove anything in
> > > dl/<package>/git/ ?
> > > 
> > > My concern is that I'm not sure if what you've done prevents from
> > > removing files inside dl/<package>/git or only inside
> > > dl/<package>/git/.git. I would find it more to do something like:
> > > 
> > > 	if "git" in d:
> > > 		d.remove("git")
> > > 
> > > but perhaps you haven't done this for some good reason ?
> > > 
> > > Another concern is how to fix those autobuilders that have already
> > > removed some random files from their cached Git repositories? Should we
> > > ask the people who run those autobuilders to entirely wipe the download
> > > folders of their autobuilder instances ? Or do we have a smart (but
> > > simple) thing to do to avoid this ?  
> > 
> > For this last part, we've already discused this with Ricardo in another
> > thread: we run git-fsck, and if there is an error, we ditch the git tree
> > and clone from scratch.
> 
> OK, so in practice removing random files inside the git/ folders should
> not cause any problem ?

Well, if you remove files from the working copy, no. The next git-checkout
would fix it to a certain extent, which can be covered by a mix of
git-clean + git-checkout, again to a certain extent...

However, if you remove files from below .git, you can cause heavy
breakage. Even removing a single object will cause the tree to get
uterly broken, not telling what happens when one removes a ref or
HEAD.

So, runniong git-fsck would catch the cases where something is missing
in .git, in which case we're toast and the git cache is as good as if
it were not present.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir
  2018-04-15 20:10       ` Yann E. MORIN
@ 2018-04-15 20:38         ` Thomas Petazzoni
  2018-04-15 20:59           ` Yann E. MORIN
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Petazzoni @ 2018-04-15 20:38 UTC (permalink / raw)
  To: buildroot

Hello,

On Sun, 15 Apr 2018 22:10:26 +0200, Yann E. MORIN wrote:

> > OK, so in practice removing random files inside the git/ folders should
> > not cause any problem ?  
> 
> Well, if you remove files from the working copy, no. The next git-checkout
> would fix it to a certain extent, which can be covered by a mix of
> git-clean + git-checkout, again to a certain extent...
> 
> However, if you remove files from below .git, you can cause heavy
> breakage. Even removing a single object will cause the tree to get
> uterly broken, not telling what happens when one removes a ref or
> HEAD.
> 
> So, runniong git-fsck would catch the cases where something is missing
> in .git, in which case we're toast and the git cache is as good as if
> it were not present.

Still not following you. Are you saying that today we do *not* run
git-fsck, and therefore a borked repository is not detected, but that
we could add a call to git-fsck to detect borked repositories, and
therefore avoid the problems of the autobuilders already having borked
Git caches ?

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir
  2018-04-15 20:38         ` Thomas Petazzoni
@ 2018-04-15 20:59           ` Yann E. MORIN
  0 siblings, 0 replies; 15+ messages in thread
From: Yann E. MORIN @ 2018-04-15 20:59 UTC (permalink / raw)
  To: buildroot

Thomas, All,

On 2018-04-15 22:38 +0200, Thomas Petazzoni spake thusly:
> On Sun, 15 Apr 2018 22:10:26 +0200, Yann E. MORIN wrote:
> 
> > > OK, so in practice removing random files inside the git/ folders should
> > > not cause any problem ?  
> > 
> > Well, if you remove files from the working copy, no. The next git-checkout
> > would fix it to a certain extent, which can be covered by a mix of
> > git-clean + git-checkout, again to a certain extent...
> > 
> > However, if you remove files from below .git, you can cause heavy
> > breakage. Even removing a single object will cause the tree to get
> > uterly broken, not telling what happens when one removes a ref or
> > HEAD.
> > 
> > So, runniong git-fsck would catch the cases where something is missing
> > in .git, in which case we're toast and the git cache is as good as if
> > it were not present.
> 
> Still not following you. Are you saying that today we do *not* run
> git-fsck, and therefore a borked repository is not detected, but that
> we could add a call to git-fsck to detect borked repositories, and
> therefore avoid the problems of the autobuilders already having borked
> Git caches ?

Basically, yes, that is what I wanted to say.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir
  2018-04-15 19:42 ` Thomas Petazzoni
  2018-04-15 19:49   ` Yann E. MORIN
@ 2018-04-15 22:23   ` Ricardo Martincoski
  2018-04-16  9:01     ` Yann E. MORIN
  2018-04-16 16:05     ` Thomas Petazzoni
  1 sibling, 2 replies; 15+ messages in thread
From: Ricardo Martincoski @ 2018-04-15 22:23 UTC (permalink / raw)
  To: buildroot

Hello,

On Sun, Apr 15, 2018 at 04:42 PM, Thomas Petazzoni wrote:
> On Fri, 13 Apr 2018 10:34:31 -0300, Ricardo Martincoski wrote:
> 
>> diff --git a/scripts/autobuild-run b/scripts/autobuild-run
>> index 33d0ae9..6a01151 100755
>> --- a/scripts/autobuild-run
>> +++ b/scripts/autobuild-run
>> @@ -291,6 +291,9 @@ def prepare_build(**kwargs):
>>      # recursively find files under root
>>      def find_files(root):
>>          for r, d, f in os.walk(root):
>> +            if '.git' in d:
>> +                d[:] = list()
>> +                continue
> 
> Just to check how this is supposed to work. We have this:
> 
>  dl/<package>/git/.git
> 
> Correct ?

Right.

> So you're idea is that when we are inside dl/<package>/git/, one of the
> sub-directories is .git, and therefore we shouldn't remove anything in
> dl/<package>/git/ ?

Yes. My idea is to keep the behavior described in the comment:
# Remove 5 random files from the download directory. Removing
# random files from the download directory allows to ensure we
# regularly re-download files to check that their upstream
# location is still correct.

So we keep testing in autobuilders that the upstream location is correct (the
git fetch fails and therefore the download script fails if the server does not
respond even in the case we already have the commits and references we need in
the cache; it is the correct behavior IMO), and we also test the generation of
tarball, but we do not test the re-download with a clean git cache; this could
be done later on autobuilder with extra code (maybe remove only 1 git cache
every run together to 5 tarballs?), but the most important scenario is already
tested: the use of git cache in the long run.

> 
> My concern is that I'm not sure if what you've done prevents from
> removing files inside dl/<package>/git or only inside
> dl/<package>/git/.git. I would find it more to do something like:

It prevents from removing files inside dl/<package>/git

> 
> 	if "git" in d:
> 		d.remove("git")
> 
> but perhaps you haven't done this for some good reason ?

Yes, we want to eventually remove dl/git/git-2.16.3.tar.xz
Sorry. It should be in a comment or in the commit log.

> 
> Another concern is how to fix those autobuilders that have already
> removed some random files from their cached Git repositories? Should we
> ask the people who run those autobuilders to entirely wipe the download
> folders of their autobuilder instances ? Or do we have a smart (but
> simple) thing to do to avoid this ?

I agree with Yann about the ditch + restart for broken git cache.
But if you want, maybe in one autobuilder instance we can manually remove all
git caches just to know earlier (before applying Yann's series) that no more
download issues will occur, since this change to the script prevents corrupting
new git caches.
I guess but did not tested this command would be enough:
rm -r instance-0/dl/*/git
Your call. I have no preference.

But leaving at least one instance not fixed (with some git cache broken) can
help to test Yann's series.

Regards,
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir
  2018-04-15 19:49   ` Yann E. MORIN
  2018-04-15 20:02     ` Thomas Petazzoni
@ 2018-04-15 23:18     ` Ricardo Martincoski
  1 sibling, 0 replies; 15+ messages in thread
From: Ricardo Martincoski @ 2018-04-15 23:18 UTC (permalink / raw)
  To: buildroot

Hello,

On Sun, Apr 15, 2018 at 04:49 PM, Yann E. MORIN wrote:

>> Another concern is how to fix those autobuilders that have already
>> removed some random files from their cached Git repositories? Should we
>> ask the people who run those autobuilders to entirely wipe the download
>> folders of their autobuilder instances ? Or do we have a smart (but
>> simple) thing to do to avoid this ?
> 
> For this last part, we've already discused this with Ricardo in another
> thread: we run git-fsck, and if there is an error, we ditch the git tree
> and clone from scratch.
> 
> See the current work I have to further robustify (does that even exist?)
> the git backend:
>     https://git.buildroot.org/~ymorin/git/buildroot/log/?h=yem/git-robust

Thank you for working on this. I can review the series.

I have some work in progress too (in early stages), but your series cover more
cases. If you like some lines of code from it, feel free to use them.
https://gitlab.com/RicardoMartincoski/buildroot/commits/ditch-restart-git-v1

The ditch and re-clone I did in a different way, without calling the script
again. Maybe I missed something.


Regards,
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir
  2018-04-15 22:23   ` Ricardo Martincoski
@ 2018-04-16  9:01     ` Yann E. MORIN
  2018-04-16 16:05     ` Thomas Petazzoni
  1 sibling, 0 replies; 15+ messages in thread
From: Yann E. MORIN @ 2018-04-16  9:01 UTC (permalink / raw)
  To: buildroot

Ricardo, All,

On 2018-04-15 19:23 -0300, Ricardo Martincoski spake thusly:
> On Sun, Apr 15, 2018 at 04:42 PM, Thomas Petazzoni wrote:
[--SNIP--]
> > So you're idea is that when we are inside dl/<package>/git/, one of the
> > sub-directories is .git, and therefore we shouldn't remove anything in
> > dl/<package>/git/ ?
> 
> Yes. My idea is to keep the behavior described in the comment:
> # Remove 5 random files from the download directory. Removing
> # random files from the download directory allows to ensure we
> # regularly re-download files to check that their upstream
> # location is still correct.
> 
> So we keep testing in autobuilders that the upstream location is correct (the
> git fetch fails and therefore the download script fails if the server does not
> respond even in the case we already have the commits and references we need in
> the cache; it is the correct behavior IMO), and we also test the generation of
> tarball, but we do not test the re-download with a clean git cache; this could
> be done later on autobuilder with extra code (maybe remove only 1 git cache
> every run together to 5 tarballs?), but the most important scenario is already
> tested: the use of git cache in the long run.

I think that we should also get rid of a full git clone.

> > Another concern is how to fix those autobuilders that have already
> > removed some random files from their cached Git repositories? Should we
> > ask the people who run those autobuilders to entirely wipe the download
> > folders of their autobuilder instances ? Or do we have a smart (but
> > simple) thing to do to avoid this ?
> I agree with Yann about the ditch + restart for broken git cache.
> But if you want, maybe in one autobuilder instance we can manually remove all
> git caches just to know earlier (before applying Yann's series) that no more
> download issues will occur, since this change to the script prevents corrupting
> new git caches.
> I guess but did not tested this command would be enough:
> rm -r instance-0/dl/*/git
> Your call. I have no preference.
> 
> But leaving at least one instance not fixed (with some git cache broken) can
> help to test Yann's series.

I got my hands on a borked dtv-scan-tables tree, so I can test locally.

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir
  2018-04-15 22:23   ` Ricardo Martincoski
  2018-04-16  9:01     ` Yann E. MORIN
@ 2018-04-16 16:05     ` Thomas Petazzoni
  1 sibling, 0 replies; 15+ messages in thread
From: Thomas Petazzoni @ 2018-04-16 16:05 UTC (permalink / raw)
  To: buildroot

Hello,

On Sun, 15 Apr 2018 19:23:09 -0300, Ricardo Martincoski wrote:

> > So you're idea is that when we are inside dl/<package>/git/, one of the
> > sub-directories is .git, and therefore we shouldn't remove anything in
> > dl/<package>/git/ ?  
> 
> Yes. My idea is to keep the behavior described in the comment:
> # Remove 5 random files from the download directory. Removing
> # random files from the download directory allows to ensure we
> # regularly re-download files to check that their upstream
> # location is still correct.
> 
> So we keep testing in autobuilders that the upstream location is correct (the
> git fetch fails and therefore the download script fails if the server does not
> respond even in the case we already have the commits and references we need in
> the cache; it is the correct behavior IMO), and we also test the generation of
> tarball, but we do not test the re-download with a clean git cache; this could
> be done later on autobuilder with extra code (maybe remove only 1 git cache
> every run together to 5 tarballs?), but the most important scenario is already
> tested: the use of git cache in the long run.

OK. I think we should indeed remove git caches once in a while. Perhaps
less often than tarballs, indeed.

> It prevents from removing files inside dl/<package>/git
> 
> > 
> > 	if "git" in d:
> > 		d.remove("git")
> > 
> > but perhaps you haven't done this for some good reason ?  
> 
> Yes, we want to eventually remove dl/git/git-2.16.3.tar.xz

Ah, yes, indeed, the git package. OK, got it, makes sense.

> > Another concern is how to fix those autobuilders that have already
> > removed some random files from their cached Git repositories? Should we
> > ask the people who run those autobuilders to entirely wipe the download
> > folders of their autobuilder instances ? Or do we have a smart (but
> > simple) thing to do to avoid this ?  
> 
> I agree with Yann about the ditch + restart for broken git cache.
> But if you want, maybe in one autobuilder instance we can manually remove all
> git caches just to know earlier (before applying Yann's series) that no more
> download issues will occur, since this change to the script prevents corrupting
> new git caches.
> I guess but did not tested this command would be enough:
> rm -r instance-0/dl/*/git
> Your call. I have no preference.

I think we should ditch the autobuilder git caches, to make the
autobuilders a bit more green. They are really red right now :)

I think we'll get bogus git cache once in a while anyway. For example,
my autobuilder instance gets rebooted regularly, without the build
process being interrupted. So there are good chances that a git
download or fetch will be interrupted at some point, leaving the git
cache in a bogus state. We will then see if it gets fixed.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir
  2018-04-13 13:34 [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir Ricardo Martincoski
  2018-04-15 19:42 ` Thomas Petazzoni
@ 2018-04-16 16:09 ` Thomas Petazzoni
  2018-04-17  4:38   ` Ricardo Martincoski
  2018-04-17  7:12 ` [Buildroot] [PATCH v2] " Ricardo Martincoski
  2 siblings, 1 reply; 15+ messages in thread
From: Thomas Petazzoni @ 2018-04-16 16:09 UTC (permalink / raw)
  To: buildroot

Hello,

On Fri, 13 Apr 2018 10:34:31 -0300, Ricardo Martincoski wrote:

> diff --git a/scripts/autobuild-run b/scripts/autobuild-run
> index 33d0ae9..6a01151 100755
> --- a/scripts/autobuild-run
> +++ b/scripts/autobuild-run
> @@ -291,6 +291,9 @@ def prepare_build(**kwargs):
>      # recursively find files under root
>      def find_files(root):
>          for r, d, f in os.walk(root):
> +            if '.git' in d:
> +                d[:] = list()

Python question: is there any advantage in doing d[:] = list() instead
of just d = list() ?

Also, do we need to actually set d to the empty list ? We're anyway
doing a continue, and skipping over to the next iteration, where a new
value of d will be returned by os.walk(). Am I missing something ?

I think I'm also missing how this will prevent from iterating in
sub-directories, like git/<something>/foo/. I guess I need to read
again how os.walk() is working exactly.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir
  2018-04-16 16:09 ` Thomas Petazzoni
@ 2018-04-17  4:38   ` Ricardo Martincoski
  0 siblings, 0 replies; 15+ messages in thread
From: Ricardo Martincoski @ 2018-04-17  4:38 UTC (permalink / raw)
  To: buildroot

Hello,

On Mon, Apr 16, 2018 at 01:09 PM, Thomas Petazzoni wrote:

>>      def find_files(root):
>>          for r, d, f in os.walk(root):
>> +            if '.git' in d:
>> +                d[:] = list()
> 
> Python question: is there any advantage in doing d[:] = list() instead
> of just d = list() ?

os.walk returns a mutable object, so d acts similar to a pointer in C language.
See this:
>>> a=[1,2]
>>> b=a
>>> a=list()
>>> a.append(3)
>>> print(a,b)
([3], [1, 2])
>>> a=[1,2]
>>> b=a
>>> a[:]=list()
>>> a.append(3)
>>> print(a,b)
([3], [3])
If we do just d = list() we are not changing the pointer that os.walk will use
in the next iteration. In the end, nothing is excluded from the search.

> Also, do we need to actually set d to the empty list ? We're anyway
> doing a continue, and skipping over to the next iteration, where a new
> value of d will be returned by os.walk(). Am I missing something ?

Yes, we need to set it to an empty list. When we detect there is a subdir .git,
os.walk is already iterating inside the dl/<package>/git directory. d contains
the list of all immediate subdirs: .git and any subdir that is checked out in
the git worktree. The next iteration of os.walk will pick up the list of subdirs
and continue. So in the end, just using 'continue' would exclude only the files
immediately inside the dl/<package>/git but not subdirs and their files nor the
.git tree. When we set the list of subdirs to empty is the same as removing all
elements one by one, but we don't have 'd.remove(all)'.
The closest valid syntax is:
del d[:]

Rethinking now, this use with 'del' is the most correct one.
d[:] = list() creates a new object of the type list and makes the pointer that
os.walk uses as subdirs to point to this new object. The old object is dangling,
nothing points to it anymore and it will hopefully be removed at some point by
the interpreter (I don't know the internals of the garbage collector).
del d[:] does not create a new object, but actually removes all elements from
the list object that already exists.

> 
> I think I'm also missing how this will prevent from iterating in
> sub-directories, like git/<something>/foo/. I guess I need to read
> again how os.walk() is working exactly.

Me too! Notice it mentions 'using del'.
"When topdown is True, the caller can modify the dirnames list in-place (perhaps
using del or slice assignment), and walk() will only recurse into the
subdirectories whose names remain in dirnames; this can be used to prune the
search"

So I will resend after changing to 'del' and adding a comment to the code or
commit log to clarify why we check for '.git' instead of 'git'.

Regards,
Ricardo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH v2] autobuild-run: remove only tarballs from download dir
  2018-04-13 13:34 [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir Ricardo Martincoski
  2018-04-15 19:42 ` Thomas Petazzoni
  2018-04-16 16:09 ` Thomas Petazzoni
@ 2018-04-17  7:12 ` Ricardo Martincoski
  2018-04-19 21:39   ` Thomas Petazzoni
  2 siblings, 1 reply; 15+ messages in thread
From: Ricardo Martincoski @ 2018-04-17  7:12 UTC (permalink / raw)
  To: buildroot

The current logic that removes 5 files at random to force the download
to be tested eventually ends up removing files from the git
repositories used as cache by the download infra.

When a file that was previously checked out in a git cache is removed,
the next 'git checkout' succeeds but does not checkout that file. The
repo remains with 'Changes not staged for commit', the tarball is
generated and the hash mismatches, causing a build failure.

Ensure only the tarballs are removed by excluding any file inside a git
repository from the list of potential files to remove.
The download infra for git method uses 'git fetch' when regenerating the
tarball, this command fails as expected if the upstream location
changed.

Reported-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Reported-by: Yann E. MORIN <yann.morin.1998@free.fr>
Signed-off-by: Ricardo Martincoski <ricardo.martincoski@gmail.com>
Cc: Peter Korsgaard <peter@korsgaard.com>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Yann E. MORIN <yann.morin.1998@free.fr>
---
Changes v1 -> v2: (after review by Thomas Petazzoni)
  - add comment to the code to clarify why we test for '.git' and not
    'git';
  - use the most correct syntax: del;

I tested this by first populating the instance-0/dl/ with 22GB of
tarballs + git cache that I have locally, then I added 'continue' in the
main loop of the script just after prepare_build() returned.
After the script removed all tarballs from instance-0/dl/, I manually
checked the tarball for git package was removed, there are git
repositories and they are sane (using git fsck) and have a clean
worktree (using git status).
Tested with Python 2.7.14
---
 scripts/autobuild-run | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/scripts/autobuild-run b/scripts/autobuild-run
index 33d0ae9..5b3d1c5 100755
--- a/scripts/autobuild-run
+++ b/scripts/autobuild-run
@@ -291,6 +291,13 @@ def prepare_build(**kwargs):
     # recursively find files under root
     def find_files(root):
         for r, d, f in os.walk(root):
+            # do not remove individual files from git caches. 'git' can
+            # be either dl/<package>/git or dl/git and we want to
+            # eventually remove tarballs for the git package, so check
+            # for '.git' instead to match only dl/<package>/git/.git .
+            if '.git' in d:
+                del d[:]
+                continue
             for i in f:
                 yield os.path.join(r, i)
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Buildroot] [PATCH v2] autobuild-run: remove only tarballs from download dir
  2018-04-17  7:12 ` [Buildroot] [PATCH v2] " Ricardo Martincoski
@ 2018-04-19 21:39   ` Thomas Petazzoni
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Petazzoni @ 2018-04-19 21:39 UTC (permalink / raw)
  To: buildroot

Hello,

On Tue, 17 Apr 2018 04:12:46 -0300, Ricardo Martincoski wrote:
> The current logic that removes 5 files at random to force the download
> to be tested eventually ends up removing files from the git
> repositories used as cache by the download infra.
> 
> When a file that was previously checked out in a git cache is removed,
> the next 'git checkout' succeeds but does not checkout that file. The
> repo remains with 'Changes not staged for commit', the tarball is
> generated and the hash mismatches, causing a build failure.
> 
> Ensure only the tarballs are removed by excluding any file inside a git
> repository from the list of potential files to remove.
> The download infra for git method uses 'git fetch' when regenerating the
> tarball, this command fails as expected if the upstream location
> changed.
> 
> Reported-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> Reported-by: Yann E. MORIN <yann.morin.1998@free.fr>
> Signed-off-by: Ricardo Martincoski <ricardo.martincoski@gmail.com>
> Cc: Peter Korsgaard <peter@korsgaard.com>
> Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
> Cc: Yann E. MORIN <yann.morin.1998@free.fr>
> ---
> Changes v1 -> v2: (after review by Thomas Petazzoni)
>   - add comment to the code to clarify why we test for '.git' and not
>     'git';
>   - use the most correct syntax: del;

I've applied to buildroot-test since a few days, forgot to notify you
and the mailing list, sorry about that. See
https://git.buildroot.org/buildroot-test/commit/?id=cb1c4829b1d1ab660736811101fb6d988a8d14e7

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-04-19 21:39 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-13 13:34 [Buildroot] [PATCH] autobuild-run: remove only tarballs from download dir Ricardo Martincoski
2018-04-15 19:42 ` Thomas Petazzoni
2018-04-15 19:49   ` Yann E. MORIN
2018-04-15 20:02     ` Thomas Petazzoni
2018-04-15 20:10       ` Yann E. MORIN
2018-04-15 20:38         ` Thomas Petazzoni
2018-04-15 20:59           ` Yann E. MORIN
2018-04-15 23:18     ` Ricardo Martincoski
2018-04-15 22:23   ` Ricardo Martincoski
2018-04-16  9:01     ` Yann E. MORIN
2018-04-16 16:05     ` Thomas Petazzoni
2018-04-16 16:09 ` Thomas Petazzoni
2018-04-17  4:38   ` Ricardo Martincoski
2018-04-17  7:12 ` [Buildroot] [PATCH v2] " Ricardo Martincoski
2018-04-19 21:39   ` Thomas Petazzoni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.