All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
@ 2022-12-05 22:00 Marek Vasut
  2022-12-06 10:29 ` [bitbake-devel] " Quentin Schulz
  0 siblings, 1 reply; 2+ messages in thread
From: Marek Vasut @ 2022-12-05 22:00 UTC (permalink / raw)
  To: bitbake-devel
  Cc: Marek Vasut, Peter Kjellerstedt, Martin Jansa, Mikko.Rapeli,
	Quentin Schulz, Richard Purdie

The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
single object in the remote repository. This works poorly with gitlab
and github, which use the remote git repository to track its metadata
like merge requests, CI pipelines and such.

Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
and refs/keep-around/* and they all contain massive amount of data that
are useless for the bitbake build purposes. The amount of useless data
can in fact be so massive (e.g. with FDO mesa.git repository) that some
proxies may outright terminate the 'git fetch' connection, and make it
appear as if bitbake got stuck on 'git fetch' with no output.

To avoid fetching all these useless metadata, tweak the git fetcher such
that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
refspecs as those are only available in new git versions.

Per feedback on the ML, Gerrit may push commits outsides of branches or
tags during CI runs, which currently works with the 'nobranch=1' fetcher
parameter. To retain this functionality, keep fetching everything in case
the 'nobranch=1' is present. This still avoids fetching massive amount of
data in the common case, since 'nobranch=1' is rare. Update 'nobranch'
documentation.

Reviewed-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
Signed-off-by: Marek Vasut <marex@denx.de>
---
Cc: Martin Jansa <Martin.Jansa@gmail.com>
Cc: Mikko.Rapeli@bmw.de
Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
Cc: Quentin Schulz <foss+yocto@0leil.net>
Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
---
V1: - Add RB from Peter
    - Keep fetching everything in case of nobranch=1
    - Update nobranch documentation
---
 doc/bitbake-user-manual/bitbake-user-manual-fetching.rst | 4 ++--
 lib/bb/fetch2/git.py                                     | 8 ++++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
index 9c269ca8..e86a4d86 100644
--- a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
+++ b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
@@ -424,8 +424,8 @@ This fetcher supports the following parameters:
 
 -  *"nobranch":* Tells the fetcher to not check the SHA validation for
    the branch when set to "1". The default is "0". Set this option for
-   the recipe that refers to the commit that is valid for a tag instead
-   of the branch.
+   the recipe that refers to the commit that is valid for a any namespace
+   instead of the branch.
 
 -  *"bareclone":* Tells the fetcher to clone a bare clone into the
    destination directory without checking out a working tree. Only the
diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
index 578edc59..c80e8e5c 100644
--- a/lib/bb/fetch2/git.py
+++ b/lib/bb/fetch2/git.py
@@ -44,7 +44,7 @@ Supported SRC_URI options are:
 
 - nobranch
    Don't check the SHA validation for branch. set this option for the recipe
-   referring to commit which is valid in tag instead of branch.
+   referring to commit which is valid in any namespace instead of branch.
    The default is "0", set nobranch=1 if needed.
 
 - usehead
@@ -382,7 +382,11 @@ class Git(FetchMethod):
               runfetchcmd("%s remote rm origin" % ud.basecmd, d, workdir=ud.clonedir)
 
             runfetchcmd("%s remote add --mirror=fetch origin %s" % (ud.basecmd, shlex.quote(repourl)), d, workdir=ud.clonedir)
-            fetch_cmd = "LANG=C %s fetch -f --progress %s refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl))
+
+            if ud.nobranch:
+                fetch_cmd = "LANG=C %s fetch -f --progress %s refs/*:refs/*" % (ud.basecmd, shlex.quote(repourl))
+            else:
+                fetch_cmd = "LANG=C %s fetch -f --progress %s refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*" % (ud.basecmd, shlex.quote(repourl))
             if ud.proto.lower() != 'file':
                 bb.fetch2.check_network_access(d, fetch_cmd, ud.url)
             progresshandler = GitProgressHandler(d)
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [bitbake-devel] [PATCH] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata
  2022-12-05 22:00 [PATCH] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata Marek Vasut
@ 2022-12-06 10:29 ` Quentin Schulz
  0 siblings, 0 replies; 2+ messages in thread
From: Quentin Schulz @ 2022-12-06 10:29 UTC (permalink / raw)
  To: Marek Vasut, bitbake-devel
  Cc: Peter Kjellerstedt, Martin Jansa, Mikko.Rapeli, Quentin Schulz,
	Richard Purdie

Hi Marek,

On 12/5/22 23:00, Marek Vasut wrote:
> The bitbake git fetcher currently fetches 'refs/*:refs/*', i.e. every
> single object in the remote repository. This works poorly with gitlab
> and github, which use the remote git repository to track its metadata
> like merge requests, CI pipelines and such.
> 
> Specifically, gitlab generates refs/merge-requests/*, refs/pipelines/*
> and refs/keep-around/* and they all contain massive amount of data that
> are useless for the bitbake build purposes. The amount of useless data
> can in fact be so massive (e.g. with FDO mesa.git repository) that some
> proxies may outright terminate the 'git fetch' connection, and make it
> appear as if bitbake got stuck on 'git fetch' with no output.
> 
> To avoid fetching all these useless metadata, tweak the git fetcher such
> that it only fetches refs/heads/* and refs/tags/* . Avoid using negative
> refspecs as those are only available in new git versions.
> 
> Per feedback on the ML, Gerrit may push commits outsides of branches or
> tags during CI runs, which currently works with the 'nobranch=1' fetcher
> parameter. To retain this functionality, keep fetching everything in case
> the 'nobranch=1' is present. This still avoids fetching massive amount of
> data in the common case, since 'nobranch=1' is rare. Update 'nobranch'
> documentation.
> 
> Reviewed-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> Signed-off-by: Marek Vasut <marex@denx.de>
> ---
> Cc: Martin Jansa <Martin.Jansa@gmail.com>
> Cc: Mikko.Rapeli@bmw.de
> Cc: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> Cc: Quentin Schulz <foss+yocto@0leil.net>
> Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
> ---
> V1: - Add RB from Peter
>      - Keep fetching everything in case of nobranch=1
>      - Update nobranch documentation
> ---
>   doc/bitbake-user-manual/bitbake-user-manual-fetching.rst | 4 ++--
>   lib/bb/fetch2/git.py                                     | 8 ++++++--
>   2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> index 9c269ca8..e86a4d86 100644
> --- a/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> +++ b/doc/bitbake-user-manual/bitbake-user-manual-fetching.rst
> @@ -424,8 +424,8 @@ This fetcher supports the following parameters:
>   
>   -  *"nobranch":* Tells the fetcher to not check the SHA validation for
>      the branch when set to "1". The default is "0". Set this option for
> -   the recipe that refers to the commit that is valid for a tag instead
> -   of the branch.
> +   the recipe that refers to the commit that is valid for a any namespace
> +   instead of the branch.

I don't have much knowledge in git terms but "namespace" does not help 
me much understanding when I should set this parameter. Could we 
rephrase this to make it more explicit exactly in which scenario we 
should use this?

also, s/a any/any/

>   
>   -  *"bareclone":* Tells the fetcher to clone a bare clone into the
>      destination directory without checking out a working tree. Only the
> diff --git a/lib/bb/fetch2/git.py b/lib/bb/fetch2/git.py
> index 578edc59..c80e8e5c 100644
> --- a/lib/bb/fetch2/git.py
> +++ b/lib/bb/fetch2/git.py
> @@ -44,7 +44,7 @@ Supported SRC_URI options are:
>   
>   - nobranch
>      Don't check the SHA validation for branch. set this option for the recipe
> -   referring to commit which is valid in tag instead of branch.
> +   referring to commit which is valid in any namespace instead of branch.

Ditto.

I'd personally make two different commits. I'm afraid this could make 
reverting specific commits difficult because of the docs.

Cheers,
Quentin


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-12-06 10:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-05 22:00 [PATCH] fetch2/git: Prevent git fetcher from fetching gitlab repository metadata Marek Vasut
2022-12-06 10:29 ` [bitbake-devel] " Quentin Schulz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.