All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fetch/git: download LFS content too during do_fetch
@ 2021-01-19 20:59 matt.hoosier
  2021-01-22 11:10 ` [bitbake-devel] " Richard Purdie
  2021-01-22 16:55 ` [PATCH v2] bitbake: " Matt Hoosier
  0 siblings, 2 replies; 4+ messages in thread
From: matt.hoosier @ 2021-01-19 20:59 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Matt Hoosier

Insert an explicit call-out to git-lfs to fetch all blobs needed by
each ref named in SRC_URI, during the fetch() function. This avoids
the default behavior of Git LFS to wait until 'git checkout' to
begin downloading the blobs pointed to by LFS records. Network
access is not allowed at that point in the recipe's lifecycle.

Signed-off-by: Matt Hoosier <matt.hoosier@garmin.com>
---
 bitbake/lib/bb/fetch2/git.py | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
index d255afeb36..847d7ff5cc 100644
--- a/bitbake/lib/bb/fetch2/git.py
+++ b/bitbake/lib/bb/fetch2/git.py
@@ -378,6 +378,10 @@ class Git(FetchMethod):
             if missing_rev:
                 raise bb.fetch2.FetchError("Unable to find revision %s even from upstream" % missing_rev)
 
+        if self._contains_lfs(ud, d, ud.clonedir) and self._need_lfs(ud):
+            for name in ud.names:
+                runfetchcmd("git-lfs fetch origin %s" % ud.revisions[name], d, workdir=ud.clonedir)
+
     def build_mirror_data(self, ud, d):
         if ud.shallow and ud.write_shallow_tarballs:
             if not os.path.exists(ud.fullshallow):
@@ -473,7 +477,7 @@ class Git(FetchMethod):
         if os.path.exists(destdir):
             bb.utils.prunedir(destdir)
 
-        need_lfs = ud.parm.get("lfs", "1") == "1"
+        need_lfs = self._need_lfs(ud)
 
         if not need_lfs:
             ud.basecmd = "GIT_LFS_SKIP_SMUDGE=1 " + ud.basecmd
@@ -562,6 +566,9 @@ class Git(FetchMethod):
             raise bb.fetch2.FetchError("The command '%s' gave output with more then 1 line unexpectedly, output: '%s'" % (cmd, output))
         return output.split()[0] != "0"
 
+    def _need_lfs(self, ud):
+        return ud.parm.get("lfs", "1") == "1"
+
     def _contains_lfs(self, ud, d, wd):
         """
         Check if the repository has 'lfs' (large file) content
@@ -572,8 +579,14 @@ class Git(FetchMethod):
         else:
             branchname = "master"
 
-        cmd = "%s grep lfs origin/%s:.gitattributes | wc -l" % (
-            ud.basecmd, ud.branches[ud.names[0]])
+        # The bare clonedir doesn't use the remote names; it has the branch immediately.
+        if wd == ud.clonedir:
+            refname = ud.branches[ud.names[0]]
+        else:
+            refname = "origin/%s" % ud.branches[ud.names[0]]
+
+        cmd = "%s grep lfs %s:.gitattributes | wc -l" % (
+            ud.basecmd, refname)
 
         try:
             output = runfetchcmd(cmd, d, quiet=True, workdir=wd)
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [bitbake-devel] [PATCH] fetch/git: download LFS content too during do_fetch
  2021-01-19 20:59 [PATCH] fetch/git: download LFS content too during do_fetch matt.hoosier
@ 2021-01-22 11:10 ` Richard Purdie
  2021-01-22 16:57   ` Matt Hoosier
  2021-01-22 16:55 ` [PATCH v2] bitbake: " Matt Hoosier
  1 sibling, 1 reply; 4+ messages in thread
From: Richard Purdie @ 2021-01-22 11:10 UTC (permalink / raw)
  To: matt.hoosier, bitbake-devel

On Tue, 2021-01-19 at 14:59 -0600, Matt Hoosier via
lists.openembedded.org wrote:
> Insert an explicit call-out to git-lfs to fetch all blobs needed by
> each ref named in SRC_URI, during the fetch() function. This avoids
> the default behavior of Git LFS to wait until 'git checkout' to
> begin downloading the blobs pointed to by LFS records. Network
> access is not allowed at that point in the recipe's lifecycle.
> 
> Signed-off-by: Matt Hoosier <matt.hoosier@garmin.com>
> ---
>  bitbake/lib/bb/fetch2/git.py | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)

This did cause some fallout on the automated testing infrastructure:

https://autobuilder.yoctoproject.org/typhoon/#/builders/23/builds/3192

(the vulkan-samples recipe seems to find lfs contents somewhere we
didn't know about)

bitbake-selftest failing:

https://autobuilder.yoctoproject.org/typhoon/#/builders/79/builds/1745
https://autobuilder.yoctoproject.org/typhoon/#/builders/79/builds/1745

since we don't use/install git-lfs on the infrastructure.

We should perhaps skip the test if git-lfs isn't installed?

Cheers,

Richard




^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2] bitbake: fetch/git: download LFS content too during do_fetch
  2021-01-19 20:59 [PATCH] fetch/git: download LFS content too during do_fetch matt.hoosier
  2021-01-22 11:10 ` [bitbake-devel] " Richard Purdie
@ 2021-01-22 16:55 ` Matt Hoosier
  1 sibling, 0 replies; 4+ messages in thread
From: Matt Hoosier @ 2021-01-22 16:55 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Matt Hoosier

Insert an explicit pass to fetch all blobs needed by Git LFS, during the
fetch() function. This avoids the default behavior of Git LFS to wait
until 'git checkout' to begin downloading the blobs pointed to by LFS records.
Network access is not allowed at that point in the recipe's lifecycle.

[YOCTO #14191]

Signed-off-by: Matt Hoosier <matt.hoosier@garmin.com>
---
 bitbake/lib/bb/fetch2/git.py  | 44 ++++++++++++++++++++++++++++++++---
 bitbake/lib/bb/tests/fetch.py | 28 +++++++++++++++-------
 2 files changed, 61 insertions(+), 11 deletions(-)

diff --git a/bitbake/lib/bb/fetch2/git.py b/bitbake/lib/bb/fetch2/git.py
index df9538a60d..e3ba80a3f5 100644
--- a/bitbake/lib/bb/fetch2/git.py
+++ b/bitbake/lib/bb/fetch2/git.py
@@ -384,6 +384,35 @@ class Git(FetchMethod):
             if missing_rev:
                 raise bb.fetch2.FetchError("Unable to find revision %s even from upstream" % missing_rev)
 
+        if self._contains_lfs(ud, d, ud.clonedir) and self._need_lfs(ud):
+            # Unpack temporary working copy, use it to run 'git checkout' to force pre-fetching
+            # of all LFS blobs needed at the the srcrev.
+            #
+            # It would be nice to just do this inline here by running 'git-lfs fetch'
+            # on the bare clonedir, but that operation requires a working copy on some
+            # releases of Git LFS.
+            tmpdir = tempfile.mkdtemp(dir=d.getVar('DL_DIR'))
+            try:
+                # Do the checkout. This implicitly involves a Git LFS fetch.
+                self.unpack(ud, tmpdir, d)
+
+                # Scoop up a copy of any stuff that Git LFS downloaded. Merge them into
+                # the bare clonedir.
+                #
+                # As this procedure is invoked repeatedly on incremental fetches as
+                # a recipe's SRCREV is bumped throughout its lifetime, this will
+                # result in a gradual accumulation of LFS blobs in <ud.clonedir>/lfs
+                # corresponding to all the blobs reachable from the different revs
+                # fetched across time.
+                #
+                # Only do this if the unpack resulted in a .git/lfs directory being
+                # created; this only happens if at least one blob needed to be
+                # downloaded.
+                if os.path.exists(os.path.join(tmpdir, "git", ".git", "lfs")):
+                    runfetchcmd("tar -cf - lfs | tar -xf - -C %s" % ud.clonedir, d, workdir="%s/git/.git" % tmpdir)
+            finally:
+                bb.utils.remove(tmpdir, recurse=True)
+
     def build_mirror_data(self, ud, d):
         if ud.shallow and ud.write_shallow_tarballs:
             if not os.path.exists(ud.fullshallow):
@@ -479,7 +508,7 @@ class Git(FetchMethod):
         if os.path.exists(destdir):
             bb.utils.prunedir(destdir)
 
-        need_lfs = ud.parm.get("lfs", "1") == "1"
+        need_lfs = self._need_lfs(ud)
 
         if not need_lfs:
             ud.basecmd = "GIT_LFS_SKIP_SMUDGE=1 " + ud.basecmd
@@ -568,6 +597,9 @@ class Git(FetchMethod):
             raise bb.fetch2.FetchError("The command '%s' gave output with more then 1 line unexpectedly, output: '%s'" % (cmd, output))
         return output.split()[0] != "0"
 
+    def _need_lfs(self, ud):
+        return ud.parm.get("lfs", "1") == "1"
+
     def _contains_lfs(self, ud, d, wd):
         """
         Check if the repository has 'lfs' (large file) content
@@ -578,8 +610,14 @@ class Git(FetchMethod):
         else:
             branchname = "master"
 
-        cmd = "%s grep lfs origin/%s:.gitattributes | wc -l" % (
-            ud.basecmd, ud.branches[ud.names[0]])
+        # The bare clonedir doesn't use the remote names; it has the branch immediately.
+        if wd == ud.clonedir:
+            refname = ud.branches[ud.names[0]]
+        else:
+            refname = "origin/%s" % ud.branches[ud.names[0]]
+
+        cmd = "%s grep lfs %s:.gitattributes | wc -l" % (
+            ud.basecmd, refname)
 
         try:
             output = runfetchcmd(cmd, d, quiet=True, workdir=wd)
diff --git a/bitbake/lib/bb/tests/fetch.py b/bitbake/lib/bb/tests/fetch.py
index 1452d76151..031ebb7171 100644
--- a/bitbake/lib/bb/tests/fetch.py
+++ b/bitbake/lib/bb/tests/fetch.py
@@ -2089,13 +2089,14 @@ class GitLfsTest(FetcherTest):
             cwd = self.gitdir
         return bb.process.run(cmd, cwd=cwd)[0]
 
-    def fetch(self, uri=None):
+    def fetch(self, uri=None, download=True):
         uris = self.d.getVar('SRC_URI').split()
         uri = uris[0]
         d = self.d
 
         fetcher = bb.fetch2.Fetch(uris, d)
-        fetcher.download()
+        if download:
+            fetcher.download()
         ud = fetcher.ud[uri]
         return fetcher, ud
 
@@ -2105,16 +2106,21 @@ class GitLfsTest(FetcherTest):
         uri = 'git://%s;protocol=file;subdir=${S};lfs=1' % self.srcdir
         self.d.setVar('SRC_URI', uri)
 
-        fetcher, ud = self.fetch()
+        # Careful: suppress initial attempt at downloading until
+        # we know whether git-lfs is installed.
+        fetcher, ud = self.fetch(uri=None, download=False)
         self.assertIsNotNone(ud.method._find_git_lfs)
 
-        # If git-lfs can be found, the unpack should be successful
-        ud.method._find_git_lfs = lambda d: True
-        shutil.rmtree(self.gitdir, ignore_errors=True)
-        fetcher.unpack(self.d.getVar('WORKDIR'))
+        # If git-lfs can be found, the unpack should be successful. Only
+        # attempt this with the real live copy of git-lfs installed.
+        if ud.method._find_git_lfs(self.d):
+            fetcher.download()
+            shutil.rmtree(self.gitdir, ignore_errors=True)
+            fetcher.unpack(self.d.getVar('WORKDIR'))
 
         # If git-lfs cannot be found, the unpack should throw an error
         with self.assertRaises(bb.fetch2.FetchError):
+            fetcher.download()
             ud.method._find_git_lfs = lambda d: False
             shutil.rmtree(self.gitdir, ignore_errors=True)
             fetcher.unpack(self.d.getVar('WORKDIR'))
@@ -2125,10 +2131,16 @@ class GitLfsTest(FetcherTest):
         uri = 'git://%s;protocol=file;subdir=${S};lfs=0' % self.srcdir
         self.d.setVar('SRC_URI', uri)
 
+        # In contrast to test_lfs_enabled(), allow the implicit download
+        # done by self.fetch() to occur here. The point of this test case
+        # is to verify that the fetcher can survive even if the source
+        # repository has Git LFS usage configured.
         fetcher, ud = self.fetch()
         self.assertIsNotNone(ud.method._find_git_lfs)
 
-        # If git-lfs can be found, the unpack should be successful
+        # If git-lfs can be found, the unpack should be successful. A
+        # live copy of git-lfs is not required for this case, so
+        # unconditionally forge its presence.
         ud.method._find_git_lfs = lambda d: True
         shutil.rmtree(self.gitdir, ignore_errors=True)
         fetcher.unpack(self.d.getVar('WORKDIR'))
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [bitbake-devel] [PATCH] fetch/git: download LFS content too during do_fetch
  2021-01-22 11:10 ` [bitbake-devel] " Richard Purdie
@ 2021-01-22 16:57   ` Matt Hoosier
  0 siblings, 0 replies; 4+ messages in thread
From: Matt Hoosier @ 2021-01-22 16:57 UTC (permalink / raw)
  To: richard.purdie, bitbake-devel

On Fri, 2021-01-22 at 11:10 +0000, Richard Purdie wrote:
> CAUTION - EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know the content is safe.
> 
> 
> On Tue, 2021-01-19 at 14:59 -0600, Matt Hoosier via
> lists.openembedded.org wrote:
> 
> Insert an explicit call-out to git-lfs to fetch all blobs needed by
> each ref named in SRC_URI, during the fetch() function. This avoids
> the default behavior of Git LFS to wait until 'git checkout' to
> begin downloading the blobs pointed to by LFS records. Network
> access is not allowed at that point in the recipe's lifecycle.
> 
> Signed-off-by: Matt Hoosier <matt.hoosier@garmin.com>
> ---
>  bitbake/lib/bb/fetch2/git.py | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> This did cause some fallout on the automated testing infrastructure:
> 
> https://urldefense.com/v3/__https://autobuilder.yoctoproject.org/typhoon/*/builders/23/builds/3192__;Iw!!EJc4YC3iFmQ!F5VI227xahpD9W8qvFAQNkq-WFCtSj7ZJoFeD079am8wTXcaYOOlEx81JEtObz3yoA$
> 
> (the vulkan-samples recipe seems to find lfs contents somewhere we
> didn't know about)

I had a look, and this one ended up being caused by some reliance I
made on a feature of Git LFS to operate without a working copy. That
turns out not to be supported across all releases of Git LFS, so I
retooled the mechanism to make a temporary checkout, fetch what's
needed, and then manually merge the fetched blobs back into the main
(bare) clonedir.

> 
> bitbake-selftest failing:
> 
> https://urldefense.com/v3/__https://autobuilder.yoctoproject.org/typhoon/*/builders/79/builds/1745__;Iw!!EJc4YC3iFmQ!F5VI227xahpD9W8qvFAQNkq-WFCtSj7ZJoFeD079am8wTXcaYOOlEx81JEs6AvEIFw$
> https://urldefense.com/v3/__https://autobuilder.yoctoproject.org/typhoon/*/builders/79/builds/1745__;Iw!!EJc4YC3iFmQ!F5VI227xahpD9W8qvFAQNkq-WFCtSj7ZJoFeD079am8wTXcaYOOlEx81JEs6AvEIFw$
> 
> since we don't use/install git-lfs on the infrastructure.
> 
> We should perhaps skip the test if git-lfs isn't installed?

Yes, I agree. I've retooled the self-test case to avoid exercising the
"SRC_URI=....;lfs=1" case if the helper binary isn't installed.

I've updated the patch with all of this, available now at 
https://lists.openembedded.org/g/bitbake-devel/message/11918.

-Matt

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-01-22 16:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-19 20:59 [PATCH] fetch/git: download LFS content too during do_fetch matt.hoosier
2021-01-22 11:10 ` [bitbake-devel] " Richard Purdie
2021-01-22 16:57   ` Matt Hoosier
2021-01-22 16:55 ` [PATCH v2] bitbake: " Matt Hoosier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.