All of lore.kernel.org
 help / color / mirror / Atom feed
* Options for allowing Git mirror tarballs to avoid stomping each other
@ 2021-05-06 14:38 Matt Hoosier
  2021-05-08  9:51 ` [bitbake-devel] " Richard Purdie
  0 siblings, 1 reply; 2+ messages in thread
From: Matt Hoosier @ 2021-05-06 14:38 UTC (permalink / raw)
  To: bitbake-devel

[-- Attachment #1: Type: text/plain, Size: 2393 bytes --]

Hi,

The Git fetcher generally assumes that repositories it fetches are well-behaved, in the sense that content fetchable from a head today will continue to be reachable on that head tomorrow. There's the 'rebaseable' SRC_URI parameter that can be used to override that assumption on a repository-by-repository basis (it's set to 0 by default).

I'm concerned that when more than one project of different vintages are publishing mirror tarballs into, say, a common HTTP-served directory, this assumption is going to cause trouble.

The mirror tarballs of a Git repository are named based only on the URL (not the revision). So, for example, git://sourceware.org/git/glibc.git;branch=release/2.24/master;name=glibc maps to a mirror tarball named git2_sourceware.org.git.glibc.git.tar.gz.

Suppose for the sake of argument that I have product A on a version of OE whose glibc is fetching release 2.25 and product B built using a version of OE whose glibc recipe is fetching release 2.24.

Product A goes through its release cycle, which includes an exhaustive pass of fetching and generating mirror tarballs. These are copied over to an internal premirror site for safe-keeping in case upstream Git of glibc disappears or has an outage. So the pre-mirrored tarball includes revision history for stuff up through 2.25.

Now Product B does all the same steps. This will mean that its mirror tarball of glibc gets sent to the premirror server, and now the premirror tarball includes revision history only up through 2.24.

If Product A needs to reconstruct its sources using the premirror site, we're in trouble. The history for 2.25 isn't there anymore in the tarball filename that it expects.

All this has happened even though glibc itself is a well-behaved repository. No maintainer there has removed history from a branch or anything like that. The loss is just a side-effect of local operations done with two different releases of Yocto-based products.

Short of doctoring all the recipes to set 'rebaseable=1' or resigning myself to have a completely separate premirror directory for each release of each product, I can't see an obvious way to avoid these history-losing collisions.

Would it make sense to unconditionally build the revision number into the Git mirror tarballs so there's no tension between competing copies of products' Git mirror tarballs?

-Matt


[-- Attachment #2: Type: text/html, Size: 4551 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [bitbake-devel] Options for allowing Git mirror tarballs to avoid stomping each other
  2021-05-06 14:38 Options for allowing Git mirror tarballs to avoid stomping each other Matt Hoosier
@ 2021-05-08  9:51 ` Richard Purdie
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Purdie @ 2021-05-08  9:51 UTC (permalink / raw)
  To: matt.hoosier, bitbake-devel

On Thu, 2021-05-06 at 14:38 +0000, Matt Hoosier via lists.openembedded.org wrote:
> Hi,
>  
> The Git fetcher generally assumes that repositories it fetches are well-behaved,
> in the sense that content fetchable from a head today will continue to be reachable
> on that head tomorrow. There's the 'rebaseable' SRC_URI parameter that can be used
> to override that assumption on a repository-by-repository basis (it's set to 0 by default).
>  
> I'm concerned that when more than one project of different vintages are publishing 
> mirror tarballs into, say, a common HTTP-served directory, this assumption is going
> to cause trouble.
>  
> The mirror tarballs of a Git repository are named based only on the URL (not the
> revision). So, for example, git://sourceware.org/git/glibc.git;branch=release/2.24/master;name=glibc 
> maps to a mirror tarball named git2_sourceware.org.git.glibc.git.tar.gz.
>  
> Suppose for the sake of argument that I have product A on a version of OE whose 
> glibc is fetching release 2.25 and product B built using a version of OE whose 
> glibc recipe is fetching release 2.24.
>  
> Product A goes through its release cycle, which includes an exhaustive pass of 
> fetching and generating mirror tarballs. These are copied over to an internal 
> premirror site for safe-keeping in case upstream Git of glibc disappears or has
> an outage. So the pre-mirrored tarball includes revision history for stuff up 
> through 2.25.
>  
> Now Product B does all the same steps. This will mean that its mirror tarball 
> of glibc gets sent to the premirror server, and now the premirror tarball 
> includes revision history only up through 2.24.
>  
> If Product A needs to reconstruct its sources using the premirror site, we're in 
> trouble. The history for 2.25 isn't there anymore in the tarball filename that it
> expects.
>  
> All this has happened even though glibc itself is a well-behaved repository. No
> maintainer there has removed history from a branch or anything like that. The loss 
> is just a side-effect of local operations done with two different releases of 
> Yocto-based products.
>  
> Short of doctoring all the recipes to set 'rebaseable=1' or resigning myself to
> have a completely separate premirror directory for each release of each product, 
> I can't see an obvious way to avoid these history-losing collisions.
>  
> Would it make sense to unconditionally build the revision number into the Git 
> mirror tarballs so there's no tension between competing copies of products' Git
> mirror tarballs?

These are some really good questions. We did once have versioned tarballs as you
mention. The challenge is that they end up generating huge amounts of data in day
to day usage and users hate them for the inefficiency. If you put the whole git repo
metadata in, it's wasteful and if you don't there is no way to incrementally update.

We changed to a model where we're supposed to be putting all the git metadata into
the mirror tarballs. I suspect we made something more efficient somewhere (or git 
changed) and now we're only doing partial fetches. The fix is probably to ensure we
do fetch everything from the upstream when generation of the tarballs is enabled.

We do have to be careful as we also need to preserve things that were upstream but
where it was deleted there, e.g. where there was a "master" branch and it is now 
"main".

Cheers,

Richard






^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-05-08  9:51 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-06 14:38 Options for allowing Git mirror tarballs to avoid stomping each other Matt Hoosier
2021-05-08  9:51 ` [bitbake-devel] " Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.