All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 00/21] Git repository sharing for kernel (and other) repos
@ 2021-04-02 17:15 Paul Gortmaker
  2021-04-02 17:15 ` [PATCH 01/21] bitbake: fetch2/git: allow override of clone args with GITCLONEARGS Paul Gortmaker
                   ` (21 more replies)
  0 siblings, 22 replies; 29+ messages in thread
From: Paul Gortmaker @ 2021-04-02 17:15 UTC (permalink / raw)
  To: Bruce Ashfield, Richard Purdie; +Cc: linux-yocto, bitbake-devel

It doesn't seem that long ago (kernel 3.x era) that the kernel repo was
less than 1G in size, and while it wasn't actively shared, it was kind of
one of those "we'll deal with that later" type things.

But today, where more people care about CI/CD - if you make use of both
linux-yocto and linux-yocto-dev - well, you are looking at over 3G and
2.2G respectively.  So that is over 5G - and we still don't have any
effective sharing between them, or any formalized method for which another
derived kernel in another layer can share and avoid duplicating downloaded
data.  So, add in a 3rd party kernel, such as a SoC variant (e.g. r-pi
layer) and you've now got another 3G of kernel, and are up to 8G.

What I'm going to show is we can do all that in less than two.

OK, so what are we doing at a relatively high level?  We are treating
external repositories (mainline, stable, preempt-rt) as core building
blocks, and exposing them as "jumping off" reference points for our two
core kernel repositories, and for any other 3rd party kernel derivatives,
so that we never download the same common git objects twice.  In addition,
we selectively download/fetch from the stable and -rt repos so we only
pull a few megs and skip the dead end leaf nodes based on 2.x and 3.x and
4.x kernel versions we simply don't care about anymore.  We use the same
selective approach on our linux-yocto and linux-yocto-dev repos as well.

This is all achieved right at the fetch level, using what are essentially
"fetch-only" meta-packages that establish the core components to be
shared/referenced.  These fetch only recipes are in turn triggered by the
parent recipe having a fetch dependency on their desired reference.  

Right now, if you do:

     bitbake -c cleanall linux-yocto-dev ; bitbake -c fetch linux-yocto-dev

what will happen is that you will w-get 920M of a pre-mirror tarball of a
2014 copy of a 3.17 yocto-dev kernel and all its useless branches.  It
gets untarred (and deleted) and then a fetch chews and grinds on it doing
the mega uplift from 3.17 --> 5.12 for a final 2.2G repo spread across
five git packs (4 from 2014 and one new one).  This all takes 10-15
minutes even on a well connected fast machine.  Whee!

With this series, the above fetch will take ~30s because we never throw
away the mainline/stable/rt content, and hence only fetch less than 20MB
of yocto-dev specific git objects from the yocto server/repo.

The kernel as components:
-------------------------
Lets look at what we really need, in order to build any BSP on v5.10 from
linux-yocto, and similarly for v5.12 on linux-yocto-dev; showing the sizes
of repacked git bare repos, and references symbolically, with the core
mainline changes up to and including v5.10 as shared:

1.5G    git.kernel.org.torvalds.linux-5.10
35M     |--> git.kernel.org.stable.linux-5.10.y
7.1M         |--> git.kernel.org.preempt-rt.linux-5.10.y
38M               |--> git.yoctoproject.org.linux-yocto.git

        linux-5.10 (shared base)
148M    |--> git.kernel.org.torvalds.linux-master (v5.10 --> latest)
2.9M         |--> git.kernel.org.stable.linux-5.12.y
5.6M              |--> git.kernel.org.preempt-rt.linux-5.12.y
18M                    |--> git.yoctoproject.org.linux-yocto-dev.git

1516+35+7+38+148+3+6+18 = 1771.  So that is 1.7G.  Add in ~100MB more for
stable-5.4 and v5.4-rt and yocto v5.4 BSPs if you want them too.  That
would be pretty sweet, right?  A heck of a lot less than 5 gigs!   But can
we do that?  Actually, yes we can.  And we can open the door for other
sharing as well, and even split that big 1.5G chunk into smaller chunks.

A bit more on that sharing, since it is important, and easy to gloss over.
Consider the r-pi kernel repo, for no other reason than everyone is
familiar with it.  As of today, if you clone it w/o making any effort to
share/reference, you'll get just shy of 3G.  Instead, if you selectively
fetch the rpi-5.10 (and reference stable-5.10 above) you'll get 6M and
another 4M for rpi-5.12 (referencing linux-master above).  So you'd be
using 10M of that 3G (about 0.33%) but since we've not made it easy for
sharing, many custom SoC/SDK kernels are similar, and you clone a whole
bunch of stuff with 99% shared DNA with what you already have.

Similar sizes/factors are in play when we consider the linux-stable and
preempt-rt repositories - multiple gigs as a whole, but as we see above,
the selected v5.10 components are only 35M and 7M respectively.

Objectives:
-----------
There were some key goals at play, even if I sort of back-declare them
now with some level of revisionist reflection:

-enable sharing off of key universally public reference repos for everyone

-compartmentalize tech blocks to eliminate overlapping downloads of git objects

-reduce download size further where possible (exclusion of content/gc/repack)

-be ready to absorb further kernel growth and also EOL dead-end leaf nodes

-make only minimal specific objective based changes to the git fetcher

-remain compatible with the generally accepted Yocto workflows/design.

With that in mind, it makes sense to now consider the above requirements
in a bit more detail.

Consider the 1st and last together - share + compatible.  Adding a one
line patch to the git fetcher to support "--reference" would enable a
level of sharing but we'd have stuff spilling outside the normal download
paths, and absolute paths inside SRC_URIs and non-portable (pre)mirror
tarballs - failing miserably on the compatible goal.  So we start by only
allowing repos to reference others which are peers in the download dir -
no /home/bob/my-super-kernel type stuff.  This goes a long way towards
keeping a portable download dir and SRC_URIs clean of absolute paths, and
remaining compatible with the (pre)mirror type Yocto work flow and use
cases.  As will be seen, we don't even really expose the "--reference"
flag use outside of the internal fetcher code.

Technology blocks -- Looking back ~10 years, we did a lot more "kernel
patching" (via git am) than just using (and merging) pre-applied commits
in a branch of a technology feature - such as "stable" or "preempt-rt".
But now the Yocto tree builds on the -rt tree which builds on stable which
builds on mainline.  We can see how they are chained together in the
ascii nesting diagram above.  If we ignored this internal ordering, we
could end up with the stable content (git objects) duplicated inside the
preempt-rt repo and even duplicated in the linux-yocto repos themselves.
By exposing that building block on building block, we also get the
universally recognized share points (mainline, stable-5.x, rt-5.x) that
can be used by anyone to complete their full git object history.

Reduce/exclude - as time has moived on, more and more repos are of such
size and varying content, that using "--mirror" and "refs/*:refs/*" as a
"grab everything" approach simply doesn't scale.  So instead we allow
a selective clone/fetch of just what we need and exclude everything else.

Reduce/garbage-collect - there are opportunities for "low hanging fruit"
in terms of getting rid of unused references/tags in linux-yocto[-dev] but
since I can't directly control what is in those repos (or tar mirrors of
them) we'll have to pursue that outside of this changeset.

Reduce/repack - most people don't realize but the size of the pack you get
from a server depends on how generous the server is with its CPU time.
And that multiple packs can be significantly larger than one single pack.
While we can't control individual servers, we can (and should) consider
agressively repacking any repo we put into service for a (pre)mirror.  That
includes any repo content we encourage compartmentalizing in this changeset.

Growth - The epoch --> v5.10 block of mainline commits is static and need
not ever change from the group of objects we have in its pack today.  The
v5.10 is the effective merge-base between linux-yocto and -dev currently,
and as such makes a sensible line in the sand for sharing.  We currently
have v5.10+ up to mainline/master coverage in the next repo in the chain,
but it is trivial to create/insert a new block of static content covering
v5.10 to v5.13 between v5.10 and master in the future to absorb new growth
in manageable chunk sizes.  We already add/use the same technology here 
to opptionally "split" the v5.10 block for those who need sub-gigabyte
downloads (and repositories) for infrastructural reasons.

EOL - While the repo sizes for the stable and -rt chunks above may not
seem significant compared to the v5.10 basline size, they are specific to
a particular baseline as leaf-node content.  As such we can simply unlink
them from our SRC_URI driven reference chains after which they won't
appear (or download) for new users/builds but they will remain in people's
old download dirs and build workspaces.

Fetcher - we add support of sharing through a "--reference" like manner,
and enforce our relative path requirements there.  We also see a clear
need for selective clone/fetch as per above, so we allow clone args to
replace "--mirror" with "--single-branch" and an override of the fetch
args and their current default of everything via. "refs/*:refs/*".
Finally, if we want to have the stable-5.4, the stable-5.10 and the
stable-5.12 as separate content for independent introduction and EOL,
then we also have to allow them to be in separate repos in the download
dir, even though they all were sourced from the same server/path/repo.
This is achieved by allowing an optional recipe specified download name.

I won't go into more detail here, since the fetcher commits all have
proper commit logs and make sense in their own right, independent of the
larger overall goals described above.  Similarly, all the kernel recipe
changes provide the working example/context of how all the fetcher changes
are used.  So even though the two groups are separate repos, I've chosen
to present it all together against the poky repo, at least initially.

Next Steps:
-----------
With this being a functional implementation, it seems like a good time to
get other people looking at it.  Ideally step #1 will be getting general
agreement that this is something we need, something that is overdue,
and that the implementation as shown here makes sense in the absence of
any similar effort from anyone that does the same but in a better way.

From there, we'll want more people not just looking at it, but testing it
as well.   I know I want to write a commit (script?) that will avoid any
"transition tax" by prepopulating new repos with "old" already downloaded
git objects where we can.  And to add/do tests with my own popluated mirror
and NO_NETWORK, and also try to ensure nothing in BB_SHALLOW gets upset,
but I wasn't going to hold up starting a review of this any longer.

I suspect I can get some co-workers using/testing it too, but Yocto gets
used in a bunch of different ways by different groups, so we'll no doubt
have to do some additional fixups to ensure everybody gets the benefits of
this sharing.  But I'm hopeful that when people see the benefits above,
they'll pitch in to help take this the final mile by ensuring it works for
their use case as well.

I'm not too worried about pontificating out beyond that until we get past
the acceptance/testing hurdles outlined above.  So, please do have a read
of the commits, kick the tires, put on your bikeshedding clothes and grab
a brush, and lets see where it goes from here...

https://github.com/paulgortmaker/poky/compare/reference-RC1

---

Paul Gortmaker (21):
  bitbake: fetch2/git: allow override of clone args with GITCLONEARGS
  bitbake: fetch2/git: allow limiting upstream fetch refs to a subset
  bitbake: fetch2/git: allow optional git download name overrride
  bitbake: fetch2/git: allow specifying repos as static/unchanging
  bitbake: fetch2/git: ensure static repos have at least one refs/heads
  bitbake: fetch2/git: allow alt references within download dir
  bitbake: fetch2/git: append new altref line if/when SRC_URI changed value
  bitbake: fetch2/git: allow pack references within download dir
  bitbake: fetch2/git: use constant names for packs in static repos
  kernel: add basic boilerplate for fetch-only recipes
  kernel: add a fetch-only recipe for mainline v5.10 source
  kernel: allow splitting mainline v5.10 source download in two
  kernel: allow splitting mainline v5.10 source download in three
  kernel: allow splitting mainline v5.10 source download in four
  kernel: add recipe for linux-master (mainline latest)
  kernel: add stable fetch recipes for v5.4.x, v5.10.x and v5.12.x
  kernel: add preempt-rt fetch recipes for v5.4.x, v5.10.x and 5.12.x
  kernel: make v5.4.x Yocto recipes use shared source
  kernel: make v5.10.x Yocto recipes use shared source
  kernel: make linux-yocto-dev recipe use shared source
  kernel: disable (pre)mirror for linux-yocto and linux-yocto-dev

 .../bitbake-user-manual-fetching.rst          |  24 ++++
 bitbake/lib/bb/fetch2/git.py                  | 135 +++++++++++++++++-
 documentation/ref-manual/variables.rst        |  22 +++
 meta/recipes-kernel/linux/fetch-linux.inc     |  20 +++
 meta/recipes-kernel/linux/fetch-only.inc      |  20 +++
 meta/recipes-kernel/linux/fetch-rt.inc        |  25 ++++
 meta/recipes-kernel/linux/fetch-stable.inc    |  24 ++++
 meta/recipes-kernel/linux/linux-3.3.bb        |   9 ++
 meta/recipes-kernel/linux/linux-3.8.bb        |   9 ++
 meta/recipes-kernel/linux/linux-4.0.bb        |   9 ++
 meta/recipes-kernel/linux/linux-4.12.bb       |  10 ++
 meta/recipes-kernel/linux/linux-4.18.bb       |  10 ++
 meta/recipes-kernel/linux/linux-4.3.bb        |  10 ++
 meta/recipes-kernel/linux/linux-5.10.bb       |  38 +++++
 meta/recipes-kernel/linux/linux-master.bb     |  13 ++
 meta/recipes-kernel/linux/linux-rt-5.10.bb    |   9 ++
 meta/recipes-kernel/linux/linux-rt-5.12.bb    |  12 ++
 meta/recipes-kernel/linux/linux-rt-5.4.bb     |   9 ++
 meta/recipes-kernel/linux/linux-yocto-dev.bb  |  11 +-
 .../linux/linux-yocto-rt_5.10.bb              |   7 +-
 .../linux/linux-yocto-rt_5.4.bb               |   7 +-
 .../linux/linux-yocto-tiny_5.10.bb            |   7 +-
 .../linux/linux-yocto-tiny_5.4.bb             |   7 +-
 meta/recipes-kernel/linux/linux-yocto.inc     |  10 ++
 meta/recipes-kernel/linux/linux-yocto_5.10.bb |   7 +-
 meta/recipes-kernel/linux/linux-yocto_5.4.bb  |   7 +-
 meta/recipes-kernel/linux/stable-5.10.bb      |  10 ++
 meta/recipes-kernel/linux/stable-5.12.bb      |  16 +++
 meta/recipes-kernel/linux/stable-5.4.bb       |  11 ++
 29 files changed, 498 insertions(+), 10 deletions(-)
 create mode 100644 meta/recipes-kernel/linux/fetch-linux.inc
 create mode 100644 meta/recipes-kernel/linux/fetch-only.inc
 create mode 100644 meta/recipes-kernel/linux/fetch-rt.inc
 create mode 100644 meta/recipes-kernel/linux/fetch-stable.inc
 create mode 100644 meta/recipes-kernel/linux/linux-3.3.bb
 create mode 100644 meta/recipes-kernel/linux/linux-3.8.bb
 create mode 100644 meta/recipes-kernel/linux/linux-4.0.bb
 create mode 100644 meta/recipes-kernel/linux/linux-4.12.bb
 create mode 100644 meta/recipes-kernel/linux/linux-4.18.bb
 create mode 100644 meta/recipes-kernel/linux/linux-4.3.bb
 create mode 100644 meta/recipes-kernel/linux/linux-5.10.bb
 create mode 100644 meta/recipes-kernel/linux/linux-master.bb
 create mode 100644 meta/recipes-kernel/linux/linux-rt-5.10.bb
 create mode 100644 meta/recipes-kernel/linux/linux-rt-5.12.bb
 create mode 100644 meta/recipes-kernel/linux/linux-rt-5.4.bb
 create mode 100644 meta/recipes-kernel/linux/stable-5.10.bb
 create mode 100644 meta/recipes-kernel/linux/stable-5.12.bb
 create mode 100644 meta/recipes-kernel/linux/stable-5.4.bb

-- 
2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-04-03  8:33 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-02 17:15 [PATCH RFC 00/21] Git repository sharing for kernel (and other) repos Paul Gortmaker
2021-04-02 17:15 ` [PATCH 01/21] bitbake: fetch2/git: allow override of clone args with GITCLONEARGS Paul Gortmaker
2021-04-02 17:15 ` [PATCH 02/21] bitbake: fetch2/git: allow limiting upstream fetch refs to a subset Paul Gortmaker
2021-04-03  7:43   ` Richard Purdie
2021-04-02 17:15 ` [PATCH 03/21] bitbake: fetch2/git: allow optional git download name overrride Paul Gortmaker
2021-04-02 17:15 ` [PATCH 04/21] bitbake: fetch2/git: allow specifying repos as static/unchanging Paul Gortmaker
2021-04-02 17:15 ` [PATCH 05/21] bitbake: fetch2/git: ensure static repos have at least one refs/heads Paul Gortmaker
2021-04-02 17:15 ` [PATCH 06/21] bitbake: fetch2/git: allow alt references within download dir Paul Gortmaker
2021-04-02 17:15 ` [PATCH 07/21] bitbake: fetch2/git: append new altref line if/when SRC_URI changed value Paul Gortmaker
2021-04-02 17:15 ` [PATCH 08/21] bitbake: fetch2/git: allow pack references within download dir Paul Gortmaker
2021-04-02 17:15 ` [PATCH 09/21] bitbake: fetch2/git: use constant names for packs in static repos Paul Gortmaker
2021-04-02 17:15 ` [PATCH 10/21] kernel: add basic boilerplate for fetch-only recipes Paul Gortmaker
2021-04-02 17:15 ` [PATCH 11/21] kernel: add a fetch-only recipe for mainline v5.10 source Paul Gortmaker
2021-04-02 20:13   ` Bruce Ashfield
2021-04-02 17:15 ` [PATCH 12/21] kernel: allow splitting mainline v5.10 source download in two Paul Gortmaker
2021-04-02 17:15 ` [PATCH 13/21] kernel: allow splitting mainline v5.10 source download in three Paul Gortmaker
2021-04-02 17:15 ` [PATCH 14/21] kernel: allow splitting mainline v5.10 source download in four Paul Gortmaker
2021-04-02 17:15 ` [PATCH 15/21] kernel: add recipe for linux-master (mainline latest) Paul Gortmaker
2021-04-02 20:16   ` Bruce Ashfield
2021-04-02 17:15 ` [PATCH 16/21] kernel: add stable fetch recipes for v5.4.x, v5.10.x and v5.12.x Paul Gortmaker
2021-04-02 17:15 ` [PATCH 17/21] kernel: add preempt-rt fetch recipes for v5.4.x, v5.10.x and 5.12.x Paul Gortmaker
2021-04-02 17:15 ` [PATCH 18/21] kernel: make v5.4.x Yocto recipes use shared source Paul Gortmaker
2021-04-02 17:15 ` [PATCH 19/21] kernel: make v5.10.x " Paul Gortmaker
2021-04-02 17:15 ` [PATCH 20/21] kernel: make linux-yocto-dev recipe " Paul Gortmaker
2021-04-02 17:15 ` [PATCH 21/21] kernel: disable (pre)mirror for linux-yocto and linux-yocto-dev Paul Gortmaker
2021-04-02 20:19   ` Bruce Ashfield
2021-04-02 22:14 ` [PATCH RFC 00/21] Git repository sharing for kernel (and other) repos Richard Purdie
2021-04-03  1:44   ` Paul Gortmaker
2021-04-03  8:33     ` Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.