All of lore.kernel.org
 help / color / mirror / Atom feed
* Proposing changes to the OpenBMC tree (to make upstreaming easier)
@ 2022-04-04 18:28 Ed Tanous
  2022-04-06  2:19 ` Andrew Jeffery
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Ed Tanous @ 2022-04-04 18:28 UTC (permalink / raw)
  To: OpenBMC Maillist; +Cc: Andrew Jeffery, Brad Bishop

The OpenBMC development process as it stands is difficult for people
new to the project to understand, which severely limits our ability to
onboard new maintainers, developers, and groups which would otherwise
contribute major features to upstream, but don't have the technical
expertise to do so.  This initiative, much like others before it[1] is
attempting to reduce the toil and OpenBMC-specific processes of
passing changes amongst the community, and move things to being more
like other projects that have largely solved this problem already.

To that end, I'd like to propose a change to the way we structure our
repositories within the project: specifically, putting (almost) all of
the Linux Foundation OpenBMC owned code into a single repo that we can
version as a single entity, rather than spreading out amongst many
repos.  In practice, this would have some significant advantages:

- The tree would be easily shareable amongst the various people
working on OpenBMC, without having to rely on a single-source Gerrit
instance.  Git is designed to be distributed, but if our recipe files
point at other repositories, it largely defeats a lot of this
capability.  Today, if you want to share a tree that has a change in
it, you have to fork the main tree, then fork every single subproject
you've made modifications to, then update the main tree to point to
your forks.  This gets very onerous over time, especially for simple
commits.  Having maintained several different companies forks
personally, and spoken to many others having problems with the same,
adding major features are difficult to test and rebase because of
this.  Moving the code to a single tree makes a lot of the toil of
tagging and modifying local trees a lot more manageable, as a series
of well-documented git commands (generally git rebase[2]).  It also
increases the likelihood that someone pulls down the fork to test it
if it's highly likely that they can apply it to their own tree in a
single command.

- There would be a reduction in reviews.  Today, anytime a person
wants to make a change that would involve any part of the tree,
there's at least 2 code reviews, one for the commit, and one for the
recipe bump.  Compared to a single tree, this at least doubles the
number of reviews we need to process.  For changes that want to make
any change to a few subsystems, as is the case when developing a
feature, they require 2 X <number of project changes> reviews, all of
which need to be synchronized.  There is a well documented problem
where we have no official way to synchronize merging of changes to
userspace applications within a bump without manual human
intervention.  This would largely render that problem moot.

- It would allow most developers to not need to understand Yocto at
all to do their day to day work on existing applications.  No more
"devtool modify", and related SRCREV bumps.  This will help most of
the new developers on the project with a lower mental load, which will
mean people are able to ramp up faster..

- It would give an opportunity for individuals and companies to "own"
well-supported public forks (ie Redhat) of the codebase, which would
increase participation in the project overall.  This already happens
quite a bit, but in practice, the forks that do it squash history,
making it nearly impossible to get their changes upstreamed from an
outside entity.

- It would centralize the bug databases.  Today, bugs filed against
sub projects tend to not get answered.  Having all the bugs in
openbmc/openbmc would help in the future to avoid duplicating bugs
across projects.

- Would increase the likelihood that someone contributes a patch,
especially a patch written by someone else.  If contributing a patch
was just a matter of cherry-picking a tree of commits and submitting
it to gerrit, it's a lot more likely that people would do it.

- Greatly increases the ease with which stats are collected.
Questions like: How many patches were submitted last year?  How many
lines of code changed between commit A and commit B?  Where was this
regression injected (ie git bisect)?  How much of our codebase is C++?
How many users of the dbus Sensor.Value interface are there?  Are all
easily answered in one liner git commands once this change is done.

- New features no longer require single-point-of-contact core
maintainer processes (ie, creating a repo for changes, setting up
maintainer groups, ect) and can just be submitted as a series of
patches to openbmc/openbmc.

- Tree-wide changes (c++ standard, yocto updates, formatting, ect) are
much easier to accomplish in a small number of patches, or a series of
patches that is easy to pull and test as a unit.

In terms of concretely how we would accomplish this, I've put together
what such a tree would look like, and I'm looking for input on how it
could be improved.  Some key points on what it represents:

- All history for both openbmc and sub projects will be retained.
Commits are interleaved based on the date in which they were submitted
using custom tooling that was built on top of git fast-export and
fast-import.  All previously available tags will have similar tags in
the new repository pointing at their equivalent commits in the new
repository.

- Inclusive guidelines: To make progress toward an unrelated but
important goal at the same time, I'm recommending that the
openbmc/master branch will be left as-is, and the newly-created sha1
will be pushed to the branch openbmc/openbmc:main, to retain peoples
links to previous commits on master, and retain the exact project
history while at the same time moving the project to having more
inclusive naming, as has been documented previously[3].  At some point
in the future the master branch could be renamed and deprecated, but
this is considered out of scope for this specific change.

- Each individual sub-project will be given a folder within
openbmc/openbmc based on their current repository name.  While there
is an opportunity to reorganize in more specific ways (ie, put all
ipmi-oem handler repos in a folder) this proposal intentionally
doesn't, under the proposition that once this change is made, any sort
of folder rearranging will be much easier to accomplish, and to keep
the scope limited.

- Yocto recipes will be changed to point to their path equivalent, and
inherit externalsrc bbclass[4].  This workflow is exactly the workflow
devtool uses to point to local repositories during a "devtool modify",
so it's unlikely we will have incremental build-consistency issues
with this approach, as was a concern in the past.

- Places where we've forked other well supported projects (u-boot,
kernel, ect) will continue to point to the openbmc/<projectname> fork.
This is done to ensure that we don't inflict the same problem we're
attempting to solve in OpenBMC upon those working in the subproject
forks, and to reinforce to contributors that patches to these projects
should prefer submitting first to the relevant upstream.

- Subprojects that are intended to be reused outside of OpenBMC (ex
sdbusplus) will retain their previous commit, history, and trees, such
that they are usable outside the project.  This is intended to make
sure that the code that should be reusable by others remains so.

- The above intentionally makes no changes to our subtree update
process, which would remain the same process as is currently.  The
openbmc-specific autobump job in Jenkins would be disabled considering
it's no longer required in this approach.

- Most Gerrit patches would now be submitted to openbmc/openbmc.

My proposed version of this tree is pushed to a github fork here, and
is based on the tree from a few weeks ago:
https://github.com/edtanous/openbmc

It implements all the above for the main branch.  This tree is based
on the output of the automated tooling, and in the case where this
proposal is accepted, the tooling would be re-run to capture the state
of the tree at the point where we chose to make this change.

The tool I wrote to generate this tree is also published, if you're
interested in how this tree was built, and is quite interesting in its
use of git export/import [5], but functionally, I would not expect
that tooling to survive after this transition is made.

Let me know what you think.

-Ed

[1] https://lore.kernel.org/openbmc/CACWQX821ADQCrekLj_bGAu=1SSLCv5pTee7jaoVo2Zs6havgnA@mail.gmail.com/
[2] https://git-scm.com/docs/git-rebase
[3] https://github.com/openbmc/docs/blob/master/CONTRIBUTING.md#inclusive-naming
[4] https://www.yoctoproject.org/docs/1.8/ref-manual/ref-manual.html#ref-classes-externalsrc
[5] https://github.com/edtanous/obmc-repo-combine/blob/main/combine

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-04-04 18:28 Proposing changes to the OpenBMC tree (to make upstreaming easier) Ed Tanous
@ 2022-04-06  2:19 ` Andrew Jeffery
  2022-04-06 15:54   ` Ed Tanous
  2022-05-19 21:12   ` Cody Smith
  2022-04-06 20:06 ` Patrick Williams
  2022-04-12  7:23 ` Heyi Guo
  2 siblings, 2 replies; 21+ messages in thread
From: Andrew Jeffery @ 2022-04-06  2:19 UTC (permalink / raw)
  To: Ed Tanous, OpenBMC Maillist; +Cc: Brad Bishop

Hi Ed,

I think what's below largely points to a bit of an identity crisis for
the project, on a couple of fronts. Fundamentally OpenBMC is a distro
(or as Yocto likes to point out, a meta-distro), and we can:

1. Identify as a traditional OSS distro: An integration of otherwise
   independent applications

2. Identify as an appliance distro: The distro and the
   applications are a monolith

You're proposing 2, while I think there exists some tension towards 1.

With the amount of custom userspace we've always kinda sat in-between.
I'd like to see libraries and applications that have use cases outside
of OpenBMC be accessible to people with those external use cases,
without being burdened by understanding the rest of the OpenBMC context.
I have a concern that by integrating things in the way you're proposing
it will lead to more inertia there (e.g. for implementations of
standards MCTP or PLDM (libmctp and libpldm)).

On Tue, 5 Apr 2022, at 03:58, Ed Tanous wrote:
> The OpenBMC development process as it stands is difficult for people
> new to the project to understand, which severely limits our ability to
> onboard new maintainers, developers, and groups which would otherwise
> contribute major features to upstream, but don't have the technical
> expertise to do so.  This initiative, much like others before it[1] is
> attempting to reduce the toil and OpenBMC-specific processes of
> passing changes amongst the community, and move things to being more
> like other projects that have largely solved this problem already.

Can you be more specific about which projects here? Do you have links 
to examples?

>
> To that end, I'd like to propose a change to the way we structure our
> repositories within the project: specifically, putting (almost) all of
> the Linux Foundation OpenBMC owned code into a single repo that we can
> version as a single entity, rather than spreading out amongst many
> repos.  In practice, this would have some significant advantages:
>
> - The tree would be easily shareable amongst the various people
> working on OpenBMC, without having to rely on a single-source Gerrit
> instance.  Git is designed to be distributed, but if our recipe files
> point at other repositories, it largely defeats a lot of this
> capability.  Today, if you want to share a tree that has a change in
> it, you have to fork the main tree, then fork every single subproject
> you've made modifications to, then update the main tree to point to
> your forks. 

This isn't true, as you can add patches in the OpenBMC tree.

CI prevents these from being submitted, as it should, but there's nothing to
stop anyone using the `devtool modify ...` / `devtool finish ...` and
committing the result as a workflow to exchange state (I do this)?

Is the issue instead with devtool? Is it bad? Is the learning curve too steep?
It is at least the Yocto workflow.

> This gets very onerous over time, especially for simple
> commits.  Having maintained several different companies forks
> personally, and spoken to many others having problems with the same,
> adding major features are difficult to test and rebase because of
> this.  Moving the code to a single tree makes a lot of the toil of
> tagging and modifying local trees a lot more manageable, as a series
> of well-documented git commands (generally git rebase[2]).  It also
> increases the likelihood that someone pulls down the fork to test it
> if it's highly likely that they can apply it to their own tree in a
> single command.

Again, this is moot if the patches are applied in-tree.

>
> - There would be a reduction in reviews.  Today, anytime a person
> wants to make a change that would involve any part of the tree,
> there's at least 2 code reviews, one for the commit, and one for the
> recipe bump.  Compared to a single tree, this at least doubles the
> number of reviews we need to process.

Is there more work? Yes.

Is it always double? No. Is it sometimes double? Yes.

Often bumps batch multiple application commits. I think this paragraph 
overstates the problem somewhat, but what it does get right is 
identifying that *some* overhead exists.

>  For changes that want to make
> any change to a few subsystems, as is the case when developing a
> feature, they require 2 X <number of project changes> reviews, all of
> which need to be synchronized.

Same issue as above here.

> There is a well documented problem
> where we have no official way to synchronize merging of changes to
> userspace applications within a bump without manual human
> intervention.  This would largely render that problem moot.

Right, this can be hard to handle.

It can be mitigated by versioning interfaces (which the D-Bus spec 
calls out[6][7] but OpenBMC fails to do (?)) and supporting multiple 
interfaces for the transition period.

That said, that's also more work, and so needs to be considered in the 
set of trade-offs.

[6] https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-interface
[7] https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-bus

>
> - It would allow most developers to not need to understand Yocto at
> all to do their day to day work on existing applications.  No more
> "devtool modify", and related SRCREV bumps.  This will help most of
> the new developers on the project with a lower mental load, which will
> mean people are able to ramp up faster..

Okay. So devtool is seen as an issue.

Can we improve its visibility and any education around it? Or is it a 
lost cause? If so, why?

Separately, I'm concerned this is an attempt to shield people from
skills that help them work with upstream Yocto. OpenBMC feels like it's
a bit of an on-ramp for open-source contributions for people who have
worked in what was previously quite a proprietary environment. We tried
shielding people in the past wrt kernel contributions, and that failed
pretty spectacularly. We (at least Joel and I) now encourage people to
work with upstream directly *and support them in the process of doing
that* rather than trying to mitigate some of the difficulties with
working upstream by avoiding them.

>
> - It would give an opportunity for individuals and companies to "own"
> well-supported public forks (ie Redhat) of the codebase, which would
> increase participation in the project overall.  This already happens
> quite a bit, but in practice, the forks that do it squash history,
> making it nearly impossible to get their changes upstreamed from an
> outside entity.

Not sure this is something we want to encourage, even if it happens in 
practice.

>
> - It would centralize the bug databases.  Today, bugs filed against
> sub projects tend to not get answered. 

Do you have some numbers handy?

> Having all the bugs in
> openbmc/openbmc would help in the future to avoid duplicating bugs
> across projects.

Has this actually been a problem?

>
> - Would increase the likelihood that someone contributes a patch,
> especially a patch written by someone else.  If contributing a patch
> was just a matter of cherry-picking a tree of commits and submitting
> it to gerrit, it's a lot more likely that people would do it.

It sounds plausible, but again, some evidence for this would be helpful.

Why is this easier than submitting the patches to the application repo?

> My proposed version of this tree is pushed to a github fork here, and
> is based on the tree from a few weeks ago:
> https://github.com/edtanous/openbmc
>
> It implements all the above for the main branch.  This tree is based
> on the output of the automated tooling, and in the case where this
> proposal is accepted, the tooling would be re-run to capture the state
> of the tree at the point where we chose to make this change.
>
> The tool I wrote to generate this tree is also published, if you're
> interested in how this tree was built, and is quite interesting in its
> use of git export/import [5], but functionally, I would not expect
> that tooling to survive after this transition is made.

I think it would be good to capture the script in openbmc-tools if we 
choose to go ahead with this, mainly as a record of how we achieved it.

Andrew

>
> [1] 
> https://lore.kernel.org/openbmc/CACWQX821ADQCrekLj_bGAu=1SSLCv5pTee7jaoVo2Zs6havgnA@mail.gmail.com/
> [2] https://git-scm.com/docs/git-rebase
> [3] 
> https://github.com/openbmc/docs/blob/master/CONTRIBUTING.md#inclusive-naming
> [4] 
> https://www.yoctoproject.org/docs/1.8/ref-manual/ref-manual.html#ref-classes-externalsrc
> [5] https://github.com/edtanous/obmc-repo-combine/blob/main/combine

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-04-06  2:19 ` Andrew Jeffery
@ 2022-04-06 15:54   ` Ed Tanous
  2022-04-06 17:28     ` Patrick Williams
  2022-05-19 21:12   ` Cody Smith
  1 sibling, 1 reply; 21+ messages in thread
From: Ed Tanous @ 2022-04-06 15:54 UTC (permalink / raw)
  To: Andrew Jeffery; +Cc: OpenBMC Maillist, Brad Bishop

On Tue, Apr 5, 2022 at 7:22 PM Andrew Jeffery <andrew@aj.id.au> wrote:
>
> Hi Ed,
>
> I think what's below largely points to a bit of an identity crisis for
> the project, on a couple of fronts. Fundamentally OpenBMC is a distro
> (or as Yocto likes to point out, a meta-distro), and we can:
>
> 1. Identify as a traditional OSS distro: An integration of otherwise
>    independent applications
>
> 2. Identify as an appliance distro: The distro and the
>    applications are a monolith
>
> You're proposing 2, while I think there exists some tension towards 1.

I didn't really think of it as a monolith in anything more than the
codebase itself.  Admittedly, having one repository could potentially
lead to lower friction in abusing interfaces, but that should be
pretty doable to safeguard against with some guidelines.  FWIW,
considering that individual OpenBMC applications don't maintain
version numbers, or version to version guarantees would imply that
we're already treating it as a monolith for some portions of the
codebase (not saying this is good or bad).

>
> With the amount of custom userspace we've always kinda sat in-between.
> I'd like to see libraries and applications that have use cases outside
> of OpenBMC be accessible to people with those external use cases,
> without being burdened by understanding the rest of the OpenBMC context.
> I have a concern that by integrating things in the way you're proposing
> it will lead to more inertia there (e.g. for implementations of
> standards MCTP or PLDM (libmctp and libpldm)).


I had assumed that libmctp and libpldm fell into the "intended to be
used outside the project" category and would retain their own
repositories, given that they publish interfaces that are not OpenBMC
specific, but lots of things within the project are openbmc-specific,
including the daemons that attach both of those libraries to dbus.
The only real difference here is that it makes the difference
explicit.

>
> On Tue, 5 Apr 2022, at 03:58, Ed Tanous wrote:
> > The OpenBMC development process as it stands is difficult for people
> > new to the project to understand, which severely limits our ability to
> > onboard new maintainers, developers, and groups which would otherwise
> > contribute major features to upstream, but don't have the technical
> > expertise to do so.  This initiative, much like others before it[1] is
> > attempting to reduce the toil and OpenBMC-specific processes of
> > passing changes amongst the community, and move things to being more
> > like other projects that have largely solved this problem already.
>
> Can you be more specific about which projects here? Do you have links
> to examples?

Linux is the primary example I think of, which hosts libraries within
it (libbpf, ect) that are meant to be used elsewhere.  u-root, u-bmc
are other examples of firmware that put all of their application
specific code in a single repository.  As a counter example, openwrt
sticks with multiple repositories, but seems to have significantly
fewer repositories in total than we do, despite being a much older
project.

As a side note, one thing I find interesting is that they host staging
branches for contributors/maintainers on their main project page.
That's a different model than I've seen elsewhere.  Unrelated to this
dicussion, but interesting nonetheless.
https://git.openwrt.org/

>
> >
> > To that end, I'd like to propose a change to the way we structure our
> > repositories within the project: specifically, putting (almost) all of
> > the Linux Foundation OpenBMC owned code into a single repo that we can
> > version as a single entity, rather than spreading out amongst many
> > repos.  In practice, this would have some significant advantages:
> >
> > - The tree would be easily shareable amongst the various people
> > working on OpenBMC, without having to rely on a single-source Gerrit
> > instance.  Git is designed to be distributed, but if our recipe files
> > point at other repositories, it largely defeats a lot of this
> > capability.  Today, if you want to share a tree that has a change in
> > it, you have to fork the main tree, then fork every single subproject
> > you've made modifications to, then update the main tree to point to
> > your forks.
>
> This isn't true, as you can add patches in the OpenBMC tree.

As most people that have stacked patches can attest to, managing patch
files in a meta layer over time is very difficult (unless you meant
something else).  Yes, I should not have said "have to", but a number
of the forks that I've seen have ended up resorting to that. Example:
(https://github.com/opencomputeproject/HWMgmt-MegaRAC-OpenEdition/tree/master/openbmc_modules)

>
> CI prevents these from being submitted, as it should, but there's nothing to
> stop anyone using the `devtool modify ...` / `devtool finish ...` and
> committing the result as a workflow to exchange state (I do this)?
>
> Is the issue instead with devtool? Is it bad? Is the learning curve too steep?
> It is at least the Yocto workflow.

devtool provides just one form of friction;  There are also a number
of cases where devtool modify and devtool finish fail in non obvious
ways (usually due to some not-quite-optimal yocto handling in a meta
layer, or patches being distributed across meta layers).  The biggest
key is that it's yet another tool that seasoned firmware developers
have to learn to jump into our codebase.  Each tool adds some friction
compared to if it just didn't exist.  It also adds the "which recipe
do I need to devtool to modify the webui?" type trouble that people
have talked about many times.

>
> > This gets very onerous over time, especially for simple
> > commits.  Having maintained several different companies forks
> > personally, and spoken to many others having problems with the same,
> > adding major features are difficult to test and rebase because of
> > this.  Moving the code to a single tree makes a lot of the toil of
> > tagging and modifying local trees a lot more manageable, as a series
> > of well-documented git commands (generally git rebase[2]).  It also
> > increases the likelihood that someone pulls down the fork to test it
> > if it's highly likely that they can apply it to their own tree in a
> > single command.
>
> Again, this is moot if the patches are applied in-tree.

Meta layer patch files in my experience tend to not layer well, and
require a good amount of maintenance.  They also have problems where
they're not versioned against a git base, so there's no guarantees of
where in the history the patches were forked from, and whether they
apply to your tree, or if they fail, what patches likely caused them
to fail.  Admittedly, tracking them in git isn't perfect either, but
at least it publishes "this is the source base these were based on" to
give some indication.  In practice, the public forks I've seen just
embed the custom meta layer within an openbmc tree to solve this
problem.
https://github.com/Intel-BMC/openbmc/tree/intel/meta-openbmc-mods
https://github.com/HewlettPackard/openbmc


>
> >
> > - There would be a reduction in reviews.  Today, anytime a person
> > wants to make a change that would involve any part of the tree,
> > there's at least 2 code reviews, one for the commit, and one for the
> > recipe bump.  Compared to a single tree, this at least doubles the
> > number of reviews we need to process.
>
> Is there more work? Yes.
>
> Is it always double? No. Is it sometimes double? Yes.
>
> Often bumps batch multiple application commits. I think this paragraph
> overstates the problem somewhat, but what it does get right is
> identifying that *some* overhead exists.

To be clear, I said doubles the number of reviews, not doubles the
work, completely agreed.  The key point here is that there is work
that in this model would now go to essentially zero.

Admittedly, not every commit gets easier, but there are a lot of
commits that would now synchronize better.  Just this morning I had a
case of this in bmcweb, so it happens a lot.  I also think that having
one or a smaller number of reviews would concentrate a lot of the
discussion when we make treewide changes.  (OWNERS files, ect)  When
they get distributed among many reviews, in my experience it tends to
dilute the discussion a bit.
>
> >  For changes that want to make
> > any change to a few subsystems, as is the case when developing a
> > feature, they require 2 X <number of project changes> reviews, all of
> > which need to be synchronized.
>
> Same issue as above here.
>
> > There is a well documented problem
> > where we have no official way to synchronize merging of changes to
> > userspace applications within a bump without manual human
> > intervention.  This would largely render that problem moot.
>
> Right, this can be hard to handle.
>
> It can be mitigated by versioning interfaces (which the D-Bus spec
> calls out[6][7] but OpenBMC fails to do (?)) and supporting multiple
> interfaces for the transition period.
>
> That said, that's also more work, and so needs to be considered in the
> set of trade-offs.

Totally agreed;  Avoiding "more work" is the point of this whole proposal.

>
> [6] https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-interface
> [7] https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-bus
>
> >
> > - It would allow most developers to not need to understand Yocto at
> > all to do their day to day work on existing applications.  No more
> > "devtool modify", and related SRCREV bumps.  This will help most of
> > the new developers on the project with a lower mental load, which will
> > mean people are able to ramp up faster..
>
> Okay. So devtool is seen as an issue.
>
> Can we improve its visibility and any education around it? Or is it a
> lost cause? If so, why?

Lots of experienced openbmc developers use devtool every day, I'm not
saying it's not useful, it's just one more tool.  "more documentation"
I don't think solves this, given that devtool is already well
documented in the multi-hundred page mega manual;  Between openbmc
docs, yocto docs, and the docs of the projects we pull in, there's
already more documentation than a developer can read when ramping up.
The best kind of documentation is that kind that doesn't need to
exist;  The second best kind is where we can point to very well
used-in-industry and maintained projects (ie git) and rely on their
documentation.

>
> Separately, I'm concerned this is an attempt to shield people from
> skills that help them work with upstream Yocto. OpenBMC feels like it's
> a bit of an on-ramp for open-source contributions for people who have
> worked in what was previously quite a proprietary environment. We tried
> shielding people in the past wrt kernel contributions, and that failed
> pretty spectacularly. We (at least Joel and I) now encourage people to
> work with upstream directly *and support them in the process of doing
> that* rather than trying to mitigate some of the difficulties with
> working upstream by avoiding them.

I'm not quite following the above, can you elaborate a little?  The
kernel, u-boot, and yocto would not be shielded in this model at all.
In fact, they would be less shielded in a way in that the repos look
very different from the "normal" which hopefully means people would
ask questions before immediately trying to modify them and push a
gerrit review.

>
> >
> > - It would give an opportunity for individuals and companies to "own"
> > well-supported public forks (ie Redhat) of the codebase, which would
> > increase participation in the project overall.  This already happens
> > quite a bit, but in practice, the forks that do it squash history,
> > making it nearly impossible to get their changes upstreamed from an
> > outside entity.
>
> Not sure this is something we want to encourage, even if it happens in
> practice.

I think when done properly, it would be a huge help to the project.
My main point is that this isn't something we can stop (companies and
individuals have and will continue to do it anyway), so would we
rather make their changes easier to ingest back to upstream?

>
> >
> > - It would centralize the bug databases.  Today, bugs filed against
> > sub projects tend to not get answered.
>
> Do you have some numbers handy?

I do not.  I can say that anecdotally the "you filed this bug against
the wrong project" happens quite often in the repositories I maintain,
and the lack of reasonable cross project "transfer the bug" semantics
makes this difficult (yes, admins can transfer bugs cross project, but
I'm pretty sure we don't want to call on core maintainers every time
we want to move things around.)  It would be quite helpful to the
project to have less than N bug trackers (might not necessarily be
one) to increase the odds that someone searches for and finds their
bug before filing a duplicate.

>
> > Having all the bugs in
> > openbmc/openbmc would help in the future to avoid duplicating bugs
> > across projects.
>
> Has this actually been a problem?

Duplication?  It happens from time to time.  Not being able to search
for a bug across the project happens a lot, and in our current model,
requires the user to know which component they are filing the bug
against.

>
> >
> > - Would increase the likelihood that someone contributes a patch,
> > especially a patch written by someone else.  If contributing a patch
> > was just a matter of cherry-picking a tree of commits and submitting
> > it to gerrit, it's a lot more likely that people would do it.
>
> It sounds plausible, but again, some evidence for this would be helpful.

The above is pretty subjective, I'm not sure how to collect evidence
aside from taking data after doing it.  Any kind of data you were
specifically looking for?

>
> Why is this easier than submitting the patches to the application repo?

Having one, non subjective place to submit all the userspace code
would mean one setup, one tree, only have to set up gerrit once (not
once per devtool per project).  If you want to manage your workspace
via git branches, even if they contain changes to multiple places,
there's now one place to do that.

>
> > My proposed version of this tree is pushed to a github fork here, and
> > is based on the tree from a few weeks ago:
> > https://github.com/edtanous/openbmc
> >
> > It implements all the above for the main branch.  This tree is based
> > on the output of the automated tooling, and in the case where this
> > proposal is accepted, the tooling would be re-run to capture the state
> > of the tree at the point where we chose to make this change.
> >
> > The tool I wrote to generate this tree is also published, if you're
> > interested in how this tree was built, and is quite interesting in its
> > use of git export/import [5], but functionally, I would not expect
> > that tooling to survive after this transition is made.
>
> I think it would be good to capture the script in openbmc-tools if we
> choose to go ahead with this, mainly as a record of how we achieved it.

ACK.  Seems reasonable.


>
> Andrew
>
> >
> > [1]
> > https://lore.kernel.org/openbmc/CACWQX821ADQCrekLj_bGAu=1SSLCv5pTee7jaoVo2Zs6havgnA@mail.gmail.com/
> > [2] https://git-scm.com/docs/git-rebase
> > [3]
> > https://github.com/openbmc/docs/blob/master/CONTRIBUTING.md#inclusive-naming
> > [4]
> > https://www.yoctoproject.org/docs/1.8/ref-manual/ref-manual.html#ref-classes-externalsrc
> > [5] https://github.com/edtanous/obmc-repo-combine/blob/main/combine

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-04-06 15:54   ` Ed Tanous
@ 2022-04-06 17:28     ` Patrick Williams
  2022-04-06 20:36       ` Benjamin Fair
  2022-04-07 15:39       ` Ed Tanous
  0 siblings, 2 replies; 21+ messages in thread
From: Patrick Williams @ 2022-04-06 17:28 UTC (permalink / raw)
  To: Ed Tanous; +Cc: Andrew Jeffery, OpenBMC Maillist, Brad Bishop

[-- Attachment #1: Type: text/plain, Size: 14796 bytes --]

I'll likely respond to the original post with more thoughts later as
well...

On Wed, Apr 06, 2022 at 08:54:28AM -0700, Ed Tanous wrote:
> On Tue, Apr 5, 2022 at 7:22 PM Andrew Jeffery <andrew@aj.id.au> wrote:
 
> >
> > With the amount of custom userspace we've always kinda sat in-between.
> > I'd like to see libraries and applications that have use cases outside
> > of OpenBMC be accessible to people with those external use cases,
> > without being burdened by understanding the rest of the OpenBMC context.
> > I have a concern that by integrating things in the way you're proposing
> > it will lead to more inertia there (e.g. for implementations of
> > standards MCTP or PLDM (libmctp and libpldm)).
> 
> 
> I had assumed that libmctp and libpldm fell into the "intended to be
> used outside the project" category and would retain their own
> repositories, given that they publish interfaces that are not OpenBMC
> specific, but lots of things within the project are openbmc-specific,
> including the daemons that attach both of those libraries to dbus.
> The only real difference here is that it makes the difference
> explicit.

It wasn't long ago that the TOF discussed some of these libraries w.r.t.
"intended to be used outside the project" and we really had trouble
determining clear language on what classified as this and what did not.
Actually, neither of these libraries were mentioned, but it was a recipe
contribution by someone pointing at a non-openbmc github repository.  We
couldn't really come up with a clear definition but we settled on
"intended to be used outside the project" recipes that also weren't in
the openbmc org needed to be submitted upstream to Yocto.

Are we going to be able to come up with a clear definition for this,
which is actually for code that is _within_ our org?  libpldm, for
instance, isn't even in a separate repository but covered as part of the
bigger PLDM with some special build flags for "library only".

> >
> > On Tue, 5 Apr 2022, at 03:58, Ed Tanous wrote:
> > > The OpenBMC development process as it stands is difficult for people
> > > new to the project to understand, which severely limits our ability to
> > > onboard new maintainers, developers, and groups which would otherwise
> > > contribute major features to upstream, but don't have the technical
> > > expertise to do so.  This initiative, much like others before it[1] is
> > > attempting to reduce the toil and OpenBMC-specific processes of
> > > passing changes amongst the community, and move things to being more
> > > like other projects that have largely solved this problem already.
> >
> > Can you be more specific about which projects here? Do you have links
> > to examples?
> 
> Linux is the primary example I think of, which hosts libraries within
> it (libbpf, ect) that are meant to be used elsewhere.  u-root, u-bmc
> are other examples of firmware that put all of their application
> specific code in a single repository.  As a counter example, openwrt
> sticks with multiple repositories, but seems to have significantly
> fewer repositories in total than we do, despite being a much older
> project.
> 
> As a side note, one thing I find interesting is that they host staging
> branches for contributors/maintainers on their main project page.
> That's a different model than I've seen elsewhere.  Unrelated to this
> dicussion, but interesting nonetheless.
> https://git.openwrt.org/

I was going to point to Android and OpenStack as two large open source
projects, which also use Gerrit, and seem to have no trouble with the
micro-repo model.

> > > To that end, I'd like to propose a change to the way we structure our
> > > repositories within the project: specifically, putting (almost) all of
> > > the Linux Foundation OpenBMC owned code into a single repo that we can
> > > version as a single entity, rather than spreading out amongst many
> > > repos.  In practice, this would have some significant advantages:
> > >
> > > - The tree would be easily shareable amongst the various people
> > > working on OpenBMC, without having to rely on a single-source Gerrit
> > > instance.  Git is designed to be distributed, but if our recipe files
> > > point at other repositories, it largely defeats a lot of this
> > > capability.  Today, if you want to share a tree that has a change in
> > > it, you have to fork the main tree, then fork every single subproject
> > > you've made modifications to, then update the main tree to point to
> > > your forks.
> >
> > This isn't true, as you can add patches in the OpenBMC tree.
> 
> As most people that have stacked patches can attest to, managing patch
> files in a meta layer over time is very difficult (unless you meant
> something else).  Yes, I should not have said "have to", but a number
> of the forks that I've seen have ended up resorting to that. Example:
> (https://github.com/opencomputeproject/HWMgmt-MegaRAC-OpenEdition/tree/master/openbmc_modules)

Why is managing patch files difficult?  Is it lack of documentation?

Upstream Yocto deals with patch files all the time.  The Facebook BMC
tree has machines in production that are still on Rocko-based Yocto
distributions and we have plenty of patch files we support there.

I wouldn't be surprised if Yocto doesn't already have tools to simplify
"git-format-patch" -> "package.bbappend" workflow, but if they don't I
could probably write something.

For what it is worth, I'm currently doing a change for sdbusplus that
requires fixes across tens of repositories (as I wrote about recently).
I pretty quickly hacked up this shell function in order to automatically
update my OpenBMC tree with commits from another repo that I've pushed
to Gerrit already.

https://github.com/williamspatrick/dotfiles/commit/df180ac2b74f2b7fcb6ae91302f0211bc49cb2e9

I don't see using 'git-format-patch' to create the patchfile instead as
too much additional effort.

> > CI prevents these from being submitted, as it should, but there's nothing to
> > stop anyone using the `devtool modify ...` / `devtool finish ...` and
> > committing the result as a workflow to exchange state (I do this)?
> >
> > Is the issue instead with devtool? Is it bad? Is the learning curve too steep?
> > It is at least the Yocto workflow.
> 
> devtool provides just one form of friction;  There are also a number
> of cases where devtool modify and devtool finish fail in non obvious
> ways (usually due to some not-quite-optimal yocto handling in a meta
> layer, or patches being distributed across meta layers).  The biggest
> key is that it's yet another tool that seasoned firmware developers
> have to learn to jump into our codebase.  Each tool adds some friction
> compared to if it just didn't exist.  It also adds the "which recipe
> do I need to devtool to modify the webui?" type trouble that people
> have talked about many times.

Do we have pointers to when devtool fails?  The only time I've seen it
are for recipes that aren't in the rootfs image: kernel and u-boot and
they've all been due to bugs in the image.bbclass on our part.  There
was actually a fix to one of the u-boot recipes very recently.

I personally don't use devtool all that much, but when I do I want it to
point at the central "workspace" of all the openbmc repos I already have
so I can get it to pick up code I already have in progress there.  Do we
need better documentation around those workflows?

(At one point there was a statement made that we didn't want tooling
written to assist with one workflow or another.  This was somewhat made
in reference to the `setup` script, but I think it had extensions that
made it that any workflow-related tools people have are hosted in their
own personal spaces and not talked about.  Maybe we need to change this
mentality.)

> > > This gets very onerous over time, especially for simple
> > > commits.  Having maintained several different companies forks
> > > personally, and spoken to many others having problems with the same,
> > > adding major features are difficult to test and rebase because of
> > > this.  Moving the code to a single tree makes a lot of the toil of
> > > tagging and modifying local trees a lot more manageable, as a series
> > > of well-documented git commands (generally git rebase[2]).  It also
> > > increases the likelihood that someone pulls down the fork to test it
> > > if it's highly likely that they can apply it to their own tree in a
> > > single command.
> >
> > Again, this is moot if the patches are applied in-tree.
> 
> Meta layer patch files in my experience tend to not layer well, and
> require a good amount of maintenance.  They also have problems where
> they're not versioned against a git base, so there's no guarantees of
> where in the history the patches were forked from, and whether they
> apply to your tree, or if they fail, what patches likely caused them
> to fail.  Admittedly, tracking them in git isn't perfect either, but
> at least it publishes "this is the source base these were based on" to
> give some indication.  In practice, the public forks I've seen just
> embed the custom meta layer within an openbmc tree to solve this
> problem.
> https://github.com/Intel-BMC/openbmc/tree/intel/meta-openbmc-mods
> https://github.com/HewlettPackard/openbmc

I could entirely be misunderstand what problem you're pointing out here.
Why would it matter "where in this history the patches were forked
from"?  Aren't they forked from whatever the SRCREV in the recipe says?
It is on the maintainer of the meta-layer to ensure they apply to that
revision.

Having all the code in one repo doesn't give you any more visibility as
to where the code was "forked from".  It ends up being exactly the same
as a patch file except that the patch file has been "applied" already to
the code.  You still don't have visibility to the underlying upstream
commit number.  And, I would suspect it is going to be even worse
because you're going to end up with back and forth merge commits trying
to pick up the latest upstream code in these forks.  You're not going to
have a nice git-submodule number indicating openbmc/openbmc was from
here.

In my opinion, this is a problem with how people maintaining these trees
are doing it and not a problem with how our code is organized.  In the
facebook/openbmc tree we use git-submodules internally to hold the
upstream trees.  For our github-side we have a script that updates them,
but there isn't a strong reason we couldn't do the same git-submodule
layout there (I think it is due to a deficiency in the way our
internal<=>external mirror tool works).

    https://github.com/facebook/openbmc/blob/helium/yocto_repos.sh

If people are not treating the openbmc/openbmc tree as an unchanged
blob, that is their fault and not ours.  Having all the source imported
in one repo doesn't really solve this either and in fact is likely to
make it worse because you now _can't_ treat openbmc/openbmc as an
unchanged blob because you're going to have to patch-in-tree any changes
to code you want to make.

> I also think that having
> one or a smaller number of reviews would concentrate a lot of the
> discussion when we make treewide changes.  (OWNERS files, ect)  When
> they get distributed among many reviews, in my experience it tends to
> dilute the discussion a bit.

I would argue this is actually a bad thing.  We're going to more likely
end up with large cross-repository commits that are harder to review,
require more people to review them (a larger set of OWNERS), and are
harder to revert.

If there is larger discussion to be had that should probably happen on
the mailing list anyhow.

> > > - It would give an opportunity for individuals and companies to "own"
> > > well-supported public forks (ie Redhat) of the codebase, which would
> > > increase participation in the project overall.  This already happens
> > > quite a bit, but in practice, the forks that do it squash history,
> > > making it nearly impossible to get their changes upstreamed from an
> > > outside entity.
> >
> > Not sure this is something we want to encourage, even if it happens in
> > practice.
> 
> I think when done properly, it would be a huge help to the project.
> My main point is that this isn't something we can stop (companies and
> individuals have and will continue to do it anyway), so would we
> rather make their changes easier to ingest back to upstream?

In my experience the difficulties with upstreaming are not related to
logistics of sending patches to Gerrit.  They are related to the effort
involved with getting other people in the community bought into what you're
trying to do.  Having all the code in one place alleviates 1% of the
upstreaming effort while doing nothing for the remaining 99%.

> > > - It would centralize the bug databases.  Today, bugs filed against
> > > sub projects tend to not get answered.
> >
> > Do you have some numbers handy?
> 
> I do not.  I can say that anecdotally the "you filed this bug against
> the wrong project" happens quite often in the repositories I maintain,
> and the lack of reasonable cross project "transfer the bug" semantics
> makes this difficult (yes, admins can transfer bugs cross project, but
> I'm pretty sure we don't want to call on core maintainers every time
> we want to move things around.)  It would be quite helpful to the
> project to have less than N bug trackers (might not necessarily be
> one) to increase the odds that someone searches for and finds their
> bug before filing a duplicate.

"bugs filed against a sub project tend to not get answered" and
"bugs are filed against the wrong project" are different problems;
you've shifted the discussion.

The first is a problem against our maintainers.  Having all the issues
in one repository doesn't improve that situation, but it actually makes
it worse because you're going to shift that burden on a few individuals
who pay attention to openbmc/openbmc issues while doing nothing to fix
the certain-maintainers-don't-respond issue.

> > > Having all the bugs in
> > > openbmc/openbmc would help in the future to avoid duplicating bugs
> > > across projects.
> >
> > Has this actually been a problem?
> 
> Duplication?  It happens from time to time.  Not being able to search
> for a bug across the project happens a lot, and in our current model,
> requires the user to know which component they are filing the bug
> against.

Why can't people search bugs or code at an org-level?  I do it all the time.

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-04-04 18:28 Proposing changes to the OpenBMC tree (to make upstreaming easier) Ed Tanous
  2022-04-06  2:19 ` Andrew Jeffery
@ 2022-04-06 20:06 ` Patrick Williams
  2022-05-23 16:53   ` Ed Tanous
  2022-04-12  7:23 ` Heyi Guo
  2 siblings, 1 reply; 21+ messages in thread
From: Patrick Williams @ 2022-04-06 20:06 UTC (permalink / raw)
  To: Ed Tanous; +Cc: Andrew Jeffery, OpenBMC Maillist, Brad Bishop

[-- Attachment #1: Type: text/plain, Size: 25527 bytes --]

On Mon, Apr 04, 2022 at 11:28:06AM -0700, Ed Tanous wrote:
> The OpenBMC development process as it stands is difficult for people
> new to the project to understand, which severely limits our ability to
> onboard new maintainers, developers, and groups which would otherwise
> contribute major features to upstream, but don't have the technical
> expertise to do so.  

This is, to me, a rather bold and surprising statement.

You're saying there are people out there who can work on an embedded
system like the BMC at layers of the stack where they might be modifying
everything from the kernel up to REST APIs to "contribute major
features" and yet they can't figure out how to manipulate multiple git
repositories?

I get that I've heard "Yocto has a big learning curve" over and over,
but you're not actually proposing moving away from Yocto and I admin
that it is quite likely the "Yocto has a big learning curve" is actually a
conflation of some of these points that you're raising along with Yocto
itself, but still, this is worded rather boldly.

> This initiative, much like others before it[1] is
> attempting to reduce the toil and OpenBMC-specific processes of
> passing changes amongst the community, and move things to being more
> like other projects that have largely solved this problem already.

I'm not sure what is OpenBMC-specific about this process though.

There are some large successful projects that work in a mono-repo and some
which work in a micro-repo model.  Android and OpenStack are both micro-repo
models that use Gerrit, just like us, and have similar "how do you put the
dependencies together" problems, which they have solved: Android with `repo`
and OpenStack with `zuul`.  `repo` and `Yocto recipes` are pretty similar
mechanics to me (and I would like to see something more `zuul`-like in our
CI).

Many open source communities work in a micro-repo model; the challenge
is always about pulling dependencies together.  Newer languages like
Rust and JS make this easier, but again, there isn't much fundamentally
different than Yocto recipes out there.

The mechanics of how we maintain our dependencies might be different,
because we're built on Yocto, but I would argue that this proposal is
itself _more_ OpenBMC-specific and not less.  I say this because we are
deviating from how every other Yocto project I know of works and how
most other "Linux Distribution" projects work (which is what we are).

> - The tree would be easily shareable amongst the various people
> working on OpenBMC, without having to rely on a single-source Gerrit
> instance.  

What do you mean by "rely on a single-source Gerrit instance"?

> Git is designed to be distributed, but if our recipe files
> point at other repositories, it largely defeats a lot of this
> capability.  Today, if you want to share a tree that has a change in
> it, you have to fork the main tree, then fork every single subproject
> you've made modifications to, then update the main tree to point to
> your forks.  This gets very onerous over time, especially for simple
> commits.  Having maintained several different companies forks
> personally, and spoken to many others having problems with the same,
> adding major features are difficult to test and rebase because of
> this.  Moving the code to a single tree makes a lot of the toil of
> tagging and modifying local trees a lot more manageable, as a series
> of well-documented git commands (generally git rebase[2]).  It also
> increases the likelihood that someone pulls down the fork to test it
> if it's highly likely that they can apply it to their own tree in a
> single command.

I'm almost certain that nobody would do a git-rebase of their fork even
if everything were in one tree.

Why do we want to improve outside testing of forks?  I don't understand
why that is an advantage.

Most of what you're describing here is making it easier for people to
live without getting their code upstreamed, which doesn't seem like a
good thing to the project as a whole, and seems to actively work against
the title of "to make upstreaming easier".

If you're maintaining a few fixes in a backports tree, maintain them as
Yocto-supported patch files.  If you're maintaining whole features
elsewhere, I have no interest in making this easier for you (if it is
worse in other ways for the project).

> - There would be a reduction in reviews.  Today, anytime a person
> wants to make a change that would involve any part of the tree,
> there's at least 2 code reviews, one for the commit, and one for the
> recipe bump.  Compared to a single tree, this at least doubles the
> number of reviews we need to process.

The typical bump commit is trivial to review and nobody does it except
Andrew G or me anyhow.

You could argue that there is actually some velocity advantage of decoupling
the feature support in the code repo while figuring out the minor
implications at a recipe level separately.  Having to have every code
repo commit pass on hardware before it can get +1 Verified is going to
greatly increase our CI requirements, plus you're going to get lots more
"I don't understand why this failed" because of a flaky Yocto-wget-fetch
or Romulus QEMU failure or ...

We also would need to implement everything we have in code-repo level CI
in the recipe-level CI.  I've spoken to this later as well, but I think
while decreasing the number of reviews it will also decrease the
velocity of the reviews even more.  I don't think that the bump commits
are what are decreasing our velocity here.

> For changes that want to make
> any change to a few subsystems, as is the case when developing a
> feature, they require 2 X <number of project changes> reviews, all of
> which need to be synchronized.  

I think there is a fundamental problem of how we develop features here.
"All of which need to be synchronized" implies that every new feature we
develop has hard co-reqs between repositories.  If that is the case,
something is broken in our architecture.  We shouldn't patch around it
by manipulating the code layout.

> There is a well documented problem
> where we have no official way to synchronize merging of changes to
> userspace applications within a bump without manual human
> intervention.  This would largely render that problem moot.

This is a solved problem in projects with a similar layout to ours, such
as OpenStack.  I've previously offered to add topic-based testing[1] to our
CI framework and, maybe I misread the attitudes, but it didn't seem like
it was interesting to the project.

I _did_ add some amount of dependency-based testing to the repositories
that are included in our base Docker image already.  If you change one
of those repositories, they get included in the Docker image used for
testing your code as well.  This has identified problems in the past
because of, say, a change in sdbusplus that broke phosphor-logging's
compile.

> - It would allow most developers to not need to understand Yocto at
> all to do their day to day work on existing applications.  No more
> "devtool modify", and related SRCREV bumps.  This will help most of
> the new developers on the project with a lower mental load, which will
> mean people are able to ramp up faster..

I think there is something more fundamental going on here which I'll
speak about later in "## End-to-end features"

> - It would give an opportunity for individuals and companies to "own"
> well-supported public forks (ie Redhat) of the codebase, which would
> increase participation in the project overall.  This already happens
> quite a bit, but in practice, the forks that do it squash history,
> making it nearly impossible to get their changes upstreamed from an
> outside entity.

I don't see how this improves the "forks that squash history" situation.
Rebases are hard to pull off with how most companies handle their
internal processes, so the best case here is merge commits.

> - It would centralize the bug databases.  Today, bugs filed against
> sub projects tend to not get answered.  Having all the bugs in
> openbmc/openbmc would help in the future to avoid duplicating bugs
> across projects.

We could do this without changing any code.  Just turn off the issue
tracking on all our code repos[2].

> - Would increase the likelihood that someone contributes a patch,
> especially a patch written by someone else.  If contributing a patch
> was just a matter of cherry-picking a tree of commits and submitting
> it to gerrit, it's a lot more likely that people would do it.

I'm not following why this can't be done today.  Aren't you just
cherry-picking commits at a different level?  

I think there is likely legal apprehension about trying to take code
from a fork that isn't your own and trying to contribute it upstream.
With the current CLA structure, the SOB in "fork/repo" doesn't have the
same meaning legally as our SOB+CLA, so I cannot comfortably add my own
SOB on code I picked up from "fork/repo".  No change to the repository
layout will fix this.

> - Greatly increases the ease with which stats are collected.
> Questions like: How many patches were submitted last year?  How many
> lines of code changed between commit A and commit B?  Where was this
> regression injected (ie git bisect)?  How much of our codebase is C++?
> How many users of the dbus Sensor.Value interface are there?  Are all
> easily answered in one liner git commands once this change is done.

I'm not saying some of these aren't positives, but most of them can
already be answered today with existing tools.  Either github or grok
search or gerrit queries can answer almost all of these except "how many
lines of code changed between A and B", if you're expecting to count all
the code in subordinate repos, but I don't see that as a particularly
interesting question.

> - New features no longer require single-point-of-contact core
> maintainer processes (ie, creating a repo for changes, setting up
> maintainer groups, ect) and can just be submitted as a series of
> patches to openbmc/openbmc.

If the single-point-of-contact is the issue, let's solve that.  I don't
think it is though.  I think the bulk of the issue with new repos is
disagreement on if something belongs in a new repo.  Submitting as a
series makes _that_ situation worse because it doesn't force the
discussion upfront and instead someone is upset they spent a bunch of
time working on a new daemon that is rejected.

> - Tree-wide changes (c++ standard, yocto updates, formatting, ect) are
> much easier to accomplish in a small number of patches, or a series of
> patches that is easy to pull and test as a unit.

Patches that are also much more difficult to revert if they break one
particular area...

Why would we want these pulled together and tested as a unit anyhow?  If
I update the formatting or the C++ standard used of repo A, that doesn't
affect repo B.  

I've been involved in almost every difficult Yocto subtree update and the only
case I can think where we couldn't apply the changes to the older Yocto version
was the OpenSSL3 changes, which we had to #define check around based on an
OpenSSL version they export.  Even if all the code were in one repo
would we have wanted to cram all that into a single "Yocto update plus
fix all the code" commit?  I suspect not.  Doing the #define was
appropriate no matter how the code was laid out.

> - Inclusive guidelines: To make progress toward an unrelated but
> important goal at the same time, I'm recommending that the
> openbmc/master branch will be left as-is, and the newly-created sha1
> will be pushed to the branch openbmc/openbmc:main, to retain peoples
> links to previous commits on master, and retain the exact project
> history while at the same time moving the project to having more
> inclusive naming, as has been documented previously[3].  At some point
> in the future the master branch could be renamed and deprecated, but
> this is considered out of scope for this specific change.

This is a separate topic and should be tackled separately.  I guess it
is simpler if you're pushing all the code into one repo to only deal
with it there.  If this is something we want to emphasize now, I think
the Yocto bits are in place that we could just do it relatively quickly.
The only painful part would be all the existing commits in Gerrit that
are unmerged and targetting `refs/for/master` but we'd have to tackle
that with this proposed move as well.

> - Each individual sub-project will be given a folder within
> openbmc/openbmc based on their current repository name.  While there
> is an opportunity to reorganize in more specific ways (ie, put all
> ipmi-oem handler repos in a folder) this proposal intentionally
> doesn't, under the proposition that once this change is made, any sort
> of folder rearranging will be much easier to accomplish, and to keep
> the scope limited.

At a minimum I'd like these all put into a subdirectory off the root.
It is bad enough with how many meta-layers we have, but we shouldn't
then add a hundred top-level subdirectories for the code.

> - Yocto recipes will be changed to point to their path equivalent, and
> inherit externalsrc bbclass[4].  This workflow is exactly the workflow
> devtool uses to point to local repositories during a "devtool modify",
> so it's unlikely we will have incremental build-consistency issues
> with this approach, as was a concern in the past.

Are you sure this works?  I thought externalsrc required the code to be
in an absolute directory and not a relative one?

    if externalsrc and not externalsrc.startswith("/"):
        bb.error("EXTERNALSRC must be an absolute path")
    if externalsrcbuild and not externalsrcbuild.startswith("/"):
        bb.error("EXTERNALSRC_BUILD must be an absolute path")

The original facebook/openbmc codebase kept all of the code in the repo
and we simply appended the directories to the SRC_URI.  It is somewhat
of a pain to maintain the SRC_URI lists, so maybe externalsrc is better
in that regard.  We also ran into issues with getting lots of
pseudo-abort issues, as if Yocto didn't really support source-in-tree in
the latest code.  In order to avoid the pseudo-abort issues I had to do
this rather ugly hack in our code[3].  I don't know how we sanity test
your proposal to ensure it doesn't have this issue.

> - Subprojects that are intended to be reused outside of OpenBMC (ex
> sdbusplus) will retain their previous commit, history, and trees, such
> that they are usable outside the project.  This is intended to make
> sure that the code that should be reusable by others remains so.

How do we identify all of these?  (I spoke about this in another part of
the chain, so we don't need to expand on it here.)

> - The above intentionally makes no changes to our subtree update
> process, which would remain the same process as is currently.  The
> openbmc-specific autobump job in Jenkins would be disabled considering
> it's no longer required in this approach.

Wouldn't it still be handy for the above-mentioned repos?

> Let me know what you think.

In general, I'm not a fan of mono-repo style.  Both the mono-repo and
micro-repo style have issues.  I think we need to have an adequate
discussion of what the issues are that would be _introduced_ by moving
to a mono-repo in our case as well.  I'm not currently convinced that
this proposal is optimizing in the way that is most beneficial to the
project.

## Reviews

You've mentioned that this will make reviews easier and I think the
opposite is far more likely to be true.  OWNERS + Gerrit is not
sufficient in a mono-repo (and Github doesn't solve this either).

The biggest complaint, as I've heard it, has been the review cycle
velocity.  I myself have ran into sub-repos where it takes weeks to get
a trivial change in because the maintainers just don't stay on top of
it.  This proposal will make the problem exponentially worse.

Our most proficient and active reviewers tend to be OWNERS higher up in
the "merge chain".  In 2022H2, I think the top 5 reviewers handled more
reviews than everyone else put together.  The only way I can stay somewhat on
top of what needs my attention is by having all my Gerrit notifications
going into a folder and deleting them as I've taken action.  With the
current OWNERS I'm already having to delete almost half of these just by
skimming the title (ex. "meta-not-my-company: ..." commits).  If you
start throwing every single commit in Gerrit at me because I matched in
OWNERS, I'll have absolutely no idea how to know what needs my attention and
what doesn't.  How are you planning on handling this as you are a
top-level OWNER as well?

## Out-of-Yocto builds

The biggest impediment to my development tends to be the repositories
that are not using Meson and haven't properly Supported meson
subprojects, because it makes it almost impossible to make changes to
those without invoking Yocto which is a much slower process.  I've
written about this in the past[5].  If all the code is in a single
repository, I fear there will be almost no effort put into supporting
out-of-Yocto builds because at that point, why support them?  I wrote
about the time involved in that post but you're taking activities that
take seconds with Meson subprojects and turning them into 5 minutes.

If we are doing something that slows down the most active developers, we
need to make sure that the increase in contributions from other
developers is going to more than offset it.  Based on the kind of people
affected by the problems you're describing, I'm really not convinced it
will.

## End-to-End Features

You used the concept of an "end-to-end feature" a few times in this
proposal but talking about different things.  I'm going to specifically
talk about a feature that requires changes across multiple existing
repositories.

A bold statement: nobody here actually implements end-to-end features.

I've never witnessed a single person implement a new feature in the
kernel, add userspace support to interact with that kernel work, add
the Redfish APIs to interact with the userspace support they added,
implement the WebUI to control the Redfish APIs, and then wrote system
integration tests in phosphor-test-automation.  We draw somewhat
arbitrary boundaries already and call something "end-to-end" because it
happens to reside within those boundaries.  For a typical feature, for
many developers, that boundary seems to be "Redfish + some other
userspace app(s)".  Admittedly, this proposal does help (a little) with
_that_ particular arbitrary boundary, but it doesn't help with anything
outside of it.

There is a bigger problem that this is exposing though, in my mind at
least: many developers can't or don't work at the component level.

Let's consider a "simple" change that requires:
    - Adding a new DBus API.
    - Adding support in App-A for said DBus API.
    - Adding Redfish support for said DBus API in bmcweb.

You're suggesting that this is hard to accomplish today, end-to-end, and maybe
it is.  But, the fact that anyone is attempting to solve it end-to-end
means that the resulting product is pretty terrible from a future
development and maintenance stand-point:

   * Repositories aren't using meson subproject wrap files which makes
     developing against a change to phosphor-dbus-interfaces trivially
     easy.

   * Developers are not developing the changes to App-A with unit-tests.

   * Developers are not confirming that their changes to App-A are sound
     at a dbus-level, so the code is tightly coupled with however
     they've changed bmcweb to interact with it and often crumbles when
     another application interacts with it (ex. PLDM).

   * The changes to bmcweb have no unit testing and/or mocking of the DBus APIs.

   * There is not a single integration test added to
     phosphor-test-automation to make sure nobody breaks this feature in
     the future.

Combining all the code together completely throws away the necessity to treat
the software as components, which certainly doesn't improve all of the
above.

Your proposal seems to be optimizing for the "I need to hack at code
across a few repos and throw it all together into a BMC binary image so
I can test drive it" case, but I would suggest this case should rarely even
be done by most of our developers.  The fact that this is even a "regular
development case" to begin with is a problem.

Everyone should be able to add a new DBus API and at least 95% of the support in
App-A without ever touching a BMC.  You then have the other 5% that
maybe needs some confirmation on hardware (you mocked out how you
_think_ the hardware behaves already, right?).  Once that is done you
should be able to develop the whole bmcweb feature without touching
hardware as that is _completely_ software-based (mock out all the dbus
interfaces for your new feature please, and please add a test case).  If
all these pieces are working independently, you can, only then, spend a
little time throwing it all into a single image with something like [6]
and test-driving it, but better yet is if you also add code to
phosphor-test-automation.

If there are pieces of what I just wrote above that are difficult today,
let's fix them, but combining all the code into a mono-repo is, to me,
just a band-aid over these problems.

## Tightly coupled software

I've worked on a large BMC-like product that had a mono-repo.  The
result was a tightly-coupled mess of code that was impossible to work
in, so there ended up being arbitrary "component boundaries" put in
place and nobody worked outside their "component boundary" silo.  We
already have a bit of this "silo" mentality here but at least we don't
have the tightly-coupled aspect of it.  We have an architecture that
allows the pieces to, mostly, move independent from each other.  How
would we ensure that a mono-repo doesn't devolve into tightly-coupled
code because the frictions that stop it are removed?

I think a very likely outcome of this proposal is we end up with more
"utility" libraries that are going to increase the coupling and
library-dependency structure, which is worse for performance.

## Reverting and large reviews

I already see quite often large commits even within a single repository,
which are difficult to review and difficult to revert.  It is currently
on the maintainers to push back on these and request smaller commits.
Having the code in a mono-repo seems to encourage cross-package changes
(and in fact was listed as a selling point here), which means more
likely that a bug introduced by one small piece of the change needs to
have the whole change reverted.

This proposal is likely to increase the average size of a commit since
it is more likely to include cross-package changes.  That means we also
need more people to give feedback on an individual change and as a
reviewer I have to sift through all the pieces that aren't even relevant
to me.  Both of these slow the review process down even more, not increase it.

## "To make upstreaming easier"

You started with the topic of making upstreaming easier and I'm really
not convinced that our repo structure is the major impediment to
upstreaming.  Most of the advantages you talked to seemed to be around
_development_ and not _upstreaming_.  Which is it we are solving?  Do we
really have data that indicates upstreaming is hard because we have
multiple repositories?  I don't think I've ever heard this.  I have
heard that development is hard[er].

---

Some of these issues I've raised could certainly be solved by stronger
worded "guidelines" than we currently have in place (and somehow
ensuring they are followed).  I am worried about the overall code-smell that
will come from a monorepo (based on my past experiences with them).

The biggest concern I have is with the [negative] impact to code reviews and
I don't really have any way to solve them except for completely changing our
code review process at the same time.  Something like the Linux maintainer
owned subtree model maybe where a maintainer owns a fork for their part of
the tree and we bless them at a higher level; it doesn't sound appealing.
Maybe there is some better Gerrit tooling than what we're currently
getting from OWNERS or maybe some other review tool.

[1]: https://lore.kernel.org/openbmc/20191119003509.GA80304@patrickw3-mbp.dhcp.thefacebook.com/
[2]: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/disabling-issues
[3]: https://github.com/facebook/openbmc/commit/e418bfbcf382185fdffb768a012e762a4600ae63#diff-b0fea89439c1004ad5229cb7058b4740ee1c542b32cfb0d2165788f9020b5da0R1
[4]: https://github.com/williamspatrick/openbmc-tof-election-data/blob/995f0d73184db7c25446284261cc023af611e7c4/2021H2/data/report.json#L1
[5]: https://www.stwcx.xyz/blog/2021/04/18/meson-subprojects.html
[6]: https://github.com/williamspatrick/dotfiles/commit/df180ac2b74f2b7fcb6ae91302f0211bc49cb2e9
-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-04-06 17:28     ` Patrick Williams
@ 2022-04-06 20:36       ` Benjamin Fair
  2022-04-07  3:26         ` Patrick Williams
  2022-04-07 15:39       ` Ed Tanous
  1 sibling, 1 reply; 21+ messages in thread
From: Benjamin Fair @ 2022-04-06 20:36 UTC (permalink / raw)
  To: Patrick Williams; +Cc: Andrew Jeffery, Ed Tanous, Brad Bishop, OpenBMC Maillist

On Wed, 6 Apr 2022 at 10:29, Patrick Williams <patrick@stwcx.xyz> wrote:
>
> I'll likely respond to the original post with more thoughts later as
> well...
>
> On Wed, Apr 06, 2022 at 08:54:28AM -0700, Ed Tanous wrote:
> > On Tue, Apr 5, 2022 at 7:22 PM Andrew Jeffery <andrew@aj.id.au> wrote:
>
> > >
> > > With the amount of custom userspace we've always kinda sat in-between.
> > > I'd like to see libraries and applications that have use cases outside
> > > of OpenBMC be accessible to people with those external use cases,
> > > without being burdened by understanding the rest of the OpenBMC context.
> > > I have a concern that by integrating things in the way you're proposing
> > > it will lead to more inertia there (e.g. for implementations of
> > > standards MCTP or PLDM (libmctp and libpldm)).
> >
> >
> > I had assumed that libmctp and libpldm fell into the "intended to be
> > used outside the project" category and would retain their own
> > repositories, given that they publish interfaces that are not OpenBMC
> > specific, but lots of things within the project are openbmc-specific,
> > including the daemons that attach both of those libraries to dbus.
> > The only real difference here is that it makes the difference
> > explicit.
>
> It wasn't long ago that the TOF discussed some of these libraries w.r.t.
> "intended to be used outside the project" and we really had trouble
> determining clear language on what classified as this and what did not.

For these repos, we could always retain the source-of-truth as inside
openbmc/openbmc and have automation to subtree sync changes out to
read-only mirrors.

This would make it slightly more difficult for external users of these
libraries to upstream their changes to us, but that's likely worth
reducing the friction of OpenBMC community members contributing to the
libraries (which are the main contributors anyways).

> Actually, neither of these libraries were mentioned, but it was a recipe
> contribution by someone pointing at a non-openbmc github repository.  We
> couldn't really come up with a clear definition but we settled on
> "intended to be used outside the project" recipes that also weren't in
> the openbmc org needed to be submitted upstream to Yocto.
>
> Are we going to be able to come up with a clear definition for this,
> which is actually for code that is _within_ our org?  libpldm, for
> instance, isn't even in a separate repository but covered as part of the
> bigger PLDM with some special build flags for "library only".
>
> > >
> > > On Tue, 5 Apr 2022, at 03:58, Ed Tanous wrote:
> > > > The OpenBMC development process as it stands is difficult for people
> > > > new to the project to understand, which severely limits our ability to
> > > > onboard new maintainers, developers, and groups which would otherwise
> > > > contribute major features to upstream, but don't have the technical
> > > > expertise to do so.  This initiative, much like others before it[1] is
> > > > attempting to reduce the toil and OpenBMC-specific processes of
> > > > passing changes amongst the community, and move things to being more
> > > > like other projects that have largely solved this problem already.
> > >
> > > Can you be more specific about which projects here? Do you have links
> > > to examples?
> >
> > Linux is the primary example I think of, which hosts libraries within
> > it (libbpf, ect) that are meant to be used elsewhere.  u-root, u-bmc
> > are other examples of firmware that put all of their application
> > specific code in a single repository.  As a counter example, openwrt
> > sticks with multiple repositories, but seems to have significantly
> > fewer repositories in total than we do, despite being a much older
> > project.
> >
> > As a side note, one thing I find interesting is that they host staging
> > branches for contributors/maintainers on their main project page.
> > That's a different model than I've seen elsewhere.  Unrelated to this
> > dicussion, but interesting nonetheless.
> > https://git.openwrt.org/
>
> I was going to point to Android and OpenStack as two large open source
> projects, which also use Gerrit, and seem to have no trouble with the
> micro-repo model.

When I contributed to Android in the past, their process was even more
frustrating and difficult to figure out than ours, requiring lots of
purpose-built infrastructure and tools (treehugger bot, repo tool,
etc.). I don't think their model is a good direction to move the
project towards.

>
> > > > To that end, I'd like to propose a change to the way we structure our
> > > > repositories within the project: specifically, putting (almost) all of
> > > > the Linux Foundation OpenBMC owned code into a single repo that we can
> > > > version as a single entity, rather than spreading out amongst many
> > > > repos.  In practice, this would have some significant advantages:
> > > >
> > > > - The tree would be easily shareable amongst the various people
> > > > working on OpenBMC, without having to rely on a single-source Gerrit
> > > > instance.  Git is designed to be distributed, but if our recipe files
> > > > point at other repositories, it largely defeats a lot of this
> > > > capability.  Today, if you want to share a tree that has a change in
> > > > it, you have to fork the main tree, then fork every single subproject
> > > > you've made modifications to, then update the main tree to point to
> > > > your forks.
> > >
> > > This isn't true, as you can add patches in the OpenBMC tree.
> >
> > As most people that have stacked patches can attest to, managing patch
> > files in a meta layer over time is very difficult (unless you meant
> > something else).  Yes, I should not have said "have to", but a number
> > of the forks that I've seen have ended up resorting to that. Example:
> > (https://github.com/opencomputeproject/HWMgmt-MegaRAC-OpenEdition/tree/master/openbmc_modules)
>
> Why is managing patch files difficult?  Is it lack of documentation?

Managing a few patch files for a single machine in a single meta-layer
doesn't have much overhead, but the complexity scales superlinearly
with more machines, distro features that may be on or off, other
meta-layers trying to add patches, etc. The usual "solution" to this
that I see is just avoiding rebasing to newer versions of OpenBMC,
which makes upstreaming patches even more difficult in a vicious
cycle.

>
> Upstream Yocto deals with patch files all the time.  The Facebook BMC
> tree has machines in production that are still on Rocko-based Yocto
> distributions and we have plenty of patch files we support there.

Supporting patches for older versions of upstream isn't as difficult,
but the project would benefit more if we make it easier for downstream
users to stay as close to upstream as possible. This would make them
more likely to contribute these patches back upstream to us too.

>
> I wouldn't be surprised if Yocto doesn't already have tools to simplify
> "git-format-patch" -> "package.bbappend" workflow, but if they don't I
> could probably write something.

"devtool modify" -> "devtool finish" does this workflow, but I've seen
it fail in subtle, difficult to debug ways many times before (although
to be fair, it has gotten more reliable in the last year or so).

>
> For what it is worth, I'm currently doing a change for sdbusplus that
> requires fixes across tens of repositories (as I wrote about recently).
> I pretty quickly hacked up this shell function in order to automatically
> update my OpenBMC tree with commits from another repo that I've pushed
> to Gerrit already.
>
> https://github.com/williamspatrick/dotfiles/commit/df180ac2b74f2b7fcb6ae91302f0211bc49cb2e9
>
> I don't see using 'git-format-patch' to create the patchfile instead as
> too much additional effort.
>
> > > CI prevents these from being submitted, as it should, but there's nothing to
> > > stop anyone using the `devtool modify ...` / `devtool finish ...` and
> > > committing the result as a workflow to exchange state (I do this)?
> > >
> > > Is the issue instead with devtool? Is it bad? Is the learning curve too steep?
> > > It is at least the Yocto workflow.
> >
> > devtool provides just one form of friction;  There are also a number
> > of cases where devtool modify and devtool finish fail in non obvious
> > ways (usually due to some not-quite-optimal yocto handling in a meta
> > layer, or patches being distributed across meta layers).  The biggest
> > key is that it's yet another tool that seasoned firmware developers
> > have to learn to jump into our codebase.  Each tool adds some friction
> > compared to if it just didn't exist.  It also adds the "which recipe
> > do I need to devtool to modify the webui?" type trouble that people
> > have talked about many times.
>
> Do we have pointers to when devtool fails?  The only time I've seen it
> are for recipes that aren't in the rootfs image: kernel and u-boot and
> they've all been due to bugs in the image.bbclass on our part.  There
> was actually a fix to one of the u-boot recipes very recently.
>
> I personally don't use devtool all that much, but when I do I want it to
> point at the central "workspace" of all the openbmc repos I already have
> so I can get it to pick up code I already have in progress there.  Do we
> need better documentation around those workflows?

That's the usual way I use devtool too, but note that using it this
way prevents "devtool finish" from working properly.

Also cloning all the OpenBMC repos in a centralized workspace and then
pointing the OpenBMC recipes at them is the exact workflow that Ed is
proposing, just in an automated fashion and without the toil of having
to sync and generate patch files from these repos individually.

>
> (At one point there was a statement made that we didn't want tooling
> written to assist with one workflow or another.  This was somewhat made
> in reference to the `setup` script, but I think it had extensions that
> made it that any workflow-related tools people have are hosted in their
> own personal spaces and not talked about.  Maybe we need to change this
> mentality.)

Agreed. I think we should at least have a small number of
well-supported workflows documented that people can choose from - Ed's
proposal here creates an obvious choice of recommended workflow
though.

>
> > > > This gets very onerous over time, especially for simple
> > > > commits.  Having maintained several different companies forks
> > > > personally, and spoken to many others having problems with the same,
> > > > adding major features are difficult to test and rebase because of
> > > > this.  Moving the code to a single tree makes a lot of the toil of
> > > > tagging and modifying local trees a lot more manageable, as a series
> > > > of well-documented git commands (generally git rebase[2]).  It also
> > > > increases the likelihood that someone pulls down the fork to test it
> > > > if it's highly likely that they can apply it to their own tree in a
> > > > single command.
> > >
> > > Again, this is moot if the patches are applied in-tree.
> >
> > Meta layer patch files in my experience tend to not layer well, and
> > require a good amount of maintenance.  They also have problems where
> > they're not versioned against a git base, so there's no guarantees of
> > where in the history the patches were forked from, and whether they
> > apply to your tree, or if they fail, what patches likely caused them
> > to fail.  Admittedly, tracking them in git isn't perfect either, but
> > at least it publishes "this is the source base these were based on" to
> > give some indication.  In practice, the public forks I've seen just
> > embed the custom meta layer within an openbmc tree to solve this
> > problem.
> > https://github.com/Intel-BMC/openbmc/tree/intel/meta-openbmc-mods
> > https://github.com/HewlettPackard/openbmc
>
> I could entirely be misunderstand what problem you're pointing out here.
> Why would it matter "where in this history the patches were forked
> from"?  Aren't they forked from whatever the SRCREV in the recipe says?
> It is on the maintainer of the meta-layer to ensure they apply to that
> revision.

.bbappend files don't list their target SRCREV for the patch files
(they can of course, but this is well known to be an anti-pattern
which is extremely difficult to maintain)

>
> Having all the code in one repo doesn't give you any more visibility as
> to where the code was "forked from".  It ends up being exactly the same
> as a patch file except that the patch file has been "applied" already to
> the code.  You still don't have visibility to the underlying upstream
> commit number.  And, I would suspect it is going to be even worse
> because you're going to end up with back and forth merge commits trying
> to pick up the latest upstream code in these forks.  You're not going to
> have a nice git-submodule number indicating openbmc/openbmc was from
> here.

"git merge-base" shows this information trivially

In my experience, forks usually use "rebase" rather than "merge", so
they wouldn't have lots of extraneous merge commits.

>
> In my opinion, this is a problem with how people maintaining these trees
> are doing it and not a problem with how our code is organized.  In the
> facebook/openbmc tree we use git-submodules internally to hold the
> upstream trees.  For our github-side we have a script that updates them,
> but there isn't a strong reason we couldn't do the same git-submodule
> layout there (I think it is due to a deficiency in the way our
> internal<=>external mirror tool works).
>
>     https://github.com/facebook/openbmc/blob/helium/yocto_repos.sh
>
> If people are not treating the openbmc/openbmc tree as an unchanged
> blob, that is their fault and not ours.  Having all the source imported
> in one repo doesn't really solve this either and in fact is likely to
> make it worse because you now _can't_ treat openbmc/openbmc as an
> unchanged blob because you're going to have to patch-in-tree any changes
> to code you want to make.

I don't quite follow the exact workflow you're using here, but it
seems like a single "git rebase" command could handle this in a
simpler way, and be more likely to automatically resolve merge
conflicts.

>
> > I also think that having
> > one or a smaller number of reviews would concentrate a lot of the
> > discussion when we make treewide changes.  (OWNERS files, ect)  When
> > they get distributed among many reviews, in my experience it tends to
> > dilute the discussion a bit.
>
> I would argue this is actually a bad thing.  We're going to more likely
> end up with large cross-repository commits that are harder to review,
> require more people to review them (a larger set of OWNERS), and are
> harder to revert.

Once a large-scale change has been agreed on, the exact implementation
of this change on the individual repos shouldn't need much review or
discussion. Owners for individual repos would be able to approve it
pretty easily, and a project-level owner could always give approval
too.

>
> If there is larger discussion to be had that should probably happen on
> the mailing list anyhow.
>
> > > > - It would give an opportunity for individuals and companies to "own"
> > > > well-supported public forks (ie Redhat) of the codebase, which would
> > > > increase participation in the project overall.  This already happens
> > > > quite a bit, but in practice, the forks that do it squash history,
> > > > making it nearly impossible to get their changes upstreamed from an
> > > > outside entity.
> > >
> > > Not sure this is something we want to encourage, even if it happens in
> > > practice.
> >
> > I think when done properly, it would be a huge help to the project.
> > My main point is that this isn't something we can stop (companies and
> > individuals have and will continue to do it anyway), so would we
> > rather make their changes easier to ingest back to upstream?
>
> In my experience the difficulties with upstreaming are not related to
> logistics of sending patches to Gerrit.  They are related to the effort
> involved with getting other people in the community bought into what you're
> trying to do.  Having all the code in one place alleviates 1% of the
> upstreaming effort while doing nothing for the remaining 99%.

I strongly disagree about this apportionment of effort. For people
like us that are familiar with open source processes and procedures,
the logistics of sending patches seems minimal and we focus instead on
the challenges of forming consensus. But for people who have never
participated in an open source community before, just figuring out
where to push code for review, how exactly to use Gerrit, etc. are
significant blockers. Anything we can do to reduce this friction would
make it more likely that new people will contribute.

>
> > > > - It would centralize the bug databases.  Today, bugs filed against
> > > > sub projects tend to not get answered.
> > >
> > > Do you have some numbers handy?
> >
> > I do not.  I can say that anecdotally the "you filed this bug against
> > the wrong project" happens quite often in the repositories I maintain,
> > and the lack of reasonable cross project "transfer the bug" semantics
> > makes this difficult (yes, admins can transfer bugs cross project, but
> > I'm pretty sure we don't want to call on core maintainers every time
> > we want to move things around.)  It would be quite helpful to the
> > project to have less than N bug trackers (might not necessarily be
> > one) to increase the odds that someone searches for and finds their
> > bug before filing a duplicate.
>
> "bugs filed against a sub project tend to not get answered" and
> "bugs are filed against the wrong project" are different problems;
> you've shifted the discussion.
>
> The first is a problem against our maintainers.  Having all the issues
> in one repository doesn't improve that situation, but it actually makes
> it worse because you're going to shift that burden on a few individuals
> who pay attention to openbmc/openbmc issues while doing nothing to fix
> the certain-maintainers-don't-respond issue.

As a maintainer or interested community member, it's much easier to
look through the open issues on a single repo than when distributed
across many other less-active repos.

It's also not always clear exactly which repo is at fault for a bug,
and limiting the discussion to people looking at a particular repo may
miss out on the relevant knowledge of people looking at other repos
instead.

>
> > > > Having all the bugs in
> > > > openbmc/openbmc would help in the future to avoid duplicating bugs
> > > > across projects.
> > >
> > > Has this actually been a problem?
> >
> > Duplication?  It happens from time to time.  Not being able to search
> > for a bug across the project happens a lot, and in our current model,
> > requires the user to know which component they are filing the bug
> > against.
>
> Why can't people search bugs or code at an org-level?  I do it all the time.

GitHub search isn't the best. I'd much rather be able to easily search
through all the code in the project from the command line using
something like `git grep`.

This is especially true if you're looking at historical commits, since
you could check out a single interesting SRCREV of openbmc/openbmc and
have the corresponding versions of all the userspace daemons available
to grep through.

>
> --
> Patrick Williams

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-04-06 20:36       ` Benjamin Fair
@ 2022-04-07  3:26         ` Patrick Williams
  0 siblings, 0 replies; 21+ messages in thread
From: Patrick Williams @ 2022-04-07  3:26 UTC (permalink / raw)
  To: Benjamin Fair; +Cc: Andrew Jeffery, Ed Tanous, Brad Bishop, OpenBMC Maillist

[-- Attachment #1: Type: text/plain, Size: 13094 bytes --]

On Wed, Apr 06, 2022 at 01:36:26PM -0700, Benjamin Fair wrote:
> On Wed, 6 Apr 2022 at 10:29, Patrick Williams <patrick@stwcx.xyz> wrote:

> > I was going to point to Android and OpenStack as two large open source
> > projects, which also use Gerrit, and seem to have no trouble with the
> > micro-repo model.
> 
> When I contributed to Android in the past, their process was even more
> frustrating and difficult to figure out than ours, requiring lots of
> purpose-built infrastructure and tools (treehugger bot, repo tool,
> etc.). I don't think their model is a good direction to move the
> project towards.

I'm not versed in their process enough to know if it is "good" or "bad".
All I'm saying is that they seem to be a pretty successful project while
not going to a monorepo, so maybe there is something we can learn from
them?

> > Why is managing patch files difficult?  Is it lack of documentation?
> 
> Managing a few patch files for a single machine in a single meta-layer
> doesn't have much overhead, but the complexity scales superlinearly
> with more machines, distro features that may be on or off, other
> meta-layers trying to add patches, etc. The usual "solution" to this
> that I see is just avoiding rebasing to newer versions of OpenBMC,
> which makes upstreaming patches even more difficult in a vicious
> cycle.

I'm still missing why it is considered hard.  I say this as someone who
helps maintain this fork:

    https://github.com/facebook/openbmc/tree/helium/meta-facebook/meta-backports

We have 29 machines across 4 different distro versions.  We try very
hard to avoid making per-machine patches and when we do we try to
enumerate them based on a system feature and not a machine type (ie.
multihost and not Yv2).

How does having code in a monorepo help unless you're making
per-machine branches?  If you aren't doing per-machine branches you then
have code changes that affect all other machines and they must all be
stuck on the same distro version?  If you're making per-machine branches,
then I don't see how the "complexity scales superlinearly" because it
literally scales linearly per machine.

> > Upstream Yocto deals with patch files all the time.  The Facebook BMC
> > tree has machines in production that are still on Rocko-based Yocto
> > distributions and we have plenty of patch files we support there.
> 
> Supporting patches for older versions of upstream isn't as difficult,
> but the project would benefit more if we make it easier for downstream
> users to stay as close to upstream as possible. This would make them
> more likely to contribute these patches back upstream to us too.

I agree it is slightly more work to rebase your patches at the same time
you update your distro, but only slightly so.  If the code is in a
monorepo, your "patch" is going to fail the rebase and you need to fix
it up.  If the code is a patch-file it is going to fail the build and
you need to fix it up.  It is a relatively straight-forward script to take
all your `SRC_URI += "foo.patch"` bbappend files, apply them into the
current checkout at `SRCREV`, rebase to the new `SRCREV`, and then
`git-format-patch` to regenerate "foo.patch".  The rebase conflicts are
exactly the same as a monorepo too.

(I agree this is work for someone to do if it is interesting to them and
I'm not trying to diminish that.  I just don't think we should rip up
our whole development process to satisfy this use-case which _could_ be
solved in a shell script).

> > I wouldn't be surprised if Yocto doesn't already have tools to simplify
> > "git-format-patch" -> "package.bbappend" workflow, but if they don't I
> > could probably write something.
> 
> "devtool modify" -> "devtool finish" does this workflow, but I've seen
> it fail in subtle, difficult to debug ways many times before (although
> to be fair, it has gotten more reliable in the last year or so).

Right.  I honestly don't use devtool tons partially because I work on
multiple machines and purposefully have separate build directories for
each.  devtool keeps the state in the build directory, so I can't apply
a chain across N machines and build them all easily.  Patch files or
`nobranch=1` works better for my typical workflow and I like being able
to `git status` to see all my hacks for testing.

> > I personally don't use devtool all that much, but when I do I want it to
> > point at the central "workspace" of all the openbmc repos I already have
> > so I can get it to pick up code I already have in progress there.  Do we
> > need better documentation around those workflows?
> 
> That's the usual way I use devtool too, but note that using it this
> way prevents "devtool finish" from working properly.

Interesting.  Is this something we should work upstream to fix?

> Also cloning all the OpenBMC repos in a centralized workspace and then
> pointing the OpenBMC recipes at them is the exact workflow that Ed is
> proposing, just in an automated fashion and without the toil of having
> to sync and generate patch files from these repos individually.

Fair, but again, why rip up our entire development process for what is 3
lines in a shell alias?

    https://github.com/williamspatrick/dotfiles/blob/master/env/30_linux/lfopenbmc.zsh#L26

> > I could entirely be misunderstand what problem you're pointing out here.
> > Why would it matter "where in this history the patches were forked
> > from"?  Aren't they forked from whatever the SRCREV in the recipe says?
> > It is on the maintainer of the meta-layer to ensure they apply to that
> > revision.
> 
> .bbappend files don't list their target SRCREV for the patch files
> (they can of course, but this is well known to be an anti-pattern
> which is extremely difficult to maintain)

Agreed, but the .bb file does.  You have the original .bb file
somewhere, right?  And then you rebase against openbmc and you get
new ones.  The SRCREV in ORIG_HEAD is what the patches were based on.

> > Having all the code in one repo doesn't give you any more visibility as
> > to where the code was "forked from".  It ends up being exactly the same
> > as a patch file except that the patch file has been "applied" already to
> > the code.  You still don't have visibility to the underlying upstream
> > commit number.  And, I would suspect it is going to be even worse
> > because you're going to end up with back and forth merge commits trying
> > to pick up the latest upstream code in these forks.  You're not going to
> > have a nice git-submodule number indicating openbmc/openbmc was from
> > here.
> 
> "git merge-base" shows this information trivially
> 
> In my experience, forks usually use "rebase" rather than "merge", so
> they wouldn't have lots of extraneous merge commits.

I've never seen a fork use `rebase`.  Can you point to one that does?

People don't usually do that with their fork because:

    - they have hundreds of commits which went through a review in their
      fork's review system

    - doing a rebase of a branch is effectively the same as a history
      re-write

    - after the rebase they've "fixed" any number of these commits due
      to conflicts and have no way to send them back through review.

Every fork I've ran across either squashes upstream content into a
single commit or does a merge commit of upstream with all their conflict
resolution done in the squash or merge commit itself.

> > In my opinion, this is a problem with how people maintaining these trees
> > are doing it and not a problem with how our code is organized.  In the
> > facebook/openbmc tree we use git-submodules internally to hold the
> > upstream trees.  For our github-side we have a script that updates them,
> > but there isn't a strong reason we couldn't do the same git-submodule
> > layout there (I think it is due to a deficiency in the way our
> > internal<=>external mirror tool works).
> >
> >     https://github.com/facebook/openbmc/blob/helium/yocto_repos.sh
> >
> > If people are not treating the openbmc/openbmc tree as an unchanged
> > blob, that is their fault and not ours.  Having all the source imported
> > in one repo doesn't really solve this either and in fact is likely to
> > make it worse because you now _can't_ treat openbmc/openbmc as an
> > unchanged blob because you're going to have to patch-in-tree any changes
> > to code you want to make.
> 
> I don't quite follow the exact workflow you're using here, but it
> seems like a single "git rebase" command could handle this in a
> simpler way, and be more likely to automatically resolve merge
> conflicts.

As I mentioned above, we have 29 machines targeting 4 distro versions all
in a single branch.  When we make a fix to the common bits they go into
all 29 machines.  When we make a fix for one distro version it affects
all the machines targeting that version.  When we add a new distro version,
we test the existing machines on it and flip a switch to move them to the
new one individually.

    https://github.com/facebook/openbmc/blob/helium/openbmc-init-build-env#L16

> > In my experience the difficulties with upstreaming are not related to
> > logistics of sending patches to Gerrit.  They are related to the effort
> > involved with getting other people in the community bought into what you're
> > trying to do.  Having all the code in one place alleviates 1% of the
> > upstreaming effort while doing nothing for the remaining 99%.
> 
> I strongly disagree about this apportionment of effort. For people
> like us that are familiar with open source processes and procedures,
> the logistics of sending patches seems minimal and we focus instead on
> the challenges of forming consensus. But for people who have never
> participated in an open source community before, just figuring out
> where to push code for review, how exactly to use Gerrit, etc. are
> significant blockers. Anything we can do to reduce this friction would
> make it more likely that new people will contribute.

I was certainly putting a distinction between MyFirstCommit and "I'm
trying to get feature X upstreamed" and my 99/1% was for the latter.

Don't we have a few MyFirstCommit tutorials already?  There are very few
people contributing that aren't covered by a CCLA, which means in
addition to the public tutorials we have they probably have an internal
one at their company and an on-boarding buddy that can mentor them
through MyFirstCommit.  

Even with the documentation we have people still make PRs in Github and
try to send us code without any intention of signing a CLA.  With the
CLA requirement, I don't see how it is likely for us to ever get
"drive-by" commits.

Again, changing our whole process to solve MyFirstCommit is overkill.

Going back to the title of this email chain it was "... to make
upstreaming easier".  I completely agree that the micro-repo model makes
_development_ harder as you have multiple repos in play compared to the
mono-repo model.  In order to develop your code you had to work this
out.  Once we've set that as a baseline, upstreaming is simply "sending
my code to Gerrit and dealing with the code review fallout".  I don't
see how running `scp openbmc.gerrit:hooks/commit-msg $(git rev-parse \
--show-toplevel)/.git/hooks/` on one or more repos is so much more work.
As far as upstreaming goes this is nearly the only difference between the
two models (and communicating your co-reqs requirements to the
maintainers is probably the only other difference).  I still stand by my
roughly 99/1% worth of effort breakdown.

> > "bugs filed against a sub project tend to not get answered" and
> > "bugs are filed against the wrong project" are different problems;
> > you've shifted the discussion.
> >
> > The first is a problem against our maintainers.  Having all the issues
> > in one repository doesn't improve that situation, but it actually makes
> > it worse because you're going to shift that burden on a few individuals
> > who pay attention to openbmc/openbmc issues while doing nothing to fix
> > the certain-maintainers-don't-respond issue.
> 
> As a maintainer or interested community member, it's much easier to
> look through the open issues on a single repo than when distributed
> across many other less-active repos.
> 
> It's also not always clear exactly which repo is at fault for a bug,
> and limiting the discussion to people looking at a particular repo may
> miss out on the relevant knowledge of people looking at other repos
> instead.

Understood.  We could turn off issues on most/all code repos if that is
really a hangup.  I'd have no problem with that*.

(*) There are probably a few repos we might NOT want to turn off issues
    on because they're pretty noisy and leverage issues a lot, or
    because they are the "used outside OpenBMC" repos, but that is an
    easy discussion to have.

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-04-06 17:28     ` Patrick Williams
  2022-04-06 20:36       ` Benjamin Fair
@ 2022-04-07 15:39       ` Ed Tanous
  2022-04-08 21:36         ` Patrick Williams
  1 sibling, 1 reply; 21+ messages in thread
From: Ed Tanous @ 2022-04-07 15:39 UTC (permalink / raw)
  To: Patrick Williams; +Cc: Andrew Jeffery, OpenBMC Maillist, Brad Bishop

On Wed, Apr 6, 2022 at 10:28 AM Patrick Williams <patrick@stwcx.xyz> wrote:
>
> I'll likely respond to the original post with more thoughts later as
> well...
>
> On Wed, Apr 06, 2022 at 08:54:28AM -0700, Ed Tanous wrote:
> > On Tue, Apr 5, 2022 at 7:22 PM Andrew Jeffery <andrew@aj.id.au> wrote:
>
> > >
> > > With the amount of custom userspace we've always kinda sat in-between.
> > > I'd like to see libraries and applications that have use cases outside
> > > of OpenBMC be accessible to people with those external use cases,
> > > without being burdened by understanding the rest of the OpenBMC context.
> > > I have a concern that by integrating things in the way you're proposing
> > > it will lead to more inertia there (e.g. for implementations of
> > > standards MCTP or PLDM (libmctp and libpldm)).
> >
> >
> > I had assumed that libmctp and libpldm fell into the "intended to be
> > used outside the project" category and would retain their own
> > repositories, given that they publish interfaces that are not OpenBMC
> > specific, but lots of things within the project are openbmc-specific,
> > including the daemons that attach both of those libraries to dbus.
> > The only real difference here is that it makes the difference
> > explicit.
>
> It wasn't long ago that the TOF discussed some of these libraries w.r.t.
> "intended to be used outside the project" and we really had trouble
> determining clear language on what classified as this and what did not.
> Actually, neither of these libraries were mentioned, but it was a recipe
> contribution by someone pointing at a non-openbmc github repository.  We
> couldn't really come up with a clear definition but we settled on
> "intended to be used outside the project" recipes that also weren't in
> the openbmc org needed to be submitted upstream to Yocto.
>
> Are we going to be able to come up with a clear definition for this,
> which is actually for code that is _within_ our org?  libpldm, for
> instance, isn't even in a separate repository but covered as part of the
> bigger PLDM with some special build flags for "library only".

Absolutely we should have clearer definitions.  From my perspective it
comes down to is, "are the maintainers willing to maintain this as a
generic usable library?".  Agreed, libpldm (assuming it's intended to
be used outside the project) being within a repo with openbmc-specific
stuff isn't a great model, and we should try to improve in that
regard.  Had we implemented this proposal originally, I suspect that
libpldm would not have evolved that way in the first place.


>
> > >
> > > On Tue, 5 Apr 2022, at 03:58, Ed Tanous wrote:
> > > > The OpenBMC development process as it stands is difficult for people
> > > > new to the project to understand, which severely limits our ability to
> > > > onboard new maintainers, developers, and groups which would otherwise
> > > > contribute major features to upstream, but don't have the technical
> > > > expertise to do so.  This initiative, much like others before it[1] is
> > > > attempting to reduce the toil and OpenBMC-specific processes of
> > > > passing changes amongst the community, and move things to being more
> > > > like other projects that have largely solved this problem already.
> > >
> > > Can you be more specific about which projects here? Do you have links
> > > to examples?
> >
> > Linux is the primary example I think of, which hosts libraries within
> > it (libbpf, ect) that are meant to be used elsewhere.  u-root, u-bmc
> > are other examples of firmware that put all of their application
> > specific code in a single repository.  As a counter example, openwrt
> > sticks with multiple repositories, but seems to have significantly
> > fewer repositories in total than we do, despite being a much older
> > project.
> >
> > As a side note, one thing I find interesting is that they host staging
> > branches for contributors/maintainers on their main project page.
> > That's a different model than I've seen elsewhere.  Unrelated to this
> > dicussion, but interesting nonetheless.
> > https://git.openwrt.org/
>
> I was going to point to Android and OpenStack as two large open source
> projects, which also use Gerrit, and seem to have no trouble with the
> micro-repo model.

Android had to invent their own tool (repo
https://gerrit.googlesource.com/git-repo/) to handle this for that
level of scale.  As a possible pivot on my proposal, if the suggestion
is to use repo at the openbmc/openbmc level to check out all the
repositories, and enforce some structure on related-reviews, I'm open
to the possibility.

FWIW, we (Google BMC) use the repo tool internally for managing our
meta layers, and in the current model, and while it does make things
better, it just adds one more level of inconsistent dev experience (If
doing X, use repo, if doing Y, use devtool, if doing Z, use gerrit
directly).  It might be a possible solution, but it feels like a half
measure to me.  One important thing that I think we lose if we go to
repo is the idea of "I'm doing something new, but I'm not far enough
along to request a repository, where should I push the code?".  This
question has blocked the upstreaming of many (intentionally unnamed to
protect the innocent) projects, some of which eventually did make it
into upstream, but late, and in a less open way than would've been
possible if there was an easily accessible, dev-pushable answer.

I don't have a lot of knowledge of the openstack dev process.  Is it
similar to Androids?


>
> > > > To that end, I'd like to propose a change to the way we structure our
> > > > repositories within the project: specifically, putting (almost) all of
> > > > the Linux Foundation OpenBMC owned code into a single repo that we can
> > > > version as a single entity, rather than spreading out amongst many
> > > > repos.  In practice, this would have some significant advantages:
> > > >
> > > > - The tree would be easily shareable amongst the various people
> > > > working on OpenBMC, without having to rely on a single-source Gerrit
> > > > instance.  Git is designed to be distributed, but if our recipe files
> > > > point at other repositories, it largely defeats a lot of this
> > > > capability.  Today, if you want to share a tree that has a change in
> > > > it, you have to fork the main tree, then fork every single subproject
> > > > you've made modifications to, then update the main tree to point to
> > > > your forks.
> > >
> > > This isn't true, as you can add patches in the OpenBMC tree.
> >
> > As most people that have stacked patches can attest to, managing patch
> > files in a meta layer over time is very difficult (unless you meant
> > something else).  Yes, I should not have said "have to", but a number
> > of the forks that I've seen have ended up resorting to that. Example:
> > (https://github.com/opencomputeproject/HWMgmt-MegaRAC-OpenEdition/tree/master/openbmc_modules)
>
> Why is managing patch files difficult?  Is it lack of documentation?

A few reasons:
1. When patches conflict, resolving the rebase conflicts is a manual,
difficult process.
2. Many devs aren't familiar with the available tools (given the
overflow of documentation) and can sometimes resort to rewriting patch
files by hand because they're nor familiar with the relevant tooling.
3. Patches sometimes apply to one system, sometimes apply to all
systems, and generally cause a complex tree of dependencies.  If Yocto
has a patch file, then openbmc has the same patch file, then a private
meta layer has the same patch file, there's no mechanism for "patch
already applied to upstream" like git rebase has.  If certain systems
have pinned a given version of a subproject because of a regression,
there's no easy way to apply different versions of patch files pre and
post regression.
4. Patches could be applied at vendor-level, platform-level, SOC
level, or distro-level, and explaining all of these concepts to new
engineers is difficult.  In practice, most will opt for checking it
into the lowest friction place.

No, I don't believe it's lack of documentation.  Yocto documents
workflows very well, there's just way too much of it to expect anyone
to read it in that level of depth.


>
> Upstream Yocto deals with patch files all the time.  The Facebook BMC
> tree has machines in production that are still on Rocko-based Yocto
> distributions and we have plenty of patch files we support there.

As a design pattern, I don't think we can recommend everyone stay on a
5 year yocto + kernel release.  I get that it works for FB, and that's
their prerogative (seriously, no judgement at all, I understand), but
there are features in the newer kernels (SOC support, kernel support)
that the project would like to rely on, not to mention security
improvements.

Also to some degree this is proving my point, if an OpenBMC
application wanted to rely on a feature that's only present in the 5.1
kernel, but there are systems on 4.7 in the tree, we'd need to
backport patches of the feature;  That gets more onerous.

>
> I wouldn't be surprised if Yocto doesn't already have tools to simplify
> "git-format-patch" -> "package.bbappend" workflow, but if they don't I
> could probably write something.

I think it's supposed to be bitbake finish, but from our experience it
doesn't handle scale very well.

If we write another tool, then we're back to "tool that you need to
know exists to use".  git is a tool that exists, and has these
features already;  We should avoid writing openbmc-specific tools
where we can.


>
> For what it is worth, I'm currently doing a change for sdbusplus that
> requires fixes across tens of repositories (as I wrote about recently).
> I pretty quickly hacked up this shell function in order to automatically
> update my OpenBMC tree with commits from another repo that I've pushed
> to Gerrit already.

Yes, you, me, or the other maintainers can hack up shell scripts to do
what we need in short order.  This doesn't help more junior developers
for which coding doesn't flow as naturally.  Also, if we, the core
maintainers, are having to hack up shell scripts every time we want to
do some kind of analysis/changes, that's not an efficient use of our
time either.

>
> https://github.com/williamspatrick/dotfiles/commit/df180ac2b74f2b7fcb6ae91302f0211bc49cb2e9
>
> I don't see using 'git-format-patch' to create the patchfile instead as
> too much additional effort.
>
> > > CI prevents these from being submitted, as it should, but there's nothing to
> > > stop anyone using the `devtool modify ...` / `devtool finish ...` and
> > > committing the result as a workflow to exchange state (I do this)?
> > >
> > > Is the issue instead with devtool? Is it bad? Is the learning curve too steep?
> > > It is at least the Yocto workflow.
> >
> > devtool provides just one form of friction;  There are also a number
> > of cases where devtool modify and devtool finish fail in non obvious
> > ways (usually due to some not-quite-optimal yocto handling in a meta
> > layer, or patches being distributed across meta layers).  The biggest
> > key is that it's yet another tool that seasoned firmware developers
> > have to learn to jump into our codebase.  Each tool adds some friction
> > compared to if it just didn't exist.  It also adds the "which recipe
> > do I need to devtool to modify the webui?" type trouble that people
> > have talked about many times.
>
> Do we have pointers to when devtool fails?  The only time I've seen it
> are for recipes that aren't in the rootfs image: kernel and u-boot and
> they've all been due to bugs in the image.bbclass on our part.  There
> was actually a fix to one of the u-boot recipes very recently.

We don't have patch files upstream (by and large), so we don't tend to
hit these failure modes in public settings.  I can try to come up with
some contrived examples if you want, but they generally involve some
form of patch conflict, which devtool can't resolve so it crashes out.

>
> I personally don't use devtool all that much, but when I do I want it to
> point at the central "workspace" of all the openbmc repos I already have
> so I can get it to pick up code I already have in progress there.  Do we
> need better documentation around those workflows?

When you do this, how do you manage if you're working on more than one
cross repo feature at a time?  Maybe there's some workflow that I'm
not aware of?

FWIW, the central workspace idea tends to not work too well for me,
but I make do, but maybe that's just a difference in style.

>
> (At one point there was a statement made that we didn't want tooling
> written to assist with one workflow or another.  This was somewhat made
> in reference to the `setup` script, but I think it had extensions that
> made it that any workflow-related tools people have are hosted in their
> own personal spaces and not talked about.  Maybe we need to change this
> mentality.)

Might be reasonable, although I would point out that workflow scripts
still require the user to know they exist.  If we're talking about
workflow scripts around "devtool all openbmc recipes" and "roll up
openbmc changes into review and send", that gets us closer in the "how
do I open reviews" regard, but I'm not sure that's the right approach.

>
> > > > This gets very onerous over time, especially for simple
> > > > commits.  Having maintained several different companies forks
> > > > personally, and spoken to many others having problems with the same,
> > > > adding major features are difficult to test and rebase because of
> > > > this.  Moving the code to a single tree makes a lot of the toil of
> > > > tagging and modifying local trees a lot more manageable, as a series
> > > > of well-documented git commands (generally git rebase[2]).  It also
> > > > increases the likelihood that someone pulls down the fork to test it
> > > > if it's highly likely that they can apply it to their own tree in a
> > > > single command.
> > >
> > > Again, this is moot if the patches are applied in-tree.
> >
> > Meta layer patch files in my experience tend to not layer well, and
> > require a good amount of maintenance.  They also have problems where
> > they're not versioned against a git base, so there's no guarantees of
> > where in the history the patches were forked from, and whether they
> > apply to your tree, or if they fail, what patches likely caused them
> > to fail.  Admittedly, tracking them in git isn't perfect either, but
> > at least it publishes "this is the source base these were based on" to
> > give some indication.  In practice, the public forks I've seen just
> > embed the custom meta layer within an openbmc tree to solve this
> > problem.
> > https://github.com/Intel-BMC/openbmc/tree/intel/meta-openbmc-mods
> > https://github.com/HewlettPackard/openbmc
>
> I could entirely be misunderstand what problem you're pointing out here.
> Why would it matter "where in this history the patches were forked
> from"?

Because the patches were applied and tested against a particular
version of the tree.  If you change the tree they're based on, you
generally invalidate your testing, which hopefully pops up in a merge
conflict, but doesn't always.

>  Aren't they forked from whatever the SRCREV in the recipe says?
> It is on the maintainer of the meta-layer to ensure they apply to that
> revision.
>

I'm not quite following, SRCREV would be in the upstream openbmc bb
file.  If meta layers bbappend now needed to pin every version of
every package they patch, we're back to a lot of toil to maintain.
Technically to get a consistent set, you'd have to put in the meta
layer patch every SRCREV of every openbmc package, considering that
changes get made often that have cross project consequences.

> Having all the code in one repo doesn't give you any more visibility as
> to where the code was "forked from".  It ends up being exactly the same
> as a patch file except that the patch file has been "applied" already to
> the code.

I'm not quite following your statement here, so I'm not sure I
understand it.  If I had a meta layer

meta-ed/
meta-ed/meta-phosphor/bmcweb_%.bbappend
meta-ed/meta-phosphor/bmcweb/0001-GreatestEver.patch

How would I know what version openbmc that patch applies and was tested against?


>  You still don't have visibility to the underlying upstream
> commit number.  And, I would suspect it is going to be even worse
> because you're going to end up with back and forth merge commits trying
> to pick up the latest upstream code in these forks.  You're not going to
> have a nice git-submodule number indicating openbmc/openbmc was from
> here.

In what I'm proposing, you would have the whole repository, with those
changes patched on top and it would be a "git merge-base" command to
determine where it was forked from mainline, same as you would for the
kernel.

>
> In my opinion, this is a problem with how people maintaining these trees
> are doing it and not a problem with how our code is organized.

Can you point to any public facing LF-openbmc tree that has done it
"right"?  If nobody on the project can figure out the "right" way
(myself included), including a lot of reasonably smart people, that
somewhat proves my point, that there's complexities in the way these
things are managed.

>  In the
> facebook/openbmc tree we use git-submodules internally to hold the
> upstream trees.  For our github-side we have a script that updates them,
> but there isn't a strong reason we couldn't do the same git-submodule
> layout there (I think it is due to a deficiency in the way our
> internal<=>external mirror tool works).

I'm not in love with the idea of submodules, but I've never worked in
a repository where they've solved the purpose, or been easy to use
(I've heard they cause similar problems to what we're facing) but can
you elaborate on what you're talking about?

Do you think they could handle the volume of patches openbmc fields?

>
>     https://github.com/facebook/openbmc/blob/helium/yocto_repos.sh
>
> If people are not treating the openbmc/openbmc tree as an unchanged
> blob, that is their fault and not ours.  Having all the source imported
> in one repo doesn't really solve this either and in fact is likely to
> make it worse because you now _can't_ treat openbmc/openbmc as an
> unchanged blob because you're going to have to patch-in-tree any changes
> to code you want to make.
>
> > I also think that having
> > one or a smaller number of reviews would concentrate a lot of the
> > discussion when we make treewide changes.  (OWNERS files, ect)  When
> > they get distributed among many reviews, in my experience it tends to
> > dilute the discussion a bit.
>
> I would argue this is actually a bad thing.  We're going to more likely
> end up with large cross-repository commits that are harder to review,
> require more people to review them (a larger set of OWNERS), and are
> harder to revert.

Nothing about what I'm proposing would prevent breaking them up still
and using good judgement, but for things like "turn on c++20" it would
be really nice if we could have one commit, determine where the
problems are, and organize the solutions, instead of breaking it up
across many repositories.

>
> If there is larger discussion to be had that should probably happen on
> the mailing list anyhow.

Our mailing list can't merge code, so it makes technical discussions
that need to reference a particular change like that more difficult if
you need to discuss a diff;  This is a big difference between our
community and say, the kernel.

>
> > > > - It would give an opportunity for individuals and companies to "own"
> > > > well-supported public forks (ie Redhat) of the codebase, which would
> > > > increase participation in the project overall.  This already happens
> > > > quite a bit, but in practice, the forks that do it squash history,
> > > > making it nearly impossible to get their changes upstreamed from an
> > > > outside entity.
> > >
> > > Not sure this is something we want to encourage, even if it happens in
> > > practice.
> >
> > I think when done properly, it would be a huge help to the project.
> > My main point is that this isn't something we can stop (companies and
> > individuals have and will continue to do it anyway), so would we
> > rather make their changes easier to ingest back to upstream?
>
> In my experience the difficulties with upstreaming are not related to
> logistics of sending patches to Gerrit.  They are related to the effort
> involved with getting other people in the community bought into what you're
> trying to do.  Having all the code in one place alleviates 1% of the
> upstreaming effort while doing nothing for the remaining 99%.

We've clearly had different experiences (which is fine).  I've had
multiple developers describe the review process itself as complicated,
unstructured, and hard.  This was mentioned explicitly separate from
community buy-in, and as someone that goes through the process almost
daily, I agree with them that it's more complex than it should be.
For the major initiatives that I've had a hand in bringing online in
the project, having a clear concise way to talk about the engineering
(ie the code) was a much better way at getting consensus than any of
our docs processes.  In most cases, it involved pushing to a public,
unrelated openbmc fork, waiting for momentum to build, then merging
the result when people in the community wanted the capabilities it
provided.  Not having a way to do that within the project adds
friction.

>
> > > > - It would centralize the bug databases.  Today, bugs filed against
> > > > sub projects tend to not get answered.
> > >
> > > Do you have some numbers handy?
> >
> > I do not.  I can say that anecdotally the "you filed this bug against
> > the wrong project" happens quite often in the repositories I maintain,
> > and the lack of reasonable cross project "transfer the bug" semantics
> > makes this difficult (yes, admins can transfer bugs cross project, but
> > I'm pretty sure we don't want to call on core maintainers every time
> > we want to move things around.)  It would be quite helpful to the
> > project to have less than N bug trackers (might not necessarily be
> > one) to increase the odds that someone searches for and finds their
> > bug before filing a duplicate.
>
> "bugs filed against a sub project tend to not get answered" and
> "bugs are filed against the wrong project" are different problems;
> you've shifted the discussion.
>
> The first is a problem against our maintainers.  Having all the issues
> in one repository doesn't improve that situation, but it actually makes
> it worse because you're going to shift that burden on a few individuals
> who pay attention to openbmc/openbmc issues while doing nothing to fix
> the certain-maintainers-don't-respond issue.

That presumes that only a few people have access to resolve bugs.
Ideally all maintainers would have access to the bug closure
permissions, so they could field bugs by project.  Tagging or
assignments could be used to convey where the bug itself lies.

I retract my previous statement, as you're right, I don't have
numbers, propose the following as a replacement: "A single github bug
tracker against the main repo would be a better community driving tool
than distributing it across many sub repositories."
I will also admit that there are probably options where we could do
that anyway, outside of this proposal.

>
> > > > Having all the bugs in
> > > > openbmc/openbmc would help in the future to avoid duplicating bugs
> > > > across projects.
> > >
> > > Has this actually been a problem?
> >
> > Duplication?  It happens from time to time.  Not being able to search
> > for a bug across the project happens a lot, and in our current model,
> > requires the user to know which component they are filing the bug
> > against.
>
> Why can't people search bugs or code at an org-level?  I do it all the time.

If there's a way to get a single listing of bugs for all repos in the
org..... I'm not aware of it, and a quick google search doesn't pop up
anything.  Maybe my google-foo is off?



>
> --
> Patrick Williams

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-04-07 15:39       ` Ed Tanous
@ 2022-04-08 21:36         ` Patrick Williams
  0 siblings, 0 replies; 21+ messages in thread
From: Patrick Williams @ 2022-04-08 21:36 UTC (permalink / raw)
  To: Ed Tanous; +Cc: Andrew Jeffery, OpenBMC Maillist, Brad Bishop

[-- Attachment #1: Type: text/plain, Size: 20729 bytes --]

On Thu, Apr 07, 2022 at 08:39:54AM -0700, Ed Tanous wrote:
> On Wed, Apr 6, 2022 at 10:28 AM Patrick Williams <patrick@stwcx.xyz> wrote:
> > I was going to point to Android and OpenStack as two large open source
> > projects, which also use Gerrit, and seem to have no trouble with the
> > micro-repo model.
> 
> Android had to invent their own tool (repo
> https://gerrit.googlesource.com/git-repo/) to handle this for that
> level of scale.  As a possible pivot on my proposal, if the suggestion
> is to use repo at the openbmc/openbmc level to check out all the
> repositories, and enforce some structure on related-reviews, I'm open
> to the possibility.

I don't necessarily think that repo is appropriate for a Yocto-based
distro.  Effectively what is in the bitbake recipes is nearly the same
as the repo metadata.  There is no point to duplicate those.

> One important thing that I think we lose if we go to
> repo is the idea of "I'm doing something new, but I'm not far enough
> along to request a repository, where should I push the code?".  This
> question has blocked the upstreaming of many (intentionally unnamed to
> protect the innocent) projects, some of which eventually did make it
> into upstream, but late, and in a less open way than would've been
> possible if there was an easily accessible, dev-pushable answer.

We should probably tackle this separately either way.  I think many
people are doing this as barely-advertised repositories in their own GH.

> I don't have a lot of knowledge of the openstack dev process.  Is it
> similar to Androids?

Based on this, I'm not sure it is much different than ours:
    https://docs.openstack.org/devstack/latest/development.html

They have "Devstack" as a test bundle of all the stuff together, which
is analogous to our bitbake image, and then they manually copy artifacts
in and replace them.

> > Why is managing patch files difficult?  Is it lack of documentation?
> 
> A few reasons:
> 1. When patches conflict, resolving the rebase conflicts is a manual,
> difficult process.
> 2. Many devs aren't familiar with the available tools (given the
> overflow of documentation) and can sometimes resort to rewriting patch
> files by hand because they're nor familiar with the relevant tooling.

I'm still not convinced.  Maybe we need a tutorial on how to do this.

If you have a set of patches, you need to `git am` them into the target
repo, then rebase, then `git format-patch` to recreate them.  The
`am/format-patch` pieces are obviously different vs the mono-repo you're
proposing, but everything else is identical.  Resolving the merge
conflicts is _exactly_ the same process.

> 3. Patches sometimes apply to one system, sometimes apply to all
> systems, and generally cause a complex tree of dependencies.  If Yocto
> has a patch file, then openbmc has the same patch file, then a private
> meta layer has the same patch file, there's no mechanism for "patch
> already applied to upstream" like git rebase has.  If certain systems
> have pinned a given version of a subproject because of a regression,
> there's no easy way to apply different versions of patch files pre and
> post regression.
> 4. Patches could be applied at vendor-level, platform-level, SOC
> level, or distro-level, and explaining all of these concepts to new
> engineers is difficult.  In practice, most will opt for checking it
> into the lowest friction place.

This sounds like complication on your part, but I don't understand how a
monorepo solves this complexity.  If you have two different machines
with two different sets of patches either:

1. You need to create separate branches for these two machines under the
   monorepo model.

2. You need to ensure all your patches apply to all machines built in the
   monorepo.

#1 is handled just as fine with a patch model as well.
#2 is policy you should follow in the patch model too.

Regarding "no mechanism for 'patch already applied'", there is when you
do the `git am` import + `git rebase` just the same.

> No, I don't believe it's lack of documentation.  Yocto documents
> workflows very well, there's just way too much of it to expect anyone
> to read it in that level of depth.

It seems to me like it is a deficiency in tooling to make "I am a fork
maintainer" more livable using the fork model you've chosen.  Maybe this
is something we as a community can improve, but I'm slightly worried
that we'll just end up with "well this isn't _my_ fork model"
discussions unless we start with a best-practices on doing forks.

> > Upstream Yocto deals with patch files all the time.  The Facebook BMC
> > tree has machines in production that are still on Rocko-based Yocto
> > distributions and we have plenty of patch files we support there.
> 
> As a design pattern, I don't think we can recommend everyone stay on a
> 5 year yocto + kernel release.  I get that it works for FB, and that's
> their prerogative (seriously, no judgement at all, I understand), but
> there are features in the newer kernels (SOC support, kernel support)
> that the project would like to rely on, not to mention security
> improvements.

Some of these older machines have been in production for a long time and
it isn't investing in them to the point of bringing them up to date.  We
backport security fixes as necessary and otherwise leave them alone.

> Also to some degree this is proving my point, if an OpenBMC
> application wanted to rely on a feature that's only present in the 5.1
> kernel, but there are systems on 4.7 in the tree, we'd need to
> backport patches of the feature;  That gets more onerous.

I think this is why we use separate distribution versions.  We don't
want to go through our full system test cycles every two weeks when the
community refreshes Yocto.  So, we pin a machine back on a distro
version (previously Yocto, but openbmc/openbmc releases starting with
Dunfell).

> > I wouldn't be surprised if Yocto doesn't already have tools to simplify
> > "git-format-patch" -> "package.bbappend" workflow, but if they don't I
> > could probably write something.
> 
> I think it's supposed to be bitbake finish, but from our experience it
> doesn't handle scale very well.

`s/bitbake/devtool/`, I think you meant, but I understand.

Can you define "scale" in this context?  I don't see how a tool that
targets a single repository / recipe is intended to "scale" or how it
would fall apart "at scale".

> If we write another tool, then we're back to "tool that you need to
> know exists to use".  git is a tool that exists, and has these
> features already;  We should avoid writing openbmc-specific tools
> where we can.

I agree in the sense that I'd rather document a workflow using tools
other people already use than write our own.  No debate there.

> > Do we have pointers to when devtool fails?  The only time I've seen it
> > are for recipes that aren't in the rootfs image: kernel and u-boot and
> > they've all been due to bugs in the image.bbclass on our part.  There
> > was actually a fix to one of the u-boot recipes very recently.
> 
> We don't have patch files upstream (by and large), so we don't tend to
> hit these failure modes in public settings.  I can try to come up with
> some contrived examples if you want, but they generally involve some
> form of patch conflict, which devtool can't resolve so it crashes out.

Thanks.  I think this is related to "scale" above.

I am really not understanding why you have "patch conflicts" though at a
devtool level.  Something seems wrong with how you're working to me if
you are.

In our internal tree I have some nasty openssh patches, related to some
custom logging we do, that I keep having to rebase.  I never work with
devtool for this.  This is the only bad patch experience I've personally
had and the monorepo wouldn't have made my life any easier because the
code changes themselves are pretty invasive in openssh (and I just do
the am/rebase/format-patch workflow I mentioned above).

> > I personally don't use devtool all that much, but when I do I want it to
> > point at the central "workspace" of all the openbmc repos I already have
> > so I can get it to pick up code I already have in progress there.  Do we
> > need better documentation around those workflows?
> 
> When you do this, how do you manage if you're working on more than one
> cross repo feature at a time?  Maybe there's some workflow that I'm
> not aware of?

I think I've mentioned in the thread elsewhere, but I always defer the
"combine it all together in Yocto" pretty late in the development
process and I'd tend to use the same branch name across all the repos
for the feature.  You'd need to make sure all the recipes modified with
devtool are on the same branch.

That doesn't help if you're trying to make changes in a single repo for
one feature while you're waiting for bitbake to compile another feature
in the same repo.  If I get to that point, I'd probably `devtool finish`
to get the patches applied and then build.

> > (At one point there was a statement made that we didn't want tooling
> > written to assist with one workflow or another.  This was somewhat made
> > in reference to the `setup` script, but I think it had extensions that
> > made it that any workflow-related tools people have are hosted in their
> > own personal spaces and not talked about.  Maybe we need to change this
> > mentality.)
> 
> Might be reasonable, although I would point out that workflow scripts
> still require the user to know they exist.  If we're talking about
> workflow scripts around "devtool all openbmc recipes" and "roll up
> openbmc changes into review and send", that gets us closer in the "how
> do I open reviews" regard, but I'm not sure that's the right approach.

Glancing at the OpenStack website they seem to have decent tutorials for
common workflows.  We could do the same and document the scripts.

> > I could entirely be misunderstand what problem you're pointing out here.
> > Why would it matter "where in this history the patches were forked
> > from"?
> 
> Because the patches were applied and tested against a particular
> version of the tree.  If you change the tree they're based on, you
> generally invalidate your testing, which hopefully pops up in a merge
> conflict, but doesn't always.

I don't really know how a monorepo solves this problem though.  If you
rebase you maybe have visibility into merge conflicts (which, as I've
said earlier, are the same as if you did rebase in the code repo), but
you would "invalidate your testing" just as much as not having a
monorepo.

> >  Aren't they forked from whatever the SRCREV in the recipe says?
> > It is on the maintainer of the meta-layer to ensure they apply to that
> > revision.
> >
> 
> I'm not quite following, SRCREV would be in the upstream openbmc bb
> file.  If meta layers bbappend now needed to pin every version of
> every package they patch, we're back to a lot of toil to maintain.
> Technically to get a consistent set, you'd have to put in the meta
> layer patch every SRCREV of every openbmc package, considering that
> changes get made often that have cross project consequences.

We must have a very different working model of how forks work?  I don't
understand why the bbappend needs the SRCREV...

> > Having all the code in one repo doesn't give you any more visibility as
> > to where the code was "forked from".  It ends up being exactly the same
> > as a patch file except that the patch file has been "applied" already to
> > the code.
> 
> I'm not quite following your statement here, so I'm not sure I
> understand it.  If I had a meta layer
> 
> meta-ed/
> meta-ed/meta-phosphor/bmcweb_%.bbappend
> meta-ed/meta-phosphor/bmcweb/0001-GreatestEver.patch
> 
> How would I know what version openbmc that patch applies and was tested against?

You never _just_ have a meta-layer.  Meta-layers don't build into
anything on their own.  You have _your_ meta-layer plus some snapshot of
openbmc/openbmc that you're working against, right?  You can do this
with submodules (like Meta does) or you can do it with a subtree merge
of some sort (which it sounds like you do, and it appears many other forks
do).

If you had working code at point A (meta-layer + openbmc X), that means
all your patches apply against "openbmc X" and you can easily derive
what SRCREV's all your bbappends were targeting.  Yes, upgrading to
"openbmc Y" is a littler harder than just a rebase (and dealing with the
merge conflicts) because it requires N-rebases for however many repos
you decided to patch, but it is a very scriptable difference between the
monorepo approach.

And I get that this is more work and a potential negative for the micro-repo
model, but again, this whole "forks are difficult" is:

- Not the same as "upstreaming is difficult".
- Not necessarily the problem of the community.

I do think the micro-repo has massive benefits for how Meta does our
downstream maintenance because with your monorepo model we're going to end up
with (machine)*(distro-version) subtrees with all the code and somehow have to
figure out if patches got applied properly across all these subtrees (or
else create a per-machine branch in our repository and similarly hope we
applied fixes everywhere).

This previous paragraph is related to my lack of understanding how
you're going to deal with different machines having different patches in
the monorepo model.  You _can_ deal with that in a patch model but, as
you said, it is complexity you have to work through.

> >  You still don't have visibility to the underlying upstream
> > commit number.  And, I would suspect it is going to be even worse
> > because you're going to end up with back and forth merge commits trying
> > to pick up the latest upstream code in these forks.  You're not going to
> > have a nice git-submodule number indicating openbmc/openbmc was from
> > here.
> 
> In what I'm proposing, you would have the whole repository, with those
> changes patched on top and it would be a "git merge-base" command to
> determine where it was forked from mainline, same as you would for the
> kernel.

Got it.

> > In my opinion, this is a problem with how people maintaining these trees
> > are doing it and not a problem with how our code is organized.
> 
> Can you point to any public facing LF-openbmc tree that has done it
> "right"?  If nobody on the project can figure out the "right" way
> (myself included), including a lot of reasonably smart people, that
> somewhat proves my point, that there's complexities in the way these
> things are managed.

I'm not going to go as far as saying that ours is "right" but I haven't
really had much trouble maintaining patches for Bletchley.  We do require
that all the code has at least been sent to Gerrit and we only import
the patch file with the corresponding Gerrit Change-Id in it.

    https://github.com/facebook/openbmc/tree/helium/meta-facebook/meta-bletchley

IBM has a lot more experience maintaining their release-fork.  It would probably
be good to hear from them on what they observe as the pain points and
what works well.

> >  In the
> > facebook/openbmc tree we use git-submodules internally to hold the
> > upstream trees.  For our github-side we have a script that updates them,
> > but there isn't a strong reason we couldn't do the same git-submodule
> > layout there (I think it is due to a deficiency in the way our
> > internal<=>external mirror tool works).
> 
> I'm not in love with the idea of submodules, but I've never worked in
> a repository where they've solved the purpose, or been easy to use
> (I've heard they cause similar problems to what we're facing) but can
> you elaborate on what you're talking about?

We explicitly use pristine submodules here of the openbmc/openbmc tree.
It isn't much different from the openbmc/openbmc handles the Yocto
subtrees, except that we use submodules to maintain the pointers rather
than squash-importing the commits.

They're just a pointer to "the latest tested version of Dunfell" that
we're building our code against.  If we want to upgrade openbmc/openbmc
we move the submodule pointer to a later commit.

> Do you think they could handle the volume of patches openbmc fields?

We don't patch (modify) the submodules.  Any changes are done in the
meta-facebook layer.

> Nothing about what I'm proposing would prevent breaking them up still
> and using good judgement, but for things like "turn on c++20" it would
> be really nice if we could have one commit, determine where the
> problems are, and organize the solutions, instead of breaking it up
> across many repositories.

I don't really understand why this is positive, but it isn't necessarily
related to monorepos.  We already push them up as a topic, but requiring
them to "all be done at once" would slow the process down and reduce the
parallelization of effort.

> > If there is larger discussion to be had that should probably happen on
> > the mailing list anyhow.
> 
> Our mailing list can't merge code, so it makes technical discussions
> that need to reference a particular change like that more difficult if
> you need to discuss a diff;  This is a big difference between our
> community and say, the kernel.

I was meaning talking about direction or snags we've ran into, not code.

This is an example thread from the mailing list:
    https://lore.kernel.org/openbmc/YQ1FD5q8KbhbXVBK@heinlein/

As we are working through pervasive changes it is good to keep everyone
appraised of the issues you're running into so they can also be fixed by
people in other areas of the tree.

> > In my experience the difficulties with upstreaming are not related to
> > logistics of sending patches to Gerrit.  They are related to the effort
> > involved with getting other people in the community bought into what you're
> > trying to do.  Having all the code in one place alleviates 1% of the
> > upstreaming effort while doing nothing for the remaining 99%.
> 
> We've clearly had different experiences (which is fine).  I've had
> multiple developers describe the review process itself as complicated,
> unstructured, and hard.  This was mentioned explicitly separate from
> community buy-in, and as someone that goes through the process almost
> daily, I agree with them that it's more complex than it should be.

I'm not really sure what is being alluded to here.  I do agree it can be
difficult in some areas to get code in, even if it is small and reasonable.
I'm having trouble aligning that mentally in an axis with "monorepo will
solve this".

> For the major initiatives that I've had a hand in bringing online in
> the project, having a clear concise way to talk about the engineering
> (ie the code) was a much better way at getting consensus than any of
> our docs processes.  In most cases, it involved pushing to a public,
> unrelated openbmc fork, waiting for momentum to build, then merging
> the result when people in the community wanted the capabilities it
> provided.  Not having a way to do that within the project adds
> friction.

Can you point to examples of this?  I don't think I've ever pushed code
to my own fork of openbmc for any development and I don't recall any
being advertised on here in a very long time (except for this monorepo
proposal, haha).

> I retract my previous statement, as you're right, I don't have
> numbers, propose the following as a replacement: "A single github bug
> tracker against the main repo would be a better community driving tool
> than distributing it across many sub repositories."
> I will also admit that there are probably options where we could do
> that anyway, outside of this proposal.

I mentioned elsewhere I don't have any major issues with disabling the
sub-repo issues database for most of our repos.  We do have to figure
out how to manage who has access to dispose of issues and I think we
need to solve that whatever we do with the code.

> > Why can't people search bugs or code at an org-level?  I do it all the time.
> 
> If there's a way to get a single listing of bugs for all repos in the
> org..... I'm not aware of it, and a quick google search doesn't pop up
> anything.  Maybe my google-foo is off?

https://github.com/issues?q=is%3Aopen+is%3Aissue+org%3Aopenbmc+archived%3Afalse+

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-04-04 18:28 Proposing changes to the OpenBMC tree (to make upstreaming easier) Ed Tanous
  2022-04-06  2:19 ` Andrew Jeffery
  2022-04-06 20:06 ` Patrick Williams
@ 2022-04-12  7:23 ` Heyi Guo
  2022-05-23 16:27   ` Ed Tanous
  2 siblings, 1 reply; 21+ messages in thread
From: Heyi Guo @ 2022-04-12  7:23 UTC (permalink / raw)
  To: Ed Tanous, OpenBMC Maillist; +Cc: Andrew Jeffery, Brad Bishop

I like the idea, for we don't utilize additional tools like repo to 
maintain the code, and it should make it easier for us to maintain 
multiple internal branches.

Thanks,

Heyi

在 2022/4/5 上午2:28, Ed Tanous 写道:
> The OpenBMC development process as it stands is difficult for people
> new to the project to understand, which severely limits our ability to
> onboard new maintainers, developers, and groups which would otherwise
> contribute major features to upstream, but don't have the technical
> expertise to do so.  This initiative, much like others before it[1] is
> attempting to reduce the toil and OpenBMC-specific processes of
> passing changes amongst the community, and move things to being more
> like other projects that have largely solved this problem already.
>
> To that end, I'd like to propose a change to the way we structure our
> repositories within the project: specifically, putting (almost) all of
> the Linux Foundation OpenBMC owned code into a single repo that we can
> version as a single entity, rather than spreading out amongst many
> repos.  In practice, this would have some significant advantages:
>
> - The tree would be easily shareable amongst the various people
> working on OpenBMC, without having to rely on a single-source Gerrit
> instance.  Git is designed to be distributed, but if our recipe files
> point at other repositories, it largely defeats a lot of this
> capability.  Today, if you want to share a tree that has a change in
> it, you have to fork the main tree, then fork every single subproject
> you've made modifications to, then update the main tree to point to
> your forks.  This gets very onerous over time, especially for simple
> commits.  Having maintained several different companies forks
> personally, and spoken to many others having problems with the same,
> adding major features are difficult to test and rebase because of
> this.  Moving the code to a single tree makes a lot of the toil of
> tagging and modifying local trees a lot more manageable, as a series
> of well-documented git commands (generally git rebase[2]).  It also
> increases the likelihood that someone pulls down the fork to test it
> if it's highly likely that they can apply it to their own tree in a
> single command.
>
> - There would be a reduction in reviews.  Today, anytime a person
> wants to make a change that would involve any part of the tree,
> there's at least 2 code reviews, one for the commit, and one for the
> recipe bump.  Compared to a single tree, this at least doubles the
> number of reviews we need to process.  For changes that want to make
> any change to a few subsystems, as is the case when developing a
> feature, they require 2 X <number of project changes> reviews, all of
> which need to be synchronized.  There is a well documented problem
> where we have no official way to synchronize merging of changes to
> userspace applications within a bump without manual human
> intervention.  This would largely render that problem moot.
>
> - It would allow most developers to not need to understand Yocto at
> all to do their day to day work on existing applications.  No more
> "devtool modify", and related SRCREV bumps.  This will help most of
> the new developers on the project with a lower mental load, which will
> mean people are able to ramp up faster..
>
> - It would give an opportunity for individuals and companies to "own"
> well-supported public forks (ie Redhat) of the codebase, which would
> increase participation in the project overall.  This already happens
> quite a bit, but in practice, the forks that do it squash history,
> making it nearly impossible to get their changes upstreamed from an
> outside entity.
>
> - It would centralize the bug databases.  Today, bugs filed against
> sub projects tend to not get answered.  Having all the bugs in
> openbmc/openbmc would help in the future to avoid duplicating bugs
> across projects.
>
> - Would increase the likelihood that someone contributes a patch,
> especially a patch written by someone else.  If contributing a patch
> was just a matter of cherry-picking a tree of commits and submitting
> it to gerrit, it's a lot more likely that people would do it.
>
> - Greatly increases the ease with which stats are collected.
> Questions like: How many patches were submitted last year?  How many
> lines of code changed between commit A and commit B?  Where was this
> regression injected (ie git bisect)?  How much of our codebase is C++?
> How many users of the dbus Sensor.Value interface are there?  Are all
> easily answered in one liner git commands once this change is done.
>
> - New features no longer require single-point-of-contact core
> maintainer processes (ie, creating a repo for changes, setting up
> maintainer groups, ect) and can just be submitted as a series of
> patches to openbmc/openbmc.
>
> - Tree-wide changes (c++ standard, yocto updates, formatting, ect) are
> much easier to accomplish in a small number of patches, or a series of
> patches that is easy to pull and test as a unit.
>
> In terms of concretely how we would accomplish this, I've put together
> what such a tree would look like, and I'm looking for input on how it
> could be improved.  Some key points on what it represents:
>
> - All history for both openbmc and sub projects will be retained.
> Commits are interleaved based on the date in which they were submitted
> using custom tooling that was built on top of git fast-export and
> fast-import.  All previously available tags will have similar tags in
> the new repository pointing at their equivalent commits in the new
> repository.
>
> - Inclusive guidelines: To make progress toward an unrelated but
> important goal at the same time, I'm recommending that the
> openbmc/master branch will be left as-is, and the newly-created sha1
> will be pushed to the branch openbmc/openbmc:main, to retain peoples
> links to previous commits on master, and retain the exact project
> history while at the same time moving the project to having more
> inclusive naming, as has been documented previously[3].  At some point
> in the future the master branch could be renamed and deprecated, but
> this is considered out of scope for this specific change.
>
> - Each individual sub-project will be given a folder within
> openbmc/openbmc based on their current repository name.  While there
> is an opportunity to reorganize in more specific ways (ie, put all
> ipmi-oem handler repos in a folder) this proposal intentionally
> doesn't, under the proposition that once this change is made, any sort
> of folder rearranging will be much easier to accomplish, and to keep
> the scope limited.
>
> - Yocto recipes will be changed to point to their path equivalent, and
> inherit externalsrc bbclass[4].  This workflow is exactly the workflow
> devtool uses to point to local repositories during a "devtool modify",
> so it's unlikely we will have incremental build-consistency issues
> with this approach, as was a concern in the past.
>
> - Places where we've forked other well supported projects (u-boot,
> kernel, ect) will continue to point to the openbmc/<projectname> fork.
> This is done to ensure that we don't inflict the same problem we're
> attempting to solve in OpenBMC upon those working in the subproject
> forks, and to reinforce to contributors that patches to these projects
> should prefer submitting first to the relevant upstream.
>
> - Subprojects that are intended to be reused outside of OpenBMC (ex
> sdbusplus) will retain their previous commit, history, and trees, such
> that they are usable outside the project.  This is intended to make
> sure that the code that should be reusable by others remains so.
>
> - The above intentionally makes no changes to our subtree update
> process, which would remain the same process as is currently.  The
> openbmc-specific autobump job in Jenkins would be disabled considering
> it's no longer required in this approach.
>
> - Most Gerrit patches would now be submitted to openbmc/openbmc.
>
> My proposed version of this tree is pushed to a github fork here, and
> is based on the tree from a few weeks ago:
> https://github.com/edtanous/openbmc
>
> It implements all the above for the main branch.  This tree is based
> on the output of the automated tooling, and in the case where this
> proposal is accepted, the tooling would be re-run to capture the state
> of the tree at the point where we chose to make this change.
>
> The tool I wrote to generate this tree is also published, if you're
> interested in how this tree was built, and is quite interesting in its
> use of git export/import [5], but functionally, I would not expect
> that tooling to survive after this transition is made.
>
> Let me know what you think.
>
> -Ed
>
> [1] https://lore.kernel.org/openbmc/CACWQX821ADQCrekLj_bGAu=1SSLCv5pTee7jaoVo2Zs6havgnA@mail.gmail.com/
> [2] https://git-scm.com/docs/git-rebase
> [3] https://github.com/openbmc/docs/blob/master/CONTRIBUTING.md#inclusive-naming
> [4] https://www.yoctoproject.org/docs/1.8/ref-manual/ref-manual.html#ref-classes-externalsrc
> [5] https://github.com/edtanous/obmc-repo-combine/blob/main/combine

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-04-06  2:19 ` Andrew Jeffery
  2022-04-06 15:54   ` Ed Tanous
@ 2022-05-19 21:12   ` Cody Smith
  2022-05-23 16:37     ` Ed Tanous
  1 sibling, 1 reply; 21+ messages in thread
From: Cody Smith @ 2022-05-19 21:12 UTC (permalink / raw)
  To: OpenBMC Maillist; +Cc: Ed Tanous, Brad Bishop

[-- Attachment #1: Type: text/plain, Size: 10801 bytes --]

I don't seem to have the original message, so this may get added to
Andrew's branch of this thread. Sorry about that in advance.

In general I support moving to a monorepo. We at Google do this, and my
significant other at Airbnb also utilizes a monorepo. The advantages are
significant, as the world gets a lot less silo'd and making changes that
would have spanned across multiple repos now only span the monorepo. This
is particularly useful when feature X requires changes to repo A, B and C,
and the changes on their own break things but shipped together are just
fine. I don't even really know how such a feature gets shipped today to be
honest.

The other thing that tends to happen with monorepos is a lot more
conformity, as reviews are carried out by a larger set of people. Suddenly
`bmcweb` is being reviewed by people who may not have previously cared
about or touched that part of the codebase as a bad example. At a
minimum more people will have eyes on the changes happening.

I also think that a monorepo avoids one maintainer "lording" over a repo.
It happens, the +2ers kind of play a role of the bridge troll, when repo X
only has 1-2 +2ers, this can be a real problem. A monorepo with 10+ +2ers
will force the +2ers to engage in debate when they disagree with each
other, instead of lording over their own kingdoms and having no influence
over other kingdoms so to speak.

I haven't made a great set of arguments here but in general I feel like a
chance like this would help from an organizational perspective and maybe
with that better org. in place maybe we can begin addressing some of the
other issues we need to address.

[image: Google Logo]
Cody Smith
System Software Engineer
Google Cloud Platform Core Services Team
scody@google.com
720-515-6105



On Tue, Apr 5, 2022 at 7:22 PM Andrew Jeffery <andrew@aj.id.au> wrote:

> Hi Ed,
>
> I think what's below largely points to a bit of an identity crisis for
> the project, on a couple of fronts. Fundamentally OpenBMC is a distro
> (or as Yocto likes to point out, a meta-distro), and we can:
>
> 1. Identify as a traditional OSS distro: An integration of otherwise
>    independent applications
>
> 2. Identify as an appliance distro: The distro and the
>    applications are a monolith
>
> You're proposing 2, while I think there exists some tension towards 1.
>
> With the amount of custom userspace we've always kinda sat in-between.
> I'd like to see libraries and applications that have use cases outside
> of OpenBMC be accessible to people with those external use cases,
> without being burdened by understanding the rest of the OpenBMC context.
> I have a concern that by integrating things in the way you're proposing
> it will lead to more inertia there (e.g. for implementations of
> standards MCTP or PLDM (libmctp and libpldm)).
>
> On Tue, 5 Apr 2022, at 03:58, Ed Tanous wrote:
> > The OpenBMC development process as it stands is difficult for people
> > new to the project to understand, which severely limits our ability to
> > onboard new maintainers, developers, and groups which would otherwise
> > contribute major features to upstream, but don't have the technical
> > expertise to do so.  This initiative, much like others before it[1] is
> > attempting to reduce the toil and OpenBMC-specific processes of
> > passing changes amongst the community, and move things to being more
> > like other projects that have largely solved this problem already.
>
> Can you be more specific about which projects here? Do you have links
> to examples?
>
> >
> > To that end, I'd like to propose a change to the way we structure our
> > repositories within the project: specifically, putting (almost) all of
> > the Linux Foundation OpenBMC owned code into a single repo that we can
> > version as a single entity, rather than spreading out amongst many
> > repos.  In practice, this would have some significant advantages:
> >
> > - The tree would be easily shareable amongst the various people
> > working on OpenBMC, without having to rely on a single-source Gerrit
> > instance.  Git is designed to be distributed, but if our recipe files
> > point at other repositories, it largely defeats a lot of this
> > capability.  Today, if you want to share a tree that has a change in
> > it, you have to fork the main tree, then fork every single subproject
> > you've made modifications to, then update the main tree to point to
> > your forks.
>
> This isn't true, as you can add patches in the OpenBMC tree.
>
> CI prevents these from being submitted, as it should, but there's nothing
> to
> stop anyone using the `devtool modify ...` / `devtool finish ...` and
> committing the result as a workflow to exchange state (I do this)?
>
> Is the issue instead with devtool? Is it bad? Is the learning curve too
> steep?
> It is at least the Yocto workflow.
>
> > This gets very onerous over time, especially for simple
> > commits.  Having maintained several different companies forks
> > personally, and spoken to many others having problems with the same,
> > adding major features are difficult to test and rebase because of
> > this.  Moving the code to a single tree makes a lot of the toil of
> > tagging and modifying local trees a lot more manageable, as a series
> > of well-documented git commands (generally git rebase[2]).  It also
> > increases the likelihood that someone pulls down the fork to test it
> > if it's highly likely that they can apply it to their own tree in a
> > single command.
>
> Again, this is moot if the patches are applied in-tree.
>
> >
> > - There would be a reduction in reviews.  Today, anytime a person
> > wants to make a change that would involve any part of the tree,
> > there's at least 2 code reviews, one for the commit, and one for the
> > recipe bump.  Compared to a single tree, this at least doubles the
> > number of reviews we need to process.
>
> Is there more work? Yes.
>
> Is it always double? No. Is it sometimes double? Yes.
>
> Often bumps batch multiple application commits. I think this paragraph
> overstates the problem somewhat, but what it does get right is
> identifying that *some* overhead exists.
>
> >  For changes that want to make
> > any change to a few subsystems, as is the case when developing a
> > feature, they require 2 X <number of project changes> reviews, all of
> > which need to be synchronized.
>
> Same issue as above here.
>
> > There is a well documented problem
> > where we have no official way to synchronize merging of changes to
> > userspace applications within a bump without manual human
> > intervention.  This would largely render that problem moot.
>
> Right, this can be hard to handle.
>
> It can be mitigated by versioning interfaces (which the D-Bus spec
> calls out[6][7] but OpenBMC fails to do (?)) and supporting multiple
> interfaces for the transition period.
>
> That said, that's also more work, and so needs to be considered in the
> set of trade-offs.
>
> [6]
> https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-interface
> [7]
> https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-bus
>
> >
> > - It would allow most developers to not need to understand Yocto at
> > all to do their day to day work on existing applications.  No more
> > "devtool modify", and related SRCREV bumps.  This will help most of
> > the new developers on the project with a lower mental load, which will
> > mean people are able to ramp up faster..
>
> Okay. So devtool is seen as an issue.
>
> Can we improve its visibility and any education around it? Or is it a
> lost cause? If so, why?
>
> Separately, I'm concerned this is an attempt to shield people from
> skills that help them work with upstream Yocto. OpenBMC feels like it's
> a bit of an on-ramp for open-source contributions for people who have
> worked in what was previously quite a proprietary environment. We tried
> shielding people in the past wrt kernel contributions, and that failed
> pretty spectacularly. We (at least Joel and I) now encourage people to
> work with upstream directly *and support them in the process of doing
> that* rather than trying to mitigate some of the difficulties with
> working upstream by avoiding them.
>
> >
> > - It would give an opportunity for individuals and companies to "own"
> > well-supported public forks (ie Redhat) of the codebase, which would
> > increase participation in the project overall.  This already happens
> > quite a bit, but in practice, the forks that do it squash history,
> > making it nearly impossible to get their changes upstreamed from an
> > outside entity.
>
> Not sure this is something we want to encourage, even if it happens in
> practice.
>
> >
> > - It would centralize the bug databases.  Today, bugs filed against
> > sub projects tend to not get answered.
>
> Do you have some numbers handy?
>
> > Having all the bugs in
> > openbmc/openbmc would help in the future to avoid duplicating bugs
> > across projects.
>
> Has this actually been a problem?
>
> >
> > - Would increase the likelihood that someone contributes a patch,
> > especially a patch written by someone else.  If contributing a patch
> > was just a matter of cherry-picking a tree of commits and submitting
> > it to gerrit, it's a lot more likely that people would do it.
>
> It sounds plausible, but again, some evidence for this would be helpful.
>
> Why is this easier than submitting the patches to the application repo?
>
> > My proposed version of this tree is pushed to a github fork here, and
> > is based on the tree from a few weeks ago:
> > https://github.com/edtanous/openbmc
> >
> > It implements all the above for the main branch.  This tree is based
> > on the output of the automated tooling, and in the case where this
> > proposal is accepted, the tooling would be re-run to capture the state
> > of the tree at the point where we chose to make this change.
> >
> > The tool I wrote to generate this tree is also published, if you're
> > interested in how this tree was built, and is quite interesting in its
> > use of git export/import [5], but functionally, I would not expect
> > that tooling to survive after this transition is made.
>
> I think it would be good to capture the script in openbmc-tools if we
> choose to go ahead with this, mainly as a record of how we achieved it.
>
> Andrew
>
> >
> > [1]
> >
> https://lore.kernel.org/openbmc/CACWQX821ADQCrekLj_bGAu=1SSLCv5pTee7jaoVo2Zs6havgnA@mail.gmail.com/
> > [2] https://git-scm.com/docs/git-rebase
> > [3]
> >
> https://github.com/openbmc/docs/blob/master/CONTRIBUTING.md#inclusive-naming
> > [4]
> >
> https://www.yoctoproject.org/docs/1.8/ref-manual/ref-manual.html#ref-classes-externalsrc
> > [5] https://github.com/edtanous/obmc-repo-combine/blob/main/combine
>

[-- Attachment #2: Type: text/html, Size: 15481 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-04-12  7:23 ` Heyi Guo
@ 2022-05-23 16:27   ` Ed Tanous
  2022-05-25 13:31     ` Heyi Guo
  0 siblings, 1 reply; 21+ messages in thread
From: Ed Tanous @ 2022-05-23 16:27 UTC (permalink / raw)
  To: Heyi Guo; +Cc: Andrew Jeffery, OpenBMC Maillist, Brad Bishop

On Tue, Apr 12, 2022 at 12:23 AM Heyi Guo <guoheyi@linux.alibaba.com> wrote:
>
> I like the idea, for we don't utilize additional tools like repo to
> maintain the code, and it should make it easier for us to maintain
> multiple internal branches.
>

Hi Heyi,
Glad to see you on the project.  Do you think you could elaborate a
little about how you're hoping to use OpenBMC and its review process,
and if any of the changes being proposed here would help you?


> Thanks,
>
> Heyi
>
> 在 2022/4/5 上午2:28, Ed Tanous 写道:
> > The OpenBMC development process as it stands is difficult for people
> > new to the project to understand, which severely limits our ability to
> > onboard new maintainers, developers, and groups which would otherwise
> > contribute major features to upstream, but don't have the technical
> > expertise to do so.  This initiative, much like others before it[1] is
> > attempting to reduce the toil and OpenBMC-specific processes of
> > passing changes amongst the community, and move things to being more
> > like other projects that have largely solved this problem already.
> >
> > To that end, I'd like to propose a change to the way we structure our
> > repositories within the project: specifically, putting (almost) all of
> > the Linux Foundation OpenBMC owned code into a single repo that we can
> > version as a single entity, rather than spreading out amongst many
> > repos.  In practice, this would have some significant advantages:
> >
> > - The tree would be easily shareable amongst the various people
> > working on OpenBMC, without having to rely on a single-source Gerrit
> > instance.  Git is designed to be distributed, but if our recipe files
> > point at other repositories, it largely defeats a lot of this
> > capability.  Today, if you want to share a tree that has a change in
> > it, you have to fork the main tree, then fork every single subproject
> > you've made modifications to, then update the main tree to point to
> > your forks.  This gets very onerous over time, especially for simple
> > commits.  Having maintained several different companies forks
> > personally, and spoken to many others having problems with the same,
> > adding major features are difficult to test and rebase because of
> > this.  Moving the code to a single tree makes a lot of the toil of
> > tagging and modifying local trees a lot more manageable, as a series
> > of well-documented git commands (generally git rebase[2]).  It also
> > increases the likelihood that someone pulls down the fork to test it
> > if it's highly likely that they can apply it to their own tree in a
> > single command.
> >
> > - There would be a reduction in reviews.  Today, anytime a person
> > wants to make a change that would involve any part of the tree,
> > there's at least 2 code reviews, one for the commit, and one for the
> > recipe bump.  Compared to a single tree, this at least doubles the
> > number of reviews we need to process.  For changes that want to make
> > any change to a few subsystems, as is the case when developing a
> > feature, they require 2 X <number of project changes> reviews, all of
> > which need to be synchronized.  There is a well documented problem
> > where we have no official way to synchronize merging of changes to
> > userspace applications within a bump without manual human
> > intervention.  This would largely render that problem moot.
> >
> > - It would allow most developers to not need to understand Yocto at
> > all to do their day to day work on existing applications.  No more
> > "devtool modify", and related SRCREV bumps.  This will help most of
> > the new developers on the project with a lower mental load, which will
> > mean people are able to ramp up faster..
> >
> > - It would give an opportunity for individuals and companies to "own"
> > well-supported public forks (ie Redhat) of the codebase, which would
> > increase participation in the project overall.  This already happens
> > quite a bit, but in practice, the forks that do it squash history,
> > making it nearly impossible to get their changes upstreamed from an
> > outside entity.
> >
> > - It would centralize the bug databases.  Today, bugs filed against
> > sub projects tend to not get answered.  Having all the bugs in
> > openbmc/openbmc would help in the future to avoid duplicating bugs
> > across projects.
> >
> > - Would increase the likelihood that someone contributes a patch,
> > especially a patch written by someone else.  If contributing a patch
> > was just a matter of cherry-picking a tree of commits and submitting
> > it to gerrit, it's a lot more likely that people would do it.
> >
> > - Greatly increases the ease with which stats are collected.
> > Questions like: How many patches were submitted last year?  How many
> > lines of code changed between commit A and commit B?  Where was this
> > regression injected (ie git bisect)?  How much of our codebase is C++?
> > How many users of the dbus Sensor.Value interface are there?  Are all
> > easily answered in one liner git commands once this change is done.
> >
> > - New features no longer require single-point-of-contact core
> > maintainer processes (ie, creating a repo for changes, setting up
> > maintainer groups, ect) and can just be submitted as a series of
> > patches to openbmc/openbmc.
> >
> > - Tree-wide changes (c++ standard, yocto updates, formatting, ect) are
> > much easier to accomplish in a small number of patches, or a series of
> > patches that is easy to pull and test as a unit.
> >
> > In terms of concretely how we would accomplish this, I've put together
> > what such a tree would look like, and I'm looking for input on how it
> > could be improved.  Some key points on what it represents:
> >
> > - All history for both openbmc and sub projects will be retained.
> > Commits are interleaved based on the date in which they were submitted
> > using custom tooling that was built on top of git fast-export and
> > fast-import.  All previously available tags will have similar tags in
> > the new repository pointing at their equivalent commits in the new
> > repository.
> >
> > - Inclusive guidelines: To make progress toward an unrelated but
> > important goal at the same time, I'm recommending that the
> > openbmc/master branch will be left as-is, and the newly-created sha1
> > will be pushed to the branch openbmc/openbmc:main, to retain peoples
> > links to previous commits on master, and retain the exact project
> > history while at the same time moving the project to having more
> > inclusive naming, as has been documented previously[3].  At some point
> > in the future the master branch could be renamed and deprecated, but
> > this is considered out of scope for this specific change.
> >
> > - Each individual sub-project will be given a folder within
> > openbmc/openbmc based on their current repository name.  While there
> > is an opportunity to reorganize in more specific ways (ie, put all
> > ipmi-oem handler repos in a folder) this proposal intentionally
> > doesn't, under the proposition that once this change is made, any sort
> > of folder rearranging will be much easier to accomplish, and to keep
> > the scope limited.
> >
> > - Yocto recipes will be changed to point to their path equivalent, and
> > inherit externalsrc bbclass[4].  This workflow is exactly the workflow
> > devtool uses to point to local repositories during a "devtool modify",
> > so it's unlikely we will have incremental build-consistency issues
> > with this approach, as was a concern in the past.
> >
> > - Places where we've forked other well supported projects (u-boot,
> > kernel, ect) will continue to point to the openbmc/<projectname> fork.
> > This is done to ensure that we don't inflict the same problem we're
> > attempting to solve in OpenBMC upon those working in the subproject
> > forks, and to reinforce to contributors that patches to these projects
> > should prefer submitting first to the relevant upstream.
> >
> > - Subprojects that are intended to be reused outside of OpenBMC (ex
> > sdbusplus) will retain their previous commit, history, and trees, such
> > that they are usable outside the project.  This is intended to make
> > sure that the code that should be reusable by others remains so.
> >
> > - The above intentionally makes no changes to our subtree update
> > process, which would remain the same process as is currently.  The
> > openbmc-specific autobump job in Jenkins would be disabled considering
> > it's no longer required in this approach.
> >
> > - Most Gerrit patches would now be submitted to openbmc/openbmc.
> >
> > My proposed version of this tree is pushed to a github fork here, and
> > is based on the tree from a few weeks ago:
> > https://github.com/edtanous/openbmc
> >
> > It implements all the above for the main branch.  This tree is based
> > on the output of the automated tooling, and in the case where this
> > proposal is accepted, the tooling would be re-run to capture the state
> > of the tree at the point where we chose to make this change.
> >
> > The tool I wrote to generate this tree is also published, if you're
> > interested in how this tree was built, and is quite interesting in its
> > use of git export/import [5], but functionally, I would not expect
> > that tooling to survive after this transition is made.
> >
> > Let me know what you think.
> >
> > -Ed
> >
> > [1] https://lore.kernel.org/openbmc/CACWQX821ADQCrekLj_bGAu=1SSLCv5pTee7jaoVo2Zs6havgnA@mail.gmail.com/
> > [2] https://git-scm.com/docs/git-rebase
> > [3] https://github.com/openbmc/docs/blob/master/CONTRIBUTING.md#inclusive-naming
> > [4] https://www.yoctoproject.org/docs/1.8/ref-manual/ref-manual.html#ref-classes-externalsrc
> > [5] https://github.com/edtanous/obmc-repo-combine/blob/main/combine

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-05-19 21:12   ` Cody Smith
@ 2022-05-23 16:37     ` Ed Tanous
  2022-05-23 21:07       ` John Broadbent
  0 siblings, 1 reply; 21+ messages in thread
From: Ed Tanous @ 2022-05-23 16:37 UTC (permalink / raw)
  To: Cody Smith; +Cc: OpenBMC Maillist, Brad Bishop

On Thu, May 19, 2022 at 2:12 PM Cody Smith <scody@google.com> wrote:
>
> I don't seem to have the original message, so this may get added to Andrew's branch of this thread. Sorry about that in advance.

The original message got caught in a lot of peoples spam filters, I'm
hoping that explains some of the lack of reply to the initial
proposal.

>
> In general I support moving to a monorepo. We at Google do this, and my significant other at Airbnb also utilizes a monorepo. The advantages are significant, as the world gets a lot less silo'd and making changes that would have spanned across multiple repos now only span the monorepo. This is particularly useful when feature X requires changes to repo A, B and C, and the changes on their own break things but shipped together are just fine. I don't even really know how such a feature gets shipped today to be honest.

I agree with your general sentiment, although a couple nitpicks, what
I propose above isn't pure "monorepo" and more analogous to
"consolidate a lot of the repos".  FWIW, although I really think it's
the right thing to do, "other companies do it for other things" isn't
the best of arguments we can make for this.  There are plenty of
counter examples of companies with much more entrenched command chains
that use multiple repos and the creation of repos as a form of project
management to great effect.

>
> The other thing that tends to happen with monorepos is a lot more conformity, as reviews are carried out by a larger set of people.

+1.  Applying consistent clang-format to the codebase for example
would be a lot more trivial.

> Suddenly `bmcweb` is being reviewed by people who may not have previously cared about or touched that part of the codebase as a bad example. At a minimum more people will have eyes on the changes happening.
>
> I also think that a monorepo avoids one maintainer "lording" over a repo. It happens, the +2ers kind of play a role of the bridge troll, when repo X only has 1-2 +2ers, this can be a real problem. A monorepo with 10+ +2ers will force the +2ers to engage in debate when they disagree with each other, instead of lording over their own kingdoms and having no influence over other kingdoms so to speak.

In what I propose, I don't really think this changes given that the
existing OWNERS files would still be largely the same, although I
agree, more +2er debate would be a good thing if it was the result.

>
> I haven't made a great set of arguments here but in general I feel like a chance like this would help from an organizational perspective and maybe with that better org. in place maybe we can begin addressing some of the other issues we need to address.

Thanks for your input.

PS, plaintext is generally prefered on this ML, given that it diffs
better in the tools.  (Click triple dot in the lower right of gmail,
then check "plain text mode").

>
> Cody Smith
> System Software Engineer
> Google Cloud Platform Core Services Team
> scody@google.com
> 720-515-6105
>
>
>
> On Tue, Apr 5, 2022 at 7:22 PM Andrew Jeffery <andrew@aj.id.au> wrote:
>>
>> Hi Ed,
>>
>> I think what's below largely points to a bit of an identity crisis for
>> the project, on a couple of fronts. Fundamentally OpenBMC is a distro
>> (or as Yocto likes to point out, a meta-distro), and we can:
>>
>> 1. Identify as a traditional OSS distro: An integration of otherwise
>>    independent applications
>>
>> 2. Identify as an appliance distro: The distro and the
>>    applications are a monolith
>>
>> You're proposing 2, while I think there exists some tension towards 1.
>>
>> With the amount of custom userspace we've always kinda sat in-between.
>> I'd like to see libraries and applications that have use cases outside
>> of OpenBMC be accessible to people with those external use cases,
>> without being burdened by understanding the rest of the OpenBMC context.
>> I have a concern that by integrating things in the way you're proposing
>> it will lead to more inertia there (e.g. for implementations of
>> standards MCTP or PLDM (libmctp and libpldm)).
>>
>> On Tue, 5 Apr 2022, at 03:58, Ed Tanous wrote:
>> > The OpenBMC development process as it stands is difficult for people
>> > new to the project to understand, which severely limits our ability to
>> > onboard new maintainers, developers, and groups which would otherwise
>> > contribute major features to upstream, but don't have the technical
>> > expertise to do so.  This initiative, much like others before it[1] is
>> > attempting to reduce the toil and OpenBMC-specific processes of
>> > passing changes amongst the community, and move things to being more
>> > like other projects that have largely solved this problem already.
>>
>> Can you be more specific about which projects here? Do you have links
>> to examples?
>>
>> >
>> > To that end, I'd like to propose a change to the way we structure our
>> > repositories within the project: specifically, putting (almost) all of
>> > the Linux Foundation OpenBMC owned code into a single repo that we can
>> > version as a single entity, rather than spreading out amongst many
>> > repos.  In practice, this would have some significant advantages:
>> >
>> > - The tree would be easily shareable amongst the various people
>> > working on OpenBMC, without having to rely on a single-source Gerrit
>> > instance.  Git is designed to be distributed, but if our recipe files
>> > point at other repositories, it largely defeats a lot of this
>> > capability.  Today, if you want to share a tree that has a change in
>> > it, you have to fork the main tree, then fork every single subproject
>> > you've made modifications to, then update the main tree to point to
>> > your forks.
>>
>> This isn't true, as you can add patches in the OpenBMC tree.
>>
>> CI prevents these from being submitted, as it should, but there's nothing to
>> stop anyone using the `devtool modify ...` / `devtool finish ...` and
>> committing the result as a workflow to exchange state (I do this)?
>>
>> Is the issue instead with devtool? Is it bad? Is the learning curve too steep?
>> It is at least the Yocto workflow.
>>
>> > This gets very onerous over time, especially for simple
>> > commits.  Having maintained several different companies forks
>> > personally, and spoken to many others having problems with the same,
>> > adding major features are difficult to test and rebase because of
>> > this.  Moving the code to a single tree makes a lot of the toil of
>> > tagging and modifying local trees a lot more manageable, as a series
>> > of well-documented git commands (generally git rebase[2]).  It also
>> > increases the likelihood that someone pulls down the fork to test it
>> > if it's highly likely that they can apply it to their own tree in a
>> > single command.
>>
>> Again, this is moot if the patches are applied in-tree.
>>
>> >
>> > - There would be a reduction in reviews.  Today, anytime a person
>> > wants to make a change that would involve any part of the tree,
>> > there's at least 2 code reviews, one for the commit, and one for the
>> > recipe bump.  Compared to a single tree, this at least doubles the
>> > number of reviews we need to process.
>>
>> Is there more work? Yes.
>>
>> Is it always double? No. Is it sometimes double? Yes.
>>
>> Often bumps batch multiple application commits. I think this paragraph
>> overstates the problem somewhat, but what it does get right is
>> identifying that *some* overhead exists.
>>
>> >  For changes that want to make
>> > any change to a few subsystems, as is the case when developing a
>> > feature, they require 2 X <number of project changes> reviews, all of
>> > which need to be synchronized.
>>
>> Same issue as above here.
>>
>> > There is a well documented problem
>> > where we have no official way to synchronize merging of changes to
>> > userspace applications within a bump without manual human
>> > intervention.  This would largely render that problem moot.
>>
>> Right, this can be hard to handle.
>>
>> It can be mitigated by versioning interfaces (which the D-Bus spec
>> calls out[6][7] but OpenBMC fails to do (?)) and supporting multiple
>> interfaces for the transition period.
>>
>> That said, that's also more work, and so needs to be considered in the
>> set of trade-offs.
>>
>> [6] https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-interface
>> [7] https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-bus
>>
>> >
>> > - It would allow most developers to not need to understand Yocto at
>> > all to do their day to day work on existing applications.  No more
>> > "devtool modify", and related SRCREV bumps.  This will help most of
>> > the new developers on the project with a lower mental load, which will
>> > mean people are able to ramp up faster..
>>
>> Okay. So devtool is seen as an issue.
>>
>> Can we improve its visibility and any education around it? Or is it a
>> lost cause? If so, why?
>>
>> Separately, I'm concerned this is an attempt to shield people from
>> skills that help them work with upstream Yocto. OpenBMC feels like it's
>> a bit of an on-ramp for open-source contributions for people who have
>> worked in what was previously quite a proprietary environment. We tried
>> shielding people in the past wrt kernel contributions, and that failed
>> pretty spectacularly. We (at least Joel and I) now encourage people to
>> work with upstream directly *and support them in the process of doing
>> that* rather than trying to mitigate some of the difficulties with
>> working upstream by avoiding them.
>>
>> >
>> > - It would give an opportunity for individuals and companies to "own"
>> > well-supported public forks (ie Redhat) of the codebase, which would
>> > increase participation in the project overall.  This already happens
>> > quite a bit, but in practice, the forks that do it squash history,
>> > making it nearly impossible to get their changes upstreamed from an
>> > outside entity.
>>
>> Not sure this is something we want to encourage, even if it happens in
>> practice.
>>
>> >
>> > - It would centralize the bug databases.  Today, bugs filed against
>> > sub projects tend to not get answered.
>>
>> Do you have some numbers handy?
>>
>> > Having all the bugs in
>> > openbmc/openbmc would help in the future to avoid duplicating bugs
>> > across projects.
>>
>> Has this actually been a problem?
>>
>> >
>> > - Would increase the likelihood that someone contributes a patch,
>> > especially a patch written by someone else.  If contributing a patch
>> > was just a matter of cherry-picking a tree of commits and submitting
>> > it to gerrit, it's a lot more likely that people would do it.
>>
>> It sounds plausible, but again, some evidence for this would be helpful.
>>
>> Why is this easier than submitting the patches to the application repo?
>>
>> > My proposed version of this tree is pushed to a github fork here, and
>> > is based on the tree from a few weeks ago:
>> > https://github.com/edtanous/openbmc
>> >
>> > It implements all the above for the main branch.  This tree is based
>> > on the output of the automated tooling, and in the case where this
>> > proposal is accepted, the tooling would be re-run to capture the state
>> > of the tree at the point where we chose to make this change.
>> >
>> > The tool I wrote to generate this tree is also published, if you're
>> > interested in how this tree was built, and is quite interesting in its
>> > use of git export/import [5], but functionally, I would not expect
>> > that tooling to survive after this transition is made.
>>
>> I think it would be good to capture the script in openbmc-tools if we
>> choose to go ahead with this, mainly as a record of how we achieved it.
>>
>> Andrew
>>
>> >
>> > [1]
>> > https://lore.kernel.org/openbmc/CACWQX821ADQCrekLj_bGAu=1SSLCv5pTee7jaoVo2Zs6havgnA@mail.gmail.com/
>> > [2] https://git-scm.com/docs/git-rebase
>> > [3]
>> > https://github.com/openbmc/docs/blob/master/CONTRIBUTING.md#inclusive-naming
>> > [4]
>> > https://www.yoctoproject.org/docs/1.8/ref-manual/ref-manual.html#ref-classes-externalsrc
>> > [5] https://github.com/edtanous/obmc-repo-combine/blob/main/combine

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-04-06 20:06 ` Patrick Williams
@ 2022-05-23 16:53   ` Ed Tanous
  0 siblings, 0 replies; 21+ messages in thread
From: Ed Tanous @ 2022-05-23 16:53 UTC (permalink / raw)
  To: Patrick Williams; +Cc: Andrew Jeffery, OpenBMC Maillist, Brad Bishop

On Wed, Apr 6, 2022 at 1:06 PM Patrick Williams <patrick@stwcx.xyz> wrote:
>
> On Mon, Apr 04, 2022 at 11:28:06AM -0700, Ed Tanous wrote:
> > The OpenBMC development process as it stands is difficult for people
> > new to the project to understand, which severely limits our ability to
> > onboard new maintainers, developers, and groups which would otherwise
> > contribute major features to upstream, but don't have the technical
> > expertise to do so.
>
> This is, to me, a rather bold and surprising statement.
>
> You're saying there are people out there who can work on an embedded
> system like the BMC at layers of the stack where they might be modifying
> everything from the kernel up to REST APIs to "contribute major
> features" and yet they can't figure out how to manipulate multiple git
> repositories?

Yes, myself included, but the primary friction here is yoctos
management of said repositories, pinning, and revving, not necessarily
just "manipulate multiple git repos".

>
> I get that I've heard "Yocto has a big learning curve" over and over,
> but you're not actually proposing moving away from Yocto and I admin
> that it is quite likely the "Yocto has a big learning curve" is actually a
> conflation of some of these points that you're raising along with Yocto
> itself, but still, this is worded rather boldly.

I'm not saying we go away from yocto, I'm saying that we move it to a
point where, unless you're implementing a totally new daemon, you
don't need to know yocto is the build system, it's just a command you
run to produce a binary of the code you have checked out.

>
> > This initiative, much like others before it[1] is
> > attempting to reduce the toil and OpenBMC-specific processes of
> > passing changes amongst the community, and move things to being more
> > like other projects that have largely solved this problem already.
>
> I'm not sure what is OpenBMC-specific about this process though.
>
> There are some large successful projects that work in a mono-repo and some
> which work in a micro-repo model.  Android and OpenStack are both micro-repo
> models that use Gerrit, just like us, and have similar "how do you put the
> dependencies together" problems, which they have solved: Android with `repo`
> and OpenStack with `zuul`.  `repo` and `Yocto recipes` are pretty similar
> mechanics to me (and I would like to see something more `zuul`-like in our
> CI).

It's quite possible some of those tools could be good solutions to
this instead, especially considering repo integrates well with gerrit.
I'm open to the possibilities, but in the case of repo, it would look
very similar to my proposal (whole tree gets checked out at once).
Compared to combining the repos, I'm not sure how it would be much
different workflow wise from what I propose, aside from having to
learn another tool, which isn't great, but I'm happy to entertain it
as a different class of solution.

>
> Many open source communities work in a micro-repo model; the challenge
> is always about pulling dependencies together.  Newer languages like
> Rust and JS make this easier, but again, there isn't much fundamentally
> different than Yocto recipes out there.
>
> The mechanics of how we maintain our dependencies might be different,
> because we're built on Yocto, but I would argue that this proposal is
> itself _more_ OpenBMC-specific and not less.  I say this because we are
> deviating from how every other Yocto project I know of works and how
> most other "Linux Distribution" projects work (which is what we are).

Examples?  I listed a few examples of firmware projects that work from
a single repository.
I think the difference here is most Linux distributions don't generate
a single system-specific binary at the end, or if they do, it's a
subset of the packages built and has things like installers attached,
and has the niceties of being able to load one image per many classes
of system (ubuntu for example produces amd64 and arm64 installers, and
maybe a few more) .  In the case of firmware, we're generating one
final image to be loaded onto hardware that is hardware specific;
That changes the problem space a bit IMO.

>
> > - The tree would be easily shareable amongst the various people
> > working on OpenBMC, without having to rely on a single-source Gerrit
> > instance.
>
> What do you mean by "rely on a single-source Gerrit instance"?

http://gerrit.openbmc-project.xyz/

The main point is that if I wanted to push a version of my tree for
others to see (like I did for this proposal) it's pretty trivial to do
using tools in the project, and encourages the community to experiment
and improve on it.  In the current model, if I wanted to push a
version of my tree, I would have to fork every application I changed
separately.

>
> > Git is designed to be distributed, but if our recipe files
> > point at other repositories, it largely defeats a lot of this
> > capability.  Today, if you want to share a tree that has a change in
> > it, you have to fork the main tree, then fork every single subproject
> > you've made modifications to, then update the main tree to point to
> > your forks.  This gets very onerous over time, especially for simple
> > commits.  Having maintained several different companies forks
> > personally, and spoken to many others having problems with the same,
> > adding major features are difficult to test and rebase because of
> > this.  Moving the code to a single tree makes a lot of the toil of
> > tagging and modifying local trees a lot more manageable, as a series
> > of well-documented git commands (generally git rebase[2]).  It also
> > increases the likelihood that someone pulls down the fork to test it
> > if it's highly likely that they can apply it to their own tree in a
> > single command.
>
> I'm almost certain that nobody would do a git-rebase of their fork even
> if everything were in one tree.

My team certainly would rebase.  There are prominent examples of forks
that already do git rebase (Intel-BMC) that rebase and just live with
the need for creating side repositories for every new feature, or rely
on patch files.

>
> Why do we want to improve outside testing of forks?  I don't understand
> why that is an advantage.

Because it's a way to get consensus, and to keep the master more
stable, and make sure that features are used and useful before they
merge.  We would ideally do this within the master tree as well, but
that I don't think requires this proposal.

>
> Most of what you're describing here is making it easier for people to
> live without getting their code upstreamed, which doesn't seem like a
> good thing to the project as a whole, and seems to actively work against
> the title of "to make upstreaming easier".

If rebases are easy, and patches that others wrote can be directly
applied and sent to upstream reviews, it makes upstreaming and
community testing easier.  I'm reading your above as "what we have
today is fine".  FWIW, our build system does this every day, by
pulling pre-built distros from ubuntu instead of pulling directly from
torvalds/linux and building ourselves.  This reduces friction on test
teams.

>
> If you're maintaining a few fixes in a backports tree, maintain them as
> Yocto-supported patch files.  If you're maintaining whole features
> elsewhere, I have no interest in making this easier for you (if it is
> worse in other ways for the project).

If we don't make this easy, it makes it a lot less likely that these
features can be contributed back to the mainline by ANYONE, not just
the individual making the feature.  This is bad for everyone, and
leads to a belief that companies can't rely on OpenBMC upstream
directly, that "upstreaming is hard", and that openbmc/openbmc is some
ivory tower that we put big walls around.  We should be reducing the
friction on integrating with upstream (including rebasing) to improve
the openbmc security, and to expand our community.

>
> > - There would be a reduction in reviews.  Today, anytime a person
> > wants to make a change that would involve any part of the tree,
> > there's at least 2 code reviews, one for the commit, and one for the
> > recipe bump.  Compared to a single tree, this at least doubles the
> > number of reviews we need to process.
>
> The typical bump commit is trivial to review and nobody does it except
> Andrew G or me anyhow.

No disagreement, but it's yet another thing that individuals have to
wait for after their patchsets merge, and is a manual, non-distributed
(ie two people do it manually) human process.  As a rule, we should
reduce the number of "human-speed" processes we have for things that a
CI system can do.

>
> You could argue that there is actually some velocity advantage of decoupling
> the feature support in the code repo while figuring out the minor
> implications at a recipe level separately.  Having to have every code
> repo commit pass on hardware before it can get +1 Verified is going to
> greatly increase our CI requirements, plus you're going to get lots more
> "I don't understand why this failed" because of a flaky Yocto-wget-fetch
> or Romulus QEMU failure or ...
>
> We also would need to implement everything we have in code-repo level CI
> in the recipe-level CI.  I've spoken to this later as well, but I think
> while decreasing the number of reviews it will also decrease the
> velocity of the reviews even more.  I don't think that the bump commits
> are what are decreasing our velocity here.


This hasn't been my experience, and there are many cases of
individuals forgetting to update the recipes, or the recipe updates
not going in at the same time.

I never asked for all commits to get hardware +1 before they merge
(although if we had resources for it, it would help considerably in
cutting down manual "git bisect" time).  Hardware CI would still run
to the limits of its capability, but would now have the OPTION to run
on every commit if it could.

From my perspective, bump commits increase the lag time between "this
got merged to master" and reports of failures coming in, because if
the bump takes 2 days to merge, that's 2 days where the community
at-large isn't testing, and things get stale.  There's an argument to
be made for "we should wait for hardware CI before things merge to
master", but I think that can be solved in either case.

>
> > For changes that want to make
> > any change to a few subsystems, as is the case when developing a
> > feature, they require 2 X <number of project changes> reviews, all of
> > which need to be synchronized.
>
> I think there is a fundamental problem of how we develop features here.
> "All of which need to be synchronized" implies that every new feature we
> develop has hard co-reqs between repositories.  If that is the case,
> something is broken in our architecture.  We shouldn't patch around it
> by manipulating the code layout.

Open to ideas on how to change the architecture, but as you know, to
add a new capability to OpenBMC, you need at minimum:
1. A design doc in the docs repo
2. A change to phosphor-dbus-interfaces to add the new interface.
3. A change to add the producer of said new feature. (possibly a new
repo at the same time).
4. 1 or 2 changes to the consuming daemons (bmcweb/ipmi) to use said interfaces.

If that's a "broken" architecture, that's fine, although it does seem
to function.  From where we are today, how would we get the above down
to a single patch series?

>
> > There is a well documented problem
> > where we have no official way to synchronize merging of changes to
> > userspace applications within a bump without manual human
> > intervention.  This would largely render that problem moot.
>
> This is a solved problem in projects with a similar layout to ours, such
> as OpenStack.  I've previously offered to add topic-based testing[1] to our
> CI framework and, maybe I misread the attitudes, but it didn't seem like
> it was interesting to the project.
>
> I _did_ add some amount of dependency-based testing to the repositories
> that are included in our base Docker image already.  If you change one
> of those repositories, they get included in the Docker image used for
> testing your code as well.  This has identified problems in the past
> because of, say, a change in sdbusplus that broke phosphor-logging's
> compile.

Again, if it's high friction and requires openbmc-specific knowledge
to execute, it doesn't solve the problem, but agreed, that would be a
way to solve this one specific friction point.  FWIW, having
everything in a single repo would mean that change would get caught at
review time, not after it was merged to master, which overall I think
is better.

>
> > - It would allow most developers to not need to understand Yocto at
> > all to do their day to day work on existing applications.  No more
> > "devtool modify", and related SRCREV bumps.  This will help most of
> > the new developers on the project with a lower mental load, which will
> > mean people are able to ramp up faster..
>
> I think there is something more fundamental going on here which I'll
> speak about later in "## End-to-end features"
>
> > - It would give an opportunity for individuals and companies to "own"
> > well-supported public forks (ie Redhat) of the codebase, which would
> > increase participation in the project overall.  This already happens
> > quite a bit, but in practice, the forks that do it squash history,
> > making it nearly impossible to get their changes upstreamed from an
> > outside entity.
>
> I don't see how this improves the "forks that squash history" situation.
> Rebases are hard to pull off with how most companies handle their
> internal processes, so the best case here is merge commits.

Sure, merge commits or rebases are roughly equivalent in this context.
The point is both are now available to users.

>
> > - It would centralize the bug databases.  Today, bugs filed against
> > sub projects tend to not get answered.  Having all the bugs in
> > openbmc/openbmc would help in the future to avoid duplicating bugs
> > across projects.
>
> We could do this without changing any code.  Just turn off the issue
> tracking on all our code repos[2].

Agreed.

>
> > - Would increase the likelihood that someone contributes a patch,
> > especially a patch written by someone else.  If contributing a patch
> > was just a matter of cherry-picking a tree of commits and submitting
> > it to gerrit, it's a lot more likely that people would do it.
>
> I'm not following why this can't be done today.  Aren't you just
> cherry-picking commits at a different level?

Which commits do you cherry-pick?  if the feature requires recipe
changes, a tweak to a related daemon, a per-system configuration file,
and a user-facing daemon (ipmi/bmcweb) change, it's not a simple
cherry-pick to make the feature work.

>
> I think there is likely legal apprehension about trying to take code
> from a fork that isn't your own and trying to contribute it upstream.
> With the current CLA structure, the SOB in "fork/repo" doesn't have the
> same meaning legally as our SOB+CLA, so I cannot comfortably add my own
> SOB on code I picked up from "fork/repo".  No change to the repository
> layout will fix this.

Sure, if there are no sign-offs in the source repo, that's a problem.
Many of the patches I see in forks have signoffs, and I message
heavily that if you're pushing public forks, they should still contain
sign offs for exactly this reason.

>
> > - Greatly increases the ease with which stats are collected.
> > Questions like: How many patches were submitted last year?  How many
> > lines of code changed between commit A and commit B?  Where was this
> > regression injected (ie git bisect)?  How much of our codebase is C++?
> > How many users of the dbus Sensor.Value interface are there?  Are all
> > easily answered in one liner git commands once this change is done.
>
> I'm not saying some of these aren't positives, but most of them can
> already be answered today with existing tools.  Either github or grok
> search or gerrit queries can answer almost all of these except "how many
> lines of code changed between A and B", if you're expecting to count all
> the code in subordinate repos, but I don't see that as a particularly
> interesting question.

Maybe I'm missing something?  Can you give some examples of how, say,
I would bisect a regression down to a single commit across repos using
existing tools?  I can bisect a regression down to an autobump, which
helps some, but so far as I'm aware that doesn't work cross repo.
Some of those stats aren't interesting to you, agreed, but they are
interesting to others, and they are stats that large codebases like
ours tend to track.

>
> > - New features no longer require single-point-of-contact core
> > maintainer processes (ie, creating a repo for changes, setting up
> > maintainer groups, ect) and can just be submitted as a series of
> > patches to openbmc/openbmc.
>
> If the single-point-of-contact is the issue, let's solve that.  I don't
> think it is though.  I think the bulk of the issue with new repos is
> disagreement on if something belongs in a new repo.  Submitting as a
> series makes _that_ situation worse because it doesn't force the
> discussion upfront and instead someone is upset they spent a bunch of
> time working on a new daemon that is rejected.

Completely disagree on this point.  Many times pushing the code is the
example of "here's why this can't go in an existing repo" that is
really hard to convey in text;  It also makes it clear that the
problem is solved.  In addition, if you have the code, and it
duplicates a lot of functions, it's really nice to be able to point
to: Here's code that basically does the same thing, can this be merged
with it?  rather than just making the blanket "don't duplicate
functions" statement.

>
> > - Tree-wide changes (c++ standard, yocto updates, formatting, ect) are
> > much easier to accomplish in a small number of patches, or a series of
> > patches that is easy to pull and test as a unit.
>
> Patches that are also much more difficult to revert if they break one
> particular area...

No disagreement there, but I think having the option is important, and
we should use good judgement when making these kinds of changes.

>
> Why would we want these pulled together and tested as a unit anyhow?  If
> I update the formatting or the C++ standard used of repo A, that doesn't
> affect repo B.

Because it's much easier to test one commit than test 100 individual
commits, basically the same as above where you were pointing out
limitations in hardware CI.

>
> I've been involved in almost every difficult Yocto subtree update and the only
> case I can think where we couldn't apply the changes to the older Yocto version
> was the OpenSSL3 changes, which we had to #define check around based on an
> OpenSSL version they export.  Even if all the code were in one repo
> would we have wanted to cram all that into a single "Yocto update plus
> fix all the code" commit?  I suspect not.  Doing the #define was
> appropriate no matter how the code was laid out.

No disagreements, but having the flexibility to say, resolve the same
openssl problem across the tree in one commit would be pretty useful.

>
> > - Inclusive guidelines: To make progress toward an unrelated but
> > important goal at the same time, I'm recommending that the
> > openbmc/master branch will be left as-is, and the newly-created sha1
> > will be pushed to the branch openbmc/openbmc:main, to retain peoples
> > links to previous commits on master, and retain the exact project
> > history while at the same time moving the project to having more
> > inclusive naming, as has been documented previously[3].  At some point
> > in the future the master branch could be renamed and deprecated, but
> > this is considered out of scope for this specific change.
>
> This is a separate topic and should be tackled separately.  I guess it
> is simpler if you're pushing all the code into one repo to only deal
> with it there.  If this is something we want to emphasize now, I think
> the Yocto bits are in place that we could just do it relatively quickly.
> The only painful part would be all the existing commits in Gerrit that
> are unmerged and targetting `refs/for/master` but we'd have to tackle
> that with this proposed move as well.

Hence the statement "unrelated but important goal".  Agreed, this
could be tackled elsewhere and in other ways;  If we're rebuilding the
tree, this just seemed like an opportune time to do it.

>
> > - Each individual sub-project will be given a folder within
> > openbmc/openbmc based on their current repository name.  While there
> > is an opportunity to reorganize in more specific ways (ie, put all
> > ipmi-oem handler repos in a folder) this proposal intentionally
> > doesn't, under the proposition that once this change is made, any sort
> > of folder rearranging will be much easier to accomplish, and to keep
> > the scope limited.
>
> At a minimum I'd like these all put into a subdirectory off the root.
> It is bad enough with how many meta-layers we have, but we shouldn't
> then add a hundred top-level subdirectories for the code.

Seems reasonable, and is in line with what other projects have done
when they've done this.

>
> > - Yocto recipes will be changed to point to their path equivalent, and
> > inherit externalsrc bbclass[4].  This workflow is exactly the workflow
> > devtool uses to point to local repositories during a "devtool modify",
> > so it's unlikely we will have incremental build-consistency issues
> > with this approach, as was a concern in the past.
>
> Are you sure this works?  I thought externalsrc required the code to be
> in an absolute directory and not a relative one?
>
>     if externalsrc and not externalsrc.startswith("/"):
>         bb.error("EXTERNALSRC must be an absolute path")
>     if externalsrcbuild and not externalsrcbuild.startswith("/"):
>         bb.error("EXTERNALSRC_BUILD must be an absolute path")

I wasn't aware of that limitation, but the code I have does build.  If
you look at the change made in the repository, it does end up building
a full path using yocto TOPDIR (I've since figured out there's a
better variable to use that wouldn't require the ..)

https://github.com/edtanous/openbmc/blob/f84413b320249dbc3c97c2eb7abab4a3bfb3f147/meta-phosphor/recipes-phosphor/interfaces/rest-dbus_git.bb#L30

>
> The original facebook/openbmc codebase kept all of the code in the repo
> and we simply appended the directories to the SRC_URI.  It is somewhat
> of a pain to maintain the SRC_URI lists, so maybe externalsrc is better
> in that regard.  We also ran into issues with getting lots of
> pseudo-abort issues, as if Yocto didn't really support source-in-tree in
> the latest code.  In order to avoid the pseudo-abort issues I had to do
> this rather ugly hack in our code[3].  I don't know how we sanity test
> your proposal to ensure it doesn't have this issue.
>
> > - Subprojects that are intended to be reused outside of OpenBMC (ex
> > sdbusplus) will retain their previous commit, history, and trees, such
> > that they are usable outside the project.  This is intended to make
> > sure that the code that should be reusable by others remains so.
>
> How do we identify all of these?  (I spoke about this in another part of
> the chain, so we don't need to expand on it here.)

ACK.

>
> > - The above intentionally makes no changes to our subtree update
> > process, which would remain the same process as is currently.  The
> > openbmc-specific autobump job in Jenkins would be disabled considering
> > it's no longer required in this approach.
>
> Wouldn't it still be handy for the above-mentioned repos?

Yes, fair point, autobump would still be used for the projects that
are meant to be used outside of openbmc.

>
> > Let me know what you think.
>
> In general, I'm not a fan of mono-repo style.  Both the mono-repo and
> micro-repo style have issues.  I think we need to have an adequate
> discussion of what the issues are that would be _introduced_ by moving
> to a mono-repo in our case as well.  I'm not currently convinced that
> this proposal is optimizing in the way that is most beneficial to the
> project.
>
> ## Reviews
>
> You've mentioned that this will make reviews easier and I think the
> opposite is far more likely to be true.  OWNERS + Gerrit is not
> sufficient in a mono-repo (and Github doesn't solve this either).
>
> The biggest complaint, as I've heard it, has been the review cycle
> velocity.  I myself have ran into sub-repos where it takes weeks to get
> a trivial change in because the maintainers just don't stay on top of
> it.  This proposal will make the problem exponentially worse.

I'm not following how this is related;  Yes, review velocity is a
concern, but I'm not following how mono repo changes that?

>
> Our most proficient and active reviewers tend to be OWNERS higher up in
> the "merge chain".  In 2022H2, I think the top 5 reviewers handled more
> reviews than everyone else put together.  The only way I can stay somewhat on
> top of what needs my attention is by having all my Gerrit notifications
> going into a folder and deleting them as I've taken action.  With the
> current OWNERS I'm already having to delete almost half of these just by
> skimming the title (ex. "meta-not-my-company: ..." commits).  If you
> start throwing every single commit in Gerrit at me because I matched in
> OWNERS, I'll have absolutely no idea how to know what needs my attention and
> what doesn't.  How are you planning on handling this as you are a
> top-level OWNER as well?

I don't have a solution pre-baked, but my plan was some combination of
gerrit and gmail filtering;  I've done this workflow in the past for
other private large gerrit codebases.  FWIW, I don't rely on email as
much as you do, so agreed, we for sure need a solution to this that
works for the "email based maintainers" workflow.


>
> ## Out-of-Yocto builds
>
> The biggest impediment to my development tends to be the repositories
> that are not using Meson and haven't properly Supported meson
> subprojects, because it makes it almost impossible to make changes to
> those without invoking Yocto which is a much slower process.  I've
> written about this in the past[5].  If all the code is in a single
> repository, I fear there will be almost no effort put into supporting
> out-of-Yocto builds because at that point, why support them?  I wrote
> about the time involved in that post but you're taking activities that
> take seconds with Meson subprojects and turning them into 5 minutes.
>
> If we are doing something that slows down the most active developers, we
> need to make sure that the increase in contributions from other
> developers is going to more than offset it.  Based on the kind of people
> affected by the problems you're describing, I'm really not convinced it
> will.

I'm not following;  Could the meson not point to within the tree
instead of pointing at individual git repositories?  Then, from a
given tree, everything would "just build" in isolation still?  I'm
also not quite following where the 5 minutes comes from;  In general
if I want to build a given project in yocto, it's maybe an extra 30
seconds (parsing recipes) on top of doing a meson build?

Agreed, this is something we should apply effort to if we go the mono
repo approach, and I'm happy to look into solutions, even if I don't
have one fully baked just yet.  This is why I started this discussion,
so we can tease out the sharp edges.

FWIW, if both paths result in the same number of total contributions,
I'd rather it be more distributed among the community, even if it
means you and I contribute less code overall.  For the things I
maintain (user-facing APIs specifically) it would likely increase my
ability to contribute, by removing a lot of the toil (bisecting
breakages, asking "which commits are dependent on this one", ect).  I
realize that maintaining sdbusplus and maintaining bmcweb are two
different ends of the userspace stack, so I for sure could see how we
have disagreements here.

>
> ## End-to-End Features
>
> You used the concept of an "end-to-end feature" a few times in this
> proposal but talking about different things.  I'm going to specifically
> talk about a feature that requires changes across multiple existing
> repositories.
>
> A bold statement: nobody here actually implements end-to-end features.
>
> I've never witnessed a single person implement a new feature in the
> kernel, add userspace support to interact with that kernel work, add
> the Redfish APIs to interact with the userspace support they added,
> implement the WebUI to control the Redfish APIs, and then wrote system
> integration tests in phosphor-test-automation.  We draw somewhat
> arbitrary boundaries already and call something "end-to-end" because it
> happens to reside within those boundaries.  For a typical feature, for
> many developers, that boundary seems to be "Redfish + some other
> userspace app(s)".  Admittedly, this proposal does help (a little) with
> _that_ particular arbitrary boundary, but it doesn't help with anything
> outside of it.

In this case, it never has to be a single person;  If you exclude the
kernel from the above (as it's only required if you're directly
accessing hardware) I've definitely seen groups of individuals
implement a feature end to end, and there are lots of examples of
features that started in forks, developed end to end, and then merged
into the project.  Agreed, we need a better definition than "end to
end", as it's arbitrary where the ends are, but it sounds like you
agree that for a class of feature this would improve things.

>
> There is a bigger problem that this is exposing though, in my mind at
> least: many developers can't or don't work at the component level.
>
> Let's consider a "simple" change that requires:
>     - Adding a new DBus API.
>     - Adding support in App-A for said DBus API.
>     - Adding Redfish support for said DBus API in bmcweb.
>
> You're suggesting that this is hard to accomplish today, end-to-end, and maybe
> it is.  But, the fact that anyone is attempting to solve it end-to-end
> means that the resulting product is pretty terrible from a future
> development and maintenance stand-point:
>
>    * Repositories aren't using meson subproject wrap files which makes
>      developing against a change to phosphor-dbus-interfaces trivially
>      easy.

I'm not following this statement;  bmcweb uses subproject wraps, as do
a lot of the examples of applications changing that would need it.

>
>    * Developers are not developing the changes to App-A with unit-tests.

This is a totally fair statement, but for better or worse, we don't
have an official stance on unit tests, nor unit test coverage.  FWIW,
In recent history I've been asking the team members that I lead to
submit unit tests (because I believe it's in our self interest), but
this is far from universal within the project, and probably deserves a
discussion on it's own.  unit testing something with few dependencies
might be reasonable, but I don't think we have common patterns for
dependency injection and all the things that are required if we're
actually going to start requiring reasonable unit test coverage for
features.

>
>    * Developers are not confirming that their changes to App-A are sound
>      at a dbus-level, so the code is tightly coupled with however
>      they've changed bmcweb to interact with it and often crumbles when
>      another application interacts with it (ex. PLDM).

This has not been my experience.  Most developers I've seen follow the
pattern of "implement the dbus API, check the dbus api, then implement
the bmcweb changes".

>
>    * The changes to bmcweb have no unit testing and/or mocking of the DBus APIs.

I would fully support a model where sdbusplus were mockable, and
(spoiler alert) there are efforts in the works on our side to improve
this in the project which are in review;  Agreed, unit testing is a
huge problem for the project, but I really think is a distraction.
Unit tests will never cover 100% of all use cases, nor be able to be
representative of a real system to the point where, if the unit tests
pass, you can reasonably assume that the system as a whole functions.
Is there an example of another project where that unit testing model
solves this?

>
>    * There is not a single integration test added to
>      phosphor-test-automation to make sure nobody breaks this feature in
>      the future.

Agreed.

>
> Combining all the code together completely throws away the necessity to treat
> the software as components, which certainly doesn't improve all of the
> above.

I don't agree with that;  There's no reason we can't keep a
"component" architecture with everything in one repository.  Linux
maintains say, the driver interfaces in a single repository without
having to break the abstraction.

>
> Your proposal seems to be optimizing for the "I need to hack at code
> across a few repos and throw it all together into a BMC binary image so
> I can test drive it" case,

Despite disagreeing with the dramatization of the words you wrote,
yes, I want to encourage experimentation and testing in a way that
people can try stuff, fail, succeed, and in the cases they succeed, we
can reasonably get the successful things onto master.

> but I would suggest this case should rarely even
> be done by most of our developers.  The fact that this is even a "regular
> development case" to begin with is a problem.

To quote something Joel said (in a completely different context)
"There's very marginal benefits to creating a monoculture."  Everyone
develops differently, and we should support as many of peoples
workflows as we can.

>
> Everyone should be able to add a new DBus API and at least 95% of the support in
> App-A without ever touching a BMC.  You then have the other 5% that
> maybe needs some confirmation on hardware (you mocked out how you
> _think_ the hardware behaves already, right?).

Can you give an example or two of a feature that exists on master that
does that?  I agree, that sounds nice, but I feel like the reality
doesn't match with that.

>  Once that is done you
> should be able to develop the whole bmcweb feature without touching
> hardware as that is _completely_ software-based (mock out all the dbus
> interfaces for your new feature please, and please add a test case).  If
> all these pieces are working independently, you can, only then, spend a
> little time throwing it all into a single image with something like [6]
> and test-driving it, but better yet is if you also add code to
> phosphor-test-automation.

This seems like a great model, but I suspect the realities of
development would be that enforcing the above as a dev process would
cause more friction, and cause more forks, not fewer.

>
> If there are pieces of what I just wrote above that are difficult today,
> let's fix them, but combining all the code into a mono-repo is, to me,
> just a band-aid over these problems.
>
> ## Tightly coupled software
>
> I've worked on a large BMC-like product that had a mono-repo.

Me too! :)

>  The
> result was a tightly-coupled mess of code that was impossible to work
> in, so there ended up being arbitrary "component boundaries" put in
> place and nobody worked outside their "component boundary" silo.

This was in no way my experience, although it did require work from
the owners to "hold the line" on the component boundaries.  The
benefits of being able to interact with multiple portions of the tree
in a patch series far outweighed the effort that was required to hold
the component boundaries.

>  We
> already have a bit of this "silo" mentality here but at least we don't
> have the tightly-coupled aspect of it.  We have an architecture that
> allows the pieces to, mostly, move independent from each other.  How
> would we ensure that a mono-repo doesn't devolve into tightly-coupled
> code because the frictions that stop it are removed?

Good maintainership and documented intent.

>
> I think a very likely outcome of this proposal is we end up with more
> "utility" libraries that are going to increase the coupling and
> library-dependency structure, which is worse for performance.

I guess I hadn't considered that, but as it stands, utility libraries
are already being created today, is it that you think they will happen
at a greater rate?  I think shared code between implementations is
overall a good thing in a lot of cases.  Can you expand on "worse for
performance".  I'm not following.

>
> ## Reverting and large reviews
>
> I already see quite often large commits even within a single repository,
> which are difficult to review and difficult to revert.  It is currently
> on the maintainers to push back on these and request smaller commits.
> Having the code in a mono-repo seems to encourage cross-package changes
> (and in fact was listed as a selling point here), which means more
> likely that a bug introduced by one small piece of the change needs to
> have the whole change reverted.

Yes.  IMO, this is a positive, especially when the changes require
breaking incompatible changes to multiple repositories.  Ideally we'd
still separate our patches into a series, such that individual patches
could still be reverted, but that doesn't change in the mono repo
model.  "With great power comes great responsibility"  This doesn't
mean that we should avoid the power for the things that it can
improve, it just means we need to wield it responsibly.

>
> This proposal is likely to increase the average size of a commit since
> it is more likely to include cross-package changes.

Only if we (the maintainers) let it;  For what it's worth, I assumed I
personally would maintain the same stance on "split up the change into
multiple patchsets" with the added advantage that patch series could
be chained together outside of a single application, which helps
considerably in getting patches understood faster, and when reverts
need to occur cross-application, making them more obvious to
accomplish.

>  That means we also
> need more people to give feedback on an individual change and as a
> reviewer I have to sift through all the pieces that aren't even relevant
> to me.  Both of these slow the review process down even more, not increase it.

This is a fair point, but I'm not following;  Is it difficult to look
at the changelist of files changed for the subfolders you maintain?
For my workflow, I imagine it would help, because it's fewer clicks to
see a whole set of changes as a whole "ie, change made to Application
A to do X, change made to bmcweb to pick up X", even if they're
separate, but in a series.

>
> ## "To make upstreaming easier"
>
> You started with the topic of making upstreaming easier and I'm really
> not convinced that our repo structure is the major impediment to
> upstreaming.

Clearly we disagree, so it'd be great to get other community input on
this from some of the maintainers of the forks that exist.

>  Most of the advantages you talked to seemed to be around
> _development_ and not _upstreaming_.  Which is it we are solving?

Sure, we can nitpick on the title, but development velocity and
upstreaming velocity aren't unrelated, and debating the difference in
this context IMO isn't helpful.  The goal is to improve the project.
If I were able to change the title of this thread, I'd be happy to
change it to "To make development easier".

>  Do we
> really have data that indicates upstreaming is hard because we have
> multiple repositories?  I don't think I've ever heard this.  I have
> heard that development is hard[er].

I have heard this from multiple people, companies, fork owners, and
individual developers.  Without naming names, if you're one of these
people, and you're reading this, it would help a lot if you could add
your experience to this thread.

>
> ---
>
> Some of these issues I've raised could certainly be solved by stronger
> worded "guidelines" than we currently have in place (and somehow
> ensuring they are followed).  I am worried about the overall code-smell that
> will come from a monorepo (based on my past experiences with them).
>
> The biggest concern I have is with the [negative] impact to code reviews and
> I don't really have any way to solve them except for completely changing our
> code review process at the same time.  Something like the Linux maintainer
> owned subtree model maybe where a maintainer owns a fork for their part of
> the tree and we bless them at a higher level; it doesn't sound appealing.
> Maybe there is some better Gerrit tooling than what we're currently
> getting from OWNERS or maybe some other review tool.

Thank you for taking your time to make a very detailed response.
Regardless of the outcome of this, I do appreciate it.





>
> [1]: https://lore.kernel.org/openbmc/20191119003509.GA80304@patrickw3-mbp.dhcp.thefacebook.com/
> [2]: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/enabling-features-for-your-repository/disabling-issues
> [3]: https://github.com/facebook/openbmc/commit/e418bfbcf382185fdffb768a012e762a4600ae63#diff-b0fea89439c1004ad5229cb7058b4740ee1c542b32cfb0d2165788f9020b5da0R1
> [4]: https://github.com/williamspatrick/openbmc-tof-election-data/blob/995f0d73184db7c25446284261cc023af611e7c4/2021H2/data/report.json#L1
> [5]: https://www.stwcx.xyz/blog/2021/04/18/meson-subprojects.html
> [6]: https://github.com/williamspatrick/dotfiles/commit/df180ac2b74f2b7fcb6ae91302f0211bc49cb2e9
> --
> Patrick Williams

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-05-23 16:37     ` Ed Tanous
@ 2022-05-23 21:07       ` John Broadbent
  2022-05-23 23:48         ` Brad Bishop
  0 siblings, 1 reply; 21+ messages in thread
From: John Broadbent @ 2022-05-23 21:07 UTC (permalink / raw)
  To: Ed Tanous; +Cc: OpenBMC Maillist, Brad Bishop, Cody Smith

[-- Attachment #1: Type: text/plain, Size: 14136 bytes --]

Main thoughts:

We have several patches that we apply to the project. When we update those
patches we see a diff of the patches, and it can be difficult to review a
diff of a diff.
I believe this new repo system would allow us to apply those patches to a
source tree, and manage/maintain the patches better.
>  "I have no interest in making this easier for you (if it is worse in
other ways for the project)."   - referring to downstream only features.

This is the wrong way to view features the community does not want, and
features we would not be allowed to share. There is a layer of complexity
that we use to integrate with our data centers services that only we need.
A better model would allow openbmc to be flexible enough to enable
downstream features.

Other thoughts:

   - I suppose it would make it easy for others to fork the project, but I
   don't think that is a strong enough reason to prevent consolidation.
   - The consolidation would make it easier to bring new people up to
   speed. (the system we have works fine, but I suspect the consolidation will
   be a improvement)
   - We are not changing OWNERS in the change.
   - Applications vs distribution: I have always viewed openbmc as a
   collection of application services/applications, combined with a distros.



On Mon, May 23, 2022 at 9:38 AM Ed Tanous <edtanous@google.com> wrote:

> On Thu, May 19, 2022 at 2:12 PM Cody Smith <scody@google.com> wrote:
> >
> > I don't seem to have the original message, so this may get added to
> Andrew's branch of this thread. Sorry about that in advance.
>
> The original message got caught in a lot of peoples spam filters, I'm
> hoping that explains some of the lack of reply to the initial
> proposal.
>
> >
> > In general I support moving to a monorepo. We at Google do this, and my
> significant other at Airbnb also utilizes a monorepo. The advantages are
> significant, as the world gets a lot less silo'd and making changes that
> would have spanned across multiple repos now only span the monorepo. This
> is particularly useful when feature X requires changes to repo A, B and C,
> and the changes on their own break things but shipped together are just
> fine. I don't even really know how such a feature gets shipped today to be
> honest.
>
> I agree with your general sentiment, although a couple nitpicks, what
> I propose above isn't pure "monorepo" and more analogous to
> "consolidate a lot of the repos".  FWIW, although I really think it's
> the right thing to do, "other companies do it for other things" isn't
> the best of arguments we can make for this.  There are plenty of
> counter examples of companies with much more entrenched command chains
> that use multiple repos and the creation of repos as a form of project
> management to great effect.
>
> >
> > The other thing that tends to happen with monorepos is a lot more
> conformity, as reviews are carried out by a larger set of people.
>
> +1.  Applying consistent clang-format to the codebase for example
> would be a lot more trivial.
>
> > Suddenly `bmcweb` is being reviewed by people who may not have
> previously cared about or touched that part of the codebase as a bad
> example. At a minimum more people will have eyes on the changes happening.
> >
> > I also think that a monorepo avoids one maintainer "lording" over a
> repo. It happens, the +2ers kind of play a role of the bridge troll, when
> repo X only has 1-2 +2ers, this can be a real problem. A monorepo with 10+
> +2ers will force the +2ers to engage in debate when they disagree with each
> other, instead of lording over their own kingdoms and having no influence
> over other kingdoms so to speak.
>
> In what I propose, I don't really think this changes given that the
> existing OWNERS files would still be largely the same, although I
> agree, more +2er debate would be a good thing if it was the result.
>
> >
> > I haven't made a great set of arguments here but in general I feel like
> a chance like this would help from an organizational perspective and maybe
> with that better org. in place maybe we can begin addressing some of the
> other issues we need to address.
>
> Thanks for your input.
>
> PS, plaintext is generally prefered on this ML, given that it diffs
> better in the tools.  (Click triple dot in the lower right of gmail,
> then check "plain text mode").
>
> >
> > Cody Smith
> > System Software Engineer
> > Google Cloud Platform Core Services Team
> > scody@google.com
> > 720-515-6105 <(720)%20515-6105>
> >
> >
> >
> > On Tue, Apr 5, 2022 at 7:22 PM Andrew Jeffery <andrew@aj.id.au> wrote:
> >>
> >> Hi Ed,
> >>
> >> I think what's below largely points to a bit of an identity crisis for
> >> the project, on a couple of fronts. Fundamentally OpenBMC is a distro
> >> (or as Yocto likes to point out, a meta-distro), and we can:
> >>
> >> 1. Identify as a traditional OSS distro: An integration of otherwise
> >>    independent applications
> >>
> >> 2. Identify as an appliance distro: The distro and the
> >>    applications are a monolith
> >>
> >> You're proposing 2, while I think there exists some tension towards 1.
> >>
> >> With the amount of custom userspace we've always kinda sat in-between.
> >> I'd like to see libraries and applications that have use cases outside
> >> of OpenBMC be accessible to people with those external use cases,
> >> without being burdened by understanding the rest of the OpenBMC context.
> >> I have a concern that by integrating things in the way you're proposing
> >> it will lead to more inertia there (e.g. for implementations of
> >> standards MCTP or PLDM (libmctp and libpldm)).
> >>
> >> On Tue, 5 Apr 2022, at 03:58, Ed Tanous wrote:
> >> > The OpenBMC development process as it stands is difficult for people
> >> > new to the project to understand, which severely limits our ability to
> >> > onboard new maintainers, developers, and groups which would otherwise
> >> > contribute major features to upstream, but don't have the technical
> >> > expertise to do so.  This initiative, much like others before it[1] is
> >> > attempting to reduce the toil and OpenBMC-specific processes of
> >> > passing changes amongst the community, and move things to being more
> >> > like other projects that have largely solved this problem already.
> >>
> >> Can you be more specific about which projects here? Do you have links
> >> to examples?
> >>
> >> >
> >> > To that end, I'd like to propose a change to the way we structure our
> >> > repositories within the project: specifically, putting (almost) all of
> >> > the Linux Foundation OpenBMC owned code into a single repo that we can
> >> > version as a single entity, rather than spreading out amongst many
> >> > repos.  In practice, this would have some significant advantages:
> >> >
> >> > - The tree would be easily shareable amongst the various people
> >> > working on OpenBMC, without having to rely on a single-source Gerrit
> >> > instance.  Git is designed to be distributed, but if our recipe files
> >> > point at other repositories, it largely defeats a lot of this
> >> > capability.  Today, if you want to share a tree that has a change in
> >> > it, you have to fork the main tree, then fork every single subproject
> >> > you've made modifications to, then update the main tree to point to
> >> > your forks.
> >>
> >> This isn't true, as you can add patches in the OpenBMC tree.
> >>
> >> CI prevents these from being submitted, as it should, but there's
> nothing to
> >> stop anyone using the `devtool modify ...` / `devtool finish ...` and
> >> committing the result as a workflow to exchange state (I do this)?
> >>
> >> Is the issue instead with devtool? Is it bad? Is the learning curve too
> steep?
> >> It is at least the Yocto workflow.
> >>
> >> > This gets very onerous over time, especially for simple
> >> > commits.  Having maintained several different companies forks
> >> > personally, and spoken to many others having problems with the same,
> >> > adding major features are difficult to test and rebase because of
> >> > this.  Moving the code to a single tree makes a lot of the toil of
> >> > tagging and modifying local trees a lot more manageable, as a series
> >> > of well-documented git commands (generally git rebase[2]).  It also
> >> > increases the likelihood that someone pulls down the fork to test it
> >> > if it's highly likely that they can apply it to their own tree in a
> >> > single command.
> >>
> >> Again, this is moot if the patches are applied in-tree.
> >>
> >> >
> >> > - There would be a reduction in reviews.  Today, anytime a person
> >> > wants to make a change that would involve any part of the tree,
> >> > there's at least 2 code reviews, one for the commit, and one for the
> >> > recipe bump.  Compared to a single tree, this at least doubles the
> >> > number of reviews we need to process.
> >>
> >> Is there more work? Yes.
> >>
> >> Is it always double? No. Is it sometimes double? Yes.
> >>
> >> Often bumps batch multiple application commits. I think this paragraph
> >> overstates the problem somewhat, but what it does get right is
> >> identifying that *some* overhead exists.
> >>
> >> >  For changes that want to make
> >> > any change to a few subsystems, as is the case when developing a
> >> > feature, they require 2 X <number of project changes> reviews, all of
> >> > which need to be synchronized.
> >>
> >> Same issue as above here.
> >>
> >> > There is a well documented problem
> >> > where we have no official way to synchronize merging of changes to
> >> > userspace applications within a bump without manual human
> >> > intervention.  This would largely render that problem moot.
> >>
> >> Right, this can be hard to handle.
> >>
> >> It can be mitigated by versioning interfaces (which the D-Bus spec
> >> calls out[6][7] but OpenBMC fails to do (?)) and supporting multiple
> >> interfaces for the transition period.
> >>
> >> That said, that's also more work, and so needs to be considered in the
> >> set of trade-offs.
> >>
> >> [6]
> https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-interface
> >> [7]
> https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-bus
> >>
> >> >
> >> > - It would allow most developers to not need to understand Yocto at
> >> > all to do their day to day work on existing applications.  No more
> >> > "devtool modify", and related SRCREV bumps.  This will help most of
> >> > the new developers on the project with a lower mental load, which will
> >> > mean people are able to ramp up faster..
> >>
> >> Okay. So devtool is seen as an issue.
> >>
> >> Can we improve its visibility and any education around it? Or is it a
> >> lost cause? If so, why?
> >>
> >> Separately, I'm concerned this is an attempt to shield people from
> >> skills that help them work with upstream Yocto. OpenBMC feels like it's
> >> a bit of an on-ramp for open-source contributions for people who have
> >> worked in what was previously quite a proprietary environment. We tried
> >> shielding people in the past wrt kernel contributions, and that failed
> >> pretty spectacularly. We (at least Joel and I) now encourage people to
> >> work with upstream directly *and support them in the process of doing
> >> that* rather than trying to mitigate some of the difficulties with
> >> working upstream by avoiding them.
> >>
> >> >
> >> > - It would give an opportunity for individuals and companies to "own"
> >> > well-supported public forks (ie Redhat) of the codebase, which would
> >> > increase participation in the project overall.  This already happens
> >> > quite a bit, but in practice, the forks that do it squash history,
> >> > making it nearly impossible to get their changes upstreamed from an
> >> > outside entity.
> >>
> >> Not sure this is something we want to encourage, even if it happens in
> >> practice.
> >>
> >> >
> >> > - It would centralize the bug databases.  Today, bugs filed against
> >> > sub projects tend to not get answered.
> >>
> >> Do you have some numbers handy?
> >>
> >> > Having all the bugs in
> >> > openbmc/openbmc would help in the future to avoid duplicating bugs
> >> > across projects.
> >>
> >> Has this actually been a problem?
> >>
> >> >
> >> > - Would increase the likelihood that someone contributes a patch,
> >> > especially a patch written by someone else.  If contributing a patch
> >> > was just a matter of cherry-picking a tree of commits and submitting
> >> > it to gerrit, it's a lot more likely that people would do it.
> >>
> >> It sounds plausible, but again, some evidence for this would be helpful.
> >>
> >> Why is this easier than submitting the patches to the application repo?
> >>
> >> > My proposed version of this tree is pushed to a github fork here, and
> >> > is based on the tree from a few weeks ago:
> >> > https://github.com/edtanous/openbmc
> >> >
> >> > It implements all the above for the main branch.  This tree is based
> >> > on the output of the automated tooling, and in the case where this
> >> > proposal is accepted, the tooling would be re-run to capture the state
> >> > of the tree at the point where we chose to make this change.
> >> >
> >> > The tool I wrote to generate this tree is also published, if you're
> >> > interested in how this tree was built, and is quite interesting in its
> >> > use of git export/import [5], but functionally, I would not expect
> >> > that tooling to survive after this transition is made.
> >>
> >> I think it would be good to capture the script in openbmc-tools if we
> >> choose to go ahead with this, mainly as a record of how we achieved it.
> >>
> >> Andrew
> >>
> >> >
> >> > [1]
> >> >
> https://lore.kernel.org/openbmc/CACWQX821ADQCrekLj_bGAu=1SSLCv5pTee7jaoVo2Zs6havgnA@mail.gmail.com/
> >> > [2] https://git-scm.com/docs/git-rebase
> >> > [3]
> >> >
> https://github.com/openbmc/docs/blob/master/CONTRIBUTING.md#inclusive-naming
> >> > [4]
> >> >
> https://www.yoctoproject.org/docs/1.8/ref-manual/ref-manual.html#ref-classes-externalsrc
> >> > [5] https://github.com/edtanous/obmc-repo-combine/blob/main/combine
>

[-- Attachment #2: Type: text/html, Size: 18150 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-05-23 21:07       ` John Broadbent
@ 2022-05-23 23:48         ` Brad Bishop
  2022-05-24  3:54           ` John Broadbent
  0 siblings, 1 reply; 21+ messages in thread
From: Brad Bishop @ 2022-05-23 23:48 UTC (permalink / raw)
  To: John Broadbent; +Cc: Ed Tanous, Cody Smith, OpenBMC Maillist

On Mon, May 23, 2022 at 02:07:55PM -0700, John Broadbent wrote:

>>  "I have no interest in making this easier for you (if it is worse in
>other ways for the project)."   - referring to downstream only features.

>This is the wrong way to view features the community does not want, 

Can you talk about what features the community does not want?  If I pick 
on Google a little bit there is already a google-misc repo where Google 
puts whatever features it wants.  There is the meta-google layer that 
doesn't actually have any platforms in it.  There is the newly approved 
Google SMM logging feature/repo.  There is an OEM Google REST API in 
upstream bmcweb.  There are multiple Google OEM IPMI repositories.  And 
to be fair, Google isn't alone here - IBM has an API in bmcweb and 
layers without platforms too.  Where is the external (community) push 
back on features?  The only one I am aware of is a feature IBM wanted to 
contribute (which for the record, I am not convinced rejecting it was 
appropriate):
https://lore.kernel.org/openbmc/CAMhqiMoFAHcUk0nO_xoOubcZqF_dPDFweqsttTULRJK38o1Ung@mail.gmail.com/

My point is, I am having trouble accepting that community pushback is 
what causes downstream patches.

> and features we would not be allowed to share. 

This I can accept as a generator of downstream patches.  I actually 
support the monorepo concept for the most part, but not with this as 
motivation.  If IBM's pay-for-access feature (reference the thread I 
linked above if that doesn't make sense) was counter to the spirit of 
open source (again, I don't think it is), adding this kind of thinking 
to our decision process is even more counter.

>There is a layer of complexity
>that we use to integrate with our data centers services that only we need.
>A better model would allow openbmc to be flexible enough to enable
>downstream features.

And an even better model would be one where there is a path to getting 
all features upstream?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-05-23 23:48         ` Brad Bishop
@ 2022-05-24  3:54           ` John Broadbent
  2022-05-24 11:32             ` Brad Bishop
  0 siblings, 1 reply; 21+ messages in thread
From: John Broadbent @ 2022-05-24  3:54 UTC (permalink / raw)
  To: Brad Bishop; +Cc: Ed Tanous, Cody Smith, OpenBMC Maillist

[-- Attachment #1: Type: text/plain, Size: 3042 bytes --]

> My point is, I am having trouble accepting that community pushback is
> what causes downstream patches.

Could you give me some insight on that? Why does that surprise you?

I don't want to call out any concrete examples without talking to the
change owner first.
(I don't want to put them on 'blast')

But we can glance at my work.
https://gerrit.openbmc.org/c/openbmc/bmcweb/+/53563/8
https://gerrit.openbmc.org/c/openbmc/bmcweb/+/53325

I have been trying to get these two changes in for the last 19
calendar days. If it gets heldup by
https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/53676.
I might have to patch to make my deadline.

The ask in PDI will take real time to negotiate, maybe months (as the last
attempt took 5 months, and still failed)
My schedule says I had 3 weeks to make this change?

Again, would rather not talk about others patches, but if you have insight
please feel free to join in.

Thanks



On Mon, May 23, 2022 at 4:48 PM Brad Bishop <bradleyb@fuzziesquirrel.com>
wrote:

> On Mon, May 23, 2022 at 02:07:55PM -0700, John Broadbent wrote:
>
> >>  "I have no interest in making this easier for you (if it is worse in
> >other ways for the project)."   - referring to downstream only features.
>
> >This is the wrong way to view features the community does not want,
>
> Can you talk about what features the community does not want?  If I pick
> on Google a little bit there is already a google-misc repo where Google
> puts whatever features it wants.  There is the meta-google layer that
> doesn't actually have any platforms in it.  There is the newly approved
> Google SMM logging feature/repo.  There is an OEM Google REST API in
> upstream bmcweb.  There are multiple Google OEM IPMI repositories.  And
> to be fair, Google isn't alone here - IBM has an API in bmcweb and
> layers without platforms too.  Where is the external (community) push
> back on features?  The only one I am aware of is a feature IBM wanted to
> contribute (which for the record, I am not convinced rejecting it was
> appropriate):
>
> https://lore.kernel.org/openbmc/CAMhqiMoFAHcUk0nO_xoOubcZqF_dPDFweqsttTULRJK38o1Ung@mail.gmail.com/
>
> My point is, I am having trouble accepting that community pushback is
> what causes downstream patches.
>
> > and features we would not be allowed to share.
>
> This I can accept as a generator of downstream patches.  I actually
> support the monorepo concept for the most part, but not with this as
> motivation.  If IBM's pay-for-access feature (reference the thread I
> linked above if that doesn't make sense) was counter to the spirit of
> open source (again, I don't think it is), adding this kind of thinking
> to our decision process is even more counter.
>
> >There is a layer of complexity
> >that we use to integrate with our data centers services that only we need.
> >A better model would allow openbmc to be flexible enough to enable
> >downstream features.
>
> And an even better model would be one where there is a path to getting
> all features upstream?
>

[-- Attachment #2: Type: text/html, Size: 4213 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-05-24  3:54           ` John Broadbent
@ 2022-05-24 11:32             ` Brad Bishop
  0 siblings, 0 replies; 21+ messages in thread
From: Brad Bishop @ 2022-05-24 11:32 UTC (permalink / raw)
  To: John Broadbent; +Cc: Ed Tanous, Cody Smith, OpenBMC Maillist

Hi John

On Mon, May 23, 2022 at 08:54:06PM -0700, John Broadbent wrote:
>> My point is, I am having trouble accepting that community pushback is
>> what causes downstream patches.
>
>Could you give me some insight on that? Why does that surprise you?

I thought I did - I listed several examples where the community has 
embraced and accepted Google-only features.

>
>I don't want to call out any concrete examples without talking to the
>change owner first.
>(I don't want to put them on 'blast')
>
>But we can glance at my work.
>https://gerrit.openbmc.org/c/openbmc/bmcweb/+/53563/8
>https://gerrit.openbmc.org/c/openbmc/bmcweb/+/53325
>
>I have been trying to get these two changes in for the last 19
>calendar days. If it gets heldup by
>https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/53676.
>I might have to patch to make my deadline.
>
>The ask in PDI will take real time to negotiate, maybe months (as the last
>attempt took 5 months, and still failed)
>My schedule says I had 3 weeks to make this change?

Ah.  These aren't examples of the community rejecting your patches 
because your features are unwanted.  These look like the normal 
consensus building process, which, you are absolutely right, that can 
take a long time (although five months seems a bit long), and be a 
generator of downstream patches.

Thanks,
Brad

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-05-23 16:27   ` Ed Tanous
@ 2022-05-25 13:31     ` Heyi Guo
  2022-05-25 15:02       ` Ed Tanous
  0 siblings, 1 reply; 21+ messages in thread
From: Heyi Guo @ 2022-05-25 13:31 UTC (permalink / raw)
  To: Ed Tanous; +Cc: Andrew Jeffery, OpenBMC Maillist, Brad Bishop


在 2022/5/24 上午12:27, Ed Tanous 写道:
> On Tue, Apr 12, 2022 at 12:23 AM Heyi Guo<guoheyi@linux.alibaba.com>  wrote:
>> I like the idea, for we don't utilize additional tools like repo to
>> maintain the code, and it should make it easier for us to maintain
>> multiple internal branches.
>>
> Hi Heyi,
> Glad to see you on the project.  Do you think you could elaborate a
> little about how you're hoping to use OpenBMC and its review process,
> and if any of the changes being proposed here would help you?

Hi Ed,

The background is our team uses basic git commands to manage the 
repositories of openbmc, so the current multi-repositories structure 
costs extra effort for our code maintenance, including:

1. Normally two commits are required for one single change, one for the 
component repo and one for openbmc, for our internal release versions 
are more frequent and the fixes are required to be merged ASAP. We also 
created a script to check if openbmc has included the latest commits of 
all component repos.

2. Not easy to maintain stable branches, which require to have branches 
for openbmc and the integrated components.

3. Not easy to search code across all the component repos; I'd like to 
use "git grep" to search keyword in a single repo, but it doesn't work 
here; and it is not easy to make generic fix for all repos, as you said.

I think monorepo will help to improve the situation, and it may help 
prevent the division of the community.

The code review process is not difficult for us, for reviewers are 
chosen automatically by gerrit.

If you also have better practice for the current multi-repo structure, 
please advise and help us improve :)

Thanks,

Heyi

>
>> Thanks,
>>
>> Heyi
>>
>> 在 2022/4/5 上午2:28, Ed Tanous 写道:
>>> The OpenBMC development process as it stands is difficult for people
>>> new to the project to understand, which severely limits our ability to
>>> onboard new maintainers, developers, and groups which would otherwise
>>> contribute major features to upstream, but don't have the technical
>>> expertise to do so.  This initiative, much like others before it[1] is
>>> attempting to reduce the toil and OpenBMC-specific processes of
>>> passing changes amongst the community, and move things to being more
>>> like other projects that have largely solved this problem already.
>>>
>>> To that end, I'd like to propose a change to the way we structure our
>>> repositories within the project: specifically, putting (almost) all of
>>> the Linux Foundation OpenBMC owned code into a single repo that we can
>>> version as a single entity, rather than spreading out amongst many
>>> repos.  In practice, this would have some significant advantages:
>>>
>>> - The tree would be easily shareable amongst the various people
>>> working on OpenBMC, without having to rely on a single-source Gerrit
>>> instance.  Git is designed to be distributed, but if our recipe files
>>> point at other repositories, it largely defeats a lot of this
>>> capability.  Today, if you want to share a tree that has a change in
>>> it, you have to fork the main tree, then fork every single subproject
>>> you've made modifications to, then update the main tree to point to
>>> your forks.  This gets very onerous over time, especially for simple
>>> commits.  Having maintained several different companies forks
>>> personally, and spoken to many others having problems with the same,
>>> adding major features are difficult to test and rebase because of
>>> this.  Moving the code to a single tree makes a lot of the toil of
>>> tagging and modifying local trees a lot more manageable, as a series
>>> of well-documented git commands (generally git rebase[2]).  It also
>>> increases the likelihood that someone pulls down the fork to test it
>>> if it's highly likely that they can apply it to their own tree in a
>>> single command.
>>>
>>> - There would be a reduction in reviews.  Today, anytime a person
>>> wants to make a change that would involve any part of the tree,
>>> there's at least 2 code reviews, one for the commit, and one for the
>>> recipe bump.  Compared to a single tree, this at least doubles the
>>> number of reviews we need to process.  For changes that want to make
>>> any change to a few subsystems, as is the case when developing a
>>> feature, they require 2 X <number of project changes> reviews, all of
>>> which need to be synchronized.  There is a well documented problem
>>> where we have no official way to synchronize merging of changes to
>>> userspace applications within a bump without manual human
>>> intervention.  This would largely render that problem moot.
>>>
>>> - It would allow most developers to not need to understand Yocto at
>>> all to do their day to day work on existing applications.  No more
>>> "devtool modify", and related SRCREV bumps.  This will help most of
>>> the new developers on the project with a lower mental load, which will
>>> mean people are able to ramp up faster..
>>>
>>> - It would give an opportunity for individuals and companies to "own"
>>> well-supported public forks (ie Redhat) of the codebase, which would
>>> increase participation in the project overall.  This already happens
>>> quite a bit, but in practice, the forks that do it squash history,
>>> making it nearly impossible to get their changes upstreamed from an
>>> outside entity.
>>>
>>> - It would centralize the bug databases.  Today, bugs filed against
>>> sub projects tend to not get answered.  Having all the bugs in
>>> openbmc/openbmc would help in the future to avoid duplicating bugs
>>> across projects.
>>>
>>> - Would increase the likelihood that someone contributes a patch,
>>> especially a patch written by someone else.  If contributing a patch
>>> was just a matter of cherry-picking a tree of commits and submitting
>>> it to gerrit, it's a lot more likely that people would do it.
>>>
>>> - Greatly increases the ease with which stats are collected.
>>> Questions like: How many patches were submitted last year?  How many
>>> lines of code changed between commit A and commit B?  Where was this
>>> regression injected (ie git bisect)?  How much of our codebase is C++?
>>> How many users of the dbus Sensor.Value interface are there?  Are all
>>> easily answered in one liner git commands once this change is done.
>>>
>>> - New features no longer require single-point-of-contact core
>>> maintainer processes (ie, creating a repo for changes, setting up
>>> maintainer groups, ect) and can just be submitted as a series of
>>> patches to openbmc/openbmc.
>>>
>>> - Tree-wide changes (c++ standard, yocto updates, formatting, ect) are
>>> much easier to accomplish in a small number of patches, or a series of
>>> patches that is easy to pull and test as a unit.
>>>
>>> In terms of concretely how we would accomplish this, I've put together
>>> what such a tree would look like, and I'm looking for input on how it
>>> could be improved.  Some key points on what it represents:
>>>
>>> - All history for both openbmc and sub projects will be retained.
>>> Commits are interleaved based on the date in which they were submitted
>>> using custom tooling that was built on top of git fast-export and
>>> fast-import.  All previously available tags will have similar tags in
>>> the new repository pointing at their equivalent commits in the new
>>> repository.
>>>
>>> - Inclusive guidelines: To make progress toward an unrelated but
>>> important goal at the same time, I'm recommending that the
>>> openbmc/master branch will be left as-is, and the newly-created sha1
>>> will be pushed to the branch openbmc/openbmc:main, to retain peoples
>>> links to previous commits on master, and retain the exact project
>>> history while at the same time moving the project to having more
>>> inclusive naming, as has been documented previously[3].  At some point
>>> in the future the master branch could be renamed and deprecated, but
>>> this is considered out of scope for this specific change.
>>>
>>> - Each individual sub-project will be given a folder within
>>> openbmc/openbmc based on their current repository name.  While there
>>> is an opportunity to reorganize in more specific ways (ie, put all
>>> ipmi-oem handler repos in a folder) this proposal intentionally
>>> doesn't, under the proposition that once this change is made, any sort
>>> of folder rearranging will be much easier to accomplish, and to keep
>>> the scope limited.
>>>
>>> - Yocto recipes will be changed to point to their path equivalent, and
>>> inherit externalsrc bbclass[4].  This workflow is exactly the workflow
>>> devtool uses to point to local repositories during a "devtool modify",
>>> so it's unlikely we will have incremental build-consistency issues
>>> with this approach, as was a concern in the past.
>>>
>>> - Places where we've forked other well supported projects (u-boot,
>>> kernel, ect) will continue to point to the openbmc/<projectname> fork.
>>> This is done to ensure that we don't inflict the same problem we're
>>> attempting to solve in OpenBMC upon those working in the subproject
>>> forks, and to reinforce to contributors that patches to these projects
>>> should prefer submitting first to the relevant upstream.
>>>
>>> - Subprojects that are intended to be reused outside of OpenBMC (ex
>>> sdbusplus) will retain their previous commit, history, and trees, such
>>> that they are usable outside the project.  This is intended to make
>>> sure that the code that should be reusable by others remains so.
>>>
>>> - The above intentionally makes no changes to our subtree update
>>> process, which would remain the same process as is currently.  The
>>> openbmc-specific autobump job in Jenkins would be disabled considering
>>> it's no longer required in this approach.
>>>
>>> - Most Gerrit patches would now be submitted to openbmc/openbmc.
>>>
>>> My proposed version of this tree is pushed to a github fork here, and
>>> is based on the tree from a few weeks ago:
>>> https://github.com/edtanous/openbmc
>>>
>>> It implements all the above for the main branch.  This tree is based
>>> on the output of the automated tooling, and in the case where this
>>> proposal is accepted, the tooling would be re-run to capture the state
>>> of the tree at the point where we chose to make this change.
>>>
>>> The tool I wrote to generate this tree is also published, if you're
>>> interested in how this tree was built, and is quite interesting in its
>>> use of git export/import [5], but functionally, I would not expect
>>> that tooling to survive after this transition is made.
>>>
>>> Let me know what you think.
>>>
>>> -Ed
>>>
>>> [1]https://lore.kernel.org/openbmc/CACWQX821ADQCrekLj_bGAu=1SSLCv5pTee7jaoVo2Zs6havgnA@mail.gmail.com/
>>> [2]https://git-scm.com/docs/git-rebase
>>> [3]https://github.com/openbmc/docs/blob/master/CONTRIBUTING.md#inclusive-naming
>>> [4]https://www.yoctoproject.org/docs/1.8/ref-manual/ref-manual.html#ref-classes-externalsrc
>>> [5]https://github.com/edtanous/obmc-repo-combine/blob/main/combine

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
  2022-05-25 13:31     ` Heyi Guo
@ 2022-05-25 15:02       ` Ed Tanous
  0 siblings, 0 replies; 21+ messages in thread
From: Ed Tanous @ 2022-05-25 15:02 UTC (permalink / raw)
  To: Heyi Guo; +Cc: Andrew Jeffery, OpenBMC Maillist, Brad Bishop

On Wed, May 25, 2022 at 6:32 AM Heyi Guo <guoheyi@linux.alibaba.com> wrote:
>
>
> 在 2022/5/24 上午12:27, Ed Tanous 写道:
> > On Tue, Apr 12, 2022 at 12:23 AM Heyi Guo<guoheyi@linux.alibaba.com>  wrote:
> >> I like the idea, for we don't utilize additional tools like repo to
> >> maintain the code, and it should make it easier for us to maintain
> >> multiple internal branches.
> >>
> > Hi Heyi,
> > Glad to see you on the project.  Do you think you could elaborate a
> > little about how you're hoping to use OpenBMC and its review process,
> > and if any of the changes being proposed here would help you?
>
> Hi Ed,
>
> The background is our team uses basic git commands to manage the
> repositories of openbmc, so the current multi-repositories structure
> costs extra effort for our code maintenance, including:
>
> 1. Normally two commits are required for one single change, one for the
> component repo and one for openbmc, for our internal release versions
> are more frequent and the fixes are required to be merged ASAP. We also
> created a script to check if openbmc has included the latest commits of
> all component repos.
>
> 2. Not easy to maintain stable branches, which require to have branches
> for openbmc and the integrated components.
>
> 3. Not easy to search code across all the component repos; I'd like to
> use "git grep" to search keyword in a single repo, but it doesn't work
> here; and it is not easy to make generic fix for all repos, as you said.
>
> I think monorepo will help to improve the situation, and it may help
> prevent the division of the community.
>
> The code review process is not difficult for us, for reviewers are
> chosen automatically by gerrit.
>
> If you also have better practice for the current multi-repo structure,
> please advise and help us improve :)
>
> Thanks,
>
> Heyi

Thank you for your feedback.


>
> >
> >> Thanks,
> >>
> >> Heyi
> >>
> >> 在 2022/4/5 上午2:28, Ed Tanous 写道:
> >>> The OpenBMC development process as it stands is difficult for people
> >>> new to the project to understand, which severely limits our ability to
> >>> onboard new maintainers, developers, and groups which would otherwise
> >>> contribute major features to upstream, but don't have the technical
> >>> expertise to do so.  This initiative, much like others before it[1] is
> >>> attempting to reduce the toil and OpenBMC-specific processes of
> >>> passing changes amongst the community, and move things to being more
> >>> like other projects that have largely solved this problem already.
> >>>
> >>> To that end, I'd like to propose a change to the way we structure our
> >>> repositories within the project: specifically, putting (almost) all of
> >>> the Linux Foundation OpenBMC owned code into a single repo that we can
> >>> version as a single entity, rather than spreading out amongst many
> >>> repos.  In practice, this would have some significant advantages:
> >>>
> >>> - The tree would be easily shareable amongst the various people
> >>> working on OpenBMC, without having to rely on a single-source Gerrit
> >>> instance.  Git is designed to be distributed, but if our recipe files
> >>> point at other repositories, it largely defeats a lot of this
> >>> capability.  Today, if you want to share a tree that has a change in
> >>> it, you have to fork the main tree, then fork every single subproject
> >>> you've made modifications to, then update the main tree to point to
> >>> your forks.  This gets very onerous over time, especially for simple
> >>> commits.  Having maintained several different companies forks
> >>> personally, and spoken to many others having problems with the same,
> >>> adding major features are difficult to test and rebase because of
> >>> this.  Moving the code to a single tree makes a lot of the toil of
> >>> tagging and modifying local trees a lot more manageable, as a series
> >>> of well-documented git commands (generally git rebase[2]).  It also
> >>> increases the likelihood that someone pulls down the fork to test it
> >>> if it's highly likely that they can apply it to their own tree in a
> >>> single command.
> >>>
> >>> - There would be a reduction in reviews.  Today, anytime a person
> >>> wants to make a change that would involve any part of the tree,
> >>> there's at least 2 code reviews, one for the commit, and one for the
> >>> recipe bump.  Compared to a single tree, this at least doubles the
> >>> number of reviews we need to process.  For changes that want to make
> >>> any change to a few subsystems, as is the case when developing a
> >>> feature, they require 2 X <number of project changes> reviews, all of
> >>> which need to be synchronized.  There is a well documented problem
> >>> where we have no official way to synchronize merging of changes to
> >>> userspace applications within a bump without manual human
> >>> intervention.  This would largely render that problem moot.
> >>>
> >>> - It would allow most developers to not need to understand Yocto at
> >>> all to do their day to day work on existing applications.  No more
> >>> "devtool modify", and related SRCREV bumps.  This will help most of
> >>> the new developers on the project with a lower mental load, which will
> >>> mean people are able to ramp up faster..
> >>>
> >>> - It would give an opportunity for individuals and companies to "own"
> >>> well-supported public forks (ie Redhat) of the codebase, which would
> >>> increase participation in the project overall.  This already happens
> >>> quite a bit, but in practice, the forks that do it squash history,
> >>> making it nearly impossible to get their changes upstreamed from an
> >>> outside entity.
> >>>
> >>> - It would centralize the bug databases.  Today, bugs filed against
> >>> sub projects tend to not get answered.  Having all the bugs in
> >>> openbmc/openbmc would help in the future to avoid duplicating bugs
> >>> across projects.
> >>>
> >>> - Would increase the likelihood that someone contributes a patch,
> >>> especially a patch written by someone else.  If contributing a patch
> >>> was just a matter of cherry-picking a tree of commits and submitting
> >>> it to gerrit, it's a lot more likely that people would do it.
> >>>
> >>> - Greatly increases the ease with which stats are collected.
> >>> Questions like: How many patches were submitted last year?  How many
> >>> lines of code changed between commit A and commit B?  Where was this
> >>> regression injected (ie git bisect)?  How much of our codebase is C++?
> >>> How many users of the dbus Sensor.Value interface are there?  Are all
> >>> easily answered in one liner git commands once this change is done.
> >>>
> >>> - New features no longer require single-point-of-contact core
> >>> maintainer processes (ie, creating a repo for changes, setting up
> >>> maintainer groups, ect) and can just be submitted as a series of
> >>> patches to openbmc/openbmc.
> >>>
> >>> - Tree-wide changes (c++ standard, yocto updates, formatting, ect) are
> >>> much easier to accomplish in a small number of patches, or a series of
> >>> patches that is easy to pull and test as a unit.
> >>>
> >>> In terms of concretely how we would accomplish this, I've put together
> >>> what such a tree would look like, and I'm looking for input on how it
> >>> could be improved.  Some key points on what it represents:
> >>>
> >>> - All history for both openbmc and sub projects will be retained.
> >>> Commits are interleaved based on the date in which they were submitted
> >>> using custom tooling that was built on top of git fast-export and
> >>> fast-import.  All previously available tags will have similar tags in
> >>> the new repository pointing at their equivalent commits in the new
> >>> repository.
> >>>
> >>> - Inclusive guidelines: To make progress toward an unrelated but
> >>> important goal at the same time, I'm recommending that the
> >>> openbmc/master branch will be left as-is, and the newly-created sha1
> >>> will be pushed to the branch openbmc/openbmc:main, to retain peoples
> >>> links to previous commits on master, and retain the exact project
> >>> history while at the same time moving the project to having more
> >>> inclusive naming, as has been documented previously[3].  At some point
> >>> in the future the master branch could be renamed and deprecated, but
> >>> this is considered out of scope for this specific change.
> >>>
> >>> - Each individual sub-project will be given a folder within
> >>> openbmc/openbmc based on their current repository name.  While there
> >>> is an opportunity to reorganize in more specific ways (ie, put all
> >>> ipmi-oem handler repos in a folder) this proposal intentionally
> >>> doesn't, under the proposition that once this change is made, any sort
> >>> of folder rearranging will be much easier to accomplish, and to keep
> >>> the scope limited.
> >>>
> >>> - Yocto recipes will be changed to point to their path equivalent, and
> >>> inherit externalsrc bbclass[4].  This workflow is exactly the workflow
> >>> devtool uses to point to local repositories during a "devtool modify",
> >>> so it's unlikely we will have incremental build-consistency issues
> >>> with this approach, as was a concern in the past.
> >>>
> >>> - Places where we've forked other well supported projects (u-boot,
> >>> kernel, ect) will continue to point to the openbmc/<projectname> fork.
> >>> This is done to ensure that we don't inflict the same problem we're
> >>> attempting to solve in OpenBMC upon those working in the subproject
> >>> forks, and to reinforce to contributors that patches to these projects
> >>> should prefer submitting first to the relevant upstream.
> >>>
> >>> - Subprojects that are intended to be reused outside of OpenBMC (ex
> >>> sdbusplus) will retain their previous commit, history, and trees, such
> >>> that they are usable outside the project.  This is intended to make
> >>> sure that the code that should be reusable by others remains so.
> >>>
> >>> - The above intentionally makes no changes to our subtree update
> >>> process, which would remain the same process as is currently.  The
> >>> openbmc-specific autobump job in Jenkins would be disabled considering
> >>> it's no longer required in this approach.
> >>>
> >>> - Most Gerrit patches would now be submitted to openbmc/openbmc.
> >>>
> >>> My proposed version of this tree is pushed to a github fork here, and
> >>> is based on the tree from a few weeks ago:
> >>> https://github.com/edtanous/openbmc
> >>>
> >>> It implements all the above for the main branch.  This tree is based
> >>> on the output of the automated tooling, and in the case where this
> >>> proposal is accepted, the tooling would be re-run to capture the state
> >>> of the tree at the point where we chose to make this change.
> >>>
> >>> The tool I wrote to generate this tree is also published, if you're
> >>> interested in how this tree was built, and is quite interesting in its
> >>> use of git export/import [5], but functionally, I would not expect
> >>> that tooling to survive after this transition is made.
> >>>
> >>> Let me know what you think.
> >>>
> >>> -Ed
> >>>
> >>> [1]https://lore.kernel.org/openbmc/CACWQX821ADQCrekLj_bGAu=1SSLCv5pTee7jaoVo2Zs6havgnA@mail.gmail.com/
> >>> [2]https://git-scm.com/docs/git-rebase
> >>> [3]https://github.com/openbmc/docs/blob/master/CONTRIBUTING.md#inclusive-naming
> >>> [4]https://www.yoctoproject.org/docs/1.8/ref-manual/ref-manual.html#ref-classes-externalsrc
> >>> [5]https://github.com/edtanous/obmc-repo-combine/blob/main/combine

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Proposing changes to the OpenBMC tree (to make upstreaming easier)
@ 2022-05-23 23:27 Nan Zhou
  0 siblings, 0 replies; 21+ messages in thread
From: Nan Zhou @ 2022-05-23 23:27 UTC (permalink / raw)
  To: John Broadbent; +Cc: OpenBMC Maillist, Brad Bishop, Cody Smith

[-- Attachment #1: Type: text/plain, Size: 31214 bytes --]

This is an email thread with a lot of information. We have discussed this
internally inside Google, too.

An important advantage of the monorepo I like is that it makes maintenance
of patches easier. For example, Google puts a lot of attention on Redfish
these days. With the monorepo, Google can maintain its own BMCWeb branch
and continuously rebase on the master. And there's only one branch for all
these customizations: Yocto, core C++ repo, etc. +1 to "A better model
would allow openbmc to be flexible enough to enable downstream features.".
I believe this will make OpenBMC even more popular.

Best,
Nan Zhou

Message: 2
> Date: Mon, 23 May 2022 14:07:55 -0700
> From:
> To: Ed Tanous <edtanous@google.com>
> Cc: Cody Smith <scody@google.com>, OpenBMC Maillist
>         <openbmc@lists.ozlabs.org>, Brad Bishop <
> bradleyb@fuzziesquirrel.com>
> Subject: Re: Proposing changes to the OpenBMC tree (to make
>         upstreaming easier)
> Message-ID:
>         <CAPw1Ef_dMf43e567LLAfMZp6khWWQAm=
> i63LHfOwWkyiSe-MFA@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> Main thoughts:
> Message: 2
> Date: Mon, 23 May 2022 14:07:55 -0700
> From: John Broadbent <jebr@google.com>
> To: Ed Tanous <edtanous@google.com>
> Cc: Cody Smith <scody@google.com>, OpenBMC Maillist
>         <openbmc@lists.ozlabs.org>, Brad Bishop <
> bradleyb@fuzziesquirrel.com>
> Subject: Re: Proposing changes to the OpenBMC tree (to make
>         upstreaming easier)
> Message-ID:
>         <CAPw1Ef_dMf43e567LLAfMZp6khWWQAm=
> i63LHfOwWkyiSe-MFA@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> Main thoughts:
> We have several patches that we apply to the project. When we update those
> patches we see a diff of the patches, and it can be difficult to review a
> diff of a diff.
> I believe this new repo system would allow us to apply those patches to a
> source tree, and manage/maintain the patches better.
> >  "I have no interest in making this easier for you (if it is worse in
> other ways for the project)."   - referring to downstream only features.
> This is the wrong way to view features the community does not want, and
> features we would not be allowed to share. There is a layer of complexity
> that we use to integrate with our data centers services that only we need.
> A better model would allow openbmc to be flexible enough to enable
> downstream features.
> Other thoughts:
>    - I suppose it would make it easy for others to fork the project, but I
>    don't think that is a strong enough reason to prevent consolidation.
>    - The consolidation would make it easier to bring new people up to
>    speed. (the system we have works fine, but I suspect the consolidation
> will
>    be a improvement)
>    - We are not changing OWNERS in the change.
>    - Applications vs distribution: I have always viewed openbmc as a
>    collection of application services/applications, combined with a
> distros.
> On Mon, May 23, 2022 at 9:38 AM Ed Tanous <edtanous@google.com> wrote:
> > On Thu, May 19, 2022 at 2:12 PM Cody Smith <scody@google.com> wrote:
> > >
> > > I don't seem to have the original message, so this may get added to
> > Andrew's branch of this thread. Sorry about that in advance.
> >
> > The original message got caught in a lot of peoples spam filters, I'm
> > hoping that explains some of the lack of reply to the initial
> > proposal.
> >
> > >
> > > In general I support moving to a monorepo. We at Google do this, and my
> > significant other at Airbnb also utilizes a monorepo. The advantages are
> > significant, as the world gets a lot less silo'd and making changes that
> > would have spanned across multiple repos now only span the monorepo. This
> > is particularly useful when feature X requires changes to repo A, B and
> C,
> > and the changes on their own break things but shipped together are just
> > fine. I don't even really know how such a feature gets shipped today to
> be
> > honest.
> >
> > I agree with your general sentiment, although a couple nitpicks, what
> > I propose above isn't pure "monorepo" and more analogous to
> > "consolidate a lot of the repos".  FWIW, although I really think it's
> > the right thing to do, "other companies do it for other things" isn't
> > the best of arguments we can make for this.  There are plenty of
> > counter examples of companies with much more entrenched command chains
> > that use multiple repos and the creation of repos as a form of project
> > management to great effect.
> >
> > >
> > > The other thing that tends to happen with monorepos is a lot more
> > conformity, as reviews are carried out by a larger set of people.
> >
> > +1.  Applying consistent clang-format to the codebase for example
> > would be a lot more trivial.
> >
> > > Suddenly `bmcweb` is being reviewed by people who may not have
> > previously cared about or touched that part of the codebase as a bad
> > example. At a minimum more people will have eyes on the changes
> happening.
> > >
> > > I also think that a monorepo avoids one maintainer "lording" over a
> > repo. It happens, the +2ers kind of play a role of the bridge troll, when
> > repo X only has 1-2 +2ers, this can be a real problem. A monorepo with
> 10+
> > +2ers will force the +2ers to engage in debate when they disagree with
> each
> > other, instead of lording over their own kingdoms and having no influence
> > over other kingdoms so to speak.
> >
> > In what I propose, I don't really think this changes given that the
> > existing OWNERS files would still be largely the same, although I
> > agree, more +2er debate would be a good thing if it was the result.
> >
> > >
> > > I haven't made a great set of arguments here but in general I feel like
> > a chance like this would help from an organizational perspective and
> maybe
> > with that better org. in place maybe we can begin addressing some of the
> > other issues we need to address.
> >
> > Thanks for your input.
> >
> > PS, plaintext is generally prefered on this ML, given that it diffs
> > better in the tools.  (Click triple dot in the lower right of gmail,
> > then check "plain text mode").
> >
> > >
> > > Cody Smith
> > > System Software Engineer
> > > Google Cloud Platform Core Services Team
> > > scody@google.com
> > > 720-515-6105 <(720)%20515-6105> <(720)%20515-6105>
> > >
> > >
> > >
> > > On Tue, Apr 5, 2022 at 7:22 PM Andrew Jeffery <andrew@aj.id.au> wrote:
> > >>
> > >> Hi Ed,
> > >>
> > >> I think what's below largely points to a bit of an identity crisis for
> > >> the project, on a couple of fronts. Fundamentally OpenBMC is a distro
> > >> (or as Yocto likes to point out, a meta-distro), and we can:
> > >>
> > >> 1. Identify as a traditional OSS distro: An integration of otherwise
> > >>    independent applications
> > >>
> > >> 2. Identify as an appliance distro: The distro and the
> > >>    applications are a monolith
> > >>
> > >> You're proposing 2, while I think there exists some tension towards 1.
> > >>
> > >> With the amount of custom userspace we've always kinda sat in-between.
> > >> I'd like to see libraries and applications that have use cases outside
> > >> of OpenBMC be accessible to people with those external use cases,
> > >> without being burdened by understanding the rest of the OpenBMC
> context.
> > >> I have a concern that by integrating things in the way you're
> proposing
> > >> it will lead to more inertia there (e.g. for implementations of
> > >> standards MCTP or PLDM (libmctp and libpldm)).
> > >>
> > >> On Tue, 5 Apr 2022, at 03:58, Ed Tanous wrote:
> > >> > The OpenBMC development process as it stands is difficult for people
> > >> > new to the project to understand, which severely limits our ability
> to
> > >> > onboard new maintainers, developers, and groups which would
> otherwise
> > >> > contribute major features to upstream, but don't have the technical
> > >> > expertise to do so.  This initiative, much like others before it[1]
> is
> > >> > attempting to reduce the toil and OpenBMC-specific processes of
> > >> > passing changes amongst the community, and move things to being more
> > >> > like other projects that have largely solved this problem already.
> > >>
> > >> Can you be more specific about which projects here? Do you have links
> > >> to examples?
> > >>
> > >> >
> > >> > To that end, I'd like to propose a change to the way we structure
> our
> > >> > repositories within the project: specifically, putting (almost) all
> of
> > >> > the Linux Foundation OpenBMC owned code into a single repo that we
> can
> > >> > version as a single entity, rather than spreading out amongst many
> > >> > repos.  In practice, this would have some significant advantages:
> > >> >
> > >> > - The tree would be easily shareable amongst the various people
> > >> > working on OpenBMC, without having to rely on a single-source Gerrit
> > >> > instance.  Git is designed to be distributed, but if our recipe
> files
> > >> > point at other repositories, it largely defeats a lot of this
> > >> > capability.  Today, if you want to share a tree that has a change in
> > >> > it, you have to fork the main tree, then fork every single
> subproject
> > >> > you've made modifications to, then update the main tree to point to
> > >> > your forks.
> > >>
> > >> This isn't true, as you can add patches in the OpenBMC tree.
> > >>
> > >> CI prevents these from being submitted, as it should, but there's
> > nothing to
> > >> stop anyone using the `devtool modify ...` / `devtool finish ...` and
> > >> committing the result as a workflow to exchange state (I do this)?
> > >>
> > >> Is the issue instead with devtool? Is it bad? Is the learning curve
> too
> > steep?
> > >> It is at least the Yocto workflow.
> > >>
> > >> > This gets very onerous over time, especially for simple
> > >> > commits.  Having maintained several different companies forks
> > >> > personally, and spoken to many others having problems with the same,
> > >> > adding major features are difficult to test and rebase because of
> > >> > this.  Moving the code to a single tree makes a lot of the toil of
> > >> > tagging and modifying local trees a lot more manageable, as a series
> > >> > of well-documented git commands (generally git rebase[2]).  It also
> > >> > increases the likelihood that someone pulls down the fork to test it
> > >> > if it's highly likely that they can apply it to their own tree in a
> > >> > single command.
> > >>
> > >> Again, this is moot if the patches are applied in-tree.
> > >>
> > >> >
> > >> > - There would be a reduction in reviews.  Today, anytime a person
> > >> > wants to make a change that would involve any part of the tree,
> > >> > there's at least 2 code reviews, one for the commit, and one for the
> > >> > recipe bump.  Compared to a single tree, this at least doubles the
> > >> > number of reviews we need to process.
> > >>
> > >> Is there more work? Yes.
> > >>
> > >> Is it always double? No. Is it sometimes double? Yes.
> > >>
> > >> Often bumps batch multiple application commits. I think this paragraph
> > >> overstates the problem somewhat, but what it does get right is
> > >> identifying that *some* overhead exists.
> > >>
> > >> >  For changes that want to make
> > >> > any change to a few subsystems, as is the case when developing a
> > >> > feature, they require 2 X <number of project changes> reviews, all
> of
> > >> > which need to be synchronized.
> > >>
> > >> Same issue as above here.
> > >>
> > >> > There is a well documented problem
> > >> > where we have no official way to synchronize merging of changes to
> > >> > userspace applications within a bump without manual human
> > >> > intervention.  This would largely render that problem moot.
> > >>
> > >> Right, this can be hard to handle.
> > >>
> > >> It can be mitigated by versioning interfaces (which the D-Bus spec
> > >> calls out[6][7] but OpenBMC fails to do (?)) and supporting multiple
> > >> interfaces for the transition period.
> > >>
> > >> That said, that's also more work, and so needs to be considered in the
> > >> set of trade-offs.
> > >>
> > >> [6]
> >
> https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-interface
> > >> [7]
> >
> https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-bus
> > >>
> > >> >
> > >> > - It would allow most developers to not need to understand Yocto at
> > >> > all to do their day to day work on existing applications.  No more
> > >> > "devtool modify", and related SRCREV bumps.  This will help most of
> > >> > the new developers on the project with a lower mental load, which
> will
> > >> > mean people are able to ramp up faster..
> > >>
> > >> Okay. So devtool is seen as an issue.
> > >>
> > >> Can we improve its visibility and any education around it? Or is it a
> > >> lost cause? If so, why?
> > >>
> > >> Separately, I'm concerned this is an attempt to shield people from
> > >> skills that help them work with upstream Yocto. OpenBMC feels like
> it's
> > >> a bit of an on-ramp for open-source contributions for people who have
> > >> worked in what was previously quite a proprietary environment. We
> tried
> > >> shielding people in the past wrt kernel contributions, and that failed
> > >> pretty spectacularly. We (at least Joel and I) now encourage people to
> > >> work with upstream directly *and support them in the process of doing
> > >> that* rather than trying to mitigate some of the difficulties with
> > >> working upstream by avoiding them.
> > >>
> > >> >
> > >> > - It would give an opportunity for individuals and companies to
> "own"
> > >> > well-supported public forks (ie Redhat) of the codebase, which would
> > >> > increase participation in the project overall.  This already happens
> > >> > quite a bit, but in practice, the forks that do it squash history,
> > >> > making it nearly impossible to get their changes upstreamed from an
> > >> > outside entity.
> > >>
> > >> Not sure this is something we want to encourage, even if it happens in
> > >> practice.
> > >>
> > >> >
> > >> > - It would centralize the bug databases.  Today, bugs filed against
> > >> > sub projects tend to not get answered.
> > >>
> > >> Do you have some numbers handy?
> > >>
> > >> > Having all the bugs in
> > >> > openbmc/openbmc would help in the future to avoid duplicating bugs
> > >> > across projects.
> > >>
> > >> Has this actually been a problem?
> > >>
> > >> >
> > >> > - Would increase the likelihood that someone contributes a patch,
> > >> > especially a patch written by someone else.  If contributing a patch
> > >> > was just a matter of cherry-picking a tree of commits and submitting
> > >> > it to gerrit, it's a lot more likely that people would do it.
> > >>
> > >> It sounds plausible, but again, some evidence for this would be
> helpful.
> > >>
> > >> Why is this easier than submitting the patches to the application
> repo?
> > >>
> > >> > My proposed version of this tree is pushed to a github fork here,
> and
> > >> > is based on the tree from a few weeks ago:
> > >> > https://github.com/edtanous/openbmc
> > >> >
> > >> > It implements all the above for the main branch.  This tree is based
> > >> > on the output of the automated tooling, and in the case where this
> > >> > proposal is accepted, the tooling would be re-run to capture the
> state
> > >> > of the tree at the point where we chose to make this change.
> > >> >
> > >> > The tool I wrote to generate this tree is also published, if you're
> > >> > interested in how this tree was built, and is quite interesting in
> its
> > >> > use of git export/import [5], but functionally, I would not expect
> > >> > that tooling to survive after this transition is made.
> > >>
> > >> I think it would be good to capture the script in openbmc-tools if we
> > >> choose to go ahead with this, mainly as a record of how we achieved
> it.
> > >>
> > >> Andrew
> > >>
> > >> >
> > >> > [1]
> > >> >
> >
> https://lore.kernel.org/openbmc/CACWQX821ADQCrekLj_bGAu=1SSLCv5pTee7jaoVo2Zs6havgnA@mail.gmail.com/
> > >> > [2] https://git-scm.com/docs/git-rebase
> > >> > [3]
> > >> >
> >
> https://github.com/openbmc/docs/blob/master/CONTRIBUTING.md#inclusive-naming
> > >> > [4]
> > >> >
> >
> https://www.yoctoproject.org/docs/1.8/ref-manual/ref-manual.html#ref-classes-externalsrc
> > >> > [5] https://github.com/edtanous/obmc-repo-combine/blob/main/combine
> >
> We have several patches that we apply to the project. When we update those
> patches we see a diff of the patches, and it can be difficult to review a
> diff of a diff.
> I believe this new repo system would allow us to apply those patches to a
> source tree, and manage/maintain the patches better.
> >  "I have no interest in making this easier for you (if it is worse in
> other ways for the project)."   - referring to downstream only features.

This is the wrong way to view features the community does not want, and
> features we would not be allowed to share. There is a layer of complexity
> that we use to integrate with our data centers services that only we need.
> A better model would allow openbmc to be flexible enough to enable
> downstream features.
> Other thoughts:
>    - I suppose it would make it easy for others to fork the project, but I
>    don't think that is a strong enough reason to prevent consolidation.
>    - The consolidation would make it easier to bring new people up to
>    speed. (the system we have works fine, but I suspect the consolidation
> will
>    be a improvement)
>    - We are not changing OWNERS in the change.
>    - Applications vs distribution: I have always viewed openbmc as a
>    collection of application services/applications, combined with a
> distros.
> On Mon, May 23, 2022 at 9:38 AM Ed Tanous <edtanous@google.com> wrote:
> > On Thu, May 19, 2022 at 2:12 PM Cody Smith <scody@google.com> wrote:
> > >
> > > I don't seem to have the original message, so this may get added to
> > Andrew's branch of this thread. Sorry about that in advance.
> >
> > The original message got caught in a lot of peoples spam filters, I'm
> > hoping that explains some of the lack of reply to the initial
> > proposal.
> >
> > >
> > > In general I support moving to a monorepo. We at Google do this, and my
> > significant other at Airbnb also utilizes a monorepo. The advantages are
> > significant, as the world gets a lot less silo'd and making changes that
> > would have spanned across multiple repos now only span the monorepo. This
> > is particularly useful when feature X requires changes to repo A, B and
> C,
> > and the changes on their own break things but shipped together are just
> > fine. I don't even really know how such a feature gets shipped today to
> be
> > honest.
> >
> > I agree with your general sentiment, although a couple nitpicks, what
> > I propose above isn't pure "monorepo" and more analogous to
> > "consolidate a lot of the repos".  FWIW, although I really think it's
> > the right thing to do, "other companies do it for other things" isn't
> > the best of arguments we can make for this.  There are plenty of
> > counter examples of companies with much more entrenched command chains
> > that use multiple repos and the creation of repos as a form of project
> > management to great effect.
> >
> > >
> > > The other thing that tends to happen with monorepos is a lot more
> > conformity, as reviews are carried out by a larger set of people.
> >
> > +1.  Applying consistent clang-format to the codebase for example
> > would be a lot more trivial.
> >
> > > Suddenly `bmcweb` is being reviewed by people who may not have
> > previously cared about or touched that part of the codebase as a bad
> > example. At a minimum more people will have eyes on the changes
> happening.
> > >
> > > I also think that a monorepo avoids one maintainer "lording" over a
> > repo. It happens, the +2ers kind of play a role of the bridge troll, when
> > repo X only has 1-2 +2ers, this can be a real problem. A monorepo with
> 10+
> > +2ers will force the +2ers to engage in debate when they disagree with
> each
> > other, instead of lording over their own kingdoms and having no influence
> > over other kingdoms so to speak.
> >
> > In what I propose, I don't really think this changes given that the
> > existing OWNERS files would still be largely the same, although I
> > agree, more +2er debate would be a good thing if it was the result.
> >
> > >
> > > I haven't made a great set of arguments here but in general I feel like
> > a chance like this would help from an organizational perspective and
> maybe
> > with that better org. in place maybe we can begin addressing some of the
> > other issues we need to address.
> >
> > Thanks for your input.
> >
> > PS, plaintext is generally prefered on this ML, given that it diffs
> > better in the tools.  (Click triple dot in the lower right of gmail,
> > then check "plain text mode").
> >
> > >
> > > Cody Smith
> > > System Software Engineer
> > > Google Cloud Platform Core Services Team
> > > scody@google.com
> > > 720-515-6105 <(720)%20515-6105> <(720)%20515-6105>
> > >
> > >
> > >
> > > On Tue, Apr 5, 2022 at 7:22 PM Andrew Jeffery <andrew@aj.id.au> wrote:
> > >>
> > >> Hi Ed,
> > >>
> > >> I think what's below largely points to a bit of an identity crisis for
> > >> the project, on a couple of fronts. Fundamentally OpenBMC is a distro
> > >> (or as Yocto likes to point out, a meta-distro), and we can:
> > >>
> > >> 1. Identify as a traditional OSS distro: An integration of otherwise
> > >>    independent applications
> > >>
> > >> 2. Identify as an appliance distro: The distro and the
> > >>    applications are a monolith
> > >>
> > >> You're proposing 2, while I think there exists some tension towards 1.
> > >>
> > >> With the amount of custom userspace we've always kinda sat in-between.
> > >> I'd like to see libraries and applications that have use cases outside
> > >> of OpenBMC be accessible to people with those external use cases,
> > >> without being burdened by understanding the rest of the OpenBMC
> context.
> > >> I have a concern that by integrating things in the way you're
> proposing
> > >> it will lead to more inertia there (e.g. for implementations of
> > >> standards MCTP or PLDM (libmctp and libpldm)).
> > >>
> > >> On Tue, 5 Apr 2022, at 03:58, Ed Tanous wrote:
> > >> > The OpenBMC development process as it stands is difficult for people
> > >> > new to the project to understand, which severely limits our ability
> to
> > >> > onboard new maintainers, developers, and groups which would
> otherwise
> > >> > contribute major features to upstream, but don't have the technical
> > >> > expertise to do so.  This initiative, much like others before it[1]
> is
> > >> > attempting to reduce the toil and OpenBMC-specific processes of
> > >> > passing changes amongst the community, and move things to being more
> > >> > like other projects that have largely solved this problem already.
> > >>
> > >> Can you be more specific about which projects here? Do you have links
> > >> to examples?
> > >>
> > >> >
> > >> > To that end, I'd like to propose a change to the way we structure
> our
> > >> > repositories within the project: specifically, putting (almost) all
> of
> > >> > the Linux Foundation OpenBMC owned code into a single repo that we
> can
> > >> > version as a single entity, rather than spreading out amongst many
> > >> > repos.  In practice, this would have some significant advantages:
> > >> >
> > >> > - The tree would be easily shareable amongst the various people
> > >> > working on OpenBMC, without having to rely on a single-source Gerrit
> > >> > instance.  Git is designed to be distributed, but if our recipe
> files
> > >> > point at other repositories, it largely defeats a lot of this
> > >> > capability.  Today, if you want to share a tree that has a change in
> > >> > it, you have to fork the main tree, then fork every single
> subproject
> > >> > you've made modifications to, then update the main tree to point to
> > >> > your forks.
> > >>
> > >> This isn't true, as you can add patches in the OpenBMC tree.
> > >>
> > >> CI prevents these from being submitted, as it should, but there's
> > nothing to
> > >> stop anyone using the `devtool modify ...` / `devtool finish ...` and
> > >> committing the result as a workflow to exchange state (I do this)?
> > >>
> > >> Is the issue instead with devtool? Is it bad? Is the learning curve
> too
> > steep?
> > >> It is at least the Yocto workflow.
> > >>
> > >> > This gets very onerous over time, especially for simple
> > >> > commits.  Having maintained several different companies forks
> > >> > personally, and spoken to many others having problems with the same,
> > >> > adding major features are difficult to test and rebase because of
> > >> > this.  Moving the code to a single tree makes a lot of the toil of
> > >> > tagging and modifying local trees a lot more manageable, as a series
> > >> > of well-documented git commands (generally git rebase[2]).  It also
> > >> > increases the likelihood that someone pulls down the fork to test it
> > >> > if it's highly likely that they can apply it to their own tree in a
> > >> > single command.
> > >>
> > >> Again, this is moot if the patches are applied in-tree.
> > >>
> > >> >
> > >> > - There would be a reduction in reviews.  Today, anytime a person
> > >> > wants to make a change that would involve any part of the tree,
> > >> > there's at least 2 code reviews, one for the commit, and one for the
> > >> > recipe bump.  Compared to a single tree, this at least doubles the
> > >> > number of reviews we need to process.
> > >>
> > >> Is there more work? Yes.
> > >>
> > >> Is it always double? No. Is it sometimes double? Yes.
> > >>
> > >> Often bumps batch multiple application commits. I think this paragraph
> > >> overstates the problem somewhat, but what it does get right is
> > >> identifying that *some* overhead exists.
> > >>
> > >> >  For changes that want to make
> > >> > any change to a few subsystems, as is the case when developing a
> > >> > feature, they require 2 X <number of project changes> reviews, all
> of
> > >> > which need to be synchronized.
> > >>
> > >> Same issue as above here.
> > >>
> > >> > There is a well documented problem
> > >> > where we have no official way to synchronize merging of changes to
> > >> > userspace applications within a bump without manual human
> > >> > intervention.  This would largely render that problem moot.
> > >>
> > >> Right, this can be hard to handle.
> > >>
> > >> It can be mitigated by versioning interfaces (which the D-Bus spec
> > >> calls out[6][7] but OpenBMC fails to do (?)) and supporting multiple
> > >> interfaces for the transition period.
> > >>
> > >> That said, that's also more work, and so needs to be considered in the
> > >> set of trade-offs.
> > >>
> > >> [6]
> >
> https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-interface
> > >> [7]
> >
> https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-bus
> > >>
> > >> >
> > >> > - It would allow most developers to not need to understand Yocto at
> > >> > all to do their day to day work on existing applications.  No more
> > >> > "devtool modify", and related SRCREV bumps.  This will help most of
> > >> > the new developers on the project with a lower mental load, which
> will
> > >> > mean people are able to ramp up faster..
> > >>
> > >> Okay. So devtool is seen as an issue.
> > >>
> > >> Can we improve its visibility and any education around it? Or is it a
> > >> lost cause? If so, why?
> > >>
> > >> Separately, I'm concerned this is an attempt to shield people from
> > >> skills that help them work with upstream Yocto. OpenBMC feels like
> it's
> > >> a bit of an on-ramp for open-source contributions for people who have
> > >> worked in what was previously quite a proprietary environment. We
> tried
> > >> shielding people in the past wrt kernel contributions, and that failed
> > >> pretty spectacularly. We (at least Joel and I) now encourage people to
> > >> work with upstream directly *and support them in the process of doing
> > >> that* rather than trying to mitigate some of the difficulties with
> > >> working upstream by avoiding them.
> > >>
> > >> >
> > >> > - It would give an opportunity for individuals and companies to
> "own"
> > >> > well-supported public forks (ie Redhat) of the codebase, which would
> > >> > increase participation in the project overall.  This already happens
> > >> > quite a bit, but in practice, the forks that do it squash history,
> > >> > making it nearly impossible to get their changes upstreamed from an
> > >> > outside entity.
> > >>
> > >> Not sure this is something we want to encourage, even if it happens in
> > >> practice.
> > >>
> > >> >
> > >> > - It would centralize the bug databases.  Today, bugs filed against
> > >> > sub projects tend to not get answered.
> > >>
> > >> Do you have some numbers handy?
> > >>
> > >> > Having all the bugs in
> > >> > openbmc/openbmc would help in the future to avoid duplicating bugs
> > >> > across projects.
> > >>
> > >> Has this actually been a problem?
> > >>
> > >> >
> > >> > - Would increase the likelihood that someone contributes a patch,
> > >> > especially a patch written by someone else.  If contributing a patch
> > >> > was just a matter of cherry-picking a tree of commits and submitting
> > >> > it to gerrit, it's a lot more likely that people would do it.
> > >>
> > >> It sounds plausible, but again, some evidence for this would be
> helpful.
> > >>
> > >> Why is this easier than submitting the patches to the application
> repo?
> > >>
> > >> > My proposed version of this tree is pushed to a github fork here,
> and
> > >> > is based on the tree from a few weeks ago:
> > >> > https://github.com/edtanous/openbmc
> > >> >
> > >> > It implements all the above for the main branch.  This tree is based
> > >> > on the output of the automated tooling, and in the case where this
> > >> > proposal is accepted, the tooling would be re-run to capture the
> state
> > >> > of the tree at the point where we chose to make this change.
> > >> >
> > >> > The tool I wrote to generate this tree is also published, if you're
> > >> > interested in how this tree was built, and is quite interesting in
> its
> > >> > use of git export/import [5], but functionally, I would not expect
> > >> > that tooling to survive after this transition is made.
> > >>
> > >> I think it would be good to capture the script in openbmc-tools if we
> > >> choose to go ahead with this, mainly as a record of how we achieved
> it.
> > >>
> > >> Andrew
> > >>
> > >> >
> > >> > [1]
> > >> >
> >
> https://lore.kernel.org/openbmc/CACWQX821ADQCrekLj_bGAu=1SSLCv5pTee7jaoVo2Zs6havgnA@mail.gmail.com/
> > >> > [2] https://git-scm.com/docs/git-rebase
> > >> > [3]
> > >> >
> >
> https://github.com/openbmc/docs/blob/master/CONTRIBUTING.md#inclusive-naming
> > >> > [4]
> > >> >
> >
> https://www.yoctoproject.org/docs/1.8/ref-manual/ref-manual.html#ref-classes-externalsrc
> > >> > [5] https://github.com/edtanous/obmc-repo-combine/blob/main/combine
> >

[-- Attachment #2: Type: text/html, Size: 40551 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-05-25 15:03 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-04 18:28 Proposing changes to the OpenBMC tree (to make upstreaming easier) Ed Tanous
2022-04-06  2:19 ` Andrew Jeffery
2022-04-06 15:54   ` Ed Tanous
2022-04-06 17:28     ` Patrick Williams
2022-04-06 20:36       ` Benjamin Fair
2022-04-07  3:26         ` Patrick Williams
2022-04-07 15:39       ` Ed Tanous
2022-04-08 21:36         ` Patrick Williams
2022-05-19 21:12   ` Cody Smith
2022-05-23 16:37     ` Ed Tanous
2022-05-23 21:07       ` John Broadbent
2022-05-23 23:48         ` Brad Bishop
2022-05-24  3:54           ` John Broadbent
2022-05-24 11:32             ` Brad Bishop
2022-04-06 20:06 ` Patrick Williams
2022-05-23 16:53   ` Ed Tanous
2022-04-12  7:23 ` Heyi Guo
2022-05-23 16:27   ` Ed Tanous
2022-05-25 13:31     ` Heyi Guo
2022-05-25 15:02       ` Ed Tanous
2022-05-23 23:27 Nan Zhou

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.