All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Initial Proposal for Packaged Staging Revamp (was [RFC] Make some big changes right after next stable)
@ 2010-03-03 17:09 Chris Larson
  2010-03-03 17:17 ` Koen Kooi
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Chris Larson @ 2010-03-03 17:09 UTC (permalink / raw)
  To: openembedded-devel

On Wed, Mar 3, 2010 at 9:30 AM, Tom Rini <tom_rini@mentor.com> wrote:

> As many people know, there's a lot of "odd" internal things that OE
> does, that if we had it to do over, we would do differently.  What I
> would like to propose is that in time for the next stable branch we:
> 1: Define a set of DISTROs/MACHINEs/build targets that need to stay
> working
> 2: In a separate branch (per big change), get one of these big, going to
> leave some stuff broken changes
> 3: Define / document what needs to be done before these branches can be
> merged back (something like #1 is working still, and if applicable a
> guide to the common problems/how to fix people are going to run into).
>
> What I'm getting at is that this would let us do things like rework the
> "this is where we place things that we build other recipes with" concept
> so that sysroot just works (and otherwise makes more sense again).  Or
> "let us have more consistency in build with compared to what could end
> up on device".  And so on.
>
> What do people think?  And what would people work on?
>

I think this is a great idea overall, and have many ideas for large changes
to be considered.  Here's my first (note, have mercy, I haven't had much
need to write anything longer than a paragraph in some time, will need to
work on my writing more :):

Proposal for the Revamp of "Packaged Staging"

Goals:
- Simple implementation
- Managed staging area
- "Build" from cached/prebuilt binaries
- Reduce behavioral differences between the prebuilt and from scratch cases
- Intrinsic to the system, no longer opt-in

For those who aren't familiar with it, "packaged staging" is what it sounds
like.  It utilizes package management to manage the OpenEmbedded staging
area.  This provides a number of benefits, such as ease of uninstallation of
staging files, switch to different versions of staging files without leaving
remnants around, etc.  I believe that these benefits are obvious, and good
enough on their own merits (given people have hit the issues it fixes on
multiple occasions, even today) to argue for its inclusion as default
OpenEmbedded behavior, but there are other features, including caching of
build output to speed up builds and ease of checks for missing dependencies
via using this build output.

Currently, one opts into its usage by setting INHERIT += "packaged-staging".
 This results in a number of behavioral changes in a typical build.  An
early task for all recipes, 'do_setscene' starts to do more work, but note
that this component is not for the package management of staging, but was
done to take advantage of the packaged staging files instead as a cache of
previous build output, to speed up build time.  The setscene task checks to
see if the packaged staging package for this recipe already exists, and if
so, it extracts it, including the stamps for the recipe, so that most of the
subsequent tasks will be skipped if the package is valid.  It does some
sanity checks to determine this validity.

In addition to the do_setscene behavioral change are the pieces of code
which produce the packaged staging package, and which installs/uninstalls
pstage packages from the staging area.

I would like to propose an alternative to the current implementation, which
I believe will aleviate some headaches (for example, those caused by the
stagefile bits, which is more functionality that slips beyond the original
intent of package managed staging), make it easier to add more traceability
to the builds, reduce behavioral differences between the use of cached
binaries and building from scratch, and should help to prepare for some
possible moves in the future.

To summarize, I propose the creation of an archive/package which acts as the
primary artifact to come out of the build of a recipe.  By capturing *all*
output of a recipe into a single place, we reduce confusion and make things
easier to track.  Every subsequent task by the recipe (or other recipes)
will go based on this archive (or cached, extracted contents of this
archive, for performance).  Builds from cached binaries would operate based
on this archive, so that the execution of the subsequent tasks would be
identical between the prebuilt and from scratch cases, and it makes it clear
that this is not just "packaged staging" in concept or intent.

I intend this archive to essentially contain the output of do_install, and
to add the necessary bits into the build process to skip all tasks that
do_install depends upon if the archive already exists, without capturing the
stamps themselves into the archive, to reduce implementation headaches and
complexity.  Staging will be populated based on this archive (and new style
staging is already based upon do_install, so this is no great stretch),
likely using a generated package with a package manager under the hood for
staging alone, not all of TMPDIR, and not in a way that's at all user
visible.  Note that by doing it this way, it also moves the packaging
process, use of prebuilts will no longer mean that packaging has already
been completed.  This makes it easier for us to change around the packaging
of a recipe without having to rebuild the thing, and makes it easier for us
to move in a direction of separation of responsibilities at some point
(packaging is more distro policy than recipe responsibility), if we choose
to do so.

While I haven't fully worked out the logistics of the implementation of
this, I did, in the past, create a prototype of a "private staging areas"
implementation which also implemented this primary archive as a side effect,
and utilized it for the prototype.  Does anyone have any thoughts on this,
ideas for improvement, arguments either for or against a pstage revamp, or
alternative ideas for a revamp?  I'd love to hear what people think about
this possibility.
-- 
Christopher Larson
clarson at kergoth dot com
Founder - BitBake, OpenEmbedded, OpenZaurus
Maintainer - Tslib
Senior Software Engineer, Mentor Graphics


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Initial Proposal for Packaged Staging Revamp (was [RFC] Make some big changes right after next stable)
  2010-03-03 17:09 [RFC] Initial Proposal for Packaged Staging Revamp (was [RFC] Make some big changes right after next stable) Chris Larson
@ 2010-03-03 17:17 ` Koen Kooi
  2010-03-04 10:15   ` Richard Purdie
  2010-03-03 17:43 ` Richard Purdie
  2010-03-05 15:38 ` Phil Blundell
  2 siblings, 1 reply; 8+ messages in thread
From: Koen Kooi @ 2010-03-03 17:17 UTC (permalink / raw)
  To: openembedded-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03-03-10 18:09, Chris Larson wrote:

> To summarize, I propose the creation of an archive/package which acts as the
> primary artifact to come out of the build of a recipe. 

That sounds like a good way to do packaged-staging without making my
head explode :)

[snip]

> While I haven't fully worked out the logistics of the implementation of
> this, I did, in the past, create a prototype of a "private staging areas"
> implementation which also implemented this primary archive as a side effect,
> and utilized it for the prototype.  Does anyone have any thoughts on this,
> ideas for improvement, arguments either for or against a pstage revamp, or
> alternative ideas for a revamp?  I'd love to hear what people think about
> this possibility.

I think the 'private staging' approach is the way to go, it makes the
build determistic instead of "might pick up extras from staging". And I
think it will also cure the mysterious "every python recipe breaks when
some, yet unknown, recipe is built"

regards,

Koen
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iD8DBQFLjpm/MkyGM64RGpERAhjkAKCOyTOeYoGo5RFKTOyrGcIgfAqO1gCeMxhs
r3gKpDutUphnVfdOpMZm8as=
=wRtB
-----END PGP SIGNATURE-----




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Initial Proposal for Packaged Staging Revamp (was [RFC] Make some big changes right after next stable)
  2010-03-03 17:09 [RFC] Initial Proposal for Packaged Staging Revamp (was [RFC] Make some big changes right after next stable) Chris Larson
  2010-03-03 17:17 ` Koen Kooi
@ 2010-03-03 17:43 ` Richard Purdie
  2010-03-03 18:28   ` Chris Larson
  2010-03-05 15:38 ` Phil Blundell
  2 siblings, 1 reply; 8+ messages in thread
From: Richard Purdie @ 2010-03-03 17:43 UTC (permalink / raw)
  To: openembedded-devel

On Wed, 2010-03-03 at 10:09 -0700, Chris Larson wrote:
> Proposal for the Revamp of "Packaged Staging"
> 
> Goals:
> - Simple implementation
> - Managed staging area
> - "Build" from cached/prebuilt binaries
> - Reduce behavioral differences between the prebuilt and from scratch cases
> - Intrinsic to the system, no longer opt-in

I had to smile when I read this as you make it sound this isn't the
direction packaged-staging is already moving in :). The things you
describe are all things I've had in mind, just the practicalities of the
real world mean we're not there yet.

Basically when you describe is what I also want to see packaged staging
and OE in general doing. You're right to point out that what we're
trying to achieve is beyond the scope of plain "put staging under the
control of a package manager".

Where we might differ how exactly to do it technically. I really dislike
some of the way packaged-staging works but its all done that way for a
reason. The reasons most likely become apparent when you try and find an
alternative to what it does.

> I would like to propose an alternative to the current implementation, which
> I believe will aleviate some headaches (for example, those caused by the
> stagefile bits, which is more functionality that slips beyond the original
> intent of package managed staging), make it easier to add more traceability
> to the builds, reduce behavioral differences between the use of cached
> binaries and building from scratch, and should help to prepare for some
> possible moves in the future.
> 
> To summarize, I propose the creation of an archive/package which acts as the
> primary artifact to come out of the build of a recipe.

My view on this is a kind of hybrid. Firstly, we need to adopt some kind
checksum system which represent staging packages. If the checksum
doesn't match what we want, the staging package is invalid.

Secondly, I agree we need to capture all output of a task we do that
already, just badly. I like the idea of creating structures under
WORKDIR where these things are put, like the output of do_install, the
output of do_package (split up do_install and some package data) as well
as the output of the package generation step. 

Your usecase is too focused on your specific problems and on do_install
though. Why is do_install special? I'd go one step further and allow the
"staging package" output of a recipe to be multiple packages each one
representing a task.

We could go as far as mandating only output under WORKDIR should be made
(in specified directories per task). bitbake could then have a
postprocessing task defined which looks at an output directory and
generates a corresponding "staging" package and also applies it to a
core sysroot directory / wherever.

So, if you build an rpm based image, it sees all the package_write_rpm
prebuilds and just makes sure they're installed, nothing else. This
approach seems extensible and generic, things which serve us well.

The pitfalls of this are (random brainstorming):

1. Stamp file handling - needs a total rethink really. Not sure how to 
   do it but I have given it thought before.
2. staging package covering tmpdir - we did this to cover pkgdata, 
   cross, stamps, deploy as well as staging.

   I like pb_'s idea of merging cross into the native sysroot and that 
   would remove one reason for using tmpdir, not the sysroot. We'd also 
   need to ensure packages only write to one sysroot. Not possible with 
   the gcc recipe for example.

   Stamp handling may be handled with 1. above. 

   pkgdata/deploy? Above ideas may help...

3. Optional packages staging - should be made mandatory to simplify code
4. Logistics of doing it. We can't even get packaged staging merged 
   into OE :(

So yes, I'd support the idea but the devil is in the details as always.

Cheers,

Richard






^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Initial Proposal for Packaged Staging Revamp (was [RFC] Make some big changes right after next stable)
  2010-03-03 17:43 ` Richard Purdie
@ 2010-03-03 18:28   ` Chris Larson
  2010-03-03 19:50     ` Frans Meulenbroeks
  2010-03-04 10:09     ` Richard Purdie
  0 siblings, 2 replies; 8+ messages in thread
From: Chris Larson @ 2010-03-03 18:28 UTC (permalink / raw)
  To: openembedded-devel

On Wed, Mar 3, 2010 at 10:43 AM, Richard Purdie <rpurdie@rpsys.net> wrote:

> On Wed, 2010-03-03 at 10:09 -0700, Chris Larson wrote:
> > Proposal for the Revamp of "Packaged Staging"
> >
> > Goals:
> > - Simple implementation
> > - Managed staging area
> > - "Build" from cached/prebuilt binaries
> > - Reduce behavioral differences between the prebuilt and from scratch
> cases
> > - Intrinsic to the system, no longer opt-in
>
> I had to smile when I read this as you make it sound this isn't the
> direction packaged-staging is already moving in :). The things you
> describe are all things I've had in mind, just the practicalities of the
> real world mean we're not there yet.
>

No, I think I didn't make this clear enough.  These goals are for the entire
implementation, not the diff of the current method against new.  These are
the end goals of the needs we want this entire notion of binary caching and
package managed staging to solve.  I didn't intend for it to sound like
pstage wasn't moving in that direction.  I just believe it is good time to
step back and consider what we're trying to accomplish, and how best to get
there.

Basically when you describe is what I also want to see packaged staging
> and OE in general doing. You're right to point out that what we're
> trying to achieve is beyond the scope of plain "put staging under the
> control of a package manager".
>

I'm glad to hear we want to move in similar directions, that avoids problems
in making this happen, and keeps the TSC out of it ;)

 Where we might differ how exactly to do it technically. I really dislike
> some of the way packaged-staging works but its all done that way for a
> reason. The reasons most likely become apparent when you try and find an
> alternative to what it does.
>

Yes, I know, as I say toward the end of the email, I implemented this idea
in a prototype of private staging, so I ran into at least some of the
reasons behind the current work.  I readily admit you must have more
experience with the pstage quirks, since you wrote the thing, so I welcome
as much input as you're willing to give on the subject.


> > I would like to propose an alternative to the current implementation,
> which
> > I believe will aleviate some headaches (for example, those caused by the
> > stagefile bits, which is more functionality that slips beyond the
> original
> > intent of package managed staging), make it easier to add more
> traceability
> > to the builds, reduce behavioral differences between the use of cached
> > binaries and building from scratch, and should help to prepare for some
> > possible moves in the future.
> >
> > To summarize, I propose the creation of an archive/package which acts as
> the
> > primary artifact to come out of the build of a recipe.
>
> My view on this is a kind of hybrid. Firstly, we need to adopt some kind
> checksum system which represent staging packages. If the checksum
> doesn't match what we want, the staging package is invalid.
>

Yes, I agree that we need this, but I believe that's a secondary issue.  In
order to implement that properly, we need to more fully track the *input*
into the build as well, not just the output, otherwise there's no good way
to determine how to invalidate.  If we start naive, we could capture only
the variables that are already captured in the PSTAGE_PKGPATH & the like
into a signature, coupled with a hash of the SRC_URI contents, as the input,
and an associated hash for do_install as the output of the operation.  Hmm.

Secondly, I agree we need to capture all output of a task we do that
> already, just badly. I like the idea of creating structures under
> WORKDIR where these things are put, like the output of do_install, the
> output of do_package (split up do_install and some package data) as well
> as the output of the package generation step.
>
> Your usecase is too focused on your specific problems and on do_install
> though. Why is do_install special? I'd go one step further and allow the
> "staging package" output of a recipe to be multiple packages each one
> representing a task.
>

As I mentioned on IRC, do_install *is* special, at least in my opinion,
because it's the final output of the upstream build/buildsystem.  It is what
we want/need from them.  Everything else we do can come from that, and all
the tasks before it are intermediate steps whose results are of limited
usefulness, other than for traceability (which I agree we need, just don't
necessarily think we need that *now*).  I have a prototype of using git to
track changes to WORKDIR through the tasks, with automatic commits of the
task output and corresponding tags for each task.  I think that kind of
thing would be extremely useful, but I think pursuing that route would be
better done as a subsequent task.

We could go as far as mandating only output under WORKDIR should be made
> (in specified directories per task). bitbake could then have a
> postprocessing task defined which looks at an output directory and
> generates a corresponding "staging" package and also applies it to a
> core sysroot directory / wherever.
>

This is what I already suggested in my email  — the archive from do_install
is the primary artifact, *everything* comes from / is generated from that,
including the package for use with package managed staging.

So, if you build an rpm based image, it sees all the package_write_rpm
> prebuilds and just makes sure they're installed, nothing else. This
> approach seems extensible and generic, things which serve us well.
>
> The pitfalls of this are (random brainstorming):
>
> 1. Stamp file handling - needs a total rethink really. Not sure how to
>   do it but I have given it thought before.
>

This isn't an issue if you look at it the way I did in my proposal, which is
that this artifact/archive is the primary result of a build of a recipe, and
all the tasks that lead up to do_install (not those that may run earlier,
just those that do_install depends upon directly or indirectly) are
intermediate steps, and can be skipped.  Setscene can certainly generate
that, rather than extracting it in the form of stamps from the pstage
package.  You've obviously had more experience in working out stamp madness
than I have, and maybe I'm making this simpler than it is, but maybe it's
simpler than you think as well.


> 2. staging package covering tmpdir - we did this to cover pkgdata,
>   cross, stamps, deploy as well as staging.
>

cross is going away if we go the toolchain-desuck route, which I think we
should.  stamps aren't a serious problem other than corner cases, if you
approach it the way I suggest, and deploy and staging would both come from
the aforementioned archive (or archives/repository, as in your suggestion).

3. Optional packages staging - should be made mandatory to simplify code
> 4. Logistics of doing it. We can't even get packaged staging merged
>   into OE :(
>

I've found that most of the time it's just a matter of someone sitting down
and implementing it.  Many things I've wanted to see in OE since it was
started were just a matter of sitting down for 48 hours and coding it up.
 If someone did a patch to make the current pstage mandatory, I suspect we
could get it in, but I feel this is a good opportunity for us to take a step
back, rather than just removing conditionals..

So, to summarize, you disagree with the notion of the 'make install' being
the primary artifact of the recipe, and want instead deep tracking of the
output of every task, with caching at that level.  I like that idea as a
means of adding traceability, as I mention above with the prototype of git
task tracking, but I don't necessarily see it as being something that has to
be either/or.  If we can agree that everything *up to* do_install is an
intermediate step, and not necessary for binary caching (though yes, useful
for traceability), I think we can build what you want for tasks *after*
do_install on top of what I suggest, rather than as an alternative to what I
suggest.  Thoughts on this?  I'd like to find a compromise that can satisfy
both of us for the future, but which allows me to get to the coding of this
piece of it immediately.
-- 
Christopher Larson
clarson at kergoth dot com
Founder - BitBake, OpenEmbedded, OpenZaurus
Maintainer - Tslib
Senior Software Engineer, Mentor Graphics


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Initial Proposal for Packaged Staging Revamp (was [RFC] Make some big changes right after next stable)
  2010-03-03 18:28   ` Chris Larson
@ 2010-03-03 19:50     ` Frans Meulenbroeks
  2010-03-04 10:09     ` Richard Purdie
  1 sibling, 0 replies; 8+ messages in thread
From: Frans Meulenbroeks @ 2010-03-03 19:50 UTC (permalink / raw)
  To: openembedded-devel

Tom: +1 for the initiative

Having had my share of trouble with packaged staging and as a result
from that having fixed a bug or what, I do welcome the idea of
improving it.

Wrt to the ideas & discussion above:
I consider the output from do-install as being the most important.
That is the deliverable of the package. Staging intermediate steps to
me seems less useful.
However if it is decided to do that, please consider making it
optional. I don't think I appreciate having staged packages with all
the intermediate object files.
Also if something changes typically all of the package needs to be
rebuild. E.g. if a provider changes it could be that configure gives a
different result, so there is no point in keeping that.

Wrt the stamps: I think the current stamps are useful when it comes to
deciding what is build and what has to be rebuild (e.g. if do_compile
fails).
For packaged staging it would be nice if one could also put them on a
feed so others could build prebuild packages (if they desire).
(this could also be internal in e.g. a company).
The current time-based stamps are not too handy here as they might
cause problems due to clock skew between systems and people mixing up
local time and GMT etc.

Instead of the current stamps a signature could be better.
A very simplistic proposal would be to use the chain of the child
signatures followed by the PR of the package (only the digit part, not
the r)
Basically in the end this delivers a sequence of all PR values of all
underlying recipes. If an underlying recipe is bumped the signature
generated while parsing does not match any more and dependent packages
need to be rebuild.

Unfortunately things are probably not that simple. We might need to
decide on ordering of packaged (the way they appear in DEPENDS?
Alphabetically by package name?
Also I can imagine that signatures (especially for images become very long).
We might be able to use a hash function to reduce the size of the
signatures and only store the hash.

What I am not sure of is whether it is desirable to build a package
that you do not want to be staged (or not yet).
One of the issues I am facing every once in a while is that I modified
a recipe, bumped PR, but in the end the change turns out not to be
good and I want to revert my work.
Guess it might be nice to be able to inhibit the staging of recipes
that are not yet commited (maybe user configable in e.g. local.conf)

BTW in the past I've been pondering about another scenario: give each
package its own subdir in staging and control which packages are
actually used through -I and -L flags.
That would avoid that one has to unpack a lot of packages.
(I'm assuming that we want to start with an empty staging when a build
starts; and that it is being popluated as needed; otherwise you can
still inadvertedly pick up remains from another package, but I am
fearing the performance penalty; I can imagine that it makes sense to
have some core packages always present (e.g. libc).

Thinking of it, do we also want to apply this for native, or should we
assume that native always contains all native packages (so implictly
making all recipes depend on all -native recipes).
After all native is the build environment.

guess there are plenty of things I overlooked and forgot. I'll add if
new things pop up.

Frans

PS: if desired I am willing to contribute, but note that my pythonese
is marginal.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Initial Proposal for Packaged Staging Revamp (was [RFC] Make some big changes right after next stable)
  2010-03-03 18:28   ` Chris Larson
  2010-03-03 19:50     ` Frans Meulenbroeks
@ 2010-03-04 10:09     ` Richard Purdie
  1 sibling, 0 replies; 8+ messages in thread
From: Richard Purdie @ 2010-03-04 10:09 UTC (permalink / raw)
  To: openembedded-devel

On Wed, 2010-03-03 at 11:28 -0700, Chris Larson wrote:
> No, I think I didn't make this clear enough.  These goals are for the entire
> implementation, not the diff of the current method against new.  These are
> the end goals of the needs we want this entire notion of binary caching and
> package managed staging to solve.  I didn't intend for it to sound like
> pstage wasn't moving in that direction.  I just believe it is good time to
> step back and consider what we're trying to accomplish, and how best to get
> there.

Ultimately I think we do want to achieve fundamentally the same thing,
ignoring a few of the implementation details.

The current pstage approach suffers a lot for a few reasons:

a) It had to be optional and opt-in
b) It has to cope with legacy staging mode
c) Discoveries were made during its implementation which required 
   fixes, some of which were more hacky than I'd have liked.
d) It had to be a smooth continual migration path

I guess I will be frustrated if we now decide the hoops I jumped through
so far to move things forward were unnecessary and taking a shortcut is
in fact ok. Its been made clear to me in the past that changes need to
be incremental.

Whilst I do strongly dislike the current code I do not think its beyond
redemption and I would also prefer to do a migration to an improvement
based on the existing code, rather then engineering from scratch.

> I'm glad to hear we want to move in similar directions, that avoids problems
> in making this happen, and keeps the TSC out of it ;)

Well, we do seem to fundamentally disagree on the approach to this and I
think the TSC does need to approve a change on this scale, particularly
if its as much of a change in direction as you want to take.

> Yes, I know, as I say toward the end of the email, I implemented this idea
> in a prototype of private staging, so I ran into at least some of the
> reasons behind the current work.  I readily admit you must have more
> experience with the pstage quirks, since you wrote the thing, so I welcome
> as much input as you're willing to give on the subject.

I will do my best to provide it...

> > My view on this is a kind of hybrid. Firstly, we need to adopt some kind
> > checksum system which represent staging packages. If the checksum
> > doesn't match what we want, the staging package is invalid.
> 
> Yes, I agree that we need this, but I believe that's a secondary issue.

First time around we decided several things would be "later" and we have
the partial implementation we have now. We're doing the painful work in
OE regarding the do_stage functions and whilst that happens there is no
hurry to write a new solution since all solutions pretty much depend on
that work being completed.

Since we have time whilst that happens I'd like to see a complete
coherent proposal which covers all the issues. It may or may not be
obvious but I'm doing certain development work on Poky and staging is
one of the targets. As an example of this we've massively improved the
mirror handling within bitbake and allowed fetching of staging packages
from multiple sources. This was done taking a step back and fixing all
levels of the code to do what actually makes sense.

For Poky I have a list of requirements and have plans laid out on how to
address them. They're based on incremental steps on existing code slowly
evolving to the end goal as agreed by the former OE core team, as
discussed at OEDEM and touched on in the TSC meetings. To list the plans
I have in mind which address some of the requirements I have:

a) Remove legacy code path (now never used in Poky)
b) Make the code default not optional (not done in Poky but only so OE 
   can keep in sync easily)
c) Add fetching of staging packages (done in Poky including better 
   mirror handling code for bitbake)
d) Add checksum support (not implemented yet)
e) Look at the general stamp problem (most likely need to change 
   bitbake)
f) Rename staging to sysroots (done in Poky, not OE yet)
g) Move cross into native sysroot (OE has patches in progress)
h) Make cross and native staging packages path independent (partially 
   complete in Poky)
i) Split the deploy/ipk, deploy/rpm, deploy/deb data into separate 
   staging packages
j) Fix staging package architecture issues
k) General cleanup having achieved all the above

Its been commented that I'm being defensive of the current code/approach
and taking it personally. I think its fair to say I've spent a lot of
time getting to where we are now and have also plans in mind for the
future. Being told that actually we need to take a step back, do it from
scratch and so on implies we're on the wrong path so its only natural
there will be a defensive element to my position.

>   In
> order to implement that properly, we need to more fully track the *input*
> into the build as well, not just the output, otherwise there's no good way
> to determine how to invalidate.  If we start naive, we could capture only
> the variables that are already captured in the PSTAGE_PKGPATH & the like
> into a signature, coupled with a hash of the SRC_URI contents, as the input,
> and an associated hash for do_install as the output of the operation.  Hmm.

I agree this is not a simple problem but that doesn't mean we shouldn't
do something about it. The solution I envisage would capture a lot more
variables (probably *unexpanded* versions) into the checksum. I even
have ideas about including things like the do_xyz function contents in
the checksum. How efficiently we can do this I'm as yet unsure but I
think its possible to make it work really well.

From an implementation standpoint, I'd see a hash being generated at
finalise time after parsing. A task's hash would be this hash combined
with the hashes of its dependencies.

I can see the checksums pretty much removing the need for the current
stamp handling if we get it right. The stamp files caused me no end of
grief last time around and if the checksums work we simply don't need
them anymore.

So if we can remove the problems with stamps by using strong checksums,
perhaps even at the bitbake level I would not call this issue
secondary ;-).

> As I mentioned on IRC, do_install *is* special, at least in my opinion,
> because it's the final output of the upstream build/buildsystem.

On this we fundamentally disagree. do_install is one form of output and
is an intermediate step on the way to many of our other outputs.
Different users will find those different outputs of differing value.

If I'm building an image, I do not want to have to run the
do_package_write_ipk functions for every staging package I just
downloaded just in order to build the image.

I see no reason why I shouldn't be able to add some other task which
processes the compiled source and generates some other kind of output
either.

Yes, my view is more complex but its this kind of attention to detail
which makes OE what it is and why it wins compared to a lot of other
build systems.

In my view, *any* solution which requires do_package_write_ipk to have
to run again in order to build an image is broken and inferior to what
we have now.

>   It is what
> we want/need from them.  Everything else we do can come from that, and all
> the tasks before it are intermediate steps whose results are of limited
> usefulness, other than for traceability (which I agree we need, just don't
> necessarily think we need that *now*).

You have a very narrow focus. How about things like build speed and
efficiency and having an architecture which is suitably generic and
powerful?

>   I have a prototype of using git to
> track changes to WORKDIR through the tasks, with automatic commits of the
> task output and corresponding tags for each task.  I think that kind of
> thing would be extremely useful, but I think pursuing that route would be
> better done as a subsequent task.

I think all these things are intertwined.

> We could go as far as mandating only output under WORKDIR should be made
> > (in specified directories per task). bitbake could then have a
> > postprocessing task defined which looks at an output directory and
> > generates a corresponding "staging" package and also applies it to a
> > core sysroot directory / wherever.
> 
> This is what I already suggested in my email  — the archive from do_install
> is the primary artifact, *everything* comes from / is generated from that,
> including the package for use with package managed staging.

The two paragraphs above are not equal. My proposal is:

do_install           - creates ${WORKDIR}/install
do_package           - creates ${WORKDIR}/install-split
do_package_write_ipk - creates ${WORKDIR}/ipk
do_deploy            - creates ${WORKDIR}/deploy

so we stop tasks writing outside WORKDIR by convention. We can then add
something to bitbake's function handling which allows for definition of
a post process routine which can probably also double as a staging
package install routine. The directories would be set using flags:

do_deploy[outputdir] = "${WORKDIR}/deploy"
do_deploy[postprocess] = "do_deploy_postprocess"
do_deploy_postprocess () {
	cp -r ${WORKDIR}/deploy/* ${DEPLOY_DIR}
}


so then if a build needs the ipks say to build an image, it only has to
install the do_package_write_ipk task output package.

Note some interesting ways my proposal could work in that if you change
FILES, it would invalidate the stamps for do_package but not do_install
(assuming we do task level checksums?) so the do_install package could
install and then just be repackaged. The first system to do this could
share those results into the staging package pool so then other machines
could just build the image from the _ipk packages again.


Your proposal is that do_install is the output package and everything
beyond that *always* has to be rebuilt.

I don't think people will be happy with having to rerun do_package
(including the QA) and the package_write_* tasks every time compared to
an accelerated build from staging packages.


> > 1. Stamp file handling - needs a total rethink really. Not sure how to
> >   do it but I have given it thought before.
> 
> This isn't an issue if you look at it the way I did in my proposal, which is
> that this artifact/archive is the primary result of a build of a recipe, and
> all the tasks that lead up to do_install (not those that may run earlier,
> just those that do_install depends upon directly or indirectly) are
> intermediate steps, and can be skipped.  Setscene can certainly generate
> that, rather than extracting it in the form of stamps from the pstage
> package.  You've obviously had more experience in working out stamp madness
> than I have, and maybe I'm making this simpler than it is, but maybe it's
> simpler than you think as well.

It depends what level of assumptions and hardcoding you want in
setscene. Anytime someone adds a new task are you going to need to
update the setscene function? 

The current solution is generic which was painful but we don't get weird
bug reports when people add tasks or change the task orders around.

> > 2. staging package covering tmpdir - we did this to cover pkgdata,
> >   cross, stamps, deploy as well as staging.
> 
> cross is going away if we go the toolchain-desuck route, which I think we
> should.

Yes, I was just highlighting this was a dependency.

>   stamps aren't a serious problem other than corner cases, if you
> approach it the way I suggest,

I wish I was so sure about that. My conclusion is we'll end up replacing
them with something containing a hash.

>  and deploy and staging would both come from
> the aforementioned archive (or archives/repository, as in your suggestion).

deploy functions are going to need a rewrite

> 3. Optional packages staging - should be made mandatory to simplify code
> > 4. Logistics of doing it. We can't even get packaged staging merged
> >   into OE :(
> 
> I've found that most of the time it's just a matter of someone sitting down
> and implementing it.  Many things I've wanted to see in OE since it was
> started were just a matter of sitting down for 48 hours and coding it up.

Many things are but this one is a major architecture change depending on
changes to every recipe. We've started that process, I'd like to see it
through.

>  If someone did a patch to make the current pstage mandatory, I suspect we
> could get it in, but I feel this is a good opportunity for us to take a step
> back, rather than just removing conditionals..

Now, we could get that in I think yes. A year ago - no chance. See above
on why "taking a step back" grates a bit, particularly if the proposal
is doesn't cover the things we need to cover and is a regression in some
ways.

> So, to summarize, you disagree with the notion of the 'make install' being
> the primary artifact of the recipe, and want instead deep tracking of the
> output of every task, with caching at that level.

Correct.

>   I like that idea as a
> means of adding traceability, as I mention above with the prototype of git
> task tracking, but I don't necessarily see it as being something that has to
> be either/or.  If we can agree that everything *up to* do_install is an
> intermediate step, and not necessary for binary caching (though yes, useful
> for traceability),

I agree with this, the output of those tasks is not useful in a packaged
format.

>  I think we can build what you want for tasks *after*
> do_install on top of what I suggest, rather than as an alternative to what I
> suggest.  Thoughts on this?  

but I disagree with the leap of logic here. I don't like the "on top of"
part of the proposal, I think we need something generic, flexible and in
keeping with the rest of the core.

> I'd like to find a compromise that can satisfy
> both of us for the future, but which allows me to get to the coding of this
> piece of it immediately.

How about coding things incrementally upon the foundations we already
have?

The trouble is you have a very focused view of what you want to achieve.
I have a much bigger picture in mind and feel that replacing pstage with
what you describe is partly a step backwards and will hamper certain
future developments which we need equally badly as a pstage cleanup
(which is what this amounts to).

Cheers,

Richard





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Initial Proposal for Packaged Staging Revamp (was [RFC] Make some big changes right after next stable)
  2010-03-03 17:17 ` Koen Kooi
@ 2010-03-04 10:15   ` Richard Purdie
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Purdie @ 2010-03-04 10:15 UTC (permalink / raw)
  To: openembedded-devel

On Wed, 2010-03-03 at 18:17 +0100, Koen Kooi wrote:
> On 03-03-10 18:09, Chris Larson wrote:
> > To summarize, I propose the creation of an archive/package which acts as the
> > primary artifact to come out of the build of a recipe. 
> 
> That sounds like a good way to do packaged-staging without making my
> head explode :)

This is mainly due to the constraints that were placed upon its
development. If we can relax some of the constraints, we can make it
simpler.

> I think the 'private staging' approach is the way to go, it makes the
> build determistic instead of "might pick up extras from staging". And I
> think it will also cure the mysterious "every python recipe breaks when
> some, yet unknown, recipe is built"

Yes, private staging areas are something we need and I don't think
anyone believes otherwise. This isn't something either proposal is
offering directly. Its only something that it will be possible to
develop more easily once we have staging packages working well.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] Initial Proposal for Packaged Staging Revamp (was [RFC] Make some big changes right after next stable)
  2010-03-03 17:09 [RFC] Initial Proposal for Packaged Staging Revamp (was [RFC] Make some big changes right after next stable) Chris Larson
  2010-03-03 17:17 ` Koen Kooi
  2010-03-03 17:43 ` Richard Purdie
@ 2010-03-05 15:38 ` Phil Blundell
  2 siblings, 0 replies; 8+ messages in thread
From: Phil Blundell @ 2010-03-05 15:38 UTC (permalink / raw)
  To: openembedded-devel

On Wed, 2010-03-03 at 10:09 -0700, Chris Larson wrote:
> To summarize, I propose the creation of an archive/package which acts as the
> primary artifact to come out of the build of a recipe.  By capturing *all*
> output of a recipe into a single place, we reduce confusion and make things
> easier to track.  Every subsequent task by the recipe (or other recipes)
> will go based on this archive (or cached, extracted contents of this
> archive, for performance).  Builds from cached binaries would operate based
> on this archive, so that the execution of the subsequent tasks would be
> identical between the prebuilt and from scratch cases, and it makes it clear
> that this is not just "packaged staging" in concept or intent.

Yes, this is pretty much what I have been thinking of as well.  I think
the only real difference between what you have suggested here and what I
have had mind is that I was originally planning to re-use the existing
binary package output (i.e. what you get from do_package) to populate
the staging directory.  The nice thing about that approach is that it
guarantees that you will get the same result for an in-tree OE build
compared to an on-target build against the installed -dev packages,
which would help to avoid some of the issues we have had in the past
with the -dev packages being defective through lack of testing.

Indeed, one of the main motivations for the work that I have been doing
on the toolchain-desuck branch is to remove the special-casing of the
toolchain recipes so that they just generate output packages which you
can then install in the same way as anything else.

Introducing a new archive format which captures the entirety of the
build output up to and including do_install is indeed quite tempting as
well, though.

p.




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-03-05 15:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-03 17:09 [RFC] Initial Proposal for Packaged Staging Revamp (was [RFC] Make some big changes right after next stable) Chris Larson
2010-03-03 17:17 ` Koen Kooi
2010-03-04 10:15   ` Richard Purdie
2010-03-03 17:43 ` Richard Purdie
2010-03-03 18:28   ` Chris Larson
2010-03-03 19:50     ` Frans Meulenbroeks
2010-03-04 10:09     ` Richard Purdie
2010-03-05 15:38 ` Phil Blundell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.