All of lore.kernel.org
 help / color / mirror / Atom feed
* Quality of meta-oe metadata
@ 2014-03-30  1:31 Martin Jansa
  2014-03-30  5:33   ` [OE-core] " Trevor Woerner
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Martin Jansa @ 2014-03-30  1:31 UTC (permalink / raw)
  To: openembedded-core, openembedded-devel

[-- Attachment #1: Type: text/plain, Size: 4170 bytes --]

Hi, sorry for longer e-mail, this is one of topic I would like to discuss
on OEDAM (http://openembedded.org/wiki/OEDAM), but having some feedback and
thoughts in advance will be very useful.

As people can notice from my "State of bitbake world" e-mails or
http://www.openembedded.org/wiki/Bitbake_World_Status
we never had "green" builds. There are always 20+ failed tasks in those
big builds and just reading the numbers isn't good indicator of quality,
because sooner you break something in dependency tree, fewer recipes will
be actually tested, so fewer failed tasks often means that something
important is broken.

There are IMHO at least 3 reasons for this depressing state:

1) Nobody is paid/dedicated to fix them, there is no company behind meta-oe
   layers, now SWAT team, not even dedicated maintainers to which I could send
   error from latest build and ask them to fix it before the end of
   week/month.

   Kudos to people who are sometimes sending patches for such issues!

2) There are a lot of changes and component upgrades in oe-core which
   sometimes aren't very straight-forward to adapt to and issues stay in
   meta-oe for months.

   I don't mean it's oe-core fault or that changes to oe-core should slow
   down just because meta-oe, especially when we cannot guarantee when it
   will be prepared for them (because of 1)).

   oe-core is trying to track latest stable versions, but meta-oe often
   contains very old versions and people upgrade to latest stable only the
   recipes they really care about, so it's not so surprising that 2 year old
   version of something isn't compatible with latest greatest freetype or
   some library like that.

3) OE releases work great and don't invalidate sstate signatures so often, so my
   feeling is that most developers and projects are just using releases and
   less and less people do CI. People will start complaining that something
   is broken in meta-oe only when they are upgrading their project from 1.5 to
   1.6 when 1.6 is released and that could be too late for fixing meta-oe
   issues.

What I'm trying to do with it:

a) sending those e-mails and updating wiki, so that people can easily find
   if some build failure is common or something which happens only for them
   (something like oestats-client.bbclass page was providing in oe-classic)
   It also includes log of QA issues which are usually easy to fix and great
   way for new people to learn something about OE.
b) trying to refuse all patches which cause new world issue (or new QA
   warn/err) - sometimes missed in logs, because it's often "hidden" by some
   other issue and hard to compare 40 issues from previous build with 38
   from current.
   Also the issues are often triggered later by changes somewhere else...
c) fixing build/qa issues in recipes I've never used or don't even have
   hardware to test - just based on assumption that something which builds
   is better than broken build, even when it can have some issues in runtime.
d) contacting people who added the recipe which is now failing, often
   without reply for months even when I try it multiple times :/
e) moving to "nonworking" directory to mark it as "known-to-be-broken",
   last resort for recipes where the fix is complicated and it's not known
   if someone is actually using it (because it was broken for months and
   nobody replied).
   + easy to find them, because they are still in repository (instead of
     git rm + revert when someone fixes it)
   - layer index probably doesn't find them, because "nonworking" directory
     level isn't in BBFILES, so maybe meta-broken or meta-nonworking would be
     better
   ? some recipes are "broken" just because their dependency is broken, what
     to do with such recipe, I usually just say that in commit message when
     I'm moving them to "nonworking" with their broken dep.

What can we do better? How to motivate more people to do CI and send fixes?
When we get to "green" state it will be easier to quickly spot new issues and
easier to fix them, because it will be clear what's causing them.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 205 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Quality of meta-oe metadata
  2014-03-30  1:31 Quality of meta-oe metadata Martin Jansa
@ 2014-03-30  5:33   ` Trevor Woerner
  2014-03-30 14:48   ` [OE-core] " Paul Barker
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Trevor Woerner @ 2014-03-30  5:33 UTC (permalink / raw)
  To: Martin Jansa, openembedded-core, openembedded-devel

Hello Martin,

Excellent, excellent post!

On 03/29/14 21:31, Martin Jansa wrote:
> 2) There are a lot of changes and component upgrades in oe-core which
>    sometimes aren't very straight-forward to adapt to and issues stay in
>    meta-oe for months.

Critical bugfixes aside, I think the current system of unrestrained,
perpetual package bumps is an issue. If dylan uses version 1 of package
XYZ and dora uses version 5, is there any need for the 3 intervening
package bumps which occurred between the release of dylan and dora?

If nothing else, these "irrelevant updates" increase the chance of
causing problems ;-)

> 3) OE releases work great and don't invalidate sstate signatures so often, so my
>    feeling is that most developers and projects are just using releases and
>    less and less people do CI.

Would users prefer a better-tested more-likely-to-work release
containing package versions which were several months old, or is staying
on the bleeding edge more important? There is always exponentially more
work/cost required to be an early adopter.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-core] Quality of meta-oe metadata
@ 2014-03-30  5:33   ` Trevor Woerner
  0 siblings, 0 replies; 16+ messages in thread
From: Trevor Woerner @ 2014-03-30  5:33 UTC (permalink / raw)
  To: Martin Jansa, openembedded-core, openembedded-devel

Hello Martin,

Excellent, excellent post!

On 03/29/14 21:31, Martin Jansa wrote:
> 2) There are a lot of changes and component upgrades in oe-core which
>    sometimes aren't very straight-forward to adapt to and issues stay in
>    meta-oe for months.

Critical bugfixes aside, I think the current system of unrestrained,
perpetual package bumps is an issue. If dylan uses version 1 of package
XYZ and dora uses version 5, is there any need for the 3 intervening
package bumps which occurred between the release of dylan and dora?

If nothing else, these "irrelevant updates" increase the chance of
causing problems ;-)

> 3) OE releases work great and don't invalidate sstate signatures so often, so my
>    feeling is that most developers and projects are just using releases and
>    less and less people do CI.

Would users prefer a better-tested more-likely-to-work release
containing package versions which were several months old, or is staying
on the bleeding edge more important? There is always exponentially more
work/cost required to be an early adopter.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Quality of meta-oe metadata
  2014-03-30  1:31 Quality of meta-oe metadata Martin Jansa
@ 2014-03-30 14:48   ` Paul Barker
  2014-03-30 14:48   ` [OE-core] " Paul Barker
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Paul Barker @ 2014-03-30 14:48 UTC (permalink / raw)
  To: Martin Jansa; +Cc: Openembedded Discussion, openembedded-core

On 30 March 2014 02:31, Martin Jansa <martin.jansa@gmail.com> wrote:
> Hi, sorry for longer e-mail, this is one of topic I would like to discuss
> on OEDAM (http://openembedded.org/wiki/OEDAM), but having some feedback and
> thoughts in advance will be very useful.
>
> As people can notice from my "State of bitbake world" e-mails or
> http://www.openembedded.org/wiki/Bitbake_World_Status
> we never had "green" builds. There are always 20+ failed tasks in those
> big builds and just reading the numbers isn't good indicator of quality,
> because sooner you break something in dependency tree, fewer recipes will
> be actually tested, so fewer failed tasks often means that something
> important is broken.
>
> There are IMHO at least 3 reasons for this depressing state:
>
> 1) Nobody is paid/dedicated to fix them, there is no company behind meta-oe
>    layers, now SWAT team, not even dedicated maintainers to which I could send
>    error from latest build and ask them to fix it before the end of
>    week/month.
>
>    Kudos to people who are sometimes sending patches for such issues!
>
> 2) There are a lot of changes and component upgrades in oe-core which
>    sometimes aren't very straight-forward to adapt to and issues stay in
>    meta-oe for months.
>
>    I don't mean it's oe-core fault or that changes to oe-core should slow
>    down just because meta-oe, especially when we cannot guarantee when it
>    will be prepared for them (because of 1)).
>
>    oe-core is trying to track latest stable versions, but meta-oe often
>    contains very old versions and people upgrade to latest stable only the
>    recipes they really care about, so it's not so surprising that 2 year old
>    version of something isn't compatible with latest greatest freetype or
>    some library like that.
>
> 3) OE releases work great and don't invalidate sstate signatures so often, so my
>    feeling is that most developers and projects are just using releases and
>    less and less people do CI. People will start complaining that something
>    is broken in meta-oe only when they are upgrading their project from 1.5 to
>    1.6 when 1.6 is released and that could be too late for fixing meta-oe
>    issues.
>
> What I'm trying to do with it:
>
> a) sending those e-mails and updating wiki, so that people can easily find
>    if some build failure is common or something which happens only for them
>    (something like oestats-client.bbclass page was providing in oe-classic)
>    It also includes log of QA issues which are usually easy to fix and great
>    way for new people to learn something about OE.
> b) trying to refuse all patches which cause new world issue (or new QA
>    warn/err) - sometimes missed in logs, because it's often "hidden" by some
>    other issue and hard to compare 40 issues from previous build with 38
>    from current.
>    Also the issues are often triggered later by changes somewhere else...
> c) fixing build/qa issues in recipes I've never used or don't even have
>    hardware to test - just based on assumption that something which builds
>    is better than broken build, even when it can have some issues in runtime.
> d) contacting people who added the recipe which is now failing, often
>    without reply for months even when I try it multiple times :/
> e) moving to "nonworking" directory to mark it as "known-to-be-broken",
>    last resort for recipes where the fix is complicated and it's not known
>    if someone is actually using it (because it was broken for months and
>    nobody replied).
>    + easy to find them, because they are still in repository (instead of
>      git rm + revert when someone fixes it)
>    - layer index probably doesn't find them, because "nonworking" directory
>      level isn't in BBFILES, so maybe meta-broken or meta-nonworking would be
>      better
>    ? some recipes are "broken" just because their dependency is broken, what
>      to do with such recipe, I usually just say that in commit message when
>      I'm moving them to "nonworking" with their broken dep.
>
> What can we do better? How to motivate more people to do CI and send fixes?
> When we get to "green" state it will be easier to quickly spot new issues and
> easier to fix them, because it will be clear what's causing them.
>

Are you discussing meta-oe alone here or all layers in meta-openembedded?

I've got a few ideas thinking slightly outside the box, so these may
or may not be workable:

- It might help to try to split the layers down further and reduce the
size of meta-oe (for example, create a meta-python layer for all the
python libraries that aren't direct dependencies of something
non-python in meta-oe) and then try get new maintainers for these
sub-layers so that the workload is spread better.

- We could create a new layer for unstable recipes which are 'use at
your own risk'. That may be a good place for recipes which don't work
on the jenkins builds but do work for some people and are in use (and
so probably don't belong in nonworking). This is similar to the
meta-broken or meta-nonworking layer idea but would take slightly more
recipes in.

- It may be worth taking some aggressive action in sync with the
upcoming oe-core release to get us to a green build, possibly by
throwing things into meta-nonworking and meta-unstable layers. That
may break a few people's builds, but the fix for them should just be
to add the meta-unstable layer. If they're building against the master
branch that should be tolerable and it won't affect anyone building
from a released/stable branch until the next oe-core release in 6
months time. Once we have a green build it'll be much easier to QA
patches and reject those which break the build.

I don't have much time to give to fixing this myself as I'm busy with
other projects. I do have idle computer time though so could run an
automated build regularly (probably just a script and a cron job
rather than buildbot/jenkins/etc). I won't be able to do a world build
for every layer for multiple machines, but I should be able to do some
subset. I may also be able to commandeer a spare server over the next
few weeks. Is there any particular config which would be beneficial to
build regularly?

Thanks,

-- 
Paul Barker

Email: paul@paulbarker.me.uk
http://www.paulbarker.me.uk


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-core] Quality of meta-oe metadata
@ 2014-03-30 14:48   ` Paul Barker
  0 siblings, 0 replies; 16+ messages in thread
From: Paul Barker @ 2014-03-30 14:48 UTC (permalink / raw)
  To: Martin Jansa; +Cc: Openembedded Discussion, openembedded-core

On 30 March 2014 02:31, Martin Jansa <martin.jansa@gmail.com> wrote:
> Hi, sorry for longer e-mail, this is one of topic I would like to discuss
> on OEDAM (http://openembedded.org/wiki/OEDAM), but having some feedback and
> thoughts in advance will be very useful.
>
> As people can notice from my "State of bitbake world" e-mails or
> http://www.openembedded.org/wiki/Bitbake_World_Status
> we never had "green" builds. There are always 20+ failed tasks in those
> big builds and just reading the numbers isn't good indicator of quality,
> because sooner you break something in dependency tree, fewer recipes will
> be actually tested, so fewer failed tasks often means that something
> important is broken.
>
> There are IMHO at least 3 reasons for this depressing state:
>
> 1) Nobody is paid/dedicated to fix them, there is no company behind meta-oe
>    layers, now SWAT team, not even dedicated maintainers to which I could send
>    error from latest build and ask them to fix it before the end of
>    week/month.
>
>    Kudos to people who are sometimes sending patches for such issues!
>
> 2) There are a lot of changes and component upgrades in oe-core which
>    sometimes aren't very straight-forward to adapt to and issues stay in
>    meta-oe for months.
>
>    I don't mean it's oe-core fault or that changes to oe-core should slow
>    down just because meta-oe, especially when we cannot guarantee when it
>    will be prepared for them (because of 1)).
>
>    oe-core is trying to track latest stable versions, but meta-oe often
>    contains very old versions and people upgrade to latest stable only the
>    recipes they really care about, so it's not so surprising that 2 year old
>    version of something isn't compatible with latest greatest freetype or
>    some library like that.
>
> 3) OE releases work great and don't invalidate sstate signatures so often, so my
>    feeling is that most developers and projects are just using releases and
>    less and less people do CI. People will start complaining that something
>    is broken in meta-oe only when they are upgrading their project from 1.5 to
>    1.6 when 1.6 is released and that could be too late for fixing meta-oe
>    issues.
>
> What I'm trying to do with it:
>
> a) sending those e-mails and updating wiki, so that people can easily find
>    if some build failure is common or something which happens only for them
>    (something like oestats-client.bbclass page was providing in oe-classic)
>    It also includes log of QA issues which are usually easy to fix and great
>    way for new people to learn something about OE.
> b) trying to refuse all patches which cause new world issue (or new QA
>    warn/err) - sometimes missed in logs, because it's often "hidden" by some
>    other issue and hard to compare 40 issues from previous build with 38
>    from current.
>    Also the issues are often triggered later by changes somewhere else...
> c) fixing build/qa issues in recipes I've never used or don't even have
>    hardware to test - just based on assumption that something which builds
>    is better than broken build, even when it can have some issues in runtime.
> d) contacting people who added the recipe which is now failing, often
>    without reply for months even when I try it multiple times :/
> e) moving to "nonworking" directory to mark it as "known-to-be-broken",
>    last resort for recipes where the fix is complicated and it's not known
>    if someone is actually using it (because it was broken for months and
>    nobody replied).
>    + easy to find them, because they are still in repository (instead of
>      git rm + revert when someone fixes it)
>    - layer index probably doesn't find them, because "nonworking" directory
>      level isn't in BBFILES, so maybe meta-broken or meta-nonworking would be
>      better
>    ? some recipes are "broken" just because their dependency is broken, what
>      to do with such recipe, I usually just say that in commit message when
>      I'm moving them to "nonworking" with their broken dep.
>
> What can we do better? How to motivate more people to do CI and send fixes?
> When we get to "green" state it will be easier to quickly spot new issues and
> easier to fix them, because it will be clear what's causing them.
>

Are you discussing meta-oe alone here or all layers in meta-openembedded?

I've got a few ideas thinking slightly outside the box, so these may
or may not be workable:

- It might help to try to split the layers down further and reduce the
size of meta-oe (for example, create a meta-python layer for all the
python libraries that aren't direct dependencies of something
non-python in meta-oe) and then try get new maintainers for these
sub-layers so that the workload is spread better.

- We could create a new layer for unstable recipes which are 'use at
your own risk'. That may be a good place for recipes which don't work
on the jenkins builds but do work for some people and are in use (and
so probably don't belong in nonworking). This is similar to the
meta-broken or meta-nonworking layer idea but would take slightly more
recipes in.

- It may be worth taking some aggressive action in sync with the
upcoming oe-core release to get us to a green build, possibly by
throwing things into meta-nonworking and meta-unstable layers. That
may break a few people's builds, but the fix for them should just be
to add the meta-unstable layer. If they're building against the master
branch that should be tolerable and it won't affect anyone building
from a released/stable branch until the next oe-core release in 6
months time. Once we have a green build it'll be much easier to QA
patches and reject those which break the build.

I don't have much time to give to fixing this myself as I'm busy with
other projects. I do have idle computer time though so could run an
automated build regularly (probably just a script and a cron job
rather than buildbot/jenkins/etc). I won't be able to do a world build
for every layer for multiple machines, but I should be able to do some
subset. I may also be able to commandeer a spare server over the next
few weeks. Is there any particular config which would be beneficial to
build regularly?

Thanks,

-- 
Paul Barker

Email: paul@paulbarker.me.uk
http://www.paulbarker.me.uk


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Quality of meta-oe metadata
  2014-03-30 14:48   ` [OE-core] " Paul Barker
@ 2014-03-30 15:14     ` Martin Jansa
  -1 siblings, 0 replies; 16+ messages in thread
From: Martin Jansa @ 2014-03-30 15:14 UTC (permalink / raw)
  To: Paul Barker; +Cc: Openembedded Discussion, openembedded-core

[-- Attachment #1: Type: text/plain, Size: 4264 bytes --]

On Sun, Mar 30, 2014 at 03:48:09PM +0100, Paul Barker wrote:
> Are you discussing meta-oe alone here or all layers in meta-openembedded?

basically all of the layers which are included in my world builds, so it
includes all layers in meta-openembedded, most from meta-smartphone,
meta-browser, meta-qt5..

> I've got a few ideas thinking slightly outside the box, so these may
> or may not be workable:
> 
> - It might help to try to split the layers down further and reduce the
> size of meta-oe (for example, create a meta-python layer for all the
> python libraries that aren't direct dependencies of something
> non-python in meta-oe) and then try get new maintainers for these
> sub-layers so that the workload is spread better.

Agreed I would be happy to accept patches creating meta-python layer,
ideally sent by someone who will volunteer to maintain it.

> - We could create a new layer for unstable recipes which are 'use at
> your own risk'. That may be a good place for recipes which don't work
> on the jenkins builds but do work for some people and are in use (and
> so probably don't belong in nonworking). This is similar to the
> meta-broken or meta-nonworking layer idea but would take slightly more
> recipes in.

The difference between meta-broken and meta-unstable from my POV is
that, "broken" should be just temporary place and someone should
eventually fix it, but if we move stuff to meta-unstable and stop
testing it, then I fear that it will became just junkyard.

If some recipe works fine for someone when building for arm, but
triggers "jenkins/world" build issues for x86* then lets restrict it
only for arm with comment which shows the error - that's better than
moving it to "broken" or "unstable".
 
> - It may be worth taking some aggressive action in sync with the
> upcoming oe-core release to get us to a green build, possibly by
> throwing things into meta-nonworking and meta-unstable layers. That
> may break a few people's builds, but the fix for them should just be
> to add the meta-unstable layer. If they're building against the master
> branch that should be tolerable and it won't affect anyone building
> from a released/stable branch until the next oe-core release in 6
> months time. Once we have a green build it'll be much easier to QA
> patches and reject those which break the build.

you mean being aggressive before or after creating the "next-release"
branch? I would prefer to be aggressive before to have green builds in
release branch.

> I don't have much time to give to fixing this myself as I'm busy with
> other projects. I do have idle computer time though so could run an
> automated build regularly (probably just a script and a cron job
> rather than buildbot/jenkins/etc). I won't be able to do a world build
> for every layer for multiple machines, but I should be able to do some
> subset. I may also be able to commandeer a spare server over the next
> few weeks. Is there any particular config which would be beneficial to
> build regularly?

Doing "big" builds in different setups is of course useful, but right
now I think we already have "more logs than what we're processing".

So I think that building not whole world, but those recipes which are
regularly failing would be good start, if people cannot reproduce some
issues which are shown in my jenkins builds we should compare the builds
and narrow possible reasons (e.g. failing only with dash, failing only
with some PACKAGECONFIG enabled or disabled).

I'm trying to stay close to distroless config, but some tweaks are
needed in order to have bigger test coverage (e.g. enabling some
PACKAGECONFIGs which are disabled by default, but something requires
them, or changing P_V to newer again because something else needs it,
enabling gold, because it's more strict so it can catch more issues..)

Jenkins builds are running almost non-stop (because it usually takes
around 24 hours per architecture), so often when I want to debug the
issue directly on that server in WORKDIR where it failed it's already
gone from tmpfs and the workspace is already "blocked" by next build.

Regards,

-- 
Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 205 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-core] Quality of meta-oe metadata
@ 2014-03-30 15:14     ` Martin Jansa
  0 siblings, 0 replies; 16+ messages in thread
From: Martin Jansa @ 2014-03-30 15:14 UTC (permalink / raw)
  To: Paul Barker; +Cc: Openembedded Discussion, openembedded-core

[-- Attachment #1: Type: text/plain, Size: 4264 bytes --]

On Sun, Mar 30, 2014 at 03:48:09PM +0100, Paul Barker wrote:
> Are you discussing meta-oe alone here or all layers in meta-openembedded?

basically all of the layers which are included in my world builds, so it
includes all layers in meta-openembedded, most from meta-smartphone,
meta-browser, meta-qt5..

> I've got a few ideas thinking slightly outside the box, so these may
> or may not be workable:
> 
> - It might help to try to split the layers down further and reduce the
> size of meta-oe (for example, create a meta-python layer for all the
> python libraries that aren't direct dependencies of something
> non-python in meta-oe) and then try get new maintainers for these
> sub-layers so that the workload is spread better.

Agreed I would be happy to accept patches creating meta-python layer,
ideally sent by someone who will volunteer to maintain it.

> - We could create a new layer for unstable recipes which are 'use at
> your own risk'. That may be a good place for recipes which don't work
> on the jenkins builds but do work for some people and are in use (and
> so probably don't belong in nonworking). This is similar to the
> meta-broken or meta-nonworking layer idea but would take slightly more
> recipes in.

The difference between meta-broken and meta-unstable from my POV is
that, "broken" should be just temporary place and someone should
eventually fix it, but if we move stuff to meta-unstable and stop
testing it, then I fear that it will became just junkyard.

If some recipe works fine for someone when building for arm, but
triggers "jenkins/world" build issues for x86* then lets restrict it
only for arm with comment which shows the error - that's better than
moving it to "broken" or "unstable".
 
> - It may be worth taking some aggressive action in sync with the
> upcoming oe-core release to get us to a green build, possibly by
> throwing things into meta-nonworking and meta-unstable layers. That
> may break a few people's builds, but the fix for them should just be
> to add the meta-unstable layer. If they're building against the master
> branch that should be tolerable and it won't affect anyone building
> from a released/stable branch until the next oe-core release in 6
> months time. Once we have a green build it'll be much easier to QA
> patches and reject those which break the build.

you mean being aggressive before or after creating the "next-release"
branch? I would prefer to be aggressive before to have green builds in
release branch.

> I don't have much time to give to fixing this myself as I'm busy with
> other projects. I do have idle computer time though so could run an
> automated build regularly (probably just a script and a cron job
> rather than buildbot/jenkins/etc). I won't be able to do a world build
> for every layer for multiple machines, but I should be able to do some
> subset. I may also be able to commandeer a spare server over the next
> few weeks. Is there any particular config which would be beneficial to
> build regularly?

Doing "big" builds in different setups is of course useful, but right
now I think we already have "more logs than what we're processing".

So I think that building not whole world, but those recipes which are
regularly failing would be good start, if people cannot reproduce some
issues which are shown in my jenkins builds we should compare the builds
and narrow possible reasons (e.g. failing only with dash, failing only
with some PACKAGECONFIG enabled or disabled).

I'm trying to stay close to distroless config, but some tweaks are
needed in order to have bigger test coverage (e.g. enabling some
PACKAGECONFIGs which are disabled by default, but something requires
them, or changing P_V to newer again because something else needs it,
enabling gold, because it's more strict so it can catch more issues..)

Jenkins builds are running almost non-stop (because it usually takes
around 24 hours per architecture), so often when I want to debug the
issue directly on that server in WORKDIR where it failed it's already
gone from tmpfs and the workspace is already "blocked" by next build.

Regards,

-- 
Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 205 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Quality of meta-oe metadata
  2014-03-30 15:14     ` [OE-core] " Martin Jansa
@ 2014-03-30 15:56       ` Paul Barker
  -1 siblings, 0 replies; 16+ messages in thread
From: Paul Barker @ 2014-03-30 15:56 UTC (permalink / raw)
  To: Martin Jansa; +Cc: Openembedded Discussion, openembedded-core

On 30 March 2014 16:14, Martin Jansa <martin.jansa@gmail.com> wrote:
> On Sun, Mar 30, 2014 at 03:48:09PM +0100, Paul Barker wrote:
>> - We could create a new layer for unstable recipes which are 'use at
>> your own risk'. That may be a good place for recipes which don't work
>> on the jenkins builds but do work for some people and are in use (and
>> so probably don't belong in nonworking). This is similar to the
>> meta-broken or meta-nonworking layer idea but would take slightly more
>> recipes in.
>
> The difference between meta-broken and meta-unstable from my POV is
> that, "broken" should be just temporary place and someone should
> eventually fix it, but if we move stuff to meta-unstable and stop
> testing it, then I fear that it will became just junkyard.
>
> If some recipe works fine for someone when building for arm, but
> triggers "jenkins/world" build issues for x86* then lets restrict it
> only for arm with comment which shows the error - that's better than
> moving it to "broken" or "unstable".
>

I agree with the worry that this could turn into a junkyard. The
problem is, the junk is currently mixed in with the good recipes.

>> - It may be worth taking some aggressive action in sync with the
>> upcoming oe-core release to get us to a green build, possibly by
>> throwing things into meta-nonworking and meta-unstable layers. That
>> may break a few people's builds, but the fix for them should just be
>> to add the meta-unstable layer. If they're building against the master
>> branch that should be tolerable and it won't affect anyone building
>> from a released/stable branch until the next oe-core release in 6
>> months time. Once we have a green build it'll be much easier to QA
>> patches and reject those which break the build.
>
> you mean being aggressive before or after creating the "next-release"
> branch? I would prefer to be aggressive before to have green builds in
> release branch.

That would probably make more sense.

>
>> I don't have much time to give to fixing this myself as I'm busy with
>> other projects. I do have idle computer time though so could run an
>> automated build regularly (probably just a script and a cron job
>> rather than buildbot/jenkins/etc). I won't be able to do a world build
>> for every layer for multiple machines, but I should be able to do some
>> subset. I may also be able to commandeer a spare server over the next
>> few weeks. Is there any particular config which would be beneficial to
>> build regularly?
>
> Doing "big" builds in different setups is of course useful, but right
> now I think we already have "more logs than what we're processing".
>
> So I think that building not whole world, but those recipes which are
> regularly failing would be good start, if people cannot reproduce some
> issues which are shown in my jenkins builds we should compare the builds
> and narrow possible reasons (e.g. failing only with dash, failing only
> with some PACKAGECONFIG enabled or disabled).
>
> I'm trying to stay close to distroless config, but some tweaks are
> needed in order to have bigger test coverage (e.g. enabling some
> PACKAGECONFIGs which are disabled by default, but something requires
> them, or changing P_V to newer again because something else needs it,
> enabling gold, because it's more strict so it can catch more issues..)
>
> Jenkins builds are running almost non-stop (because it usually takes
> around 24 hours per architecture), so often when I want to debug the
> issue directly on that server in WORKDIR where it failed it's already
> gone from tmpfs and the workspace is already "blocked" by next build.
>

Having a more focussed build would help but I can't really give any
ability to debug it on the machine I run builds on and I don't really
have time to debug much myself. It would literally just be a status
and a set of logs for a different config.

I think we need to prioritise what needs fixing first. I think doing a
build of meta-oe only with no PACKAGECONFIG changes, disabling
anything that requires PACKAGECONFIG changes, for qemuarm and qemux86
(and qemux86_64 if I have time) would be a good start to see what
fails in that case. Then step out to further layers once meta-oe is
green.

-- 
Paul Barker

Email: paul@paulbarker.me.uk
http://www.paulbarker.me.uk


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-core] Quality of meta-oe metadata
@ 2014-03-30 15:56       ` Paul Barker
  0 siblings, 0 replies; 16+ messages in thread
From: Paul Barker @ 2014-03-30 15:56 UTC (permalink / raw)
  To: Martin Jansa; +Cc: Openembedded Discussion, openembedded-core

On 30 March 2014 16:14, Martin Jansa <martin.jansa@gmail.com> wrote:
> On Sun, Mar 30, 2014 at 03:48:09PM +0100, Paul Barker wrote:
>> - We could create a new layer for unstable recipes which are 'use at
>> your own risk'. That may be a good place for recipes which don't work
>> on the jenkins builds but do work for some people and are in use (and
>> so probably don't belong in nonworking). This is similar to the
>> meta-broken or meta-nonworking layer idea but would take slightly more
>> recipes in.
>
> The difference between meta-broken and meta-unstable from my POV is
> that, "broken" should be just temporary place and someone should
> eventually fix it, but if we move stuff to meta-unstable and stop
> testing it, then I fear that it will became just junkyard.
>
> If some recipe works fine for someone when building for arm, but
> triggers "jenkins/world" build issues for x86* then lets restrict it
> only for arm with comment which shows the error - that's better than
> moving it to "broken" or "unstable".
>

I agree with the worry that this could turn into a junkyard. The
problem is, the junk is currently mixed in with the good recipes.

>> - It may be worth taking some aggressive action in sync with the
>> upcoming oe-core release to get us to a green build, possibly by
>> throwing things into meta-nonworking and meta-unstable layers. That
>> may break a few people's builds, but the fix for them should just be
>> to add the meta-unstable layer. If they're building against the master
>> branch that should be tolerable and it won't affect anyone building
>> from a released/stable branch until the next oe-core release in 6
>> months time. Once we have a green build it'll be much easier to QA
>> patches and reject those which break the build.
>
> you mean being aggressive before or after creating the "next-release"
> branch? I would prefer to be aggressive before to have green builds in
> release branch.

That would probably make more sense.

>
>> I don't have much time to give to fixing this myself as I'm busy with
>> other projects. I do have idle computer time though so could run an
>> automated build regularly (probably just a script and a cron job
>> rather than buildbot/jenkins/etc). I won't be able to do a world build
>> for every layer for multiple machines, but I should be able to do some
>> subset. I may also be able to commandeer a spare server over the next
>> few weeks. Is there any particular config which would be beneficial to
>> build regularly?
>
> Doing "big" builds in different setups is of course useful, but right
> now I think we already have "more logs than what we're processing".
>
> So I think that building not whole world, but those recipes which are
> regularly failing would be good start, if people cannot reproduce some
> issues which are shown in my jenkins builds we should compare the builds
> and narrow possible reasons (e.g. failing only with dash, failing only
> with some PACKAGECONFIG enabled or disabled).
>
> I'm trying to stay close to distroless config, but some tweaks are
> needed in order to have bigger test coverage (e.g. enabling some
> PACKAGECONFIGs which are disabled by default, but something requires
> them, or changing P_V to newer again because something else needs it,
> enabling gold, because it's more strict so it can catch more issues..)
>
> Jenkins builds are running almost non-stop (because it usually takes
> around 24 hours per architecture), so often when I want to debug the
> issue directly on that server in WORKDIR where it failed it's already
> gone from tmpfs and the workspace is already "blocked" by next build.
>

Having a more focussed build would help but I can't really give any
ability to debug it on the machine I run builds on and I don't really
have time to debug much myself. It would literally just be a status
and a set of logs for a different config.

I think we need to prioritise what needs fixing first. I think doing a
build of meta-oe only with no PACKAGECONFIG changes, disabling
anything that requires PACKAGECONFIG changes, for qemuarm and qemux86
(and qemux86_64 if I have time) would be a good start to see what
fails in that case. Then step out to further layers once meta-oe is
green.

-- 
Paul Barker

Email: paul@paulbarker.me.uk
http://www.paulbarker.me.uk


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Quality of meta-oe metadata
  2014-03-30  1:31 Quality of meta-oe metadata Martin Jansa
@ 2014-03-30 16:09   ` Paul Barker
  2014-03-30 14:48   ` [OE-core] " Paul Barker
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Paul Barker @ 2014-03-30 16:09 UTC (permalink / raw)
  To: Martin Jansa; +Cc: Openembedded Discussion, openembedded-core

On 30 March 2014 02:31, Martin Jansa <martin.jansa@gmail.com> wrote:
> There are always 20+ failed tasks in those
> big builds and just reading the numbers isn't good indicator of quality,
> because sooner you break something in dependency tree, fewer recipes will
> be actually tested, so fewer failed tasks often means that something
> important is broken.
>

Sorry to double reply but had a thought on this one thing in particular.

Is there any way to list recipes (or tasks) which weren't executed
because a dependency failed? I know we see this at the start of the
build if dependencies can't be satisfied, I'm thinking tasks with
satisfied dependencies but then those dependencies fail.

Being able to see that chain for each build may help us prioritise. It
would also show whether a failure disappearing was due to the issue
being fixed or the recipe no longer being attempted due to a new
failure somewhere else.

Thanks,

-- 
Paul Barker

Email: paul@paulbarker.me.uk
http://www.paulbarker.me.uk


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-core] Quality of meta-oe metadata
@ 2014-03-30 16:09   ` Paul Barker
  0 siblings, 0 replies; 16+ messages in thread
From: Paul Barker @ 2014-03-30 16:09 UTC (permalink / raw)
  To: Martin Jansa; +Cc: Openembedded Discussion, openembedded-core

On 30 March 2014 02:31, Martin Jansa <martin.jansa@gmail.com> wrote:
> There are always 20+ failed tasks in those
> big builds and just reading the numbers isn't good indicator of quality,
> because sooner you break something in dependency tree, fewer recipes will
> be actually tested, so fewer failed tasks often means that something
> important is broken.
>

Sorry to double reply but had a thought on this one thing in particular.

Is there any way to list recipes (or tasks) which weren't executed
because a dependency failed? I know we see this at the start of the
build if dependencies can't be satisfied, I'm thinking tasks with
satisfied dependencies but then those dependencies fail.

Being able to see that chain for each build may help us prioritise. It
would also show whether a failure disappearing was due to the issue
being fixed or the recipe no longer being attempted due to a new
failure somewhere else.

Thanks,

-- 
Paul Barker

Email: paul@paulbarker.me.uk
http://www.paulbarker.me.uk


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Quality of meta-oe metadata
  2014-03-30  5:33   ` [OE-core] " Trevor Woerner
@ 2014-04-01 11:21     ` Richard Purdie
  -1 siblings, 0 replies; 16+ messages in thread
From: Richard Purdie @ 2014-04-01 11:21 UTC (permalink / raw)
  To: Trevor Woerner; +Cc: openembedded-devel, openembedded-core

On Sun, 2014-03-30 at 02:33 -0400, Trevor Woerner wrote:
> Hello Martin,
> 
> Excellent, excellent post!
> 
> On 03/29/14 21:31, Martin Jansa wrote:
> > 2) There are a lot of changes and component upgrades in oe-core which
> >    sometimes aren't very straight-forward to adapt to and issues stay in
> >    meta-oe for months.
> 
> Critical bugfixes aside, I think the current system of unrestrained,
> perpetual package bumps is an issue. If dylan uses version 1 of package
> XYZ and dora uses version 5, is there any need for the 3 intervening
> package bumps which occurred between the release of dylan and dora?

To put the other side of the argument to this, if you leave things and
just update once, you bring in more change and it usually ends up being
progressively harder to debug any issues since more things changed and
you don't know which change caused which problem.

OE-Core has there fore gone for the "update regularly" philospohy where
we can. This means we get to know about issues earlier and have more
time to fix them, they're also less likely to get confused with other
bugs.

> If nothing else, these "irrelevant updates" increase the chance of
> causing problems ;-)

I'd disagree :)

> > 3) OE releases work great and don't invalidate sstate signatures so often, so my
> >    feeling is that most developers and projects are just using releases and
> >    less and less people do CI.
> 
> Would users prefer a better-tested more-likely-to-work release
> containing package versions which were several months old, or is staying
> on the bleeding edge more important? There is always exponentially more
> work/cost required to be an early adopter.

At least for OE-Core, there is a strong pressure to try and keep up to
date. The stability is found in the release branches one a set of
versions are locked in.

If we run into problems and talk to upstreams, the invariable question
was "does this reproduce with the last release (or head of SCM)?". If
we're up to date, it makes it easier for us to talk to upstreams in that
regard too.

Cheers,

Richard







^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [OE-core] Quality of meta-oe metadata
@ 2014-04-01 11:21     ` Richard Purdie
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Purdie @ 2014-04-01 11:21 UTC (permalink / raw)
  To: Trevor Woerner; +Cc: openembedded-devel, openembedded-core

On Sun, 2014-03-30 at 02:33 -0400, Trevor Woerner wrote:
> Hello Martin,
> 
> Excellent, excellent post!
> 
> On 03/29/14 21:31, Martin Jansa wrote:
> > 2) There are a lot of changes and component upgrades in oe-core which
> >    sometimes aren't very straight-forward to adapt to and issues stay in
> >    meta-oe for months.
> 
> Critical bugfixes aside, I think the current system of unrestrained,
> perpetual package bumps is an issue. If dylan uses version 1 of package
> XYZ and dora uses version 5, is there any need for the 3 intervening
> package bumps which occurred between the release of dylan and dora?

To put the other side of the argument to this, if you leave things and
just update once, you bring in more change and it usually ends up being
progressively harder to debug any issues since more things changed and
you don't know which change caused which problem.

OE-Core has there fore gone for the "update regularly" philospohy where
we can. This means we get to know about issues earlier and have more
time to fix them, they're also less likely to get confused with other
bugs.

> If nothing else, these "irrelevant updates" increase the chance of
> causing problems ;-)

I'd disagree :)

> > 3) OE releases work great and don't invalidate sstate signatures so often, so my
> >    feeling is that most developers and projects are just using releases and
> >    less and less people do CI.
> 
> Would users prefer a better-tested more-likely-to-work release
> containing package versions which were several months old, or is staying
> on the bleeding edge more important? There is always exponentially more
> work/cost required to be an early adopter.

At least for OE-Core, there is a strong pressure to try and keep up to
date. The stability is found in the release branches one a set of
versions are locked in.

If we run into problems and talk to upstreams, the invariable question
was "does this reproduce with the last release (or head of SCM)?". If
we're up to date, it makes it easier for us to talk to upstreams in that
regard too.

Cheers,

Richard







^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Quality of meta-oe metadata
  2014-03-30  1:31 Quality of meta-oe metadata Martin Jansa
                   ` (2 preceding siblings ...)
  2014-03-30 16:09   ` [OE-core] " Paul Barker
@ 2014-04-01 17:12 ` Mark Hatle
  2014-04-01 17:40   ` Martin Jansa
  3 siblings, 1 reply; 16+ messages in thread
From: Mark Hatle @ 2014-04-01 17:12 UTC (permalink / raw)
  To: openembedded-core

On 3/29/14, 8:31 PM, Martin Jansa wrote:
> Hi, sorry for longer e-mail, this is one of topic I would like to discuss
> on OEDAM (http://openembedded.org/wiki/OEDAM), but having some feedback and
> thoughts in advance will be very useful.
>
> As people can notice from my "State of bitbake world" e-mails or
> http://www.openembedded.org/wiki/Bitbake_World_Status
> we never had "green" builds. There are always 20+ failed tasks in those
> big builds and just reading the numbers isn't good indicator of quality,
> because sooner you break something in dependency tree, fewer recipes will
> be actually tested, so fewer failed tasks often means that something
> important is broken.

...

> 3) OE releases work great and don't invalidate sstate signatures so often, so my
>     feeling is that most developers and projects are just using releases and
>     less and less people do CI. People will start complaining that something
>     is broken in meta-oe only when they are upgrading their project from 1.5 to
>     1.6 when 1.6 is released and that could be too late for fixing meta-oe
>     issues.

I agree, the success of what we're doing is certainly causing us 'different' 
problems.  :)

> What I'm trying to do with it:
>
> a) sending those e-mails and updating wiki, so that people can easily find
>     if some build failure is common or something which happens only for them
>     (something like oestats-client.bbclass page was providing in oe-classic)
>     It also includes log of QA issues which are usually easy to fix and great
>     way for new people to learn something about OE.
> b) trying to refuse all patches which cause new world issue (or new QA
>     warn/err) - sometimes missed in logs, because it's often "hidden" by some
>     other issue and hard to compare 40 issues from previous build with 38
>     from current.
>     Also the issues are often triggered later by changes somewhere else...
> c) fixing build/qa issues in recipes I've never used or don't even have
>     hardware to test - just based on assumption that something which builds
>     is better than broken build, even when it can have some issues in runtime.
> d) contacting people who added the recipe which is now failing, often
>     without reply for months even when I try it multiple times :/

I agree with all of the above.  In fact I suspect you are going above and beyond 
what you really need to.  Kudos for that BTW.

> e) moving to "nonworking" directory to mark it as "known-to-be-broken",
>     last resort for recipes where the fix is complicated and it's not known
>     if someone is actually using it (because it was broken for months and
>     nobody replied).
>     + easy to find them, because they are still in repository (instead of
>       git rm + revert when someone fixes it)
>     - layer index probably doesn't find them, because "nonworking" directory
>       level isn't in BBFILES, so maybe meta-broken or meta-nonworking would be
>       better
>     ? some recipes are "broken" just because their dependency is broken, what
>       to do with such recipe, I usually just say that in commit message when
>       I'm moving them to "nonworking" with their broken dep.

Have you considered using the blacklist system for this?

You could do something like:

conf/layer.conf:
include ${LAYERDIR}/conf/broken.inc

conf/broken.inc:

<can we ensure the blacklist system is in the system>

BROKENMSG_layername = "The recipe is disabled due to a build failure.  If you 
need this recipe, or have gotten it to work.  Please submit patches to <path>. 
Otherwise this recipe will be removed in the future."

# Recipe FOO is broken as of 2014-03-14, see ...
PNBLACKLIST[FOO] = "${BROKENMSG_layername}"

# Recipe BAR is broken as of 2013-06-13, see ...
PNBLACKLIST[BAR] = "${BROKENMSG_layername}"


Then after a given amount of time, say one year? on the broken list -- we can 
then remove the items.

If the format of the comments is such that it can be easily parsed, then we can 
even automate tracking of these things.

(In cases where dependencies are causing the breakage, the message cause be 
augmented with that information as well...)

The advantage of the blacklist system is that if a user tries to use the recipe 
they will hopefully see the blacklist message, it prevents having to git mv 
recipes, and should be easier for people to find/fix the bad code via a simple 
patch.  (And hopefully easier to remove old cruft!)

--Mark

> What can we do better? How to motivate more people to do CI and send fixes?
> When we get to "green" state it will be easier to quickly spot new issues and
> easier to fix them, because it will be clear what's causing them.
>
>
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Quality of meta-oe metadata
  2014-04-01 17:12 ` Mark Hatle
@ 2014-04-01 17:40   ` Martin Jansa
  2014-04-01 17:50     ` Mark Hatle
  0 siblings, 1 reply; 16+ messages in thread
From: Martin Jansa @ 2014-04-01 17:40 UTC (permalink / raw)
  To: Mark Hatle; +Cc: openembedded-core

[-- Attachment #1: Type: text/plain, Size: 6470 bytes --]

On Tue, Apr 01, 2014 at 12:12:58PM -0500, Mark Hatle wrote:
> On 3/29/14, 8:31 PM, Martin Jansa wrote:
> > Hi, sorry for longer e-mail, this is one of topic I would like to discuss
> > on OEDAM (http://openembedded.org/wiki/OEDAM), but having some feedback and
> > thoughts in advance will be very useful.
> >
> > As people can notice from my "State of bitbake world" e-mails or
> > http://www.openembedded.org/wiki/Bitbake_World_Status
> > we never had "green" builds. There are always 20+ failed tasks in those
> > big builds and just reading the numbers isn't good indicator of quality,
> > because sooner you break something in dependency tree, fewer recipes will
> > be actually tested, so fewer failed tasks often means that something
> > important is broken.
> 
> ...
> 
> > 3) OE releases work great and don't invalidate sstate signatures so often, so my
> >     feeling is that most developers and projects are just using releases and
> >     less and less people do CI. People will start complaining that something
> >     is broken in meta-oe only when they are upgrading their project from 1.5 to
> >     1.6 when 1.6 is released and that could be too late for fixing meta-oe
> >     issues.
> 
> I agree, the success of what we're doing is certainly causing us 'different' 
> problems.  :)
> 
> > What I'm trying to do with it:
> >
> > a) sending those e-mails and updating wiki, so that people can easily find
> >     if some build failure is common or something which happens only for them
> >     (something like oestats-client.bbclass page was providing in oe-classic)
> >     It also includes log of QA issues which are usually easy to fix and great
> >     way for new people to learn something about OE.
> > b) trying to refuse all patches which cause new world issue (or new QA
> >     warn/err) - sometimes missed in logs, because it's often "hidden" by some
> >     other issue and hard to compare 40 issues from previous build with 38
> >     from current.
> >     Also the issues are often triggered later by changes somewhere else...
> > c) fixing build/qa issues in recipes I've never used or don't even have
> >     hardware to test - just based on assumption that something which builds
> >     is better than broken build, even when it can have some issues in runtime.
> > d) contacting people who added the recipe which is now failing, often
> >     without reply for months even when I try it multiple times :/
> 
> I agree with all of the above.  In fact I suspect you are going above and beyond 
> what you really need to.  Kudos for that BTW.
> 
> > e) moving to "nonworking" directory to mark it as "known-to-be-broken",
> >     last resort for recipes where the fix is complicated and it's not known
> >     if someone is actually using it (because it was broken for months and
> >     nobody replied).
> >     + easy to find them, because they are still in repository (instead of
> >       git rm + revert when someone fixes it)
> >     - layer index probably doesn't find them, because "nonworking" directory
> >       level isn't in BBFILES, so maybe meta-broken or meta-nonworking would be
> >       better
> >     ? some recipes are "broken" just because their dependency is broken, what
> >       to do with such recipe, I usually just say that in commit message when
> >       I'm moving them to "nonworking" with their broken dep.
> 
> Have you considered using the blacklist system for this?
> 
> You could do something like:
> 
> conf/layer.conf:
> include ${LAYERDIR}/conf/broken.inc
> 
> conf/broken.inc:
> 
> <can we ensure the blacklist system is in the system>
> 
> BROKENMSG_layername = "The recipe is disabled due to a build failure.  If you 
> need this recipe, or have gotten it to work.  Please submit patches to <path>. 
> Otherwise this recipe will be removed in the future."
> 
> # Recipe FOO is broken as of 2014-03-14, see ...
> PNBLACKLIST[FOO] = "${BROKENMSG_layername}"
> 
> # Recipe BAR is broken as of 2013-06-13, see ...
> PNBLACKLIST[BAR] = "${BROKENMSG_layername}"
> 
> 
> Then after a given amount of time, say one year? on the broken list -- we can 
> then remove the items.
> 
> If the format of the comments is such that it can be easily parsed, then we can 
> even automate tracking of these things.
> 
> (In cases where dependencies are causing the breakage, the message cause be 
> augmented with that information as well...)
> 
> The advantage of the blacklist system is that if a user tries to use the recipe 
> they will hopefully see the blacklist message, it prevents having to git mv 
> recipes, and should be easier for people to find/fix the bad code via a simple 
> patch.  (And hopefully easier to remove old cruft!)

Yes, that's another way of doing that and I was using it on world builds
as well (but without including it in layer and layer.conf to make it
"public")

e.g.
http://logs.nslu2-linux.org/buildlogs/oe/oe-shr-core-branches/log.world.20140329_001343.log/world_mask.inc

It definitely has the advantage that you can "document" it in the
message and few more details in the file itself.

Disadvantage from my POV was that I never included and enabled it in
repo, so new people didn't know about it and will still see the issues
when they try to build something broken.

Another disadvantage was that I always felt, OK I'll mark this as broken
with PNBLACKLIST and lets forget that it ever existed (sometimes I've
uncommented include lines for this just to confirm that everything still
fails - but not so often as "regular" builds).

And last one: if I recall correctly, when I was using this it was hard
to unblacklist something in your config, so if you wanted to test newer
version or something you had to modify world_mask.inc first, which won't
be very good for people if we include it by default.

Regards,

> --Mark
> 
> > What can we do better? How to motivate more people to do CI and send fixes?
> > When we get to "green" state it will be easier to quickly spot new issues and
> > easier to fix them, because it will be clear what's causing them.
> >
> >
> >
> 
> -- 
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core

-- 
Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 205 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Quality of meta-oe metadata
  2014-04-01 17:40   ` Martin Jansa
@ 2014-04-01 17:50     ` Mark Hatle
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Hatle @ 2014-04-01 17:50 UTC (permalink / raw)
  To: Martin Jansa; +Cc: openembedded-core

On 4/1/14, 12:40 PM, Martin Jansa wrote:
> On Tue, Apr 01, 2014 at 12:12:58PM -0500, Mark Hatle wrote:
>> On 3/29/14, 8:31 PM, Martin Jansa wrote:
>>> Hi, sorry for longer e-mail, this is one of topic I would like to discuss
>>> on OEDAM (http://openembedded.org/wiki/OEDAM), but having some feedback and
>>> thoughts in advance will be very useful.
>>>
>>> As people can notice from my "State of bitbake world" e-mails or
>>> http://www.openembedded.org/wiki/Bitbake_World_Status
>>> we never had "green" builds. There are always 20+ failed tasks in those
>>> big builds and just reading the numbers isn't good indicator of quality,
>>> because sooner you break something in dependency tree, fewer recipes will
>>> be actually tested, so fewer failed tasks often means that something
>>> important is broken.
>>
>> ...
>>
>>> 3) OE releases work great and don't invalidate sstate signatures so often, so my
>>>      feeling is that most developers and projects are just using releases and
>>>      less and less people do CI. People will start complaining that something
>>>      is broken in meta-oe only when they are upgrading their project from 1.5 to
>>>      1.6 when 1.6 is released and that could be too late for fixing meta-oe
>>>      issues.
>>
>> I agree, the success of what we're doing is certainly causing us 'different'
>> problems.  :)
>>
>>> What I'm trying to do with it:
>>>
>>> a) sending those e-mails and updating wiki, so that people can easily find
>>>      if some build failure is common or something which happens only for them
>>>      (something like oestats-client.bbclass page was providing in oe-classic)
>>>      It also includes log of QA issues which are usually easy to fix and great
>>>      way for new people to learn something about OE.
>>> b) trying to refuse all patches which cause new world issue (or new QA
>>>      warn/err) - sometimes missed in logs, because it's often "hidden" by some
>>>      other issue and hard to compare 40 issues from previous build with 38
>>>      from current.
>>>      Also the issues are often triggered later by changes somewhere else...
>>> c) fixing build/qa issues in recipes I've never used or don't even have
>>>      hardware to test - just based on assumption that something which builds
>>>      is better than broken build, even when it can have some issues in runtime.
>>> d) contacting people who added the recipe which is now failing, often
>>>      without reply for months even when I try it multiple times :/
>>
>> I agree with all of the above.  In fact I suspect you are going above and beyond
>> what you really need to.  Kudos for that BTW.
>>
>>> e) moving to "nonworking" directory to mark it as "known-to-be-broken",
>>>      last resort for recipes where the fix is complicated and it's not known
>>>      if someone is actually using it (because it was broken for months and
>>>      nobody replied).
>>>      + easy to find them, because they are still in repository (instead of
>>>        git rm + revert when someone fixes it)
>>>      - layer index probably doesn't find them, because "nonworking" directory
>>>        level isn't in BBFILES, so maybe meta-broken or meta-nonworking would be
>>>        better
>>>      ? some recipes are "broken" just because their dependency is broken, what
>>>        to do with such recipe, I usually just say that in commit message when
>>>        I'm moving them to "nonworking" with their broken dep.
>>
>> Have you considered using the blacklist system for this?
>>
>> You could do something like:
>>
>> conf/layer.conf:
>> include ${LAYERDIR}/conf/broken.inc
>>
>> conf/broken.inc:
>>
>> <can we ensure the blacklist system is in the system>
>>
>> BROKENMSG_layername = "The recipe is disabled due to a build failure.  If you
>> need this recipe, or have gotten it to work.  Please submit patches to <path>.
>> Otherwise this recipe will be removed in the future."
>>
>> # Recipe FOO is broken as of 2014-03-14, see ...
>> PNBLACKLIST[FOO] = "${BROKENMSG_layername}"
>>
>> # Recipe BAR is broken as of 2013-06-13, see ...
>> PNBLACKLIST[BAR] = "${BROKENMSG_layername}"
>>
>>
>> Then after a given amount of time, say one year? on the broken list -- we can
>> then remove the items.
>>
>> If the format of the comments is such that it can be easily parsed, then we can
>> even automate tracking of these things.
>>
>> (In cases where dependencies are causing the breakage, the message cause be
>> augmented with that information as well...)
>>
>> The advantage of the blacklist system is that if a user tries to use the recipe
>> they will hopefully see the blacklist message, it prevents having to git mv
>> recipes, and should be easier for people to find/fix the bad code via a simple
>> patch.  (And hopefully easier to remove old cruft!)
>
> Yes, that's another way of doing that and I was using it on world builds
> as well (but without including it in layer and layer.conf to make it
> "public")
>
> e.g.
> http://logs.nslu2-linux.org/buildlogs/oe/oe-shr-core-branches/log.world.20140329_001343.log/world_mask.inc
>
> It definitely has the advantage that you can "document" it in the
> message and few more details in the file itself.
>
> Disadvantage from my POV was that I never included and enabled it in
> repo, so new people didn't know about it and will still see the issues
> when they try to build something broken.
>
> Another disadvantage was that I always felt, OK I'll mark this as broken
> with PNBLACKLIST and lets forget that it ever existed (sometimes I've
> uncommented include lines for this just to confirm that everything still
> fails - but not so often as "regular" builds).

Ya, it definitely prevents retry without a conscious change to a file.

> And last one: if I recall correctly, when I was using this it was hard
> to unblacklist something in your config, so if you wanted to test newer
> version or something you had to modify world_mask.inc first, which won't
> be very good for people if we include it by default.

To unblacklist, you would set PNBLACKIST[FOO] = ""

But of course, someone has to know then setting it to blank has the effect of 
removing the blacklist for their work.

--Mark

> Regards,
>
>> --Mark
>>
>>> What can we do better? How to motivate more people to do CI and send fixes?
>>> When we get to "green" state it will be easier to quickly spot new issues and
>>> easier to fix them, because it will be clear what's causing them.
>>>
>>>
>>>
>>
>> --
>> _______________________________________________
>> Openembedded-core mailing list
>> Openembedded-core@lists.openembedded.org
>> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-04-01 17:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-30  1:31 Quality of meta-oe metadata Martin Jansa
2014-03-30  5:33 ` Trevor Woerner
2014-03-30  5:33   ` [OE-core] " Trevor Woerner
2014-04-01 11:21   ` Richard Purdie
2014-04-01 11:21     ` [OE-core] " Richard Purdie
2014-03-30 14:48 ` Paul Barker
2014-03-30 14:48   ` [OE-core] " Paul Barker
2014-03-30 15:14   ` Martin Jansa
2014-03-30 15:14     ` [OE-core] " Martin Jansa
2014-03-30 15:56     ` Paul Barker
2014-03-30 15:56       ` [OE-core] " Paul Barker
2014-03-30 16:09 ` Paul Barker
2014-03-30 16:09   ` [OE-core] " Paul Barker
2014-04-01 17:12 ` Mark Hatle
2014-04-01 17:40   ` Martin Jansa
2014-04-01 17:50     ` Mark Hatle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.