All of lore.kernel.org
 help / color / mirror / Atom feed
* Intermittent failure issue summary
@ 2022-04-16 10:26 Richard Purdie
  2022-04-16 13:31 ` [OE-core] " Markus Volk
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Richard Purdie @ 2022-04-16 10:26 UTC (permalink / raw)
  To: openembedded-core

I'm guessing a lot of people don't follow the intermittent issues. I therefore
thought I'd share a summary of some of them along with some random thoughts on
them. There is a mix of different things here, each needing different skills.

Systemd daemon-reload unit restart failures:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=14787
AlexK has got part way in figuring out the circumstances of this, any systemd
experts able to spot what I think is a service file dependency issue?

EFI Boot Failure:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=14018
"oe-selftest - efibootpartition.GenericEFITest.test_boot_efi selftest"
Does anyone know the EFI boot process and know what logging we might add to the
system so we gain more insight when this happens?

Bitbake parsing error:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=14665
"Parsing recipes...ERROR: ParseError in None: Not all recipes parsed, parser
thread killed/died? Exiting" - I just can't spot the logic bug causing this
error (and some similar variants), maybe someone else can?

sstate files not found:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=14775
For this one I think we need to write a standalone replica of the tests against
an sstate mirror that sstate.bbclass runs to check if sstate objects exist. That
way we could try different load levels against the project server and see
whether it is the sstate/fetcher code (which does weird things with threads and
concurrent connections) or if it is the server side of things that has some
limit we can't spot.

pseudo do_flush_pseudodb task error:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=14654
not sure why this sometimes happens, like need to sport the race in the pseudo
shutdown code.

Memory resident bitbake PR Serv issue:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=14786
This is one of the blocking issues on moving to memory resident bitbake by
default

x86 boot log serio/CD drive timeout in qemu:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=14743
We've talked about disabling some of the peripherals we don't need/care about
such as psmouse and the CD drive. Anyone fancy digging into this with upstream
qemu? I suspect there are other people who'd like this too.

Bitbake Server timeout:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=14201
This one really needs a rework of bitbake's main loop with a new thread so that
the UI and server can talk even when whatever it is doing (parsing, event
handlers) is blocked. No takers?! Just thought I'd add to the list! :)


These are 8 of the issues and probably the most frequent/annoying or ones where
there is a clearish path forward. The full list of 57:

https://bugzilla.yoctoproject.org/buglist.cgi?quicksearch=AB-INT

(it was over 70 at one point, we've beaten it down a bit)

Cheers,

Richard





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [OE-core] Intermittent failure issue summary
  2022-04-16 10:26 Intermittent failure issue summary Richard Purdie
@ 2022-04-16 13:31 ` Markus Volk
  2022-04-16 13:40   ` Richard Purdie
  2022-04-16 15:37 ` Jose Quaresma
  2022-04-19 16:50 ` Ross Burton
  2 siblings, 1 reply; 7+ messages in thread
From: Markus Volk @ 2022-04-16 13:31 UTC (permalink / raw)
  To: openembedded-core

the systemd issue could be this ?

https://github.com/systemd/systemd/pull/22552/commits/de90700f36f2126528f7ce92df0b5b5d5e277558

Am 16.04.22 um 12:26 schrieb Richard Purdie:
> I'm guessing a lot of people don't follow the intermittent issues. I therefore
> thought I'd share a summary of some of them along with some random thoughts on
> them. There is a mix of different things here, each needing different skills.
>
> Systemd daemon-reload unit restart failures:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14787
> AlexK has got part way in figuring out the circumstances of this, any systemd
> experts able to spot what I think is a service file dependency issue?
>
> EFI Boot Failure:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14018
> "oe-selftest - efibootpartition.GenericEFITest.test_boot_efi selftest"
> Does anyone know the EFI boot process and know what logging we might add to the
> system so we gain more insight when this happens?
>
> Bitbake parsing error:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14665
> "Parsing recipes...ERROR: ParseError in None: Not all recipes parsed, parser
> thread killed/died? Exiting" - I just can't spot the logic bug causing this
> error (and some similar variants), maybe someone else can?
>
> sstate files not found:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14775
> For this one I think we need to write a standalone replica of the tests against
> an sstate mirror that sstate.bbclass runs to check if sstate objects exist. That
> way we could try different load levels against the project server and see
> whether it is the sstate/fetcher code (which does weird things with threads and
> concurrent connections) or if it is the server side of things that has some
> limit we can't spot.
>
> pseudo do_flush_pseudodb task error:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14654
> not sure why this sometimes happens, like need to sport the race in the pseudo
> shutdown code.
>
> Memory resident bitbake PR Serv issue:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14786
> This is one of the blocking issues on moving to memory resident bitbake by
> default
>
> x86 boot log serio/CD drive timeout in qemu:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14743
> We've talked about disabling some of the peripherals we don't need/care about
> such as psmouse and the CD drive. Anyone fancy digging into this with upstream
> qemu? I suspect there are other people who'd like this too.
>
> Bitbake Server timeout:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14201
> This one really needs a rework of bitbake's main loop with a new thread so that
> the UI and server can talk even when whatever it is doing (parsing, event
> handlers) is blocked. No takers?! Just thought I'd add to the list! :)
>
>
> These are 8 of the issues and probably the most frequent/annoying or ones where
> there is a clearish path forward. The full list of 57:
>
> https://bugzilla.yoctoproject.org/buglist.cgi?quicksearch=AB-INT
>
> (it was over 70 at one point, we've beaten it down a bit)
>
> Cheers,
>
> Richard
>
>
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#164532): https://lists.openembedded.org/g/openembedded-core/message/164532
> Mute This Topic: https://lists.openembedded.org/mt/90503262/3618223
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [f_l_k@t-online.de]
> -=-=-=-=-=-=-=-=-=-=-=-
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [OE-core] Intermittent failure issue summary
  2022-04-16 13:31 ` [OE-core] " Markus Volk
@ 2022-04-16 13:40   ` Richard Purdie
  2022-04-16 15:29     ` Alexander Kanavin
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Purdie @ 2022-04-16 13:40 UTC (permalink / raw)
  To: Markus Volk, openembedded-core; +Cc: Alexander Kanavin

On Sat, 2022-04-16 at 15:31 +0200, Markus Volk wrote:
> the systemd issue could be this ?
> 
> https://github.com/systemd/systemd/pull/22552/commits/de90700f36f2126528f7ce92df0b5b5d5e277558
> 
> Am 16.04.22 um 12:26 schrieb Richard Purdie:
> > Systemd daemon-reload unit restart failures:
> > https://bugzilla.yoctoproject.org/show_bug.cgi?id=14787
> > AlexK has got part way in figuring out the circumstances of this, any systemd
> > experts able to spot what I think is a service file dependency issue?
> > 

Yes, that could well be it :)

Particularly when you read:

https://github.com/systemd/systemd/issues/15316

Alex: Any thoughts?

Cheers,

Richard



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [OE-core] Intermittent failure issue summary
  2022-04-16 13:40   ` Richard Purdie
@ 2022-04-16 15:29     ` Alexander Kanavin
  0 siblings, 0 replies; 7+ messages in thread
From: Alexander Kanavin @ 2022-04-16 15:29 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Markus Volk, OE-core

On Sat, 16 Apr 2022 at 15:40, Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
> > the systemd issue could be this ?
> >
> > https://github.com/systemd/systemd/pull/22552/commits/de90700f36f2126528f7ce92df0b5b5d5e277558
> >
> > Am 16.04.22 um 12:26 schrieb Richard Purdie:
> > > Systemd daemon-reload unit restart failures:
> > > https://bugzilla.yoctoproject.org/show_bug.cgi?id=14787
> > > AlexK has got part way in figuring out the circumstances of this, any systemd
> > > experts able to spot what I think is a service file dependency issue?
> > >
>
> Yes, that could well be it :)
>
> Particularly when you read:
>
> https://github.com/systemd/systemd/issues/15316
>
> Alex: Any thoughts?

These commits have been backported to 250-stable, released in 250.4,
and we already carry that version :-(
https://github.com/systemd/systemd-stable/commit/367041af816d48d4852140f98fd0ba78ed83f9e4

Alex


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [OE-core] Intermittent failure issue summary
  2022-04-16 10:26 Intermittent failure issue summary Richard Purdie
  2022-04-16 13:31 ` [OE-core] " Markus Volk
@ 2022-04-16 15:37 ` Jose Quaresma
  2022-04-19 16:50 ` Ross Burton
  2 siblings, 0 replies; 7+ messages in thread
From: Jose Quaresma @ 2022-04-16 15:37 UTC (permalink / raw)
  To: Richard Purdie; +Cc: openembedded-core

[-- Attachment #1: Type: text/plain, Size: 4172 bytes --]

Richard Purdie <richard.purdie@linuxfoundation.org> escreveu no dia sábado,
16/04/2022 à(s) 11:26:

> I'm guessing a lot of people don't follow the intermittent issues. I
> therefore
> thought I'd share a summary of some of them along with some random
> thoughts on
> them. There is a mix of different things here, each needing different
> skills.
>
> Systemd daemon-reload unit restart failures:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14787
> AlexK has got part way in figuring out the circumstances of this, any
> systemd
> experts able to spot what I think is a service file dependency issue?
>
> EFI Boot Failure:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14018
> "oe-selftest - efibootpartition.GenericEFITest.test_boot_efi selftest"
> Does anyone know the EFI boot process and know what logging we might add
> to the
> system so we gain more insight when this happens?
>
> Bitbake parsing error:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14665
> "Parsing recipes...ERROR: ParseError in None: Not all recipes parsed,
> parser
> thread killed/died? Exiting" - I just can't spot the logic bug causing this
> error (and some similar variants), maybe someone else can?
>
> sstate files not found:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14775
> For this one I think we need to write a standalone replica of the tests
> against
> an sstate mirror that sstate.bbclass runs to check if sstate objects
> exist. That
> way we could try different load levels against the project server and see
> whether it is the sstate/fetcher code (which does weird things with
> threads and
> concurrent connections) or if it is the server side of things that has some
> limit we can't spot.
>

Will it be a good idea to raise a warning and do another try for such cases?

A timeout on socket seems to me that is server related and the last server
infrastructure migration this timeout issue improves a lot.
Before that last migration I can workaround this timeout issue setting
BB_NUMBER_THREADS=1 that will do one connection at a time.
Ding this BB_NUMBER_THREADS=1 makes me think that this can be
some race condition with the oe.utils.ThreadedPool that afaik
is only used on the sstate.bbclass.

Jose


> pseudo do_flush_pseudodb task error:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14654
> not sure why this sometimes happens, like need to sport the race in the
> pseudo
> shutdown code.
>
> Memory resident bitbake PR Serv issue:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14786
> This is one of the blocking issues on moving to memory resident bitbake by
> default
>
> x86 boot log serio/CD drive timeout in qemu:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14743
> We've talked about disabling some of the peripherals we don't need/care
> about
> such as psmouse and the CD drive. Anyone fancy digging into this with
> upstream
> qemu? I suspect there are other people who'd like this too.
>
> Bitbake Server timeout:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14201
> This one really needs a rework of bitbake's main loop with a new thread so
> that
> the UI and server can talk even when whatever it is doing (parsing, event
> handlers) is blocked. No takers?! Just thought I'd add to the list! :)
>
>
> These are 8 of the issues and probably the most frequent/annoying or ones
> where
> there is a clearish path forward. The full list of 57:
>
> https://bugzilla.yoctoproject.org/buglist.cgi?quicksearch=AB-INT
>
> (it was over 70 at one point, we've beaten it down a bit)
>
> Cheers,
>
> Richard
>
>
>
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#164532):
> https://lists.openembedded.org/g/openembedded-core/message/164532
> Mute This Topic: https://lists.openembedded.org/mt/90503262/5052612
> Group Owner: openembedded-core+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [
> quaresma.jose@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
>

-- 
Best regards,

José Quaresma

[-- Attachment #2: Type: text/html, Size: 6445 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [OE-core] Intermittent failure issue summary
  2022-04-16 10:26 Intermittent failure issue summary Richard Purdie
  2022-04-16 13:31 ` [OE-core] " Markus Volk
  2022-04-16 15:37 ` Jose Quaresma
@ 2022-04-19 16:50 ` Ross Burton
  2022-04-20 21:54   ` Richard Purdie
  2 siblings, 1 reply; 7+ messages in thread
From: Ross Burton @ 2022-04-19 16:50 UTC (permalink / raw)
  To: Richard Purdie; +Cc: openembedded-core

On Sat, 16 Apr 2022 at 11:26, Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
> x86 boot log serio/CD drive timeout in qemu:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=14743
> We've talked about disabling some of the peripherals we don't need/care about
> such as psmouse and the CD drive. Anyone fancy digging into this with upstream
> qemu? I suspect there are other people who'd like this too.

Patches sent for the keyboard/mouse part.  The CD drive is trickier...

Ross


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [OE-core] Intermittent failure issue summary
  2022-04-19 16:50 ` Ross Burton
@ 2022-04-20 21:54   ` Richard Purdie
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Purdie @ 2022-04-20 21:54 UTC (permalink / raw)
  To: Ross Burton; +Cc: openembedded-core

On Tue, 2022-04-19 at 17:50 +0100, Ross Burton wrote:
> On Sat, 16 Apr 2022 at 11:26, Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
> > x86 boot log serio/CD drive timeout in qemu:
> > https://bugzilla.yoctoproject.org/show_bug.cgi?id=14743
> > We've talked about disabling some of the peripherals we don't need/care about
> > such as psmouse and the CD drive. Anyone fancy digging into this with upstream
> > qemu? I suspect there are other people who'd like this too.
> 
> Patches sent for the keyboard/mouse part.  The CD drive is trickier...

Knocking those two out alone is great and much appreciated, thanks!

Cheers,

Richard



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-04-21 16:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-16 10:26 Intermittent failure issue summary Richard Purdie
2022-04-16 13:31 ` [OE-core] " Markus Volk
2022-04-16 13:40   ` Richard Purdie
2022-04-16 15:29     ` Alexander Kanavin
2022-04-16 15:37 ` Jose Quaresma
2022-04-19 16:50 ` Ross Burton
2022-04-20 21:54   ` Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.