All of lore.kernel.org
 help / color / mirror / Atom feed
* pseudo 1.8.1 doesn't work with docker & dumb-init
@ 2016-08-31  9:21 wenzong fan
  2016-08-31 15:11 ` Joshua Lock
  2016-08-31 15:48 ` Seebs
  0 siblings, 2 replies; 11+ messages in thread
From: wenzong fan @ 2016-08-31  9:21 UTC (permalink / raw)
  To: 'Patches and discussions about the oe-core layer',
	seebs, Richard Purdie

Hi Experts,

While I trying to build Yocto in Docker Container which using dumb-init 
as init system, I found the build always be stopped at some point and 
the container was terminated as well with below errors:

     Child process timeout after 2 seconds.
     Child process exit status 4: lock_held

Sometimes there's not any obvious error message.

After some `git bisect` testing, I believe the issue was started since 
commit:

----------------------
9df3cdf42d8c1216682f497f0b166a43ef9f4184 is the first bad commit
commit 9df3cdf42d8c1216682f497f0b166a43ef9f4184
Author: Richard Purdie <richard.purdie@linuxfoundation.org>
Date: Tue Jul 5 13:18:31 2016 +0100

     pseudo: Upgrade to 1.8.1

     * Drop patches where the changes exist upstream
     * Fetch from git as no tarball is available for 1.8.1
     * Move common code to pseudo.inc
     * Update patchset in git recipe

     (From OE-Core rev: 0c36984d4c501d12fa91cf7371511641585cc256)
-----------------------

Finally I narrowed it down to pseudo commit:

------------------------
commit 77ee254a6c974aad9bcab2c58c9ee9e0880c9718
Author: Peter Seebach <peter.seebach@windriver.com>
Date: Tue Mar 1 16:21:15 2016 -0600

     Server launch reworking.

     This is the big overhaul to have the server provide meaningful exit 
status
     to clients.

     In the process, I discovered that the server was running with 
signals blocked
     if launched by a client, which is not a good thing, and prevented 
this from
     working as intended.

     Still looking to see why more than one server spawn seems to happen.
------------------------

I also created a testcase for reproducing the issue at:

     https://github.com/WenzongFan/docker-build-yocto

For dumb-init please refer to:

     https://github.com/Yelp/dumb-init

Could anyone help to fix the signal handling in pseudo?


Thanks
Wenzong


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pseudo 1.8.1 doesn't work with docker & dumb-init
  2016-08-31  9:21 pseudo 1.8.1 doesn't work with docker & dumb-init wenzong fan
@ 2016-08-31 15:11 ` Joshua Lock
  2016-09-02  1:24   ` wenzong fan
  2016-08-31 15:48 ` Seebs
  1 sibling, 1 reply; 11+ messages in thread
From: Joshua Lock @ 2016-08-31 15:11 UTC (permalink / raw)
  To: wenzong fan,
	'Patches and discussions about the oe-core layer',
	seebs, Richard Purdie

On Wed, 2016-08-31 at 17:21 +0800, wenzong fan wrote:
> Hi Experts,
> 
> While I trying to build Yocto in Docker Container which using dumb-
> init 
> as init system, I found the build always be stopped at some point
> and 
> the container was terminated as well with below errors:
> 
>      Child process timeout after 2 seconds.
>      Child process exit status 4: lock_held
> 
> Sometimes there's not any obvious error message.
> 
> After some `git bisect` testing, I believe the issue was started
> since 
> commit:
> 
> ----------------------
> 9df3cdf42d8c1216682f497f0b166a43ef9f4184 is the first bad commit
> commit 9df3cdf42d8c1216682f497f0b166a43ef9f4184
> Author: Richard Purdie <richard.purdie@linuxfoundation.org>
> Date: Tue Jul 5 13:18:31 2016 +0100
> 
>      pseudo: Upgrade to 1.8.1
> 
>      * Drop patches where the changes exist upstream
>      * Fetch from git as no tarball is available for 1.8.1
>      * Move common code to pseudo.inc
>      * Update patchset in git recipe
> 
>      (From OE-Core rev: 0c36984d4c501d12fa91cf7371511641585cc256)
> -----------------------
> 
> Finally I narrowed it down to pseudo commit:
> 
> ------------------------
> commit 77ee254a6c974aad9bcab2c58c9ee9e0880c9718
> Author: Peter Seebach <peter.seebach@windriver.com>
> Date: Tue Mar 1 16:21:15 2016 -0600
> 
>      Server launch reworking.
> 
>      This is the big overhaul to have the server provide meaningful
> exit 
> status
>      to clients.
> 
>      In the process, I discovered that the server was running with 
> signals blocked
>      if launched by a client, which is not a good thing, and
> prevented 
> this from
>      working as intended.
> 
>      Still looking to see why more than one server spawn seems to
> happen.
> ------------------------
> 
> I also created a testcase for reproducing the issue at:
> 
>      https://github.com/WenzongFan/docker-build-yocto

Thanks for providing a detailed reproducer. I'm trying to configure a
container behind my proxy here.

> 
> For dumb-init please refer to:
> 
>      https://github.com/Yelp/dumb-init
> 
> Could anyone help to fix the signal handling in pseudo?

It may not actually be pseudo at fault here. I've only skimmed the
dumb-init README but it looks like there might be a strange interaction
between the newly fixed signal handling in pseudo and dumb-init's
signal handling.

Should dumb-init be running in single-child/non-setsid mode so that
signals are only forwarded to the direct child rather than all child
processes in the dumb-init session? Is this a scenario you've tested?

Regards,

Joshua



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pseudo 1.8.1 doesn't work with docker & dumb-init
  2016-08-31  9:21 pseudo 1.8.1 doesn't work with docker & dumb-init wenzong fan
  2016-08-31 15:11 ` Joshua Lock
@ 2016-08-31 15:48 ` Seebs
  2016-09-02  1:33   ` wenzong fan
  1 sibling, 1 reply; 11+ messages in thread
From: Seebs @ 2016-08-31 15:48 UTC (permalink / raw)
  To: wenzong fan; +Cc: Patches and discussions about the oe-core layer

On 31 Aug 2016, at 4:21, wenzong fan wrote:

> Finally I narrowed it down to pseudo commit:

Yes, that makes sense, we expect that there'd be potential issues, but I 
didn't have a reproducer for any. Thanks! I'll see whether it reproduces 
for me now. Any specific version of docker I might need?

-s


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pseudo 1.8.1 doesn't work with docker & dumb-init
  2016-08-31 15:11 ` Joshua Lock
@ 2016-09-02  1:24   ` wenzong fan
  0 siblings, 0 replies; 11+ messages in thread
From: wenzong fan @ 2016-09-02  1:24 UTC (permalink / raw)
  To: Joshua Lock,
	'Patches and discussions about the oe-core layer',
	seebs, Richard Purdie

On 08/31/2016 11:11 PM, Joshua Lock wrote:
> On Wed, 2016-08-31 at 17:21 +0800, wenzong fan wrote:
>> Hi Experts,
>>
>> While I trying to build Yocto in Docker Container which using dumb-
>> init
>> as init system, I found the build always be stopped at some point
>> and
>> the container was terminated as well with below errors:
>>
>>      Child process timeout after 2 seconds.
>>      Child process exit status 4: lock_held
>>
>> Sometimes there's not any obvious error message.
>>
>> After some `git bisect` testing, I believe the issue was started
>> since
>> commit:
>>
>> ----------------------
>> 9df3cdf42d8c1216682f497f0b166a43ef9f4184 is the first bad commit
>> commit 9df3cdf42d8c1216682f497f0b166a43ef9f4184
>> Author: Richard Purdie <richard.purdie@linuxfoundation.org>
>> Date: Tue Jul 5 13:18:31 2016 +0100
>>
>>      pseudo: Upgrade to 1.8.1
>>
>>      * Drop patches where the changes exist upstream
>>      * Fetch from git as no tarball is available for 1.8.1
>>      * Move common code to pseudo.inc
>>      * Update patchset in git recipe
>>
>>      (From OE-Core rev: 0c36984d4c501d12fa91cf7371511641585cc256)
>> -----------------------
>>
>> Finally I narrowed it down to pseudo commit:
>>
>> ------------------------
>> commit 77ee254a6c974aad9bcab2c58c9ee9e0880c9718
>> Author: Peter Seebach <peter.seebach@windriver.com>
>> Date: Tue Mar 1 16:21:15 2016 -0600
>>
>>      Server launch reworking.
>>
>>      This is the big overhaul to have the server provide meaningful
>> exit
>> status
>>      to clients.
>>
>>      In the process, I discovered that the server was running with
>> signals blocked
>>      if launched by a client, which is not a good thing, and
>> prevented
>> this from
>>      working as intended.
>>
>>      Still looking to see why more than one server spawn seems to
>> happen.
>> ------------------------
>>
>> I also created a testcase for reproducing the issue at:
>>
>>      https://github.com/WenzongFan/docker-build-yocto
>
> Thanks for providing a detailed reproducer. I'm trying to configure a
> container behind my proxy here.
>
>>
>> For dumb-init please refer to:
>>
>>      https://github.com/Yelp/dumb-init
>>
>> Could anyone help to fix the signal handling in pseudo?
>
> It may not actually be pseudo at fault here. I've only skimmed the
> dumb-init README but it looks like there might be a strange interaction
> between the newly fixed signal handling in pseudo and dumb-init's
> signal handling.
>
> Should dumb-init be running in single-child/non-setsid mode so that
> signals are only forwarded to the direct child rather than all child
> processes in the dumb-init session? Is this a scenario you've tested?

Yes, I had try below options, but all of them don't work:

1) Run dumb-init with the -c flag: 
https://github.com/Yelp/dumb-init/issues/51 - single-child/non-setsid mode
2) Update dumb-init to latest version v1.1.3 (the release notes mention 
fixes for race conditions)
3) Switch to tini which an alterative to dumb-init: 
https://github.com/krallin/tini

Thanks
Wenzong

>
> Regards,
>
> Joshua
>
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pseudo 1.8.1 doesn't work with docker & dumb-init
  2016-08-31 15:48 ` Seebs
@ 2016-09-02  1:33   ` wenzong fan
  2016-09-02  2:10     ` Seebs
  0 siblings, 1 reply; 11+ messages in thread
From: wenzong fan @ 2016-09-02  1:33 UTC (permalink / raw)
  To: Seebs; +Cc: Patches and discussions about the oe-core layer

On 08/31/2016 11:48 PM, Seebs wrote:
> On 31 Aug 2016, at 4:21, wenzong fan wrote:
>
>> Finally I narrowed it down to pseudo commit:
>
> Yes, that makes sense, we expect that there'd be potential issues, but I
> didn't have a reproducer for any. Thanks! I'll see whether it reproduces
> for me now. Any specific version of docker I might need?

No, I didn't think it's related to any specific docker version.

I tested it on "Docker version 1.7.1, build 786b29d" & "Docker version 
1.11.2, build b9f10c9".

BTW, I also tested the docker build w/o dumb-init, and the build works ...

Thanks
Wenzong

>
> -s
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pseudo 1.8.1 doesn't work with docker & dumb-init
  2016-09-02  1:33   ` wenzong fan
@ 2016-09-02  2:10     ` Seebs
  2016-09-07  6:32       ` wenzong fan
  0 siblings, 1 reply; 11+ messages in thread
From: Seebs @ 2016-09-02  2:10 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On 1 Sep 2016, at 20:33, wenzong fan wrote:

> No, I didn't think it's related to any specific docker version.
>
> I tested it on "Docker version 1.7.1, build 786b29d" & "Docker version 
> 1.11.2, build b9f10c9".
>
> BTW, I also tested the docker build w/o dumb-init, and the build works 
> ...

Yeah, it's definitely specific in some way to docker.

However, it doesn't appear to be 100% reproducible; I just tried a build 
with your reproducer and it completed without problems. (Unless the 
problems are more subtle, and don't prevent a build.) So this one's 
gonna be really fun to track down.

-s


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pseudo 1.8.1 doesn't work with docker & dumb-init
  2016-09-02  2:10     ` Seebs
@ 2016-09-07  6:32       ` wenzong fan
  2016-09-07  6:40         ` Seebs
  0 siblings, 1 reply; 11+ messages in thread
From: wenzong fan @ 2016-09-07  6:32 UTC (permalink / raw)
  To: Seebs, Patches and discussions about the oe-core layer

On 09/02/2016 10:10 AM, Seebs wrote:
> On 1 Sep 2016, at 20:33, wenzong fan wrote:
>
>> No, I didn't think it's related to any specific docker version.
>>
>> I tested it on "Docker version 1.7.1, build 786b29d" & "Docker version
>> 1.11.2, build b9f10c9".
>>
>> BTW, I also tested the docker build w/o dumb-init, and the build works
>> ...
>
> Yeah, it's definitely specific in some way to docker.
>
> However, it doesn't appear to be 100% reproducible; I just tried a build
> with your reproducer and it completed without problems. (Unless the
> problems are more subtle, and don't prevent a build.) So this one's
> gonna be really fun to track down.

Yes, I believe it's not a 100 reproducible issue. Maybe you could run it 
with other builds in parallel and try it 3 times or more.

It keeps high probability on my work host which a server that shared by 
several persons, I can always get the error from 1 ~ 3 times build.

Thanks
Wenzong

>
> -s


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pseudo 1.8.1 doesn't work with docker & dumb-init
  2016-09-07  6:32       ` wenzong fan
@ 2016-09-07  6:40         ` Seebs
  2016-09-14 20:46           ` Bystricky, Juro
  0 siblings, 1 reply; 11+ messages in thread
From: Seebs @ 2016-09-07  6:40 UTC (permalink / raw)
  To: wenzong fan; +Cc: Patches and discussions about the oe-core layer

On 7 Sep 2016, at 1:32, wenzong fan wrote:

> Yes, I believe it's not a 100 reproducible issue. Maybe you could run 
> it with other builds in parallel and try it 3 times or more.

I can try, but that might need bigger hardware than I have to hand at 
the moment.

-s


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pseudo 1.8.1 doesn't work with docker & dumb-init
  2016-09-07  6:40         ` Seebs
@ 2016-09-14 20:46           ` Bystricky, Juro
  2016-09-15  2:24             ` Randy MacLeod
  0 siblings, 1 reply; 11+ messages in thread
From: Bystricky, Juro @ 2016-09-14 20:46 UTC (permalink / raw)
  To: Seebs, Fan, Wenzong (Wind River)
  Cc: Patches and discussions about the oe-core layer

I am pretty sure I glimpsed the messages: 
	Child process timeout after 2 seconds.
    	Child process exit status 4: lock_held
on several occasions recently, just before my Xserver was restarted and I was kicked back to the login prompt.  
I typically ran several parallel bitbake builds. Ubuntu 16.04, not using container. The last message in the syslog (first error message) was always:
Fatal IO error 11 (Resource temporarily unavailable) on X server :0

Possibly not related to this problem, nevertheless worth mentioning.

Thanks

Juro


> -----Original Message-----
> From: openembedded-core-bounces@lists.openembedded.org
> [mailto:openembedded-core-bounces@lists.openembedded.org] On Behalf Of
> Seebs
> Sent: Tuesday, September 6, 2016 11:40 PM
> To: Fan, Wenzong (Wind River) <wenzong.fan@windriver.com>
> Cc: Patches and discussions about the oe-core layer <openembedded-
> core@lists.openembedded.org>
> Subject: Re: [OE-core] pseudo 1.8.1 doesn't work with docker & dumb-init
> 
> On 7 Sep 2016, at 1:32, wenzong fan wrote:
> 
> > Yes, I believe it's not a 100 reproducible issue. Maybe you could run
> > it with other builds in parallel and try it 3 times or more.
> 
> I can try, but that might need bigger hardware than I have to hand at
> the moment.
> 
> -s
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pseudo 1.8.1 doesn't work with docker & dumb-init
  2016-09-14 20:46           ` Bystricky, Juro
@ 2016-09-15  2:24             ` Randy MacLeod
  2016-09-15 19:08               ` Randy MacLeod
  0 siblings, 1 reply; 11+ messages in thread
From: Randy MacLeod @ 2016-09-15  2:24 UTC (permalink / raw)
  To: Bystricky, Juro, Seebs, Fan, Wenzong (Wind River)
  Cc: Patches and discussions about the oe-core layer

On 2016-09-14 04:46 PM, Bystricky, Juro wrote:
> I am pretty sure I glimpsed the messages:
> 	Child process timeout after 2 seconds.
>     	Child process exit status 4: lock_held
> on several occasions recently, just before my Xserver was restarted and I was kicked back to the login prompt.
> I typically ran several parallel bitbake builds. Ubuntu 16.04, not using container. The last message in the syslog (first error message) was always:
> Fatal IO error 11 (Resource temporarily unavailable) on X server :0
>
> Possibly not related to this problem, nevertheless worth mentioning.

Yes, it may be. Thanks for reporting it.

Two  weeks ago, I was building a qemuarm64 image on my laptop
(i7, 16 GB, SSD running Ubuntu-16.04)
and I saw a similarity bizarre result from running a build
in that chrome and then the X server were both killed.
I wasn't in front of the system when this happened so
I can't say exactly what was going on.

I did collect some of the logs from my IRC client and chrome:


[423679.028437] konversation[23416]: segfault at 7f72d2c33ce0 ip 
00007f72eca4e818 sp 00007ffc7f450ae0 error 4 in 
libQt5Gui.so.5.5.1[7f72ec8da000+527000]
[423679.325315] chrome[28083]: segfault at 968 ip 00007f63f7615643 sp 
00007ffd26c25af0 error 4 in libX11.so.6.3.0[7f63f75ed000+135000]


and then from the X server:

Aug 29 16:11:59 laptop org.a11y.atspi.Registry[4763]: XIO:  fatal IO 
error 11 (Resource temporarily unavailable) on X server ":0"
Aug 29 16:11:59 laptop org.a11y.atspi.Registry[4763]:       after 67649 
requests (67649 known processed) with 0 events remaining.
Aug 29 16:11:59 laptop gnome-session[4748]: (diodon:4925): Gdk-WARNING 
**: diodon: Fatal IO error 11 (Resource temporarily unavailable) on X 
server :0
.
...
Aug 29 16:11:59 laptop systemd[1]: Started Process Core Dump (PID 
28084/UID 0).
Aug 29 16:11:59 laptop gnome-session[4748]: Failed to connect to Mir: 
Failed to connect to server socket: No such file or directory
Aug 29 16:11:59 laptop kernel: [423679.325315] chrome[28083]: segfault 
at 968 ip 00007f63f7615643 sp 00007ffd26c25af0 error 4 in 
libX11.so.6.3.0[7f63f75ed000+135000]



In my case, I had added meta-oe to oe-core and was building:
    MACHINE=qemuarm64 bitbake imagemagick

I reproduced it once in X then did NOT see it happen when
I built with an X session running but building on the console.
i.e. the build of imagemagick for qemuarm64 succeeded.

I've removed the build logs it seems but I'll see if I can reproduce
the failure overnight.

../Randy

>
> Thanks
>
> Juro
>
>
>> -----Original Message-----
>> From: openembedded-core-bounces@lists.openembedded.org
>> [mailto:openembedded-core-bounces@lists.openembedded.org] On Behalf Of
>> Seebs
>> Sent: Tuesday, September 6, 2016 11:40 PM
>> To: Fan, Wenzong (Wind River) <wenzong.fan@windriver.com>
>> Cc: Patches and discussions about the oe-core layer <openembedded-
>> core@lists.openembedded.org>
>> Subject: Re: [OE-core] pseudo 1.8.1 doesn't work with docker & dumb-init
>>
>> On 7 Sep 2016, at 1:32, wenzong fan wrote:
>>
>>> Yes, I believe it's not a 100 reproducible issue. Maybe you could run
>>> it with other builds in parallel and try it 3 times or more.
>>
>> I can try, but that might need bigger hardware than I have to hand at
>> the moment.
>>
>> -s
>> --
>> _______________________________________________
>> Openembedded-core mailing list
>> Openembedded-core@lists.openembedded.org
>> http://lists.openembedded.org/mailman/listinfo/openembedded-core


-- 
# Randy MacLeod. SMTS, Linux, Wind River
Direct: 613.963.1350 | 350 Terry Fox Drive, Suite 200, Ottawa, ON, 
Canada, K2K 2W5


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pseudo 1.8.1 doesn't work with docker & dumb-init
  2016-09-15  2:24             ` Randy MacLeod
@ 2016-09-15 19:08               ` Randy MacLeod
  0 siblings, 0 replies; 11+ messages in thread
From: Randy MacLeod @ 2016-09-15 19:08 UTC (permalink / raw)
  To: Bystricky, Juro, Seebs, Fan, Wenzong (Wind River)
  Cc: Patches and discussions about the oe-core layer

On 2016-09-14 10:24 PM, Randy MacLeod wrote:
> I'll see if I can reproduce
> the failure overnight.

The laptop build worked without error. I may try again tonight.

-- 
# Randy MacLeod. SMTS, Linux, Wind River
Direct: 613.963.1350 | 350 Terry Fox Drive, Suite 200, Ottawa, ON, 
Canada, K2K 2W5


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-09-15 19:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-31  9:21 pseudo 1.8.1 doesn't work with docker & dumb-init wenzong fan
2016-08-31 15:11 ` Joshua Lock
2016-09-02  1:24   ` wenzong fan
2016-08-31 15:48 ` Seebs
2016-09-02  1:33   ` wenzong fan
2016-09-02  2:10     ` Seebs
2016-09-07  6:32       ` wenzong fan
2016-09-07  6:40         ` Seebs
2016-09-14 20:46           ` Bystricky, Juro
2016-09-15  2:24             ` Randy MacLeod
2016-09-15 19:08               ` Randy MacLeod

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.