All of lore.kernel.org
 help / color / mirror / Atom feed
* build failures due to pigz host tool
@ 2019-07-03 15:04 Trevor Woerner
  2019-07-03 16:02 ` Mikko.Rapeli
  2019-07-03 16:10 ` Richard Purdie
  0 siblings, 2 replies; 8+ messages in thread
From: Trevor Woerner @ 2019-07-03 15:04 UTC (permalink / raw)
  To: openembedded-core

Hi,

This came up as a topic in yesterday's Engineering Sync meeting. For roughly a
year I've been seeing random build failures on my Jenkins setup due to pigz
failing; apparently the project is now seeing them on their builds, so I'll
share what I know of them.

At the time I started seeing these failures (Aug 2018) I had just upgraded my
system to openSUSE 15.0. Since nobody else was seeing them, I assumed they
were related to my setup. When I went out searching for an answer, I found
there wasn't very much out there to help me. But I did notice that there were
reports of other people seeing the issue who weren't using openSUSE and who
weren't doing anything related to OE builds using Jenkins.

The build failure looks something like this:

	| DEBUG: Executing shell function sstate_create_package
	| pigz: abort: internal threads error
	| tar: /z/jenkins-workspace/nightly/cubietruck/build/sstate-cache/8a/sstate:linux-mainline:cubietruck-oe-linux-gnueabi:4.19.46:r0:cubietruck:3:8a159ba1ffefb5fc2feeeff5b40abf8ad67658e5ff3ed3bf67d25d9c8f2805e0_package.tgz.9bA8tCje: Wrote only 6144 of 10240 bytes
	| tar: Child returned status 16
	| tar: Error is not recoverable: exiting now
	| WARNING: /z/jenkins-workspace/nightly/cubietruck/build/tmp-glibc/work/cubietruck-oe-linux-gnueabi/linux-mainline/4.19.46-r0/temp/run.sstate_create_package.19996:1 exit 1 from 'exit 1'
	| DEBUG: Python function sstate_task_postfunc finished
	| ERROR: Function failed: sstate_create_package (log file is located at /z/jenkins-workspace/nightly/cubietruck/build/tmp-glibc/work/cubietruck-oe-linux-gnueabi/linux-mainline/4.19.46-r0/temp/log.do_package.19996)
	NOTE: recipe linux-mainline-4.19.46-r0: task do_package: Failed
	ERROR: Task (/opt/oe/configs/z/jenkins-workspace/nightly/cubietruck/layers/meta-sunxi/recipes-kernel/linux/linux-mainline_4.19.46.bb:do_package) failed with exit code '1'

Here's another example:

	| DEBUG: Executing shell function sstate_create_package
	| pigz: abort: internal threads error
	| tar: /z/jenkins-workspace/nightly/odroid-xu4/build/sstate-cache/d4/sstate:sqlite3:cortexa15t2hf-neon-vfpv4-oe-linux-gnueabi:3.28.0:r0:cortexa15t2hf-neon-vfpv4:3:d4eb5692a1756a832d72fb2003a3d431108fbc736044747d33698ad7b6881dd9_package.tgz.herLUpYQ: Wrote only 2048 of 10240 bytes
	| tar: Child returned status 16
	| tar: Error is not recoverable: exiting now
	| WARNING: /z/jenkins-workspace/nightly/odroid-xu4/build/tmp-glibc/work/cortexa15t2hf-neon-vfpv4-oe-linux-gnueabi/sqlite3/3_3.28.0-r0/temp/run.sstate_create_package.24136:1 exit 1 from 'exit 1'
	| DEBUG: Python function sstate_task_postfunc finished
	| ERROR: Function failed: sstate_create_package (log file is located at /z/jenkins-workspace/nightly/odroid-xu4/build/tmp-glibc/work/cortexa15t2hf-neon-vfpv4-oe-linux-gnueabi/sqlite3/3_3.28.0-r0/temp/log.do_package.24136)
	NOTE: recipe sqlite3-3_3.28.0-r0: task do_package: Failed
	ERROR: Task (/opt/oe/configs/z/jenkins-workspace/nightly/odroid-xu4/layers/openembedded-core/meta/recipes-support/sqlite/sqlite3_3.28.0.bb:do_package) failed with exit code '1'

When I first started seeing this problem, I would see it quite frequently.
Every morning, out of roughly 15 nightly builds, around 4-5 of them would have
failed in this way. Back then I would also get a lot of errors that would
report something along the lines of the following:

	fork: Resource temporarily unavailable
	Cannot spawn thread (?)

I don't have an example of that error on hand, but I used to get a lot of
those around the same time too.

My observations are:
- I've never seen any of these errors with builds that I run by hand, oddly
  enough, these errors only ever happen to builds that are run by Jenkins. I
  have no idea if this is just a coincidence, or if there is something going
  on related to kicking off a build from a large program (Jenkins)

- Back then these failures were quite frequent. Today, of the 20-ish or so
  Jenkins builds that are kicked off every night, in a 2-week span I have only
  2 such failures. So it seems that I've been able to reduce the occurrence
  rate, but not eliminate it completely

- I haven't seen the "resource" failure in a while. I don't know if these are
  two separate issues that just happened to start at the same time, or if
  they're related in some way.

From what little information I was able to find online, here are the things I
tweaked (which may or may not have contributed to the reduction in the rate of
occurrence):

- At that time, I had been setting a "barrier=6000" on the disk I was using
  for the builds. I removed that tweak.

- I edited /etc/systemd/logind.conf and set
	UserTaskMax=infinity

- I edited /etc/systemd/system.conf and set:
	DefaultTaskAccounting=no
	DefaultTaskMax=infinity

- I edited /etc/sysconfig/jenkins and added/set:
	JENKINS_JAVA_OPTIONS="-Djava.awt.headless=true -Xmx1g"

Since this build failure is so intermittent, it's quite hard to dig into it.
As I said above, of the last roughly 280 builds my system has done in the last
2 weeks, only 2 such failures occurred.

It's possible that overriding CONVERSION_CMD_gz in my builds to not use pigz
would probably fix the issue at the cost of losing the parallelism of the
sstate_create_package task.

My host machine's version of pigz is: 2.3.3

Best regards,
	Trevor


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: build failures due to pigz host tool
  2019-07-03 15:04 build failures due to pigz host tool Trevor Woerner
@ 2019-07-03 16:02 ` Mikko.Rapeli
  2019-07-03 16:10 ` Richard Purdie
  1 sibling, 0 replies; 8+ messages in thread
From: Mikko.Rapeli @ 2019-07-03 16:02 UTC (permalink / raw)
  To: twoerner; +Cc: openembedded-core

On Wed, Jul 03, 2019 at 11:04:06AM -0400, Trevor Woerner wrote:
> Hi,
> 
> This came up as a topic in yesterday's Engineering Sync meeting. For roughly a
> year I've been seeing random build failures on my Jenkins setup due to pigz
> failing; apparently the project is now seeing them on their builds, so I'll
> share what I know of them.
> 
> At the time I started seeing these failures (Aug 2018) I had just upgraded my
> system to openSUSE 15.0. Since nobody else was seeing them, I assumed they
> were related to my setup. When I went out searching for an answer, I found
> there wasn't very much out there to help me. But I did notice that there were
> reports of other people seeing the issue who weren't using openSUSE and who
> weren't doing anything related to OE builds using Jenkins.
> 
> The build failure looks something like this:
> 
> 	| DEBUG: Executing shell function sstate_create_package
> 	| pigz: abort: internal threads error
> 	| tar: /z/jenkins-workspace/nightly/cubietruck/build/sstate-cache/8a/sstate:linux-mainline:cubietruck-oe-linux-gnueabi:4.19.46:r0:cubietruck:3:8a159ba1ffefb5fc2feeeff5b40abf8ad67658e5ff3ed3bf67d25d9c8f2805e0_package.tgz.9bA8tCje: Wrote only 6144 of 10240 bytes
> 	| tar: Child returned status 16
> 	| tar: Error is not recoverable: exiting now
> 	| WARNING: /z/jenkins-workspace/nightly/cubietruck/build/tmp-glibc/work/cubietruck-oe-linux-gnueabi/linux-mainline/4.19.46-r0/temp/run.sstate_create_package.19996:1 exit 1 from 'exit 1'
> 	| DEBUG: Python function sstate_task_postfunc finished
> 	| ERROR: Function failed: sstate_create_package (log file is located at /z/jenkins-workspace/nightly/cubietruck/build/tmp-glibc/work/cubietruck-oe-linux-gnueabi/linux-mainline/4.19.46-r0/temp/log.do_package.19996)
> 	NOTE: recipe linux-mainline-4.19.46-r0: task do_package: Failed
> 	ERROR: Task (/opt/oe/configs/z/jenkins-workspace/nightly/cubietruck/layers/meta-sunxi/recipes-kernel/linux/linux-mainline_4.19.46.bb:do_package) failed with exit code '1'
> 
> Here's another example:
> 
> 	| DEBUG: Executing shell function sstate_create_package
> 	| pigz: abort: internal threads error
> 	| tar: /z/jenkins-workspace/nightly/odroid-xu4/build/sstate-cache/d4/sstate:sqlite3:cortexa15t2hf-neon-vfpv4-oe-linux-gnueabi:3.28.0:r0:cortexa15t2hf-neon-vfpv4:3:d4eb5692a1756a832d72fb2003a3d431108fbc736044747d33698ad7b6881dd9_package.tgz.herLUpYQ: Wrote only 2048 of 10240 bytes
> 	| tar: Child returned status 16
> 	| tar: Error is not recoverable: exiting now
> 	| WARNING: /z/jenkins-workspace/nightly/odroid-xu4/build/tmp-glibc/work/cortexa15t2hf-neon-vfpv4-oe-linux-gnueabi/sqlite3/3_3.28.0-r0/temp/run.sstate_create_package.24136:1 exit 1 from 'exit 1'
> 	| DEBUG: Python function sstate_task_postfunc finished
> 	| ERROR: Function failed: sstate_create_package (log file is located at /z/jenkins-workspace/nightly/odroid-xu4/build/tmp-glibc/work/cortexa15t2hf-neon-vfpv4-oe-linux-gnueabi/sqlite3/3_3.28.0-r0/temp/log.do_package.24136)
> 	NOTE: recipe sqlite3-3_3.28.0-r0: task do_package: Failed
> 	ERROR: Task (/opt/oe/configs/z/jenkins-workspace/nightly/odroid-xu4/layers/openembedded-core/meta/recipes-support/sqlite/sqlite3_3.28.0.bb:do_package) failed with exit code '1'
> 
> When I first started seeing this problem, I would see it quite frequently.
> Every morning, out of roughly 15 nightly builds, around 4-5 of them would have
> failed in this way. Back then I would also get a lot of errors that would
> report something along the lines of the following:
> 
> 	fork: Resource temporarily unavailable
> 	Cannot spawn thread (?)
> 
> I don't have an example of that error on hand, but I used to get a lot of
> those around the same time too.
> 
> My observations are:
> - I've never seen any of these errors with builds that I run by hand, oddly
>   enough, these errors only ever happen to builds that are run by Jenkins. I
>   have no idea if this is just a coincidence, or if there is something going
>   on related to kicking off a build from a large program (Jenkins)

This rings a bell. Jenkins spawns builds from java and the environment where
processes are spawned differs from normal console and login shells
considerably. Can't remember any details anymore, but environment variables
and various things like thread limits may be different in Java shells
compared to login ones. If pigz spawns too many threads and it happens on
too many bitbake tasks in parallel, at some point thread creation may fail
due to limits.

Hope this helps,

-Mikko

> - Back then these failures were quite frequent. Today, of the 20-ish or so
>   Jenkins builds that are kicked off every night, in a 2-week span I have only
>   2 such failures. So it seems that I've been able to reduce the occurrence
>   rate, but not eliminate it completely
> 
> - I haven't seen the "resource" failure in a while. I don't know if these are
>   two separate issues that just happened to start at the same time, or if
>   they're related in some way.
> 
> >From what little information I was able to find online, here are the things I
> tweaked (which may or may not have contributed to the reduction in the rate of
> occurrence):
> 
> - At that time, I had been setting a "barrier=6000" on the disk I was using
>   for the builds. I removed that tweak.
> 
> - I edited /etc/systemd/logind.conf and set
> 	UserTaskMax=infinity
> 
> - I edited /etc/systemd/system.conf and set:
> 	DefaultTaskAccounting=no
> 	DefaultTaskMax=infinity
> 
> - I edited /etc/sysconfig/jenkins and added/set:
> 	JENKINS_JAVA_OPTIONS="-Djava.awt.headless=true -Xmx1g"
> 
> Since this build failure is so intermittent, it's quite hard to dig into it.
> As I said above, of the last roughly 280 builds my system has done in the last
> 2 weeks, only 2 such failures occurred.
> 
> It's possible that overriding CONVERSION_CMD_gz in my builds to not use pigz
> would probably fix the issue at the cost of losing the parallelism of the
> sstate_create_package task.
> 
> My host machine's version of pigz is: 2.3.3
> 
> Best regards,
> 	Trevor
> -- 
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: build failures due to pigz host tool
  2019-07-03 15:04 build failures due to pigz host tool Trevor Woerner
  2019-07-03 16:02 ` Mikko.Rapeli
@ 2019-07-03 16:10 ` Richard Purdie
  2019-07-04 14:27   ` Trevor Woerner
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Purdie @ 2019-07-03 16:10 UTC (permalink / raw)
  To: Trevor Woerner, openembedded-core

On Wed, 2019-07-03 at 11:04 -0400, Trevor Woerner wrote:
> This came up as a topic in yesterday's Engineering Sync meeting. For roughly a
> year I've been seeing random build failures on my Jenkins setup due to pigz
> failing; apparently the project is now seeing them on their builds, so I'll
> share what I know of them.
> 
> At the time I started seeing these failures (Aug 2018) I had just upgraded my
> system to openSUSE 15.0. Since nobody else was seeing them, I assumed they
> were related to my setup. When I went out searching for an answer, I found
> there wasn't very much out there to help me. But I did notice that there were
> reports of other people seeing the issue who weren't using openSUSE and who
> weren't doing anything related to OE builds using Jenkins.
> 
> The build failure looks something like this:
> 
> 	| DEBUG: Executing shell function sstate_create_package
> 	| pigz: abort: internal threads error
> 	| tar: /z/jenkins-workspace/nightly/cubietruck/build/sstate-cache/8a/sstate:linux-mainline:cubietruck-oe-linux-gnueabi:4.19.46:r0:cubietruck:3:8a159ba1ffefb5fc2feeeff5b40abf8ad67658e5ff3ed3bf67d25d9c8f2805e0_package.tgz.9bA8tCje: Wrote only 6144 of 10240 bytes
> 	| tar: Child returned status 16
> 	| tar: Error is not recoverable: exiting now
> 	| WARNING: /z/jenkins-workspace/nightly/cubietruck/build/tmp-glibc/work/cubietruck-oe-linux-gnueabi/linux-mainline/4.19.46-r0/temp/run.sstate_create_package.19996:1 exit 1 from 'exit 1'
> 	| DEBUG: Python function sstate_task_postfunc finished
> 	| ERROR: Function failed: sstate_create_package (log file is located at /z/jenkins-workspace/nightly/cubietruck/build/tmp-glibc/work/cubietruck-oe-linux-gnueabi/linux-mainline/4.19.46-r0/temp/log.do_package.19996)
> 	NOTE: recipe linux-mainline-4.19.46-r0: task do_package: Failed
> 	ERROR: Task (/opt/oe/configs/z/jenkins-workspace/nightly/cubietruck/layers/meta-sunxi/recipes-kernel/linux/linux-mainline_4.19.46.bb:do_package) failed with exit code '1'

Thanks, I can at least confirm that is the same error we're seeing on
the autobuilders, in particular on opensuse151 but I think we've seen
it on others.

The exit code of 16 is probably significant, EBUSY or "device or
resource busy". I have a suspicion its coming from some pthread
function.

I suspect a bug in error handling of EBUSY in yarn (a sub piece of
pigz).

> When I first started seeing this problem, I would see it quite frequently.
> Every morning, out of roughly 15 nightly builds, around 4-5 of them would have
> failed in this way. Back then I would also get a lot of errors that would
> report something along the lines of the following:
> 
> 	fork: Resource temporarily unavailable
> 	Cannot spawn thread (?)

I think this is a different bug FWIW.

> I don't have an example of that error on hand, but I used to get a lot of
> those around the same time too.
> 
> My observations are:
> - I've never seen any of these errors with builds that I run by hand, oddly
>   enough, these errors only ever happen to builds that are run by Jenkins. I
>   have no idea if this is just a coincidence, or if there is something going
>   on related to kicking off a build from a large program (Jenkins)
> 
> - Back then these failures were quite frequent. Today, of the 20-ish or so
>   Jenkins builds that are kicked off every night, in a 2-week span I have only
>   2 such failures. So it seems that I've been able to reduce the occurrence
>   rate, but not eliminate it completely

That is about the same rate we're seeing on the autobuilder, its
occasional.

> - I haven't seen the "resource" failure in a while. I don't know if these are
>   two separate issues that just happened to start at the same time, or if
>   they're related in some way.

My guess is two issues exposed by your distro upgrade.

> From what little information I was able to find online, here are the things I
> tweaked (which may or may not have contributed to the reduction in the rate of
> occurrence):
> 
> - At that time, I had been setting a "barrier=6000" on the disk I was using
>   for the builds. I removed that tweak.
> 
> - I edited /etc/systemd/logind.conf and set
> 	UserTaskMax=infinity
> 
> - I edited /etc/systemd/system.conf and set:
> 	DefaultTaskAccounting=no
> 	DefaultTaskMax=infinity
> 
> - I edited /etc/sysconfig/jenkins and added/set:
> 	JENKINS_JAVA_OPTIONS="-Djava.awt.headless=true -Xmx1g"

These probably fixed the fork issue but not the other.

> Since this build failure is so intermittent, it's quite hard to dig into it.
> As I said above, of the last roughly 280 builds my system has done in the last
> 2 weeks, only 2 such failures occurred.
> 
> It's possible that overriding CONVERSION_CMD_gz in my builds to not use pigz
> would probably fix the issue at the cost of losing the parallelism of the
> sstate_create_package task.
> 
> My host machine's version of pigz is: 2.3.3

We're also only seeing this with 2.3.3 as far as I can see.

The debug patch in master-next hasn't triggered it, there is also a
possibility that its down to uninative though. With uninative applied,
we'd use a different pthread library which may have some bug fixed...

Thanks for the data points, it does help try and narrow things down.

Is there any way you could try pigz 2.4 on your machine, see if the
problem still occurred there? If you do rebuild pigz, could you apply
the debug patch in master-next?

I think I might have to try this on our opensuse151 worker too...

Cheers,

Richard



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: build failures due to pigz host tool
  2019-07-03 16:10 ` Richard Purdie
@ 2019-07-04 14:27   ` Trevor Woerner
  2019-07-04 15:57     ` Richard Purdie
  0 siblings, 1 reply; 8+ messages in thread
From: Trevor Woerner @ 2019-07-04 14:27 UTC (permalink / raw)
  To: openembedded-core

On Wed 2019-07-03 @ 05:10:14 PM, Richard Purdie wrote:
> Is there any way you could try pigz 2.4 on your machine, see if the
> problem still occurred there? If you do rebuild pigz, could you apply
> the debug patch in master-next?

I've updated the pigz in the $PATH jenkins sees to pigz-2.4.

However, my builds are always updated automatically and always build from
master; there's no way to insert ad-hoc patches into a build.

The job of my overnight builds is to keep tabs on master branches and ensure
they continue to build, not to test groups of patches in anticipation of
pushing new things into the repository. Therefore my Jenkins build flow has
been structured this specific way on purpose.

Since these failures only occur for me from builds kicked off by Jenkins, I
won't be able to easily help test patches unless I were to create a new
Jenkins workflow.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: build failures due to pigz host tool
  2019-07-04 14:27   ` Trevor Woerner
@ 2019-07-04 15:57     ` Richard Purdie
  2019-07-04 22:28       ` Richard Purdie
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Purdie @ 2019-07-04 15:57 UTC (permalink / raw)
  To: Trevor Woerner, openembedded-core

On Thu, 2019-07-04 at 10:27 -0400, Trevor Woerner wrote:
> On Wed 2019-07-03 @ 05:10:14 PM, Richard Purdie wrote:
> > Is there any way you could try pigz 2.4 on your machine, see if the
> > problem still occurred there? If you do rebuild pigz, could you
> > apply
> > the debug patch in master-next?
> 
> I've updated the pigz in the $PATH jenkins sees to pigz-2.4.
> 
> However, my builds are always updated automatically and always build
> from master; there's no way to insert ad-hoc patches into a build.

That is fine, even just the PATH adjustment will be interesting to see
the results with, thanks!

Was that version of pigz patched? FWIW I've patched the one on our
opensuse151 worker so we'll see if that helps shed any light on things.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: build failures due to pigz host tool
  2019-07-04 15:57     ` Richard Purdie
@ 2019-07-04 22:28       ` Richard Purdie
  2019-07-06  8:14         ` Richard Purdie
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Purdie @ 2019-07-04 22:28 UTC (permalink / raw)
  To: Trevor Woerner, openembedded-core

On Thu, 2019-07-04 at 16:57 +0100, Richard Purdie wrote:
> On Thu, 2019-07-04 at 10:27 -0400, Trevor Woerner wrote:
> > On Wed 2019-07-03 @ 05:10:14 PM, Richard Purdie wrote:
> > > Is there any way you could try pigz 2.4 on your machine, see if
> > > the
> > > problem still occurred there? If you do rebuild pigz, could you
> > > apply
> > > the debug patch in master-next?
> > 
> > I've updated the pigz in the $PATH jenkins sees to pigz-2.4.
> > 
> > However, my builds are always updated automatically and always
> > build
> > from master; there's no way to insert ad-hoc patches into a build.
> 
> That is fine, even just the PATH adjustment will be interesting to
> see
> the results with, thanks!
> 
> Was that version of pigz patched? FWIW I've patched the one on our
> opensuse151 worker so we'll see if that helps shed any light on
> things.

and it did:

http://errors.yoctoproject.org/Errors/Details/250609/

pigz: abort: internal threads error (16, 10)

which means from:

http://git.yoctoproject.org/cgit.cgi/poky/commit/?h=master-next&id=110ccb56b88af177c0153741a31a9d34b1f75abf

+     if ((ret = pthread_cond_destroy(&(bolt->cond))) ||
+         (ret = pthread_mutex_destroy(&(bolt->mutex))))
+-        fail(ret);
++        fail(ret, 10);
+     my_free(bolt);

so pthread_cond_destroy or pthread_mutex_destroy returned EBUSY.
Further digging required but at least we know which code path we're on.
I wonder if suse did something odd to libpthread...

Cheers,

Richard







^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: build failures due to pigz host tool
  2019-07-04 22:28       ` Richard Purdie
@ 2019-07-06  8:14         ` Richard Purdie
  2019-07-06 17:02           ` Richard Purdie
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Purdie @ 2019-07-06  8:14 UTC (permalink / raw)
  To: Trevor Woerner, openembedded-core

On Thu, 2019-07-04 at 23:28 +0100, Richard Purdie wrote:
> On Thu, 2019-07-04 at 16:57 +0100, Richard Purdie wrote:
> > On Thu, 2019-07-04 at 10:27 -0400, Trevor Woerner wrote:
> > > On Wed 2019-07-03 @ 05:10:14 PM, Richard Purdie wrote:
> > > > Is there any way you could try pigz 2.4 on your machine, see if
> > > > the
> > > > problem still occurred there? If you do rebuild pigz, could you
> > > > apply
> > > > the debug patch in master-next?
> > > 
> > > I've updated the pigz in the $PATH jenkins sees to pigz-2.4.
> > > 
> > > However, my builds are always updated automatically and always
> > > build
> > > from master; there's no way to insert ad-hoc patches into a
> > > build.
> > 
> > That is fine, even just the PATH adjustment will be interesting to
> > see
> > the results with, thanks!
> > 
> > Was that version of pigz patched? FWIW I've patched the one on our
> > opensuse151 worker so we'll see if that helps shed any light on
> > things.
> 
> and it did:
> 
> http://errors.yoctoproject.org/Errors/Details/250609/
> 
> pigz: abort: internal threads error (16, 10)
> 
> which means from:
> 
> http://git.yoctoproject.org/cgit.cgi/poky/commit/?h=master-next&id=110ccb56b88af177c0153741a31a9d34b1f75abf
> 
> +     if ((ret = pthread_cond_destroy(&(bolt->cond))) ||
> +         (ret = pthread_mutex_destroy(&(bolt->mutex))))
> +-        fail(ret);
> ++        fail(ret, 10);
> +     my_free(bolt);
> 
> so pthread_cond_destroy or pthread_mutex_destroy returned EBUSY.
> Further digging required but at least we know which code path we're
> on. I wonder if suse did something odd to libpthread...

Another failure with more debug added:

https://autobuilder.yoctoproject.org/typhoon/#/builders/45/builds/792
https://autobuilder.yoctoproject.org/typhoon/#/builders/45/builds/792/steps/7/logs/step1b

specifically:

| DEBUG: Executing shell function sstate_create_package
| pigz: abort: internal threads error (16, 720)
| tar: /srv/autobuilder/autobuilder.yoctoproject.org/pub/sstate/24/sstate:readline:core2-64-poky-linux-musl:8.0:r0:core2-64:3:241b1909d0f06ca48bda9ec41ccee543bb82fcc164b2b74cd4efd318b3ae0a33_package_write_ipk.tgz.KDjTwErb: Wrote only 8192 of 10240 bytes
| tar: Child returned status 16
| tar: Error is not recoverable: exiting now
| WARNING: /home/pokybuild/yocto-worker/musl-qemux86-64/build/build/tmp/work/core2-64-poky-linux-musl/readline/8.0-r0/temp/run.sstate_create_package.34356:1 exit 1 from 'exit 1'

which confirms that its aborting with errno 16, EBUSY and the 720 tells
me it came from pthread_mutex_destroy(&(bolt->mutex)) in free_lock(),
called from free_lock(job->calc).

I can't see anything obvious wrong with the way job->calc is being
handled.

I get the feeling this doesn't happen when uninative is in use which
suggests its something odd going on with libpthread on opensuse but its
hard to know that for sure.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: build failures due to pigz host tool
  2019-07-06  8:14         ` Richard Purdie
@ 2019-07-06 17:02           ` Richard Purdie
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Purdie @ 2019-07-06 17:02 UTC (permalink / raw)
  To: Trevor Woerner, openembedded-core

On Sat, 2019-07-06 at 09:14 +0100, Richard Purdie wrote:
> I can't see anything obvious wrong with the way job->calc is being
> handled.
> 
> I get the feeling this doesn't happen when uninative is in use which
> suggests its something odd going on with libpthread on opensuse but
> its hard to know that for sure.

https://github.com/madler/pigz/issues/68

Others are seeing this outside of OE and it seems specific to suse,
probably to its pthread version/patches.

Cheers,

Richard



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-07-06 17:02 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-03 15:04 build failures due to pigz host tool Trevor Woerner
2019-07-03 16:02 ` Mikko.Rapeli
2019-07-03 16:10 ` Richard Purdie
2019-07-04 14:27   ` Trevor Woerner
2019-07-04 15:57     ` Richard Purdie
2019-07-04 22:28       ` Richard Purdie
2019-07-06  8:14         ` Richard Purdie
2019-07-06 17:02           ` Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.