* Re: [PATCH v2] bitbake.conf: omit XZ threads and RAM from sstate signatures
2020-02-24 16:44 ` [PATCH v2] bitbake.conf: omit XZ threads and RAM from sstate signatures Richard Purdie
@ 2020-02-24 17:12 ` Adrian Bunk
2020-02-24 17:14 ` André Draszik
2020-02-26 15:26 ` xz threads / memlimit behaviour (was: Re: [PATCH v2] bitbake.conf: omit XZ threads and RAM from sstate signatures) André Draszik
2 siblings, 0 replies; 16+ messages in thread
From: Adrian Bunk @ 2020-02-24 17:12 UTC (permalink / raw)
To: Richard Purdie; +Cc: openembedded-core
On Mon, Feb 24, 2020 at 04:44:28PM +0000, Richard Purdie wrote:
> On Mon, 2020-02-24 at 15:40 +0200, Adrian Bunk wrote:
> > On Mon, Feb 24, 2020 at 12:59:55PM +0000, André Draszik wrote:
> > > The number of threads used, and the amount of memory allowed
> > > to be used, should not affect sstate signatures, as they
> > > don't affect the result.
> >
> > Unfortunately they can affect the result.
>
> I looked into this a bit and its complicated. The threads are used to
> compress chunks and their compression should be deterministic whether
> done serially or in parallel.
>
> I did some tests and:
>
> xz <file>
> gave equivalent output to:
> xz <file> --threads=1
>
> and
>
> xz <file> --threads=2
> xz <file> --threads=5
> xz <file> --threads=50
>
> all give different identical output.
>
> So if we force --threads >=2 we should have determinism?
This was also my guess after reading the manpage,
but no definite answer from me.
> > > Otherwise, it becomes impossible to re-use sstate from
> > > automated builders on developer's machines (as the former
> > > might execute bitbake with certain constraints different
> > > compared to developer's machines).
> > > ...
> > > -XZ_DEFAULTS ?= "--memlimit=50% --threads=${@oe.utils.cpu_count()}"
> > > ...
> >
> > Threaded compression can result in slightly worse compression
> > than single-threaded compression.
> >
> > With memlimit the problem is actually the opposite way,
> > and worse than what you were trying to fix:
> >
> > When a developer hits memlimit during compression, the documented
> > behavour of xz is to scale down the compression level.
> >
> > I assume 50% wrongly gives the same sstate signature no matter how
> > much RAM is installed on the local machine?
>
> I did some tests locally and I could see different output checksums
> depending on how much memory I gave xz.
>
> Perhaps we should specify a specific high amount like 1GB?
xz -9 needs 1.25 GB per thread.
And since xz decompression speed is linear to compressed size,
-9 is often wanted since it gives the fastest xz decompression.
> Does anyone know more about the internals and how to have this behave
> "nicely" for our needs?
>
> FWIW we haven't seen variation on the autobuilder due to this as far as
> I know.
I assume the autobuilders have plenty of RAM per core?
For any reasonably sizes machine that doesn't OOM on larger C++ projects
the memlimit is a nop and can be dropped.
More problematic might be developers with oldish desktops/laptops with
many cores and few RAM.
> Cheers,
>
> Richard
cu
Adrian
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] bitbake.conf: omit XZ threads and RAM from sstate signatures
2020-02-24 16:44 ` [PATCH v2] bitbake.conf: omit XZ threads and RAM from sstate signatures Richard Purdie
2020-02-24 17:12 ` Adrian Bunk
@ 2020-02-24 17:14 ` André Draszik
2020-02-24 17:32 ` Richard Purdie
2020-02-26 15:26 ` xz threads / memlimit behaviour (was: Re: [PATCH v2] bitbake.conf: omit XZ threads and RAM from sstate signatures) André Draszik
2 siblings, 1 reply; 16+ messages in thread
From: André Draszik @ 2020-02-24 17:14 UTC (permalink / raw)
To: Richard Purdie, Adrian Bunk; +Cc: openembedded-core
On Mon, 2020-02-24 at 16:44 +0000, Richard Purdie wrote:
> On Mon, 2020-02-24 at 15:40 +0200, Adrian Bunk wrote:
> > On Mon, Feb 24, 2020 at 12:59:55PM +0000, André Draszik wrote:
> > > The number of threads used, and the amount of memory allowed
> > > to be used, should not affect sstate signatures, as they
> > > don't affect the result.
> >
> > Unfortunately they can affect the result.
>
> I looked into this a bit and its complicated. The threads are used to
> compress chunks and their compression should be deterministic whether
> done serially or in parallel.
>
> I did some tests and:
>
> xz <file>
> gave equivalent output to:
> xz <file> --threads=1
>
> and
>
> xz <file> --threads=2
> xz <file> --threads=5
> xz <file> --threads=50
>
> all give different identical output.
>
> So if we force --threads >=2 we should have determinism?
How large were your files? Given threaded operation works by operating in
'blocks', you should have a different number of blocks in your output.
They're all compressed independently, so I don't see how the result could
be identical, unless the block size is large enough for xz to not create
as many blocks as you allowed via setting the upper limit on the # of
threads.
> > > Otherwise, it becomes impossible to re-use sstate from
> > > automated builders on developer's machines (as the former
> > > might execute bitbake with certain constraints different
> > > compared to developer's machines).
> > > ...
> > > -XZ_DEFAULTS ?= "--memlimit=50% --threads=${@oe.utils.cpu_count()}"
> > > ...
> >
> > Threaded compression can result in slightly worse compression
> > than single-threaded compression.
> >
> > With memlimit the problem is actually the opposite way,
> > and worse than what you were trying to fix:
> >
> > When a developer hits memlimit during compression, the documented
> > behavour of xz is to scale down the compression level.
> >
> > I assume 50% wrongly gives the same sstate signature no matter how
> > much RAM is installed on the local machine?
>
> I did some tests locally and I could see different output checksums
> depending on how much memory I gave xz.
>
> Perhaps we should specify a specific high amount like 1GB?
If I understand the man page right, at compression level 9 (default in OE),
each thread / block will use up to 674MiB of RAM, so a limit of 1GiB
effectively reduces parallelism to 1.
> Does anyone know more about the internals and how to have this behave
> "nicely" for our needs?
What are the needs specifically?
In my case, I have some defined builds that run on their own, so they
are allowed to use all memory and a certain (lowish) number of threads.
I also have other builds where several can run in parallel, and xz would
just be killed if all of them were allowed to use all RAM, so I have to
reduce threads and memory consumption.
This of course means slower builds...
I don't want to penalise myself or anybody else by generally forcing a
low number of threads either.
> FWIW we haven't seen variation on the autobuilder due to this as far as
> I know.
BTW, pigz and pbzip should have a similar thread related problem, according
to the man pages, if you read between the lines.
Files are split into chunks, and chunks are compressed individually in the
pbzip2 case.
pigz terminates each chunk with an empty block in the final .gz file.
Cheers,
Andre'
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] bitbake.conf: omit XZ threads and RAM from sstate signatures
2020-02-24 17:14 ` André Draszik
@ 2020-02-24 17:32 ` Richard Purdie
2020-02-24 22:00 ` Adrian Bunk
0 siblings, 1 reply; 16+ messages in thread
From: Richard Purdie @ 2020-02-24 17:32 UTC (permalink / raw)
To: André Draszik, Adrian Bunk; +Cc: openembedded-core
On Mon, 2020-02-24 at 17:14 +0000, André Draszik wrote:
> On Mon, 2020-02-24 at 16:44 +0000, Richard Purdie wrote:
> > On Mon, 2020-02-24 at 15:40 +0200, Adrian Bunk wrote:
> > > On Mon, Feb 24, 2020 at 12:59:55PM +0000, André Draszik wrote:
> > > > The number of threads used, and the amount of memory allowed
> > > > to be used, should not affect sstate signatures, as they
> > > > don't affect the result.
> > >
> > > Unfortunately they can affect the result.
> >
> > I looked into this a bit and its complicated. The threads are used
> > to
> > compress chunks and their compression should be deterministic
> > whether
> > done serially or in parallel.
> >
> > I did some tests and:
> >
> > xz <file>
> > gave equivalent output to:
> > xz <file> --threads=1
> >
> > and
> >
> > xz <file> --threads=2
> > xz <file> --threads=5
> > xz <file> --threads=50
> >
> > all give different identical output.
> >
> > So if we force --threads >=2 we should have determinism?
>
> How large were your files?
105MB total which should be enough as the largest xz block size is
32MB?
> Given threaded operation works by operating in 'blocks', you should
> have a different number of blocks in your output.
> They're all compressed independently, so I don't see how the result
> could be identical, unless the block size is large enough for xz to
> not create as many blocks as you allowed via setting the upper limit
> on the # of threads.
I don't follow. The algorithm xz uses appears to be designed to give
consistent results regardless of numbers of threads. It gets split into
the same number of chunks. Each chunk is compressed independently, but
deterministically and then they're assembled in order. It doesn't
matter if the chunk is compressed in parallel or serially.
> > > > Otherwise, it becomes impossible to re-use sstate from
> > > > automated builders on developer's machines (as the former
> > > > might execute bitbake with certain constraints different
> > > > compared to developer's machines).
> > > > ...
> > > > -XZ_DEFAULTS ?= "--memlimit=50% --threads=${@oe.utils.cpu_count
> > > > ()}"
> > > > > Files are split into chunks, and chunks are compressed
> > > > individually
> > > > > in the
> > > > > pbzip2 case.
> > > > > pigz terminates each chunk with an empty block in the final
> > > > .gz file.
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Andre'
> > > > >
> > > > >
> > > > ...
> > >
> > > Threaded compression can result in slightly worse compression
> > > than single-threaded compression.
> > >
> > > With memlimit the problem is actually the opposite way,
> > > and worse than what you were trying to fix:
> > >
> > > When a developer hits memlimit during compression, the documented
> > > behavour of xz is to scale down the compression level.
> > >
> > > I assume 50% wrongly gives the same sstate signature no matter
> > > how
> > > much RAM is installed on the local machine?
> >
> > I did some tests locally and I could see different output checksums
> > depending on how much memory I gave xz.
> >
> > Perhaps we should specify a specific high amount like 1GB?
>
> If I understand the man page right, at compression level 9 (default
> in OE),
> each thread / block will use up to 674MiB of RAM, so a limit of 1GiB
> effectively reduces parallelism to 1.
Right, I agree that isn't feasible.
> > Does anyone know more about the internals and how to have this
> > behave "nicely" for our needs?
>
> What are the needs specifically?
The project is aiming to have deterministic builds, our output for
given input metadata is binary identical.
> In my case, I have some defined builds that run on their own, so they
> are allowed to use all memory and a certain (lowish) number of
> threads.
>
> I also have other builds where several can run in parallel, and xz
> would just be killed if all of them were allowed to use all RAM, so I
> have to reduce threads and memory consumption.
>
> This of course means slower builds...
>
> I don't want to penalise myself or anybody else by generally forcing
> a low number of threads either.
We didn't have resource issues on the autobuilder with xz but others
did report it and its why the memlimit was set. It would be better to
remove the memlimit for determinism really as far as I can tell and if
necessary throttle the threads count.
Maybe setting an upper limit of say 10 threads and a minimum of 2 might
given us what we need. Perhaps we just allow the number of xz threads
to be set independently?
> > FWIW we haven't seen variation on the autobuilder due to this as
> > far as
> > I know.
>
> BTW, pigz and pbzip should have a similar thread related problem,
> according to the man pages, if you read between the lines.
Agreed. We have less of an issue with these as their use is much less
frequent. xz is used by deb and ipk for packaging.
Cheers,
Richard
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] bitbake.conf: omit XZ threads and RAM from sstate signatures
2020-02-24 17:32 ` Richard Purdie
@ 2020-02-24 22:00 ` Adrian Bunk
2020-02-25 9:16 ` André Draszik
0 siblings, 1 reply; 16+ messages in thread
From: Adrian Bunk @ 2020-02-24 22:00 UTC (permalink / raw)
To: Richard Purdie; +Cc: openembedded-core
On Mon, Feb 24, 2020 at 05:32:29PM +0000, Richard Purdie wrote:
> On Mon, 2020-02-24 at 17:14 +0000, André Draszik wrote:
>...
> > I don't want to penalise myself or anybody else by generally forcing
> > a low number of threads either.
>
> We didn't have resource issues on the autobuilder with xz but others
> did report it and its why the memlimit was set. It would be better to
> remove the memlimit for determinism really as far as I can tell and if
> necessary throttle the threads count.
>
> Maybe setting an upper limit of say 10 threads and a minimum of 2 might
> given us what we need.
On a Threadripper with 128 cores and 256 GB RAM it would not be a
problem to use all cores.
A laptop with 8 cores and 8 GB RAM is problematic.
> Perhaps we just allow the number of xz threads
> to be set independently?
dpkg manually reduces the number of threads until less than half
of the RAM is used:
https://sources.debian.org/src/dpkg/1.19.7/lib/dpkg/compress.c/#L566-L574
In a script it would be possible to use --no-adjust to achieve the same:
$ xz -9 --memlimit=50% --no-adjust -T32 /dev/null
xz: Memory usage limit is too low for the given filter setup.
xz: 39,972 MiB of memory is required. The limit is 32,051 MiB.
$
> > > FWIW we haven't seen variation on the autobuilder due to this as
> > > far as
> > > I know.
> >
> > BTW, pigz and pbzip should have a similar thread related problem,
> > according to the man pages, if you read between the lines.
>
> Agreed. We have less of an issue with these as their use is much less
> frequent. xz is used by deb and ipk for packaging.
Good point, dpkg does parallel xz compression and I am not aware of any
reproducibility problems this causes.
> Cheers,
>
> Richard
cu
Adrian
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] bitbake.conf: omit XZ threads and RAM from sstate signatures
2020-02-24 22:00 ` Adrian Bunk
@ 2020-02-25 9:16 ` André Draszik
2020-02-25 9:54 ` Adrian Bunk
0 siblings, 1 reply; 16+ messages in thread
From: André Draszik @ 2020-02-25 9:16 UTC (permalink / raw)
To: Adrian Bunk, Richard Purdie; +Cc: openembedded-core
On Tue, 2020-02-25 at 00:00 +0200, Adrian Bunk wrote:
> On Mon, Feb 24, 2020 at 05:32:29PM +0000, Richard Purdie wrote:
> > On Mon, 2020-02-24 at 17:14 +0000, André Draszik wrote:
> > ...
> > > I don't want to penalise myself or anybody else by generally forcing
> > > a low number of threads either.
> >
> > We didn't have resource issues on the autobuilder with xz but others
> > did report it and its why the memlimit was set. It would be better to
> > remove the memlimit for determinism really as far as I can tell and if
> > necessary throttle the threads count.
> >
> > Maybe setting an upper limit of say 10 threads and a minimum of 2 might
> > given us what we need.
>
> On a Threadripper with 128 cores and 256 GB RAM it would not be a
> problem to use all cores.
>
> A laptop with 8 cores and 8 GB RAM is problematic.
>
> > Perhaps we just allow the number of xz threads
> > to be set independently?
>
> dpkg manually reduces the number of threads until less than half
> of the RAM is used:
> https://sources.debian.org/src/dpkg/1.19.7/lib/dpkg/compress.c/#L566-L574
>
> In a script it would be possible to use --no-adjust to achieve the same:
> $ xz -9 --memlimit=50% --no-adjust -T32 /dev/null
> xz: Memory usage limit is too low for the given filter setup.
> xz: 39,972 MiB of memory is required. The limit is 32,051 MiB.
> $
>
The problem with --no-adjust is that it also prevents xz from reducing the
number of threads. It will do that in preference to changing compression
parameters, so as long as you have more than 2499MiB of memory to support at
least 2 threads (with default block size), compression will be identical.
> > > > FWIW we haven't seen variation on the autobuilder due to this as
> > > > far as
> > > > I know.
> > >
> > > BTW, pigz and pbzip should have a similar thread related problem,
> > > according to the man pages, if you read between the lines.
> >
> > Agreed. We have less of an issue with these as their use is much less
> > frequent. xz is used by deb and ipk for packaging.
They both actually don't. It's reproducible no matter how many threads.
Interesting...
>
> Good point, dpkg does parallel xz compression and I am not aware of any
> reproducibility problems this causes.
Debian's dpkg tries to be clever by simply reducing the number of threads until
compression uses at most half of physical
ram.
I guess this typically works out as at least two threads in most if not all
environments.
Cheers,
Andre'
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2] bitbake.conf: omit XZ threads and RAM from sstate signatures
2020-02-25 9:16 ` André Draszik
@ 2020-02-25 9:54 ` Adrian Bunk
0 siblings, 0 replies; 16+ messages in thread
From: Adrian Bunk @ 2020-02-25 9:54 UTC (permalink / raw)
To: André Draszik; +Cc: openembedded-core
On Tue, Feb 25, 2020 at 09:16:20AM +0000, André Draszik wrote:
> On Tue, 2020-02-25 at 00:00 +0200, Adrian Bunk wrote:
>...
> > > Perhaps we just allow the number of xz threads
> > > to be set independently?
> >
> > dpkg manually reduces the number of threads until less than half
> > of the RAM is used:
> > https://sources.debian.org/src/dpkg/1.19.7/lib/dpkg/compress.c/#L566-L574
> >
> > In a script it would be possible to use --no-adjust to achieve the same:
> > $ xz -9 --memlimit=50% --no-adjust -T32 /dev/null
> > xz: Memory usage limit is too low for the given filter setup.
> > xz: 39,972 MiB of memory is required. The limit is 32,051 MiB.
> > $
>
> The problem with --no-adjust is that it also prevents xz from reducing the
> number of threads. It will do that in preference to changing compression
> parameters, so as long as you have more than 2499MiB of memory to support at
> least 2 threads (with default block size), compression will be identical.
>...
5 GB due to the 50%.
But I had something different in mind:
Similar to what dpkg is doing, you could once loop from
@oe.utils.cpu_count() down to 2 until "--memlimit=50% --no-adjust -T$(i)"
succeeds.
Then run the actual compression without --memlimit.
I would not consider anything with less than 4 GB RAM reasonable for
building Yocto (g++ also likes to use more than 2 GB).
> Cheers,
> Andre'
cu
Adrian
^ permalink raw reply [flat|nested] 16+ messages in thread
* xz threads / memlimit behaviour (was: Re: [PATCH v2] bitbake.conf: omit XZ threads and RAM from sstate signatures)
2020-02-24 16:44 ` [PATCH v2] bitbake.conf: omit XZ threads and RAM from sstate signatures Richard Purdie
2020-02-24 17:12 ` Adrian Bunk
2020-02-24 17:14 ` André Draszik
@ 2020-02-26 15:26 ` André Draszik
2 siblings, 0 replies; 16+ messages in thread
From: André Draszik @ 2020-02-26 15:26 UTC (permalink / raw)
To: Richard Purdie, Adrian Bunk; +Cc: openembedded-core
On Mon, 2020-02-24 at 16:44 +0000, Richard Purdie wrote:
> On Mon, 2020-02-24 at 15:40 +0200, Adrian Bunk wrote:
> > On Mon, Feb 24, 2020 at 12:59:55PM +0000, André Draszik wrote:
> > > The number of threads used, and the amount of memory allowed
> > > to be used, should not affect sstate signatures, as they
> > > don't affect the result.
> >
> > Unfortunately they can affect the result.
>
> I looked into this a bit and its complicated. The threads are used to
> compress chunks and their compression should be deterministic whether
> done serially or in parallel.
>
> I did some tests and:
>
> xz <file>
> gave equivalent output to:
> xz <file> --threads=1
>
> and
>
> xz <file> --threads=2
> xz <file> --threads=5
> xz <file> --threads=50
>
> all give different identical output.
>
> So if we force --threads >=2 we should have determinism?
I did another test...
Even single threaded compression gives the same result as multi-threaded,
if single threaded is a result of xz scaling down the memory usage due
to --memlimit, ie:
xz -f -c -9 --threads=2 --memlimit=2000M --keep --verbose --verbose xztest > xztest2_down_to_1.xz
xz: Filter chain: --lzma2=dict=64MiB,lc=3,lp=0,pb=2,mode=normal,nice=64,mf=bt4,depth=0
xz: Using up to 2 threads.
xz: 2499 MiB of memory is required. The limit is 2000 MiB.
xz: Decompression will need 65 MiB of memory.
xz: Adjusted the number of threads from 2 to 1 to not exceed the memory usage limit of 2000 MiB
is different from
xz -f -c -9 --threads=1 --memlimit=2000M --verbose --verbose xztest > xztest1.xz
xz: Filter chain: --lzma2=dict=64MiB,lc=3,lp=0,pb=2,mode=normal,nice=64,mf=bt4,depth=0
xz: 674 MiB of memory is required. The limit is 2000 MiB.
xz: Decompression will need 65 MiB of memory.
There even is a comment in the xz sources about this. This simplifies things a bit...
> > > Otherwise, it becomes impossible to re-use sstate from
> > > automated builders on developer's machines (as the former
> > > might execute bitbake with certain constraints different
> > > compared to developer's machines).
> > > ...
> > > -XZ_DEFAULTS ?= "--memlimit=50% --threads=${@oe.utils.cpu_count()}"
> > > ...
> >
> > Threaded compression can result in slightly worse compression
> > than single-threaded compression.
> >
> > With memlimit the problem is actually the opposite way,
> > and worse than what you were trying to fix:
> >
> > When a developer hits memlimit during compression, the documented
> > behavour of xz is to scale down the compression level.
This is not what is actually happening - it appears the documentation only
talks about single-threaded mode.
If you look at the code and look at command output, xz only ever
tries to reduce the number of threads if multi-threading was enabled. In
that case, it will *never* even attempt to change the compression level.
If it can't satisfy the memory limit by reducing the number of threads,
it just errors out.
So as long as all builders have at least 2 cores, i.e. --threads >= 2, memlimit
doesn't ever affect the outcome (except that it can make the operation fail as
a whole).
Cheers,
Andre'
^ permalink raw reply [flat|nested] 16+ messages in thread