All of lore.kernel.org
 help / color / mirror / Atom feed
* Build failure with parallel build and opkg
@ 2018-09-11 22:49 Stefan Agner
  2018-09-26  9:34 ` Stefan Agner
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Agner @ 2018-09-11 22:49 UTC (permalink / raw)
  To: openembedded-core; +Cc: Brandon Shibley, samuel.bissig, ricardo

Hi,

We experience build errors as follows every now and then:

...
ERROR: full-container-image-0.1-r0 do_populate_sdk: Unable to install
packages. Command
'/workdir/oe/tmp/work/colibri_imx7-lmp-linux-gnueabi/full-container-image/0.1-r0/recipe-sysroot-native/usr/bin/opkg
--volatile-cache -f
/workdir/oe/tmp/work/colibri_imx7-lmp-linux-gnueabi/full-container-image/0.1-r0/opkg.conf
-t
/workdir/oe/tmp/work/colibri_imx7-lmp-linux-gnueabi/full-container-image/0.1-r0/temp/ipktemp/
-o
/workdir/oe/tmp/work/colibri_imx7-lmp-linux-gnueabi/full-container-image/0.1-r0/sdk/image/usr/local/tordy-x86_64/sysroots/armv7at2hf-neon-lmp-linux-gnueabi
 --force_postinstall --prefer-arch-to-version   install 96boards-tools
aktualizr aktualizr-host-tools aktualizr-runtime-prov base-passwd
coreutils cpufrequtils docker gptfdisk haveged hostapd htop iptables
kernel-modules ldd less lmp-device-register networkmanager
networkmanager-nmtui openssh-sftp-server os-release ostree
packagegroup-base-extended packagegroup-core-boot
packagegroup-core-full-cmdline-extended
packagegroup-core-full-cmdline-multiuser
packagegroup-core-full-cmdline-utils packagegroup-core-ssh-openssh
packagegroup-core-standalone-sdk-target pciutils python3-compression
python3-distutils python3-docker python3-docker-compose python3-json
python3-netclient python3-pkgutil python3-shell python3-unixadmin rsync
run-postinsts shadow sshfs-fuse strace sudo target-sdk-provides-dummy
tcpdump vim-tiny' returned 255:
...
Downloading
file:/workdir/oe/tmp/deploy/ipk/armv7at2hf-neon/nss_3.38-r0_armv7at2hf-neon.ipk.
Removing corrupt package file
/workdir/oe/tmp/work/colibri_imx7-lmp-linux-gnueabi/full-container-image/0.1-r0/sdk/image/usr/local/tordy-x86_64/sysroots/armv7at2hf-neon-lmp-linux-gnueabi//var/cache/opkg/volatile/8e392ecd3611e24a6a49a8b22ad6e1ff_nss_3.38-r0_armv7at2hf-neon.ipk.
...
Installing pam-plugin-faildelay (1.3.0) on root
Downloading
file:/workdir/oe/tmp/deploy/ipk/armv7at2hf-neon/pam-plugin-faildelay_1.3.0-r5_armv7at2hf-neon.ipk.
Removing corrupt package file
/workdir/oe/tmp/work/colibri_imx7-lmp-linux-gnueabi/full-container-image/0.1-r0/sdk/image/usr/local/tordy-x86_64/sysroots/armv7at2hf-neon-lmp-linux-gnueabi//var/cache/opkg/volatile/0df6a8bc594a581f6ca3bcfa55e860e2_pam-plugin-faildelay_1.3.0-r5_armv7at2hf-neon.ipk.
...
Collected errors:
 * opkg_install_pkg: Failed to download nss. Perhaps you need to run
'opkg update'?
 * opkg_install_pkg: Failed to download pam-plugin-faildelay. Perhaps
you need to run 'opkg update'?
.
...

We build our own OpenEmbedded core based distribution currently based on
a recent master state. But we have seen this on and off back since
rocko.

We build the image using Jenkins with multiple builders running in
parallel and sharing sstate. I think the fact that we run similar images
in parallel is the culprit: Looking closer at the failed build directory
reveals that the tmp-glibc/deploy/ipk/armv7at2hf-neon/Packages has a
different MD5Sum than the actual package. We start with two builders
simultaneously building an image, and it seems that they build the same
package around the same time. I assume that the two builders somehow
have a race between when the package get assembled and when the Package
index gets built...

We start with a clean sstate, and this typically only happens for the
very first builds, when the sstate is cold.

I guess there is some race/asynchronous operation going on around
building index/getting package from sstate/pushing package to sstate.

It seems an issue others have seen in the past too:
https://www.yoctoproject.org/irc/%23yocto.2018-07-05.log.html#t2018-07-05T10:07:25

Any idea?

--
Stefan




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Build failure with parallel build and opkg
  2018-09-11 22:49 Build failure with parallel build and opkg Stefan Agner
@ 2018-09-26  9:34 ` Stefan Agner
  2018-10-02  8:46   ` Stefan Agner
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Agner @ 2018-09-26  9:34 UTC (permalink / raw)
  To: openembedded-core, richard.purdie; +Cc: Brandon Shibley, samuel.bissig, ricardo

Hi,

On 12.09.2018 00:49, Stefan Agner wrote:
> Hi,
> 
> We experience build errors as follows every now and then:
> 
> ...
> ERROR: full-container-image-0.1-r0 do_populate_sdk: Unable to install
> packages. Command
> '/workdir/oe/tmp/work/colibri_imx7-lmp-linux-gnueabi/full-container-image/0.1-r0/recipe-sysroot-native/usr/bin/opkg
> --volatile-cache -f
> /workdir/oe/tmp/work/colibri_imx7-lmp-linux-gnueabi/full-container-image/0.1-r0/opkg.conf
> -t
> /workdir/oe/tmp/work/colibri_imx7-lmp-linux-gnueabi/full-container-image/0.1-r0/temp/ipktemp/
> -o
> /workdir/oe/tmp/work/colibri_imx7-lmp-linux-gnueabi/full-container-image/0.1-r0/sdk/image/usr/local/tordy-x86_64/sysroots/armv7at2hf-neon-lmp-linux-gnueabi
>  --force_postinstall --prefer-arch-to-version   install 96boards-tools
> aktualizr aktualizr-host-tools aktualizr-runtime-prov base-passwd
> coreutils cpufrequtils docker gptfdisk haveged hostapd htop iptables
> kernel-modules ldd less lmp-device-register networkmanager
> networkmanager-nmtui openssh-sftp-server os-release ostree
> packagegroup-base-extended packagegroup-core-boot
> packagegroup-core-full-cmdline-extended
> packagegroup-core-full-cmdline-multiuser
> packagegroup-core-full-cmdline-utils packagegroup-core-ssh-openssh
> packagegroup-core-standalone-sdk-target pciutils python3-compression
> python3-distutils python3-docker python3-docker-compose python3-json
> python3-netclient python3-pkgutil python3-shell python3-unixadmin rsync
> run-postinsts shadow sshfs-fuse strace sudo target-sdk-provides-dummy
> tcpdump vim-tiny' returned 255:
> ...
> Downloading
> file:/workdir/oe/tmp/deploy/ipk/armv7at2hf-neon/nss_3.38-r0_armv7at2hf-neon.ipk.
> Removing corrupt package file
> /workdir/oe/tmp/work/colibri_imx7-lmp-linux-gnueabi/full-container-image/0.1-r0/sdk/image/usr/local/tordy-x86_64/sysroots/armv7at2hf-neon-lmp-linux-gnueabi//var/cache/opkg/volatile/8e392ecd3611e24a6a49a8b22ad6e1ff_nss_3.38-r0_armv7at2hf-neon.ipk.
> ...
> Installing pam-plugin-faildelay (1.3.0) on root
> Downloading
> file:/workdir/oe/tmp/deploy/ipk/armv7at2hf-neon/pam-plugin-faildelay_1.3.0-r5_armv7at2hf-neon.ipk.
> Removing corrupt package file
> /workdir/oe/tmp/work/colibri_imx7-lmp-linux-gnueabi/full-container-image/0.1-r0/sdk/image/usr/local/tordy-x86_64/sysroots/armv7at2hf-neon-lmp-linux-gnueabi//var/cache/opkg/volatile/0df6a8bc594a581f6ca3bcfa55e860e2_pam-plugin-faildelay_1.3.0-r5_armv7at2hf-neon.ipk.
> ...
> Collected errors:
>  * opkg_install_pkg: Failed to download nss. Perhaps you need to run
> 'opkg update'?
>  * opkg_install_pkg: Failed to download pam-plugin-faildelay. Perhaps
> you need to run 'opkg update'?
> .
> ...
> 
> We build our own OpenEmbedded core based distribution currently based on
> a recent master state. But we have seen this on and off back since
> rocko.
> 
> We build the image using Jenkins with multiple builders running in
> parallel and sharing sstate. I think the fact that we run similar images
> in parallel is the culprit: Looking closer at the failed build directory
> reveals that the tmp-glibc/deploy/ipk/armv7at2hf-neon/Packages has a
> different MD5Sum than the actual package. We start with two builders
> simultaneously building an image, and it seems that they build the same
> package around the same time. I assume that the two builders somehow
> have a race between when the package get assembled and when the Package
> index gets built...
> 
> We start with a clean sstate, and this typically only happens for the
> very first builds, when the sstate is cold.

We discussed the issue at Linaro Connect a bit.

To recap, we do build in two steps:

1. bitbake full-container-image
2. bitbake -c populate_sdk full-container-image

The issue always happens in the second step.

We also see that in the second step, the do_package_write_ipk_setscene
task for every recipe is executed.

The current assumption is

I tried to reproduce by building a recipe using openembedded-core master
only in two build directories with shared sstate manually:

1. build1 $ bitbake eudev
2. build2 $ bitbake -c cleansstate eudev
3. build2 $ bitbake eudev
4. build1 $ bitbake core-image-minimal

This sequence seems not to have triggered a
do_package_write_ipk_setscene for eudev.

I then tried
5. build1 $ bitbake -c populate_sdk core-image-minimal

Which did trigger a do_package_write_ipk_setscene. However, the issue
did not appear...

I even tried to rebuild and replace the file manually, and run bitbake
-c populate_sdk -f core-image-minimal, but it just seems not to appear.

Last time I have seen it was with oe-core
f6634581fa0a81c4d68dc9179a755ad7b9d99357, I will revert to this version
again to see whether that helps reproducing the issue.

--
Stefan


> 
> I guess there is some race/asynchronous operation going on around
> building index/getting package from sstate/pushing package to sstate.
> 
> It seems an issue others have seen in the past too:
> https://www.yoctoproject.org/irc/%23yocto.2018-07-05.log.html#t2018-07-05T10:07:25
> 
> Any idea?
> 
> --
> Stefan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Build failure with parallel build and opkg
  2018-09-26  9:34 ` Stefan Agner
@ 2018-10-02  8:46   ` Stefan Agner
  2018-10-02 13:12     ` Stefan Agner
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Agner @ 2018-10-02  8:46 UTC (permalink / raw)
  To: openembedded-core, richard.purdie, Max Krummenacher
  Cc: Brandon Shibley, samuel.bissig, ricardo

On 26.09.2018 11:34, Stefan Agner wrote:
> Hi,
> 
> On 12.09.2018 00:49, Stefan Agner wrote:
<snip>
> We discussed the issue at Linaro Connect a bit.
> 
> To recap, we do build in two steps:
> 
> 1. bitbake full-container-image
> 2. bitbake -c populate_sdk full-container-image
> 
> The issue always happens in the second step.
> 
> We also see that in the second step, the do_package_write_ipk_setscene
> task for every recipe is executed.
> 
> The current assumption is
> 
> I tried to reproduce by building a recipe using openembedded-core master
> only in two build directories with shared sstate manually:
> 
> 1. build1 $ bitbake eudev
> 2. build2 $ bitbake -c cleansstate eudev
> 3. build2 $ bitbake eudev
> 4. build1 $ bitbake core-image-minimal
> 
> This sequence seems not to have triggered a
> do_package_write_ipk_setscene for eudev.
> 
> I then tried
> 5. build1 $ bitbake -c populate_sdk core-image-minimal
> 
> Which did trigger a do_package_write_ipk_setscene. However, the issue
> did not appear...
> 
> I even tried to rebuild and replace the file manually, and run bitbake
> -c populate_sdk -f core-image-minimal, but it just seems not to appear.
> 
> Last time I have seen it was with oe-core
> f6634581fa0a81c4d68dc9179a755ad7b9d99357, I will revert to this version
> again to see whether that helps reproducing the issue.

Using the older OE version did not make a difference.


So Max and I discussed a bit further. We realized that when OE rebuilds,
the opkg package index is refreshed for a package only if the mtime (in
seconds) is different between the previous file and the file on disk
currently (see opkg-make-index). Can it be that two simultaneously
started builds create two ipk with same mtime?

So I built two core-image-minimal builds (on the same machine) at the
*very* same time. First try did not cause a collision, but already on
the second try I managed to reproduce multiple collision. This output
shows all the packages with the same unix mtime stamp according to the
Packages.stamps file (generated by opkg-make-index):

$ comm -1 -2 <(sort build1/tmp-glibc/deploy/ipk/i586/Packages.stamps) <(sort build2/tmp-glibc/deploy/ipk/i586/Packages.stamps)
kages.stamps)                                                                                                                               
1538424250 glibc-binary-localedata-de-it_2.27-r0_i586.ipk                                                                                   
1538425082 libreadline-staticdev_7.0-r0_i586.ipk                                                                                            
1538425162 bash-bashbug_4.4.18-r0_i586.ipk                                                                                                  
1538425162 bash-completion_2.8-r0_i586.ipk                                                                                                  
1538425162 bash-completion-dbg_2.8-r0_i586.ipk                                                                                              
1538425162 bash-completion-dev_2.8-r0_i586.ipk                                                                                              
1538425162 bash-loadable_4.4.18-r0_i586.ipk                                                                                                 
1538425163 bash_4.4.18-r0_i586.ipk                                                                                                          
1538425163 bash-completion-extra_2.8-r0_i586.ipk
1538425164 bash-doc_4.4.18-r0_i586.ipk
1538425167 bash-dbg_4.4.18-r0_i586.ipk
1538425233 util-linux-locale-hr_2.32-r0_i586.ipk
1538425233 util-linux-locale-id_2.32-r0_i586.ipk
1538425233 util-linux-locale-it_2.32-r0_i586.ipk
1538425233 util-linux-locale-sl_2.32-r0_i586.ipk
1538425233 util-linux-locale-zh-tw_2.32-r0_i586.ipk
1538425234 util-linux-dev_2.32-r0_i586.ipk
1538425234 util-linux-findfs_2.32-r0_i586.ipk
1538425234 util-linux-ionice_2.32-r0_i586.ipk
1538425234 util-linux-locale-ca_2.32-r0_i586.ipk
1538425234 util-linux-locale-pl_2.32-r0_i586.ipk
1538425234 util-linux-locale-pt-br_2.32-r0_i586.ipk
1538425234 util-linux-locale-ru_2.32-r0_i586.ipk
1538425234 util-linux-locale-sv_2.32-r0_i586.ipk
1538425234 util-linux-locale-tr_2.32-r0_i586.ipk
1538425234 util-linux-locale-vi_2.32-r0_i586.ipk
1538425234 util-linux-locale-zh-cn_2.32-r0_i586.ipk
1538425234 util-linux-mountpoint_2.32-r0_i586.ipk
1538425235 util-linux-locale-fr_2.32-r0_i586.ipk
1538425236 util-linux-blkid_2.32-r0_i586.ipk
1538425236 util-linux-fsck_2.32-r0_i586.ipk
1538425236 util-linux-fsck.cramfs_2.32-r0_i586.ipk
1538425236 util-linux-fstrim_2.32-r0_i586.ipk
1538425236 util-linux-lscpu_2.32-r0_i586.ipk
1538425236 util-linux-mkfs_2.32-r0_i586.ipk
1538425236 util-linux-mount_2.32-r0_i586.ipk
1538425236 util-linux-partx_2.32-r0_i586.ipk
1538425236 util-linux-readprofile_2.32-r0_i586.ipk
1538425236 util-linux-umount_2.32-r0_i586.ipk
1538425237 libblkid1_2.32-r0_i586.ipk
1538425237 libfdisk1_2.32-r0_i586.ipk
1538425237 libmount1_2.32-r0_i586.ipk
1538425238 util-linux_2.32-r0_i586.ipk
1538425240 util-linux-staticdev_2.32-r0_i586.ipk

Indeed, the two ipk files have the same mtime (at least, on second level):

$ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
  File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
  Size: 136612          Blocks: 272        IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 2627769381  Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
Context: unconfined_u:object_r:home_root_t:s0
Access: 2018-10-01 22:28:37.464716076 +0200
Modify: 2018-10-01 22:20:37.930653796 +0200
Change: 2018-10-01 22:20:50.998205339 +0200
 Birth: -
  File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
  Size: 136692          Blocks: 272        IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 2608876057  Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
Context: unconfined_u:object_r:home_root_t:s0
Access: 2018-10-01 22:28:47.145119532 +0200
Modify: 2018-10-01 22:20:37.358629653 +0200
Change: 2018-10-01 22:20:50.727193901 +0200
 Birth: -

But different md5sums!

$ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
bb8ffd4443e84fede943ccc3292dc657  ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
91608ae2b109e3f4af0379c634fad9c0  ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk

In that case, libblkid1_2.32-r0_i586.ipk of build1 managed to make it into the sstate.

On our production system, the SDK builds called later always fetch the packages
from sstate again (do_package_write_ipk_setscene). For some reason that did not
happen in my synthetic test. However, I simulated the fetch from sstate by
removing libblkid1_2.32-r0_i586.ipk (part of util-linux) from build2:

  build2 $ bitbake -c clean util-linux

And rebuild the image:

  build2 $ bitbake -f -c populate_sdk core-image-minimal

Which indeed showed the issues we are seeing in production:
...
Downloading file:/home/ags/oe-core/build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk.
Removing corrupt package file /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/sdk/image/usr/local/oecore-x86_64/sysroots/i586-oe-linux//var/cache/opkg/volatile/edad620216e689cf8778354f4ac2ff7e_libblkid1_2.32-r0_i586.ipk.
...
Collected errors:                                                                                                                           
 * opkg_install_pkg: Failed to download libblkid1. Perhaps you need to run 'opkg update'?
...
ERROR: core-image-minimal-1.0-r0 do_populate_sdk: Function failed: do_populate_sdk                                                         
ERROR: Logfile of failure stored in: /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/temp/log.do_populate_sdk.49541
ERROR: Task (/home/ags/oe-core/meta/recipes-core/images/core-image-minimal.bb:do_populate_sdk) failed with exit code '1'


It really seems that using mtime only in opkg-make-index is not safe enough
in our setup. We probably should use the inode (or using the inode in addition
to the time stamp).

Is it guaranteed that the inode changes when ipk are fetched from sstate?

--
Stefan


> 
>>
>> I guess there is some race/asynchronous operation going on around
>> building index/getting package from sstate/pushing package to sstate.
>>
>> It seems an issue others have seen in the past too:
>> https://www.yoctoproject.org/irc/%23yocto.2018-07-05.log.html#t2018-07-05T10:07:25
>>
>> Any idea?
>>
>> --
>> Stefan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Build failure with parallel build and opkg
  2018-10-02  8:46   ` Stefan Agner
@ 2018-10-02 13:12     ` Stefan Agner
  2018-10-02 19:03       ` Khem Raj
  2018-10-11  9:32       ` Stefan Agner
  0 siblings, 2 replies; 6+ messages in thread
From: Stefan Agner @ 2018-10-02 13:12 UTC (permalink / raw)
  To: openembedded-core, richard.purdie, Max Krummenacher
  Cc: Brandon Shibley, samuel.bissig, ricardo

On 02.10.2018 10:46, Stefan Agner wrote:
> On 26.09.2018 11:34, Stefan Agner wrote:
>> Hi,
>>
>> On 12.09.2018 00:49, Stefan Agner wrote:
> <snip>
>> We discussed the issue at Linaro Connect a bit.
>>
>> To recap, we do build in two steps:
>>
>> 1. bitbake full-container-image
>> 2. bitbake -c populate_sdk full-container-image
>>
>> The issue always happens in the second step.
>>
>> We also see that in the second step, the do_package_write_ipk_setscene
>> task for every recipe is executed.
>>
>> The current assumption is
>>
>> I tried to reproduce by building a recipe using openembedded-core master
>> only in two build directories with shared sstate manually:
>>
>> 1. build1 $ bitbake eudev
>> 2. build2 $ bitbake -c cleansstate eudev
>> 3. build2 $ bitbake eudev
>> 4. build1 $ bitbake core-image-minimal
>>
>> This sequence seems not to have triggered a
>> do_package_write_ipk_setscene for eudev.
>>
>> I then tried
>> 5. build1 $ bitbake -c populate_sdk core-image-minimal
>>
>> Which did trigger a do_package_write_ipk_setscene. However, the issue
>> did not appear...
>>
>> I even tried to rebuild and replace the file manually, and run bitbake
>> -c populate_sdk -f core-image-minimal, but it just seems not to appear.
>>
>> Last time I have seen it was with oe-core
>> f6634581fa0a81c4d68dc9179a755ad7b9d99357, I will revert to this version
>> again to see whether that helps reproducing the issue.
> 
> Using the older OE version did not make a difference.
> 
> 
> So Max and I discussed a bit further. We realized that when OE rebuilds,
> the opkg package index is refreshed for a package only if the mtime (in
> seconds) is different between the previous file and the file on disk
> currently (see opkg-make-index). Can it be that two simultaneously
> started builds create two ipk with same mtime?
> 
> So I built two core-image-minimal builds (on the same machine) at the
> *very* same time. First try did not cause a collision, but already on
> the second try I managed to reproduce multiple collision. This output
> shows all the packages with the same unix mtime stamp according to the
> Packages.stamps file (generated by opkg-make-index):
> 
> $ comm -1 -2 <(sort build1/tmp-glibc/deploy/ipk/i586/Packages.stamps) <(sort build2/tmp-glibc/deploy/ipk/i586/Packages.stamps)
> kages.stamps)                                                                                                                               
> 1538424250 glibc-binary-localedata-de-it_2.27-r0_i586.ipk                                                                                   
> 1538425082 libreadline-staticdev_7.0-r0_i586.ipk                                                                                            
> 1538425162 bash-bashbug_4.4.18-r0_i586.ipk                                                                                                  
> 1538425162 bash-completion_2.8-r0_i586.ipk                                                                                                  
> 1538425162 bash-completion-dbg_2.8-r0_i586.ipk                                                                                              
> 1538425162 bash-completion-dev_2.8-r0_i586.ipk                                                                                              
> 1538425162 bash-loadable_4.4.18-r0_i586.ipk                                                                                                 
> 1538425163 bash_4.4.18-r0_i586.ipk                                                                                                          
> 1538425163 bash-completion-extra_2.8-r0_i586.ipk
> 1538425164 bash-doc_4.4.18-r0_i586.ipk
> 1538425167 bash-dbg_4.4.18-r0_i586.ipk
> 1538425233 util-linux-locale-hr_2.32-r0_i586.ipk
> 1538425233 util-linux-locale-id_2.32-r0_i586.ipk
> 1538425233 util-linux-locale-it_2.32-r0_i586.ipk
> 1538425233 util-linux-locale-sl_2.32-r0_i586.ipk
> 1538425233 util-linux-locale-zh-tw_2.32-r0_i586.ipk
> 1538425234 util-linux-dev_2.32-r0_i586.ipk
> 1538425234 util-linux-findfs_2.32-r0_i586.ipk
> 1538425234 util-linux-ionice_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-ca_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-pl_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-pt-br_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-ru_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-sv_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-tr_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-vi_2.32-r0_i586.ipk
> 1538425234 util-linux-locale-zh-cn_2.32-r0_i586.ipk
> 1538425234 util-linux-mountpoint_2.32-r0_i586.ipk
> 1538425235 util-linux-locale-fr_2.32-r0_i586.ipk
> 1538425236 util-linux-blkid_2.32-r0_i586.ipk
> 1538425236 util-linux-fsck_2.32-r0_i586.ipk
> 1538425236 util-linux-fsck.cramfs_2.32-r0_i586.ipk
> 1538425236 util-linux-fstrim_2.32-r0_i586.ipk
> 1538425236 util-linux-lscpu_2.32-r0_i586.ipk
> 1538425236 util-linux-mkfs_2.32-r0_i586.ipk
> 1538425236 util-linux-mount_2.32-r0_i586.ipk
> 1538425236 util-linux-partx_2.32-r0_i586.ipk
> 1538425236 util-linux-readprofile_2.32-r0_i586.ipk
> 1538425236 util-linux-umount_2.32-r0_i586.ipk
> 1538425237 libblkid1_2.32-r0_i586.ipk
> 1538425237 libfdisk1_2.32-r0_i586.ipk
> 1538425237 libmount1_2.32-r0_i586.ipk
> 1538425238 util-linux_2.32-r0_i586.ipk
> 1538425240 util-linux-staticdev_2.32-r0_i586.ipk
> 
> Indeed, the two ipk files have the same mtime (at least, on second level):
> 
> $ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   Size: 136612          Blocks: 272        IO Block: 4096   regular file
> Device: fd00h/64768d    Inode: 2627769381  Links: 2
> Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
> Context: unconfined_u:object_r:home_root_t:s0
> Access: 2018-10-01 22:28:37.464716076 +0200
> Modify: 2018-10-01 22:20:37.930653796 +0200
> Change: 2018-10-01 22:20:50.998205339 +0200
>  Birth: -
>   File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   Size: 136692          Blocks: 272        IO Block: 4096   regular file
> Device: fd00h/64768d    Inode: 2608876057  Links: 2
> Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
> Context: unconfined_u:object_r:home_root_t:s0
> Access: 2018-10-01 22:28:47.145119532 +0200
> Modify: 2018-10-01 22:20:37.358629653 +0200
> Change: 2018-10-01 22:20:50.727193901 +0200
>  Birth: -
> 
> But different md5sums!
> 
> $ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> bb8ffd4443e84fede943ccc3292dc657  ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> 91608ae2b109e3f4af0379c634fad9c0  ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> 
> In that case, libblkid1_2.32-r0_i586.ipk of build1 managed to make it into the sstate.
> 
> On our production system, the SDK builds called later always fetch the packages
> from sstate again (do_package_write_ipk_setscene). For some reason that did not
> happen in my synthetic test. However, I simulated the fetch from sstate by
> removing libblkid1_2.32-r0_i586.ipk (part of util-linux) from build2:
> 
>   build2 $ bitbake -c clean util-linux
> 
> And rebuild the image:
> 
>   build2 $ bitbake -f -c populate_sdk core-image-minimal
> 

Note that after this command, the packages in the feed looks like this:

$ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
  File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
  Size: 136612          Blocks: 272        IO Block: 4096   regular file                                                                   
Device: fd00h/64768d    Inode: 2627769381  Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)                                                                   
Context: unconfined_u:object_r:home_root_t:s0
Access: 2018-10-01 22:28:37.464716076 +0200
Modify: 2018-10-01 22:20:37.930653796 +0200
Change: 2018-10-01 22:20:50.998205339 +0200
 Birth: -
  File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
  Size: 136612          Blocks: 272        IO Block: 4096   regular file                                                                   
Device: fd00h/64768d    Inode: 2315301423  Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)                                                                   
Context: unconfined_u:object_r:home_root_t:s0
Access: 2018-10-02 09:27:07.750888458 +0200
Modify: 2018-10-01 22:20:37.000000000 +0200
Change: 2018-10-02 09:26:47.308004397 +0200
 Birth: -

Note that the package in build2 the mtime has no nanosecond accuracy. Presumably
the tarball from sstate does not preserve sub second timestamps.

$ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
bb8ffd4443e84fede943ccc3292dc657  ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
bb8ffd4443e84fede943ccc3292dc657  ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk

Will send out a patch which uses inodes instead of mtime in opkg-make-index.

--
Stefan


> Which indeed showed the issues we are seeing in production:
> ...
> Downloading file:/home/ags/oe-core/build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk.
> Removing corrupt package file /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/sdk/image/usr/local/oecore-x86_64/sysroots/i586-oe-linux//var/cache/opkg/volatile/edad620216e689cf8778354f4ac2ff7e_libblkid1_2.32-r0_i586.ipk.
> ...
> Collected errors:                                                                                                                           
>  * opkg_install_pkg: Failed to download libblkid1. Perhaps you need to run 'opkg update'?
> ...
> ERROR: core-image-minimal-1.0-r0 do_populate_sdk: Function failed: do_populate_sdk                                                         
> ERROR: Logfile of failure stored in: /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/temp/log.do_populate_sdk.49541
> ERROR: Task (/home/ags/oe-core/meta/recipes-core/images/core-image-minimal.bb:do_populate_sdk) failed with exit code '1'
> 
> 
> It really seems that using mtime only in opkg-make-index is not safe enough
> in our setup. We probably should use the inode (or using the inode in addition
> to the time stamp).
> 
> Is it guaranteed that the inode changes when ipk are fetched from sstate?
> 
> --
> Stefan
> 
> 
>>
>>>
>>> I guess there is some race/asynchronous operation going on around
>>> building index/getting package from sstate/pushing package to sstate.
>>>
>>> It seems an issue others have seen in the past too:
>>> https://www.yoctoproject.org/irc/%23yocto.2018-07-05.log.html#t2018-07-05T10:07:25
>>>
>>> Any idea?
>>>
>>> --
>>> Stefan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Build failure with parallel build and opkg
  2018-10-02 13:12     ` Stefan Agner
@ 2018-10-02 19:03       ` Khem Raj
  2018-10-11  9:32       ` Stefan Agner
  1 sibling, 0 replies; 6+ messages in thread
From: Khem Raj @ 2018-10-02 19:03 UTC (permalink / raw)
  To: Stefan Agner
  Cc: Patches and discussions about the oe-core layer, Brandon Shibley,
	samuel.bissig, ricardo

On Tue, Oct 2, 2018 at 6:13 AM Stefan Agner <stefan@agner.ch> wrote:
>
> On 02.10.2018 10:46, Stefan Agner wrote:
> > On 26.09.2018 11:34, Stefan Agner wrote:
> >> Hi,
> >>
> >> On 12.09.2018 00:49, Stefan Agner wrote:
> > <snip>
> >> We discussed the issue at Linaro Connect a bit.
> >>
> >> To recap, we do build in two steps:
> >>
> >> 1. bitbake full-container-image
> >> 2. bitbake -c populate_sdk full-container-image
> >>
> >> The issue always happens in the second step.
> >>
> >> We also see that in the second step, the do_package_write_ipk_setscene
> >> task for every recipe is executed.
> >>
> >> The current assumption is
> >>
> >> I tried to reproduce by building a recipe using openembedded-core master
> >> only in two build directories with shared sstate manually:
> >>
> >> 1. build1 $ bitbake eudev
> >> 2. build2 $ bitbake -c cleansstate eudev
> >> 3. build2 $ bitbake eudev
> >> 4. build1 $ bitbake core-image-minimal
> >>
> >> This sequence seems not to have triggered a
> >> do_package_write_ipk_setscene for eudev.
> >>
> >> I then tried
> >> 5. build1 $ bitbake -c populate_sdk core-image-minimal
> >>
> >> Which did trigger a do_package_write_ipk_setscene. However, the issue
> >> did not appear...
> >>
> >> I even tried to rebuild and replace the file manually, and run bitbake
> >> -c populate_sdk -f core-image-minimal, but it just seems not to appear.
> >>
> >> Last time I have seen it was with oe-core
> >> f6634581fa0a81c4d68dc9179a755ad7b9d99357, I will revert to this version
> >> again to see whether that helps reproducing the issue.
> >
> > Using the older OE version did not make a difference.
> >
> >
> > So Max and I discussed a bit further. We realized that when OE rebuilds,
> > the opkg package index is refreshed for a package only if the mtime (in
> > seconds) is different between the previous file and the file on disk
> > currently (see opkg-make-index). Can it be that two simultaneously
> > started builds create two ipk with same mtime?
> >
> > So I built two core-image-minimal builds (on the same machine) at the
> > *very* same time. First try did not cause a collision, but already on
> > the second try I managed to reproduce multiple collision. This output
> > shows all the packages with the same unix mtime stamp according to the
> > Packages.stamps file (generated by opkg-make-index):
> >
> > $ comm -1 -2 <(sort build1/tmp-glibc/deploy/ipk/i586/Packages.stamps) <(sort build2/tmp-glibc/deploy/ipk/i586/Packages.stamps)
> > kages.stamps)
> > 1538424250 glibc-binary-localedata-de-it_2.27-r0_i586.ipk
> > 1538425082 libreadline-staticdev_7.0-r0_i586.ipk
> > 1538425162 bash-bashbug_4.4.18-r0_i586.ipk
> > 1538425162 bash-completion_2.8-r0_i586.ipk
> > 1538425162 bash-completion-dbg_2.8-r0_i586.ipk
> > 1538425162 bash-completion-dev_2.8-r0_i586.ipk
> > 1538425162 bash-loadable_4.4.18-r0_i586.ipk
> > 1538425163 bash_4.4.18-r0_i586.ipk
> > 1538425163 bash-completion-extra_2.8-r0_i586.ipk
> > 1538425164 bash-doc_4.4.18-r0_i586.ipk
> > 1538425167 bash-dbg_4.4.18-r0_i586.ipk
> > 1538425233 util-linux-locale-hr_2.32-r0_i586.ipk
> > 1538425233 util-linux-locale-id_2.32-r0_i586.ipk
> > 1538425233 util-linux-locale-it_2.32-r0_i586.ipk
> > 1538425233 util-linux-locale-sl_2.32-r0_i586.ipk
> > 1538425233 util-linux-locale-zh-tw_2.32-r0_i586.ipk
> > 1538425234 util-linux-dev_2.32-r0_i586.ipk
> > 1538425234 util-linux-findfs_2.32-r0_i586.ipk
> > 1538425234 util-linux-ionice_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-ca_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-pl_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-pt-br_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-ru_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-sv_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-tr_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-vi_2.32-r0_i586.ipk
> > 1538425234 util-linux-locale-zh-cn_2.32-r0_i586.ipk
> > 1538425234 util-linux-mountpoint_2.32-r0_i586.ipk
> > 1538425235 util-linux-locale-fr_2.32-r0_i586.ipk
> > 1538425236 util-linux-blkid_2.32-r0_i586.ipk
> > 1538425236 util-linux-fsck_2.32-r0_i586.ipk
> > 1538425236 util-linux-fsck.cramfs_2.32-r0_i586.ipk
> > 1538425236 util-linux-fstrim_2.32-r0_i586.ipk
> > 1538425236 util-linux-lscpu_2.32-r0_i586.ipk
> > 1538425236 util-linux-mkfs_2.32-r0_i586.ipk
> > 1538425236 util-linux-mount_2.32-r0_i586.ipk
> > 1538425236 util-linux-partx_2.32-r0_i586.ipk
> > 1538425236 util-linux-readprofile_2.32-r0_i586.ipk
> > 1538425236 util-linux-umount_2.32-r0_i586.ipk
> > 1538425237 libblkid1_2.32-r0_i586.ipk
> > 1538425237 libfdisk1_2.32-r0_i586.ipk
> > 1538425237 libmount1_2.32-r0_i586.ipk
> > 1538425238 util-linux_2.32-r0_i586.ipk
> > 1538425240 util-linux-staticdev_2.32-r0_i586.ipk
> >
> > Indeed, the two ipk files have the same mtime (at least, on second level):
> >
> > $ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> >   File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> >   Size: 136612          Blocks: 272        IO Block: 4096   regular file
> > Device: fd00h/64768d    Inode: 2627769381  Links: 2
> > Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
> > Context: unconfined_u:object_r:home_root_t:s0
> > Access: 2018-10-01 22:28:37.464716076 +0200
> > Modify: 2018-10-01 22:20:37.930653796 +0200
> > Change: 2018-10-01 22:20:50.998205339 +0200
> >  Birth: -
> >   File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> >   Size: 136692          Blocks: 272        IO Block: 4096   regular file
> > Device: fd00h/64768d    Inode: 2608876057  Links: 2
> > Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
> > Context: unconfined_u:object_r:home_root_t:s0
> > Access: 2018-10-01 22:28:47.145119532 +0200
> > Modify: 2018-10-01 22:20:37.358629653 +0200
> > Change: 2018-10-01 22:20:50.727193901 +0200
> >  Birth: -
> >
> > But different md5sums!
> >
> > $ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> > bb8ffd4443e84fede943ccc3292dc657  ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> > 91608ae2b109e3f4af0379c634fad9c0  ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> >
> > In that case, libblkid1_2.32-r0_i586.ipk of build1 managed to make it into the sstate.
> >
> > On our production system, the SDK builds called later always fetch the packages
> > from sstate again (do_package_write_ipk_setscene). For some reason that did not
> > happen in my synthetic test. However, I simulated the fetch from sstate by
> > removing libblkid1_2.32-r0_i586.ipk (part of util-linux) from build2:
> >
> >   build2 $ bitbake -c clean util-linux
> >
> > And rebuild the image:
> >
> >   build2 $ bitbake -f -c populate_sdk core-image-minimal
> >
>
> Note that after this command, the packages in the feed looks like this:
>
> $ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   Size: 136612          Blocks: 272        IO Block: 4096   regular file
> Device: fd00h/64768d    Inode: 2627769381  Links: 2
> Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
> Context: unconfined_u:object_r:home_root_t:s0
> Access: 2018-10-01 22:28:37.464716076 +0200
> Modify: 2018-10-01 22:20:37.930653796 +0200
> Change: 2018-10-01 22:20:50.998205339 +0200
>  Birth: -
>   File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   Size: 136612          Blocks: 272        IO Block: 4096   regular file
> Device: fd00h/64768d    Inode: 2315301423  Links: 2
> Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
> Context: unconfined_u:object_r:home_root_t:s0
> Access: 2018-10-02 09:27:07.750888458 +0200
> Modify: 2018-10-01 22:20:37.000000000 +0200
> Change: 2018-10-02 09:26:47.308004397 +0200
>  Birth: -
>
> Note that the package in build2 the mtime has no nanosecond accuracy. Presumably
> the tarball from sstate does not preserve sub second timestamps.
>
> $ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> bb8ffd4443e84fede943ccc3292dc657  ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> bb8ffd4443e84fede943ccc3292dc657  ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>
> Will send out a patch which uses inodes instead of mtime in opkg-make-index.
>

Thats a good find Stefan, How about using ctime instead of mtime.

> --
> Stefan
>
>
> > Which indeed showed the issues we are seeing in production:
> > ...
> > Downloading file:/home/ags/oe-core/build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk.
> > Removing corrupt package file /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/sdk/image/usr/local/oecore-x86_64/sysroots/i586-oe-linux//var/cache/opkg/volatile/edad620216e689cf8778354f4ac2ff7e_libblkid1_2.32-r0_i586.ipk.
> > ...
> > Collected errors:
> >  * opkg_install_pkg: Failed to download libblkid1. Perhaps you need to run 'opkg update'?
> > ...
> > ERROR: core-image-minimal-1.0-r0 do_populate_sdk: Function failed: do_populate_sdk
> > ERROR: Logfile of failure stored in: /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/temp/log.do_populate_sdk.49541
> > ERROR: Task (/home/ags/oe-core/meta/recipes-core/images/core-image-minimal.bb:do_populate_sdk) failed with exit code '1'
> >
> >
> > It really seems that using mtime only in opkg-make-index is not safe enough
> > in our setup. We probably should use the inode (or using the inode in addition
> > to the time stamp).
> >
> > Is it guaranteed that the inode changes when ipk are fetched from sstate?
> >
> > --
> > Stefan
> >
> >
> >>
> >>>
> >>> I guess there is some race/asynchronous operation going on around
> >>> building index/getting package from sstate/pushing package to sstate.
> >>>
> >>> It seems an issue others have seen in the past too:
> >>> https://www.yoctoproject.org/irc/%23yocto.2018-07-05.log.html#t2018-07-05T10:07:25
> >>>
> >>> Any idea?
> >>>
> >>> --
> >>> Stefan
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Build failure with parallel build and opkg
  2018-10-02 13:12     ` Stefan Agner
  2018-10-02 19:03       ` Khem Raj
@ 2018-10-11  9:32       ` Stefan Agner
  1 sibling, 0 replies; 6+ messages in thread
From: Stefan Agner @ 2018-10-11  9:32 UTC (permalink / raw)
  To: openembedded-core, richard.purdie, Max Krummenacher
  Cc: Brandon Shibley, samuel.bissig, ricardo

On 02.10.2018 15:12, Stefan Agner wrote:
> On 02.10.2018 10:46, Stefan Agner wrote:
>> On 26.09.2018 11:34, Stefan Agner wrote:
>>> Hi,
>>>
>>> On 12.09.2018 00:49, Stefan Agner wrote:
>> <snip>
>>> We discussed the issue at Linaro Connect a bit.
>>>
>>> To recap, we do build in two steps:
>>>
>>> 1. bitbake full-container-image
>>> 2. bitbake -c populate_sdk full-container-image
>>>
>>> The issue always happens in the second step.
>>>
>>> We also see that in the second step, the do_package_write_ipk_setscene
>>> task for every recipe is executed.
>>>
>>> The current assumption is
>>>
>>> I tried to reproduce by building a recipe using openembedded-core master
>>> only in two build directories with shared sstate manually:
>>>
>>> 1. build1 $ bitbake eudev
>>> 2. build2 $ bitbake -c cleansstate eudev
>>> 3. build2 $ bitbake eudev
>>> 4. build1 $ bitbake core-image-minimal
>>>
>>> This sequence seems not to have triggered a
>>> do_package_write_ipk_setscene for eudev.
>>>
>>> I then tried
>>> 5. build1 $ bitbake -c populate_sdk core-image-minimal
>>>
>>> Which did trigger a do_package_write_ipk_setscene. However, the issue
>>> did not appear...
>>>
>>> I even tried to rebuild and replace the file manually, and run bitbake
>>> -c populate_sdk -f core-image-minimal, but it just seems not to appear.
>>>
>>> Last time I have seen it was with oe-core
>>> f6634581fa0a81c4d68dc9179a755ad7b9d99357, I will revert to this version
>>> again to see whether that helps reproducing the issue.
>>
>> Using the older OE version did not make a difference.
>>
>>
>> So Max and I discussed a bit further. We realized that when OE rebuilds,
>> the opkg package index is refreshed for a package only if the mtime (in
>> seconds) is different between the previous file and the file on disk
>> currently (see opkg-make-index). Can it be that two simultaneously
>> started builds create two ipk with same mtime?
>>
>> So I built two core-image-minimal builds (on the same machine) at the
>> *very* same time. First try did not cause a collision, but already on
>> the second try I managed to reproduce multiple collision. This output
>> shows all the packages with the same unix mtime stamp according to the
>> Packages.stamps file (generated by opkg-make-index):
>>
>> $ comm -1 -2 <(sort build1/tmp-glibc/deploy/ipk/i586/Packages.stamps) <(sort build2/tmp-glibc/deploy/ipk/i586/Packages.stamps)
>> kages.stamps)
>> 1538424250 glibc-binary-localedata-de-it_2.27-r0_i586.ipk
>> 1538425082 libreadline-staticdev_7.0-r0_i586.ipk
>> 1538425162 bash-bashbug_4.4.18-r0_i586.ipk
>> 1538425162 bash-completion_2.8-r0_i586.ipk
>> 1538425162 bash-completion-dbg_2.8-r0_i586.ipk
>> 1538425162 bash-completion-dev_2.8-r0_i586.ipk
>> 1538425162 bash-loadable_4.4.18-r0_i586.ipk
>> 1538425163 bash_4.4.18-r0_i586.ipk
>> 1538425163 bash-completion-extra_2.8-r0_i586.ipk
>> 1538425164 bash-doc_4.4.18-r0_i586.ipk
>> 1538425167 bash-dbg_4.4.18-r0_i586.ipk
>> 1538425233 util-linux-locale-hr_2.32-r0_i586.ipk
>> 1538425233 util-linux-locale-id_2.32-r0_i586.ipk
>> 1538425233 util-linux-locale-it_2.32-r0_i586.ipk
>> 1538425233 util-linux-locale-sl_2.32-r0_i586.ipk
>> 1538425233 util-linux-locale-zh-tw_2.32-r0_i586.ipk
>> 1538425234 util-linux-dev_2.32-r0_i586.ipk
>> 1538425234 util-linux-findfs_2.32-r0_i586.ipk
>> 1538425234 util-linux-ionice_2.32-r0_i586.ipk
>> 1538425234 util-linux-locale-ca_2.32-r0_i586.ipk
>> 1538425234 util-linux-locale-pl_2.32-r0_i586.ipk
>> 1538425234 util-linux-locale-pt-br_2.32-r0_i586.ipk
>> 1538425234 util-linux-locale-ru_2.32-r0_i586.ipk
>> 1538425234 util-linux-locale-sv_2.32-r0_i586.ipk
>> 1538425234 util-linux-locale-tr_2.32-r0_i586.ipk
>> 1538425234 util-linux-locale-vi_2.32-r0_i586.ipk
>> 1538425234 util-linux-locale-zh-cn_2.32-r0_i586.ipk
>> 1538425234 util-linux-mountpoint_2.32-r0_i586.ipk
>> 1538425235 util-linux-locale-fr_2.32-r0_i586.ipk
>> 1538425236 util-linux-blkid_2.32-r0_i586.ipk
>> 1538425236 util-linux-fsck_2.32-r0_i586.ipk
>> 1538425236 util-linux-fsck.cramfs_2.32-r0_i586.ipk
>> 1538425236 util-linux-fstrim_2.32-r0_i586.ipk
>> 1538425236 util-linux-lscpu_2.32-r0_i586.ipk
>> 1538425236 util-linux-mkfs_2.32-r0_i586.ipk
>> 1538425236 util-linux-mount_2.32-r0_i586.ipk
>> 1538425236 util-linux-partx_2.32-r0_i586.ipk
>> 1538425236 util-linux-readprofile_2.32-r0_i586.ipk
>> 1538425236 util-linux-umount_2.32-r0_i586.ipk
>> 1538425237 libblkid1_2.32-r0_i586.ipk
>> 1538425237 libfdisk1_2.32-r0_i586.ipk
>> 1538425237 libmount1_2.32-r0_i586.ipk
>> 1538425238 util-linux_2.32-r0_i586.ipk
>> 1538425240 util-linux-staticdev_2.32-r0_i586.ipk
>>
>> Indeed, the two ipk files have the same mtime (at least, on second level):
>>
>> $ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>>   File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>>   Size: 136612          Blocks: 272        IO Block: 4096   regular file
>> Device: fd00h/64768d    Inode: 2627769381  Links: 2
>> Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
>> Context: unconfined_u:object_r:home_root_t:s0
>> Access: 2018-10-01 22:28:37.464716076 +0200
>> Modify: 2018-10-01 22:20:37.930653796 +0200
>> Change: 2018-10-01 22:20:50.998205339 +0200
>>  Birth: -
>>   File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>>   Size: 136692          Blocks: 272        IO Block: 4096   regular file
>> Device: fd00h/64768d    Inode: 2608876057  Links: 2
>> Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
>> Context: unconfined_u:object_r:home_root_t:s0
>> Access: 2018-10-01 22:28:47.145119532 +0200
>> Modify: 2018-10-01 22:20:37.358629653 +0200
>> Change: 2018-10-01 22:20:50.727193901 +0200
>>  Birth: -
>>
>> But different md5sums!
>>
>> $ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>> bb8ffd4443e84fede943ccc3292dc657  ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>> 91608ae2b109e3f4af0379c634fad9c0  ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>>
>> In that case, libblkid1_2.32-r0_i586.ipk of build1 managed to make it into the sstate.
>>
>> On our production system, the SDK builds called later always fetch the packages
>> from sstate again (do_package_write_ipk_setscene). For some reason that did not
>> happen in my synthetic test. However, I simulated the fetch from sstate by
>> removing libblkid1_2.32-r0_i586.ipk (part of util-linux) from build2:
>>
>>   build2 $ bitbake -c clean util-linux
>>
>> And rebuild the image:
>>
>>   build2 $ bitbake -f -c populate_sdk core-image-minimal
>>
> 
> Note that after this command, the packages in the feed looks like this:
> 
> $ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   Size: 136612          Blocks: 272        IO Block: 4096   regular
> file
> Device: fd00h/64768d    Inode: 2627769381  Links: 2
> Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/    
> ags)
> Context: unconfined_u:object_r:home_root_t:s0
> Access: 2018-10-01 22:28:37.464716076 +0200
> Modify: 2018-10-01 22:20:37.930653796 +0200
> Change: 2018-10-01 22:20:50.998205339 +0200
>  Birth: -
>   File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
>   Size: 136612          Blocks: 272        IO Block: 4096   regular
> file
> Device: fd00h/64768d    Inode: 2315301423  Links: 2
> Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/    
> ags)
> Context: unconfined_u:object_r:home_root_t:s0
> Access: 2018-10-02 09:27:07.750888458 +0200
> Modify: 2018-10-01 22:20:37.000000000 +0200
> Change: 2018-10-02 09:26:47.308004397 +0200
>  Birth: -
> 
> Note that the package in build2 the mtime has no nanosecond accuracy. Presumably
> the tarball from sstate does not preserve sub second timestamps.
> 
> $ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> bb8ffd4443e84fede943ccc3292dc657 
> ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> bb8ffd4443e84fede943ccc3292dc657 
> ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
> 
> Will send out a patch which uses inodes instead of mtime in opkg-make-index.
> 
> --
> Stefan
> 
> 
>> Which indeed showed the issues we are seeing in production:
>> ...
>> Downloading file:/home/ags/oe-core/build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk.
>> Removing corrupt package file /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/sdk/image/usr/local/oecore-x86_64/sysroots/i586-oe-linux//var/cache/opkg/volatile/edad620216e689cf8778354f4ac2ff7e_libblkid1_2.32-r0_i586.ipk.
>> ...
>> Collected errors:
>>  * opkg_install_pkg: Failed to download libblkid1. Perhaps you need to run 'opkg update'?
>> ...
>> ERROR: core-image-minimal-1.0-r0 do_populate_sdk: Function failed: do_populate_sdk
>> ERROR: Logfile of failure stored in: /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/temp/log.do_populate_sdk.49541
>> ERROR: Task (/home/ags/oe-core/meta/recipes-core/images/core-image-minimal.bb:do_populate_sdk) failed with exit code '1'
>>
>>
>> It really seems that using mtime only in opkg-make-index is not safe enough
>> in our setup. We probably should use the inode (or using the inode in addition
>> to the time stamp).
>>
>> Is it guaranteed that the inode changes when ipk are fetched from sstate?

Unfortunately, testing with inodes instead of mtime as Packages.stamps
turned out to be equally unreliable. It seems that when the package is
fetched from sstate, inode is preserved:

In a particular case:
$ stat
/workdir/oe/tmp/deploy/ipk/armv7at2hf-neon/libbz2-1_1.0.6-r5_armv7at2hf-neon.ipk
               
  File:
/workdir/oe/tmp/deploy/ipk/armv7at2hf-neon/libbz2-1_1.0.6-r5_armv7at2hf-neon.ipk
                                                  
  Size: 26144           Blocks: 56         IO Block: 4096   regular file
                                                                  
Device: 801h/2049d      Inode: 151929072   Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1003/   yocto)   Gid: ( 1003/   yocto)
                                                                  
Access: 2018-10-10 10:14:37.022012933 +0000
Modify: 2018-10-10 08:43:51.000000000 +0000
Change: 2018-10-10 10:12:54.026508198 +0000
 Birth: -

(modification time ns is 0 -> file has been fetched from sstate).

I recorded Inode on every opkg-make-index invocation, and it seems that
inode was always 151929072.

The log.do_package_write_ipk_setscene from that package shows:

DEBUG: Executing python function do_package_write_ipk_setscene
DEBUG: Removing manifest:
/workdir/oe/tmp/deploy/ipk/armv7at2hf-neon/bzip2_1.0.6-r5_armv7at2hf-neon.ipk
                                   
DEBUG: Removing manifest:
/workdir/oe/tmp/deploy/ipk/armv7at2hf-neon/bzip2-doc_1.0.6-r5_armv7at2hf-neon.ipk
                               
DEBUG: Removing manifest:
/workdir/oe/tmp/deploy/ipk/armv7at2hf-neon/bzip2-staticdev_1.0.6-r5_armv7at2hf-neon.ipk
                         
DEBUG: Removing manifest:
/workdir/oe/tmp/deploy/ipk/armv7at2hf-neon/bzip2-dbg_1.0.6-r5_armv7at2hf-neon.ipk
                               
DEBUG: Removing manifest:
/workdir/oe/tmp/deploy/ipk/armv7at2hf-neon/bzip2-dev_1.0.6-r5_armv7at2hf-neon.ipk
                               
DEBUG: Removing manifest:
/workdir/oe/tmp/deploy/ipk/armv7at2hf-neon/libbz2-1_1.0.6-r5_armv7at2hf-neon.ipk
                                
DEBUG: Removing manifest: /workdir/oe/tmp/deploy/ipk/armv7at2hf-neon/
DEBUG: SITE files ['endian-little', 'bit-32', 'arm-common', 'arm-32',
'common-linux', 'common-glibc', 'arm-linux', 'arm-linux-gnueabi',
'common']
DEBUG: Executing shell function sstate_unpack_package
deploy-ipks/
deploy-ipks/armv7at2hf-neon/
deploy-ipks/armv7at2hf-neon/bzip2_1.0.6-r5_armv7at2hf-neon.ipk
deploy-ipks/armv7at2hf-neon/bzip2-doc_1.0.6-r5_armv7at2hf-neon.ipk
deploy-ipks/armv7at2hf-neon/bzip2-staticdev_1.0.6-r5_armv7at2hf-neon.ipk
                                                                  
deploy-ipks/armv7at2hf-neon/bzip2-dbg_1.0.6-r5_armv7at2hf-neon.ipk
deploy-ipks/armv7at2hf-neon/bzip2-dev_1.0.6-r5_armv7at2hf-neon.ipk
deploy-ipks/armv7at2hf-neon/libbz2-1_1.0.6-r5_armv7at2hf-neon.ipk
DEBUG: Shell function sstate_unpack_package finished
DEBUG: Executing python function sstate_hardcode_path_unpack
DEBUG: Python function sstate_hardcode_path_unpack finished
DEBUG: Executing python function buildhistory_emit_outputsigs
DEBUG: Python function buildhistory_emit_outputsigs finished
DEBUG: Staging files from
/workdir/oe/tmp/work/armv7at2hf-neon-lmp-linux-gnueabi/bzip2/1.0.6-r5/deploy-ipks
to /workdir/oe/tmp/deploy/ipk  
DEBUG: Executing python function buildhistory_emit_pkghistory
DEBUG: Python function buildhistory_emit_pkghistory finished
DEBUG: Python function do_package_write_ipk_setscene finished

So I guess the "Staging files" step preserves the target files inode
number...

It seems that ctime is modified though, so our best chance is probably
ctime. I will run some tests suing ctime.

--
Stefan

>>
>> --
>> Stefan
>>
>>
>>>
>>>>
>>>> I guess there is some race/asynchronous operation going on around
>>>> building index/getting package from sstate/pushing package to sstate.
>>>>
>>>> It seems an issue others have seen in the past too:
>>>> https://www.yoctoproject.org/irc/%23yocto.2018-07-05.log.html#t2018-07-05T10:07:25
>>>>
>>>> Any idea?
>>>>
>>>> --
>>>> Stefan


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-10-11  9:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-11 22:49 Build failure with parallel build and opkg Stefan Agner
2018-09-26  9:34 ` Stefan Agner
2018-10-02  8:46   ` Stefan Agner
2018-10-02 13:12     ` Stefan Agner
2018-10-02 19:03       ` Khem Raj
2018-10-11  9:32       ` Stefan Agner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.