All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Agner <stefan@agner.ch>
To: openembedded-core@lists.openembedded.org,
	richard.purdie@linuxfoundation.org,
	Max Krummenacher <max.oss.09@gmail.com>
Cc: Brandon Shibley <brandon.shibley@toradex.com>,
	samuel.bissig@toradex.com, ricardo@foundries.io
Subject: Re: Build failure with parallel build and opkg
Date: Tue, 02 Oct 2018 10:46:02 +0200	[thread overview]
Message-ID: <ad574663589f7559ca0f32f159de471c@agner.ch> (raw)
In-Reply-To: <abbffc6e7fa60468423b2cf1ec39f311@agner.ch>

On 26.09.2018 11:34, Stefan Agner wrote:
> Hi,
> 
> On 12.09.2018 00:49, Stefan Agner wrote:
<snip>
> We discussed the issue at Linaro Connect a bit.
> 
> To recap, we do build in two steps:
> 
> 1. bitbake full-container-image
> 2. bitbake -c populate_sdk full-container-image
> 
> The issue always happens in the second step.
> 
> We also see that in the second step, the do_package_write_ipk_setscene
> task for every recipe is executed.
> 
> The current assumption is
> 
> I tried to reproduce by building a recipe using openembedded-core master
> only in two build directories with shared sstate manually:
> 
> 1. build1 $ bitbake eudev
> 2. build2 $ bitbake -c cleansstate eudev
> 3. build2 $ bitbake eudev
> 4. build1 $ bitbake core-image-minimal
> 
> This sequence seems not to have triggered a
> do_package_write_ipk_setscene for eudev.
> 
> I then tried
> 5. build1 $ bitbake -c populate_sdk core-image-minimal
> 
> Which did trigger a do_package_write_ipk_setscene. However, the issue
> did not appear...
> 
> I even tried to rebuild and replace the file manually, and run bitbake
> -c populate_sdk -f core-image-minimal, but it just seems not to appear.
> 
> Last time I have seen it was with oe-core
> f6634581fa0a81c4d68dc9179a755ad7b9d99357, I will revert to this version
> again to see whether that helps reproducing the issue.

Using the older OE version did not make a difference.


So Max and I discussed a bit further. We realized that when OE rebuilds,
the opkg package index is refreshed for a package only if the mtime (in
seconds) is different between the previous file and the file on disk
currently (see opkg-make-index). Can it be that two simultaneously
started builds create two ipk with same mtime?

So I built two core-image-minimal builds (on the same machine) at the
*very* same time. First try did not cause a collision, but already on
the second try I managed to reproduce multiple collision. This output
shows all the packages with the same unix mtime stamp according to the
Packages.stamps file (generated by opkg-make-index):

$ comm -1 -2 <(sort build1/tmp-glibc/deploy/ipk/i586/Packages.stamps) <(sort build2/tmp-glibc/deploy/ipk/i586/Packages.stamps)
kages.stamps)                                                                                                                               
1538424250 glibc-binary-localedata-de-it_2.27-r0_i586.ipk                                                                                   
1538425082 libreadline-staticdev_7.0-r0_i586.ipk                                                                                            
1538425162 bash-bashbug_4.4.18-r0_i586.ipk                                                                                                  
1538425162 bash-completion_2.8-r0_i586.ipk                                                                                                  
1538425162 bash-completion-dbg_2.8-r0_i586.ipk                                                                                              
1538425162 bash-completion-dev_2.8-r0_i586.ipk                                                                                              
1538425162 bash-loadable_4.4.18-r0_i586.ipk                                                                                                 
1538425163 bash_4.4.18-r0_i586.ipk                                                                                                          
1538425163 bash-completion-extra_2.8-r0_i586.ipk
1538425164 bash-doc_4.4.18-r0_i586.ipk
1538425167 bash-dbg_4.4.18-r0_i586.ipk
1538425233 util-linux-locale-hr_2.32-r0_i586.ipk
1538425233 util-linux-locale-id_2.32-r0_i586.ipk
1538425233 util-linux-locale-it_2.32-r0_i586.ipk
1538425233 util-linux-locale-sl_2.32-r0_i586.ipk
1538425233 util-linux-locale-zh-tw_2.32-r0_i586.ipk
1538425234 util-linux-dev_2.32-r0_i586.ipk
1538425234 util-linux-findfs_2.32-r0_i586.ipk
1538425234 util-linux-ionice_2.32-r0_i586.ipk
1538425234 util-linux-locale-ca_2.32-r0_i586.ipk
1538425234 util-linux-locale-pl_2.32-r0_i586.ipk
1538425234 util-linux-locale-pt-br_2.32-r0_i586.ipk
1538425234 util-linux-locale-ru_2.32-r0_i586.ipk
1538425234 util-linux-locale-sv_2.32-r0_i586.ipk
1538425234 util-linux-locale-tr_2.32-r0_i586.ipk
1538425234 util-linux-locale-vi_2.32-r0_i586.ipk
1538425234 util-linux-locale-zh-cn_2.32-r0_i586.ipk
1538425234 util-linux-mountpoint_2.32-r0_i586.ipk
1538425235 util-linux-locale-fr_2.32-r0_i586.ipk
1538425236 util-linux-blkid_2.32-r0_i586.ipk
1538425236 util-linux-fsck_2.32-r0_i586.ipk
1538425236 util-linux-fsck.cramfs_2.32-r0_i586.ipk
1538425236 util-linux-fstrim_2.32-r0_i586.ipk
1538425236 util-linux-lscpu_2.32-r0_i586.ipk
1538425236 util-linux-mkfs_2.32-r0_i586.ipk
1538425236 util-linux-mount_2.32-r0_i586.ipk
1538425236 util-linux-partx_2.32-r0_i586.ipk
1538425236 util-linux-readprofile_2.32-r0_i586.ipk
1538425236 util-linux-umount_2.32-r0_i586.ipk
1538425237 libblkid1_2.32-r0_i586.ipk
1538425237 libfdisk1_2.32-r0_i586.ipk
1538425237 libmount1_2.32-r0_i586.ipk
1538425238 util-linux_2.32-r0_i586.ipk
1538425240 util-linux-staticdev_2.32-r0_i586.ipk

Indeed, the two ipk files have the same mtime (at least, on second level):

$ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
  File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
  Size: 136612          Blocks: 272        IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 2627769381  Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
Context: unconfined_u:object_r:home_root_t:s0
Access: 2018-10-01 22:28:37.464716076 +0200
Modify: 2018-10-01 22:20:37.930653796 +0200
Change: 2018-10-01 22:20:50.998205339 +0200
 Birth: -
  File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
  Size: 136692          Blocks: 272        IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 2608876057  Links: 2
Access: (0644/-rw-r--r--)  Uid: ( 1002/     ags)   Gid: ( 1002/     ags)
Context: unconfined_u:object_r:home_root_t:s0
Access: 2018-10-01 22:28:47.145119532 +0200
Modify: 2018-10-01 22:20:37.358629653 +0200
Change: 2018-10-01 22:20:50.727193901 +0200
 Birth: -

But different md5sums!

$ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
bb8ffd4443e84fede943ccc3292dc657  ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk
91608ae2b109e3f4af0379c634fad9c0  ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk

In that case, libblkid1_2.32-r0_i586.ipk of build1 managed to make it into the sstate.

On our production system, the SDK builds called later always fetch the packages
from sstate again (do_package_write_ipk_setscene). For some reason that did not
happen in my synthetic test. However, I simulated the fetch from sstate by
removing libblkid1_2.32-r0_i586.ipk (part of util-linux) from build2:

  build2 $ bitbake -c clean util-linux

And rebuild the image:

  build2 $ bitbake -f -c populate_sdk core-image-minimal

Which indeed showed the issues we are seeing in production:
...
Downloading file:/home/ags/oe-core/build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk.
Removing corrupt package file /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/sdk/image/usr/local/oecore-x86_64/sysroots/i586-oe-linux//var/cache/opkg/volatile/edad620216e689cf8778354f4ac2ff7e_libblkid1_2.32-r0_i586.ipk.
...
Collected errors:                                                                                                                           
 * opkg_install_pkg: Failed to download libblkid1. Perhaps you need to run 'opkg update'?
...
ERROR: core-image-minimal-1.0-r0 do_populate_sdk: Function failed: do_populate_sdk                                                         
ERROR: Logfile of failure stored in: /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/temp/log.do_populate_sdk.49541
ERROR: Task (/home/ags/oe-core/meta/recipes-core/images/core-image-minimal.bb:do_populate_sdk) failed with exit code '1'


It really seems that using mtime only in opkg-make-index is not safe enough
in our setup. We probably should use the inode (or using the inode in addition
to the time stamp).

Is it guaranteed that the inode changes when ipk are fetched from sstate?

--
Stefan


> 
>>
>> I guess there is some race/asynchronous operation going on around
>> building index/getting package from sstate/pushing package to sstate.
>>
>> It seems an issue others have seen in the past too:
>> https://www.yoctoproject.org/irc/%23yocto.2018-07-05.log.html#t2018-07-05T10:07:25
>>
>> Any idea?
>>
>> --
>> Stefan


  reply	other threads:[~2018-10-02  8:46 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-11 22:49 Build failure with parallel build and opkg Stefan Agner
2018-09-26  9:34 ` Stefan Agner
2018-10-02  8:46   ` Stefan Agner [this message]
2018-10-02 13:12     ` Stefan Agner
2018-10-02 19:03       ` Khem Raj
2018-10-11  9:32       ` Stefan Agner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad574663589f7559ca0f32f159de471c@agner.ch \
    --to=stefan@agner.ch \
    --cc=brandon.shibley@toradex.com \
    --cc=max.oss.09@gmail.com \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=ricardo@foundries.io \
    --cc=richard.purdie@linuxfoundation.org \
    --cc=samuel.bissig@toradex.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.