From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.kmu-office.ch (mail.kmu-office.ch [178.209.48.109]) by mail.openembedded.org (Postfix) with ESMTP id AB4FE6E5E9 for ; Tue, 2 Oct 2018 13:12:55 +0000 (UTC) Received: from webmail.kmu-office.ch (unknown [IPv6:2a02:418:6a02::a3]) by mail.kmu-office.ch (Postfix) with ESMTPSA id 840BB5C005C; Tue, 2 Oct 2018 15:12:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=agner.ch; s=dkim; t=1538485975; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=spfgM0v8Rc90eONuUH8Wiz0NvWM/CFEE6ve+S3KJx9s=; b=nofLOkCuB8t6y0y1m21GXbTPYaiErh7fEx3zthX12+oFB3gQymuGOUE7D+UE5ika1R5+WN MMwt0aUoenTsMrdyCy8wKLUVaoMUMRGM9ySf/Huciiz55rDClRyuZz1Nxgk7yPzBahG2Hj a1xAE69nC5E4FhzYV4j3qSC1VAGtcHs= MIME-Version: 1.0 Date: Tue, 02 Oct 2018 15:12:55 +0200 From: Stefan Agner To: openembedded-core@lists.openembedded.org, richard.purdie@linuxfoundation.org, Max Krummenacher In-Reply-To: References: Message-ID: X-Sender: stefan@agner.ch User-Agent: Roundcube Webmail/1.3.4 Cc: Brandon Shibley , samuel.bissig@toradex.com, ricardo@foundries.io Subject: Re: Build failure with parallel build and opkg X-BeenThere: openembedded-core@lists.openembedded.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: Patches and discussions about the oe-core layer List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2018 13:12:56 -0000 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On 02.10.2018 10:46, Stefan Agner wrote: > On 26.09.2018 11:34, Stefan Agner wrote: >> Hi, >> >> On 12.09.2018 00:49, Stefan Agner wrote: > >> We discussed the issue at Linaro Connect a bit. >> >> To recap, we do build in two steps: >> >> 1. bitbake full-container-image >> 2. bitbake -c populate_sdk full-container-image >> >> The issue always happens in the second step. >> >> We also see that in the second step, the do_package_write_ipk_setscene >> task for every recipe is executed. >> >> The current assumption is >> >> I tried to reproduce by building a recipe using openembedded-core master >> only in two build directories with shared sstate manually: >> >> 1. build1 $ bitbake eudev >> 2. build2 $ bitbake -c cleansstate eudev >> 3. build2 $ bitbake eudev >> 4. build1 $ bitbake core-image-minimal >> >> This sequence seems not to have triggered a >> do_package_write_ipk_setscene for eudev. >> >> I then tried >> 5. build1 $ bitbake -c populate_sdk core-image-minimal >> >> Which did trigger a do_package_write_ipk_setscene. However, the issue >> did not appear... >> >> I even tried to rebuild and replace the file manually, and run bitbake >> -c populate_sdk -f core-image-minimal, but it just seems not to appear. >> >> Last time I have seen it was with oe-core >> f6634581fa0a81c4d68dc9179a755ad7b9d99357, I will revert to this version >> again to see whether that helps reproducing the issue. > > Using the older OE version did not make a difference. > > > So Max and I discussed a bit further. We realized that when OE rebuilds, > the opkg package index is refreshed for a package only if the mtime (in > seconds) is different between the previous file and the file on disk > currently (see opkg-make-index). Can it be that two simultaneously > started builds create two ipk with same mtime? > > So I built two core-image-minimal builds (on the same machine) at the > *very* same time. First try did not cause a collision, but already on > the second try I managed to reproduce multiple collision. This output > shows all the packages with the same unix mtime stamp according to the > Packages.stamps file (generated by opkg-make-index): > > $ comm -1 -2 <(sort build1/tmp-glibc/deploy/ipk/i586/Packages.stamps) <(sort build2/tmp-glibc/deploy/ipk/i586/Packages.stamps) > kages.stamps) > 1538424250 glibc-binary-localedata-de-it_2.27-r0_i586.ipk > 1538425082 libreadline-staticdev_7.0-r0_i586.ipk > 1538425162 bash-bashbug_4.4.18-r0_i586.ipk > 1538425162 bash-completion_2.8-r0_i586.ipk > 1538425162 bash-completion-dbg_2.8-r0_i586.ipk > 1538425162 bash-completion-dev_2.8-r0_i586.ipk > 1538425162 bash-loadable_4.4.18-r0_i586.ipk > 1538425163 bash_4.4.18-r0_i586.ipk > 1538425163 bash-completion-extra_2.8-r0_i586.ipk > 1538425164 bash-doc_4.4.18-r0_i586.ipk > 1538425167 bash-dbg_4.4.18-r0_i586.ipk > 1538425233 util-linux-locale-hr_2.32-r0_i586.ipk > 1538425233 util-linux-locale-id_2.32-r0_i586.ipk > 1538425233 util-linux-locale-it_2.32-r0_i586.ipk > 1538425233 util-linux-locale-sl_2.32-r0_i586.ipk > 1538425233 util-linux-locale-zh-tw_2.32-r0_i586.ipk > 1538425234 util-linux-dev_2.32-r0_i586.ipk > 1538425234 util-linux-findfs_2.32-r0_i586.ipk > 1538425234 util-linux-ionice_2.32-r0_i586.ipk > 1538425234 util-linux-locale-ca_2.32-r0_i586.ipk > 1538425234 util-linux-locale-pl_2.32-r0_i586.ipk > 1538425234 util-linux-locale-pt-br_2.32-r0_i586.ipk > 1538425234 util-linux-locale-ru_2.32-r0_i586.ipk > 1538425234 util-linux-locale-sv_2.32-r0_i586.ipk > 1538425234 util-linux-locale-tr_2.32-r0_i586.ipk > 1538425234 util-linux-locale-vi_2.32-r0_i586.ipk > 1538425234 util-linux-locale-zh-cn_2.32-r0_i586.ipk > 1538425234 util-linux-mountpoint_2.32-r0_i586.ipk > 1538425235 util-linux-locale-fr_2.32-r0_i586.ipk > 1538425236 util-linux-blkid_2.32-r0_i586.ipk > 1538425236 util-linux-fsck_2.32-r0_i586.ipk > 1538425236 util-linux-fsck.cramfs_2.32-r0_i586.ipk > 1538425236 util-linux-fstrim_2.32-r0_i586.ipk > 1538425236 util-linux-lscpu_2.32-r0_i586.ipk > 1538425236 util-linux-mkfs_2.32-r0_i586.ipk > 1538425236 util-linux-mount_2.32-r0_i586.ipk > 1538425236 util-linux-partx_2.32-r0_i586.ipk > 1538425236 util-linux-readprofile_2.32-r0_i586.ipk > 1538425236 util-linux-umount_2.32-r0_i586.ipk > 1538425237 libblkid1_2.32-r0_i586.ipk > 1538425237 libfdisk1_2.32-r0_i586.ipk > 1538425237 libmount1_2.32-r0_i586.ipk > 1538425238 util-linux_2.32-r0_i586.ipk > 1538425240 util-linux-staticdev_2.32-r0_i586.ipk > > Indeed, the two ipk files have the same mtime (at least, on second level): > > $ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk > File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk > Size: 136612 Blocks: 272 IO Block: 4096 regular file > Device: fd00h/64768d Inode: 2627769381 Links: 2 > Access: (0644/-rw-r--r--) Uid: ( 1002/ ags) Gid: ( 1002/ ags) > Context: unconfined_u:object_r:home_root_t:s0 > Access: 2018-10-01 22:28:37.464716076 +0200 > Modify: 2018-10-01 22:20:37.930653796 +0200 > Change: 2018-10-01 22:20:50.998205339 +0200 > Birth: - > File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk > Size: 136692 Blocks: 272 IO Block: 4096 regular file > Device: fd00h/64768d Inode: 2608876057 Links: 2 > Access: (0644/-rw-r--r--) Uid: ( 1002/ ags) Gid: ( 1002/ ags) > Context: unconfined_u:object_r:home_root_t:s0 > Access: 2018-10-01 22:28:47.145119532 +0200 > Modify: 2018-10-01 22:20:37.358629653 +0200 > Change: 2018-10-01 22:20:50.727193901 +0200 > Birth: - > > But different md5sums! > > $ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk > bb8ffd4443e84fede943ccc3292dc657 ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk > 91608ae2b109e3f4af0379c634fad9c0 ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk > > In that case, libblkid1_2.32-r0_i586.ipk of build1 managed to make it into the sstate. > > On our production system, the SDK builds called later always fetch the packages > from sstate again (do_package_write_ipk_setscene). For some reason that did not > happen in my synthetic test. However, I simulated the fetch from sstate by > removing libblkid1_2.32-r0_i586.ipk (part of util-linux) from build2: > > build2 $ bitbake -c clean util-linux > > And rebuild the image: > > build2 $ bitbake -f -c populate_sdk core-image-minimal > Note that after this command, the packages in the feed looks like this: $ stat ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk File: ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk Size: 136612 Blocks: 272 IO Block: 4096 regular file Device: fd00h/64768d Inode: 2627769381 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 1002/ ags) Gid: ( 1002/ ags) Context: unconfined_u:object_r:home_root_t:s0 Access: 2018-10-01 22:28:37.464716076 +0200 Modify: 2018-10-01 22:20:37.930653796 +0200 Change: 2018-10-01 22:20:50.998205339 +0200 Birth: - File: ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk Size: 136612 Blocks: 272 IO Block: 4096 regular file Device: fd00h/64768d Inode: 2315301423 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 1002/ ags) Gid: ( 1002/ ags) Context: unconfined_u:object_r:home_root_t:s0 Access: 2018-10-02 09:27:07.750888458 +0200 Modify: 2018-10-01 22:20:37.000000000 +0200 Change: 2018-10-02 09:26:47.308004397 +0200 Birth: - Note that the package in build2 the mtime has no nanosecond accuracy. Presumably the tarball from sstate does not preserve sub second timestamps. $ md5sum ./build*/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk bb8ffd4443e84fede943ccc3292dc657 ./build1/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk bb8ffd4443e84fede943ccc3292dc657 ./build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk Will send out a patch which uses inodes instead of mtime in opkg-make-index. -- Stefan > Which indeed showed the issues we are seeing in production: > ... > Downloading file:/home/ags/oe-core/build2/tmp-glibc/deploy/ipk/i586/libblkid1_2.32-r0_i586.ipk. > Removing corrupt package file /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/sdk/image/usr/local/oecore-x86_64/sysroots/i586-oe-linux//var/cache/opkg/volatile/edad620216e689cf8778354f4ac2ff7e_libblkid1_2.32-r0_i586.ipk. > ... > Collected errors: > * opkg_install_pkg: Failed to download libblkid1. Perhaps you need to run 'opkg update'? > ... > ERROR: core-image-minimal-1.0-r0 do_populate_sdk: Function failed: do_populate_sdk > ERROR: Logfile of failure stored in: /home/ags/oe-core/build2/tmp-glibc/work/qemux86-oe-linux/core-image-minimal/1.0-r0/temp/log.do_populate_sdk.49541 > ERROR: Task (/home/ags/oe-core/meta/recipes-core/images/core-image-minimal.bb:do_populate_sdk) failed with exit code '1' > > > It really seems that using mtime only in opkg-make-index is not safe enough > in our setup. We probably should use the inode (or using the inode in addition > to the time stamp). > > Is it guaranteed that the inode changes when ipk are fetched from sstate? > > -- > Stefan > > >> >>> >>> I guess there is some race/asynchronous operation going on around >>> building index/getting package from sstate/pushing package to sstate. >>> >>> It seems an issue others have seen in the past too: >>> https://www.yoctoproject.org/irc/%23yocto.2018-07-05.log.html#t2018-07-05T10:07:25 >>> >>> Any idea? >>> >>> -- >>> Stefan