All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Avoid build failures due to setscene errors
@ 2017-08-29 20:00 Peter Kjellerstedt
  2017-08-29 20:00 ` [PATCH 1/2] bitbake: fetch2: Allow Fetch.download() to warn instead of error Peter Kjellerstedt
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Peter Kjellerstedt @ 2017-08-29 20:00 UTC (permalink / raw)
  To: openembedded-core

Occasionally, we see errors on our autobuilders where a setscene task
fails to retrieve a file from our global sstate cache. It typically
looks something like this:

WARNING: zip-3.0-r2 do_populate_sysroot_setscene: Failed to fetch URL
file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz, attempting
MIRRORS if available
ERROR: zip-3.0-r2 do_populate_sysroot_setscene: Fetcher failure:
Unable to find file
file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz anywhere. The
paths that were searched were:
    /home/pkj/.openembedded/sstate-cache
ERROR: zip-3.0-r2 do_populate_sysroot_setscene: No suitable staging
package found
WARNING: Setscene task
(meta/recipes-extended/zip/zip_3.0.bb:do_populate_sysroot_setscene)
failed with exit code '1' - real task will be run instead

As the last warning indicates, the build will proceed and the real
task will run and the build will eventually complete. However, due to
the two errors above, bitbake will return with an error code which
causes the autobuilder to treat the build as failed and it proceeds to
throw everything it built away.

Since this is quite pointless and causes unnecessary build resources
to be spent and grief from the developers, the two patches in this
change set turn the errors from setscene tasks into warnings.

//Peter

The following changes since commit bc2e0b2e9b95707d96c840dade12b00e1450ecc3:

  libsdl: Move PACKAGECONFIG options from meta-mingw (2017-08-29 12:23:10 +0100)

are available in the git repository at:

  git://git.yoctoproject.org/poky-contrib pkj/setscene-errors
  http://git.yoctoproject.org/cgit.cgi/poky-contrib/log/?h=pkj/setscene-errors

Peter Kjellerstedt (2):
  bitbake: fetch2: Allow Fetch.download() to warn instead of error
  sstate.bbclass: Do not cause build failures due to setscene errors

 bitbake/lib/bb/fetch2/__init__.py | 20 +++++++++++++++-----
 meta/classes/sstate.bbclass       |  5 +++--
 2 files changed, 18 insertions(+), 7 deletions(-)

-- 
2.12.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] bitbake: fetch2: Allow Fetch.download() to warn instead of error
  2017-08-29 20:00 [PATCH 0/2] Avoid build failures due to setscene errors Peter Kjellerstedt
@ 2017-08-29 20:00 ` Peter Kjellerstedt
  2017-08-29 20:00 ` [PATCH 2/2] sstate.bbclass: Do not cause build failures due to setscene errors Peter Kjellerstedt
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Peter Kjellerstedt @ 2017-08-29 20:00 UTC (permalink / raw)
  To: openembedded-core

Under some situations it can be allowed for Fetch.download() to fail
to fetch a file without causing bitbake to fail. By adding
only_warn=True as argument to Fetch.download(), it will call
logger.warning() instead of logger.error() and thus not cause build
failures.

Signed-off-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
---
 bitbake/lib/bb/fetch2/__init__.py | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/bitbake/lib/bb/fetch2/__init__.py b/bitbake/lib/bb/fetch2/__init__.py
index 3eb0e4d211..58f65ada84 100644
--- a/bitbake/lib/bb/fetch2/__init__.py
+++ b/bitbake/lib/bb/fetch2/__init__.py
@@ -1608,9 +1608,10 @@ class Fetch(object):
 
         return local
 
-    def download(self, urls=None):
+    def download(self, urls=None, only_warn=False):
         """
-        Fetch all urls
+        Fetch all urls. In case only_warn is True, a failure to fetch a url
+        will only result in a warning message, rather than an error message.
         """
         if not urls:
             urls = self.urls
@@ -1688,19 +1689,28 @@ class Fetch(object):
 
                 if not localpath or ((not os.path.exists(localpath)) and localpath.find("*") == -1):
                     if firsterr:
-                        logger.error(str(firsterr))
+                        if only_warn:
+                            logger.warning(str(firsterr))
+                        else:
+                            logger.error(str(firsterr))
                     raise FetchError("Unable to fetch URL from any source.", u)
 
                 update_stamp(ud, self.d)
 
             except IOError as e:
                 if e.errno in [os.errno.ESTALE]:
-                    logger.error("Stale Error Observed %s." % u)
+                    if only_warn:
+                        logger.warning("Stale Error Observed %s." % u)
+                    else:
+                        logger.error("Stale Error Observed %s." % u)
                     raise ChecksumError("Stale Error Detected")
 
             except BBFetchException as e:
                 if isinstance(e, ChecksumError):
-                    logger.error("Checksum failure fetching %s" % u)
+                    if only_warn:
+                        logger.warning("Checksum failure fetching %s" % u)
+                    else:
+                        logger.error("Checksum failure fetching %s" % u)
                 raise
 
             finally:
-- 
2.12.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] sstate.bbclass: Do not cause build failures due to setscene errors
  2017-08-29 20:00 [PATCH 0/2] Avoid build failures due to setscene errors Peter Kjellerstedt
  2017-08-29 20:00 ` [PATCH 1/2] bitbake: fetch2: Allow Fetch.download() to warn instead of error Peter Kjellerstedt
@ 2017-08-29 20:00 ` Peter Kjellerstedt
  2017-08-29 20:04 ` ✗ patchtest: failure for Avoid " Patchwork
  2017-08-29 20:38 ` [PATCH 0/2] " Andre McCurdy
  3 siblings, 0 replies; 18+ messages in thread
From: Peter Kjellerstedt @ 2017-08-29 20:00 UTC (permalink / raw)
  To: openembedded-core

If a setscene task fails, the real task will be run instead. However,
in case the failed setscene task happened to log any errors, this will
still cause bitbake to return with an error code, even though
everything actually built ok. To avoid this, modify setscene to only
warn about errors.

Signed-off-by: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
---
 meta/classes/sstate.bbclass | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/meta/classes/sstate.bbclass b/meta/classes/sstate.bbclass
index 6af0d388bc..7d76ac141b 100644
--- a/meta/classes/sstate.bbclass
+++ b/meta/classes/sstate.bbclass
@@ -671,7 +671,7 @@ def pstaging_fetch(sstatefetch, sstatepkg, d):
         localdata.setVar('SRC_URI', srcuri)
         try:
             fetcher = bb.fetch2.Fetch([srcuri], localdata, cache=False)
-            fetcher.download()
+            fetcher.download(only_warn=True)
 
         except bb.fetch2.BBFetchException:
             break
@@ -680,7 +680,8 @@ def sstate_setscene(d):
     shared_state = sstate_state_fromvars(d)
     accelerate = sstate_installpkg(shared_state, d)
     if not accelerate:
-        bb.fatal("No suitable staging package found")
+        bb.warn("No suitable staging package found")
+        sys.exit(1)
 
 python sstate_task_prefunc () {
     shared_state = sstate_state_fromvars(d)
-- 
2.12.0



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* ✗ patchtest: failure for Avoid build failures due to setscene errors
  2017-08-29 20:00 [PATCH 0/2] Avoid build failures due to setscene errors Peter Kjellerstedt
  2017-08-29 20:00 ` [PATCH 1/2] bitbake: fetch2: Allow Fetch.download() to warn instead of error Peter Kjellerstedt
  2017-08-29 20:00 ` [PATCH 2/2] sstate.bbclass: Do not cause build failures due to setscene errors Peter Kjellerstedt
@ 2017-08-29 20:04 ` Patchwork
  2017-08-29 20:25   ` Peter Kjellerstedt
  2017-08-29 20:38 ` [PATCH 0/2] " Andre McCurdy
  3 siblings, 1 reply; 18+ messages in thread
From: Patchwork @ 2017-08-29 20:04 UTC (permalink / raw)
  To: Peter Kjellerstedt; +Cc: openembedded-core

== Series Details ==

Series: Avoid build failures due to setscene errors
Revision: 1
URL   : https://patchwork.openembedded.org/series/8575/
State : failure

== Summary ==


Thank you for submitting this patch series to OpenEmbedded Core. This is
an automated response. Several tests have been executed on the proposed
series by patchtest resulting in the following failures:



* Issue             Series does not apply on top of target branch [test_series_merge_on_head] 
  Suggested fix    Rebase your series on top of targeted branch
  Targeted branch  master (currently at 2454019844)



If you believe any of these test results are incorrect, please reply to the
mailing list (openembedded-core@lists.openembedded.org) raising your concerns.
Otherwise we would appreciate you correcting the issues and submitting a new
version of the patchset if applicable. Please ensure you add/increment the
version number when sending the new version (i.e. [PATCH] -> [PATCH v2] ->
[PATCH v3] -> ...).

---
Test framework: http://git.yoctoproject.org/cgit/cgit.cgi/patchtest
Test suite:     http://git.yoctoproject.org/cgit/cgit.cgi/patchtest-oe



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ✗ patchtest: failure for Avoid build failures due to setscene errors
  2017-08-29 20:04 ` ✗ patchtest: failure for Avoid " Patchwork
@ 2017-08-29 20:25   ` Peter Kjellerstedt
  2017-08-29 22:35     ` Philip Balister
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Kjellerstedt @ 2017-08-29 20:25 UTC (permalink / raw)
  To: openembedded-core

> -----Original Message-----
> From: Patchwork [mailto:patchwork@patchwork.openembedded.org]
> Sent: den 29 augusti 2017 22:05
> To: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> Cc: openembedded-core@lists.openembedded.org
> Subject: ✗ patchtest: failure for Avoid build failures due to setscene
> errors
> 
> == Series Details ==
> 
> Series: Avoid build failures due to setscene errors
> Revision: 1
> URL   : https://patchwork.openembedded.org/series/8575/
> State : failure
> 
> == Summary ==
> 
> 
> Thank you for submitting this patch series to OpenEmbedded Core. This is
> an automated response. Several tests have been executed on the proposed
> series by patchtest resulting in the following failures:
> 
> 
> 
> * Issue             Series does not apply on top of target branch [test_series_merge_on_head]
>   Suggested fix    Rebase your series on top of targeted branch
>   Targeted branch  master (currently at 2454019844)

Argh, why can't this handle combined bitbake and OE-Core changes, i.e., 
changes for Poky. Oh well, separate patches coming up...

> If you believe any of these test results are incorrect, please reply to the
> mailing list (openembedded-core@lists.openembedded.org) raising your concerns.
> Otherwise we would appreciate you correcting the issues and submitting a new
> version of the patchset if applicable. Please ensure you add/increment the
> version number when sending the new version (i.e. [PATCH] -> [PATCH v2] ->
> [PATCH v3] -> ...).
> 
> ---
> Test framework: http://git.yoctoproject.org/cgit/cgit.cgi/patchtest
> Test suite:     http://git.yoctoproject.org/cgit/cgit.cgi/patchtest-oe

//Peter


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Avoid build failures due to setscene errors
  2017-08-29 20:00 [PATCH 0/2] Avoid build failures due to setscene errors Peter Kjellerstedt
                   ` (2 preceding siblings ...)
  2017-08-29 20:04 ` ✗ patchtest: failure for Avoid " Patchwork
@ 2017-08-29 20:38 ` Andre McCurdy
  2017-08-29 20:59   ` Peter Kjellerstedt
  3 siblings, 1 reply; 18+ messages in thread
From: Andre McCurdy @ 2017-08-29 20:38 UTC (permalink / raw)
  To: Peter Kjellerstedt; +Cc: OE Core mailing list

On Tue, Aug 29, 2017 at 1:00 PM, Peter Kjellerstedt
<peter.kjellerstedt@axis.com> wrote:
> Occasionally, we see errors on our autobuilders where a setscene task
> fails to retrieve a file from our global sstate cache. It typically
> looks something like this:
>
> WARNING: zip-3.0-r2 do_populate_sysroot_setscene: Failed to fetch URL
> file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
> downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz, attempting
> MIRRORS if available
> ERROR: zip-3.0-r2 do_populate_sysroot_setscene: Fetcher failure:
> Unable to find file
> file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
> downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz anywhere. The
> paths that were searched were:
>     /home/pkj/.openembedded/sstate-cache

To trigger this, do you have SSTATE_MIRRORS pointing to
"/home/pkj/.openembedded/sstate-cache" and SSTATE_DIR pointed
somewhere else? Or are they both pointing to the same local directory?
Or something else?

> ERROR: zip-3.0-r2 do_populate_sysroot_setscene: No suitable staging
> package found
> WARNING: Setscene task
> (meta/recipes-extended/zip/zip_3.0.bb:do_populate_sysroot_setscene)
> failed with exit code '1' - real task will be run instead
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Avoid build failures due to setscene errors
  2017-08-29 20:38 ` [PATCH 0/2] " Andre McCurdy
@ 2017-08-29 20:59   ` Peter Kjellerstedt
  2017-08-29 21:49     ` Richard Purdie
  2017-08-29 22:03     ` Andre McCurdy
  0 siblings, 2 replies; 18+ messages in thread
From: Peter Kjellerstedt @ 2017-08-29 20:59 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: OE Core mailing list

> -----Original Message-----
> From: Andre McCurdy [mailto:armccurdy@gmail.com]
> Sent: den 29 augusti 2017 22:38
> To: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> Cc: OE Core mailing list <openembedded-core@lists.openembedded.org>
> Subject: Re: [OE-core] [PATCH 0/2] Avoid build failures due to setscene
> errors
> 
> On Tue, Aug 29, 2017 at 1:00 PM, Peter Kjellerstedt
> <peter.kjellerstedt@axis.com> wrote:
> > Occasionally, we see errors on our autobuilders where a setscene task
> > fails to retrieve a file from our global sstate cache. It typically
> > looks something like this:
> >
> > WARNING: zip-3.0-r2 do_populate_sysroot_setscene: Failed to fetch URL
> > file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
> > downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-
> 64:3:\
> > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz, attempting
> > MIRRORS if available
> > ERROR: zip-3.0-r2 do_populate_sysroot_setscene: Fetcher failure:
> > Unable to find file
> > file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
> > downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-
> 64:3:\
> > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz anywhere. The
> > paths that were searched were:
> >     /home/pkj/.openembedded/sstate-cache
> 
> To trigger this, do you have SSTATE_MIRRORS pointing to
> "/home/pkj/.openembedded/sstate-cache" and SSTATE_DIR pointed
> somewhere else? Or are they both pointing to the same local directory?
> Or something else?

No, the directory above is actually what is in SSTATE_DIR. 
SSTATE_MIRRORS is set to:

SSTATE_MIRRORS ?= "\
file://.* file:///n/oe/sstate-cache/PATH;downloadfilename=PATH"

where /n/oe is an NFS mount where we share a global sstate cache. 

The only way I have figured out to manually simulate the problem is 
by modifying the code in sstate_checkhashes() in sstate.bbclass and 
commenting out the call to fetcher.checkstatus(). Then as long as 
there actually is no sstate files for the task in either the global 
or the local sstate cache, I will get the above. 

I do not know what triggers it on the autobuilder though. My guess is 
that somehow the sstate tgz file disappears between the call to 
sstate_checkhashes() and when bitbake actually tries to download the 
file. 

We do have a daily job that cleans up the global sstate cache and 
removes files that have not been accessed in the last ten days, but 
it seems unlikely that it should remove a file that just happens to 
be required again, and do it at exactly the time when that task is 
building.

> > ERROR: zip-3.0-r2 do_populate_sysroot_setscene: No suitable staging
> > package found
> > WARNING: Setscene task
> > (meta/recipes-extended/zip/zip_3.0.bb:do_populate_sysroot_setscene)
> > failed with exit code '1' - real task will be run instead

//Peter


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Avoid build failures due to setscene errors
  2017-08-29 20:59   ` Peter Kjellerstedt
@ 2017-08-29 21:49     ` Richard Purdie
  2017-08-30  6:44       ` Peter Kjellerstedt
  2017-08-29 22:03     ` Andre McCurdy
  1 sibling, 1 reply; 18+ messages in thread
From: Richard Purdie @ 2017-08-29 21:49 UTC (permalink / raw)
  To: Peter Kjellerstedt, Andre McCurdy; +Cc: OE Core mailing list

On Tue, 2017-08-29 at 20:59 +0000, Peter Kjellerstedt wrote:
> > -----Original Message-----
> > From: Andre McCurdy [mailto:armccurdy@gmail.com]
> > Sent: den 29 augusti 2017 22:38
> > To: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> > Cc: OE Core mailing list <openembedded-core@lists.openembedded.org>
> > Subject: Re: [OE-core] [PATCH 0/2] Avoid build failures due to
> > setscene
> > errors
> > 
> > On Tue, Aug 29, 2017 at 1:00 PM, Peter Kjellerstedt
> > <peter.kjellerstedt@axis.com> wrote:
> > > 
> > > Occasionally, we see errors on our autobuilders where a setscene
> > > task
> > > fails to retrieve a file from our global sstate cache. It
> > > typically
> > > looks something like this:
> > > 
> > > WARNING: zip-3.0-r2 do_populate_sysroot_setscene: Failed to fetch
> > > URL
> > > file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
> > > downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-
> > 64:3:\
> > > 
> > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz, attempting
> > > MIRRORS if available
> > > ERROR: zip-3.0-r2 do_populate_sysroot_setscene: Fetcher failure:
> > > Unable to find file
> > > file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
> > > downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-
> > 64:3:\
> > > 
> > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz anywhere.
> > > The
> > > paths that were searched were:
> > >     /home/pkj/.openembedded/sstate-cache
> > To trigger this, do you have SSTATE_MIRRORS pointing to
> > "/home/pkj/.openembedded/sstate-cache" and SSTATE_DIR pointed
> > somewhere else? Or are they both pointing to the same local
> > directory?
> > Or something else?
> No, the directory above is actually what is in SSTATE_DIR. 
> SSTATE_MIRRORS is set to:
> 
> SSTATE_MIRRORS ?= "\
> file://.* file:///n/oe/sstate-cache/PATH;downloadfilename=PATH"
> 
> where /n/oe is an NFS mount where we share a global sstate cache. 
> 
> The only way I have figured out to manually simulate the problem is 
> by modifying the code in sstate_checkhashes() in sstate.bbclass and 
> commenting out the call to fetcher.checkstatus(). Then as long as 
> there actually is no sstate files for the task in either the global 
> or the local sstate cache, I will get the above. 
> 
> I do not know what triggers it on the autobuilder though. My guess
> is 
> that somehow the sstate tgz file disappears between the call to 
> sstate_checkhashes() and when bitbake actually tries to download the 
> file. 
> 
> We do have a daily job that cleans up the global sstate cache and 
> removes files that have not been accessed in the last ten days, but 
> it seems unlikely that it should remove a file that just happens to 
> be required again, and do it at exactly the time when that task is 
> building.

I have left this code as an error deliberately as this kind of thing
should not happen and if it does, there is really something wrong which
you need to figure out. It means that at one point bitbake thinks the
sstate is present and valid, then later it isn't.

I'm not convinced patching out the errors is the right solution here...

Cheers,

Richard


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Avoid build failures due to setscene errors
  2017-08-29 20:59   ` Peter Kjellerstedt
  2017-08-29 21:49     ` Richard Purdie
@ 2017-08-29 22:03     ` Andre McCurdy
  2017-08-30  9:55       ` Peter Kjellerstedt
  1 sibling, 1 reply; 18+ messages in thread
From: Andre McCurdy @ 2017-08-29 22:03 UTC (permalink / raw)
  To: Peter Kjellerstedt; +Cc: OE Core mailing list

On Tue, Aug 29, 2017 at 1:59 PM, Peter Kjellerstedt
<peter.kjellerstedt@axis.com> wrote:
> We do have a daily job that cleans up the global sstate cache and
> removes files that have not been accessed in the last ten days, but
> it seems unlikely that it should remove a file that just happens to
> be required again, and do it at exactly the time when that task is
> building.

I guess you've already confirmed that accessing the sstate files over
NFS does actually modify atime on the server (and that the filesystem
on the server really does have atime support enabled, e.g. mounted
with strictatime rather than relatime etc)?

If access time isn't being determined reliably and sstate files are
being removed 10 days after being created then that might make the
race a little more likely to trigger.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ✗ patchtest: failure for Avoid build failures due to setscene errors
  2017-08-29 20:25   ` Peter Kjellerstedt
@ 2017-08-29 22:35     ` Philip Balister
  2017-08-30  7:41       ` Peter Kjellerstedt
  0 siblings, 1 reply; 18+ messages in thread
From: Philip Balister @ 2017-08-29 22:35 UTC (permalink / raw)
  To: Peter Kjellerstedt, openembedded-core

On 08/29/2017 04:25 PM, Peter Kjellerstedt wrote:
>> -----Original Message-----
>> From: Patchwork [mailto:patchwork@patchwork.openembedded.org]
>> Sent: den 29 augusti 2017 22:05
>> To: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
>> Cc: openembedded-core@lists.openembedded.org
>> Subject: ✗ patchtest: failure for Avoid build failures due to setscene
>> errors
>>
>> == Series Details ==
>>
>> Series: Avoid build failures due to setscene errors
>> Revision: 1
>> URL   : https://patchwork.openembedded.org/series/8575/
>> State : failure
>>
>> == Summary ==
>>
>>
>> Thank you for submitting this patch series to OpenEmbedded Core. This is
>> an automated response. Several tests have been executed on the proposed
>> series by patchtest resulting in the following failures:
>>
>>
>>
>> * Issue             Series does not apply on top of target branch [test_series_merge_on_head]
>>   Suggested fix    Rebase your series on top of targeted branch
>>   Targeted branch  master (currently at 2454019844)
> 
> Argh, why can't this handle combined bitbake and OE-Core changes, i.e., 
> changes for Poky. Oh well, separate patches coming up...

Because poky isn't the upstream project.

Philip

> 
>> If you believe any of these test results are incorrect, please reply to the
>> mailing list (openembedded-core@lists.openembedded.org) raising your concerns.
>> Otherwise we would appreciate you correcting the issues and submitting a new
>> version of the patchset if applicable. Please ensure you add/increment the
>> version number when sending the new version (i.e. [PATCH] -> [PATCH v2] ->
>> [PATCH v3] -> ...).
>>
>> ---
>> Test framework: http://git.yoctoproject.org/cgit/cgit.cgi/patchtest
>> Test suite:     http://git.yoctoproject.org/cgit/cgit.cgi/patchtest-oe
> 
> //Peter
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Avoid build failures due to setscene errors
  2017-08-29 21:49     ` Richard Purdie
@ 2017-08-30  6:44       ` Peter Kjellerstedt
  2017-08-30  7:54         ` Martin Jansa
  2017-08-30  8:02         ` Richard Purdie
  0 siblings, 2 replies; 18+ messages in thread
From: Peter Kjellerstedt @ 2017-08-30  6:44 UTC (permalink / raw)
  To: Richard Purdie, Andre McCurdy; +Cc: OE Core mailing list

> -----Original Message-----
> From: openembedded-core-bounces@lists.openembedded.org
> [mailto:openembedded-core-bounces@lists.openembedded.org] On Behalf Of
> Richard Purdie
> Sent: den 29 augusti 2017 23:50
> To: Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Andre McCurdy
> <armccurdy@gmail.com>
> Cc: OE Core mailing list <openembedded-core@lists.openembedded.org>
> Subject: Re: [OE-core] [PATCH 0/2] Avoid build failures due to setscene
> errors
> 
> On Tue, 2017-08-29 at 20:59 +0000, Peter Kjellerstedt wrote:
> > > -----Original Message-----
> > > From: Andre McCurdy [mailto:armccurdy@gmail.com]
> > > Sent: den 29 augusti 2017 22:38
> > > To: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> > > Cc: OE Core mailing list <openembedded-core@lists.openembedded.org>
> > > Subject: Re: [OE-core] [PATCH 0/2] Avoid build failures due to
> > > setscene
> > > errors
> > >
> > > On Tue, Aug 29, 2017 at 1:00 PM, Peter Kjellerstedt
> > > <peter.kjellerstedt@axis.com> wrote:
> > > >
> > > > Occasionally, we see errors on our autobuilders where a setscene
> > > > task
> > > > fails to retrieve a file from our global sstate cache. It
> > > > typically
> > > > looks something like this:
> > > >
> > > > WARNING: zip-3.0-r2 do_populate_sysroot_setscene: Failed to fetch
> > > > URL
> > > > file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> > > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
> > > > downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-
> > > 64:3:\
> > > >
> > > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz, attempting
> > > > MIRRORS if available
> > > > ERROR: zip-3.0-r2 do_populate_sysroot_setscene: Fetcher failure:
> > > > Unable to find file
> > > > file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> > > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
> > > > downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-
> > > 64:3:\
> > > >
> > > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz anywhere.
> > > > The
> > > > paths that were searched were:
> > > >     /home/pkj/.openembedded/sstate-cache
> > > To trigger this, do you have SSTATE_MIRRORS pointing to
> > > "/home/pkj/.openembedded/sstate-cache" and SSTATE_DIR pointed
> > > somewhere else? Or are they both pointing to the same local
> > > directory?
> > > Or something else?
> > No, the directory above is actually what is in SSTATE_DIR.
> > SSTATE_MIRRORS is set to:
> >
> > SSTATE_MIRRORS ?= "\
> > file://.* file:///n/oe/sstate-cache/PATH;downloadfilename=PATH"
> >
> > where /n/oe is an NFS mount where we share a global sstate cache.
> >
> > The only way I have figured out to manually simulate the problem is
> > by modifying the code in sstate_checkhashes() in sstate.bbclass and
> > commenting out the call to fetcher.checkstatus(). Then as long as
> > there actually is no sstate files for the task in either the global
> > or the local sstate cache, I will get the above.
> >
> > I do not know what triggers it on the autobuilder though. My guess
> > is
> > that somehow the sstate tgz file disappears between the call to
> > sstate_checkhashes() and when bitbake actually tries to download the
> > file.
> >
> > We do have a daily job that cleans up the global sstate cache and
> > removes files that have not been accessed in the last ten days, but
> > it seems unlikely that it should remove a file that just happens to
> > be required again, and do it at exactly the time when that task is
> > building.
> 
> I have left this code as an error deliberately as this kind of thing
> should not happen and if it does, there is really something wrong which
> you need to figure out. It means that at one point bitbake thinks the
> sstate is present and valid, then later it isn't.

True, but since the operations of checking if an sstate file exists and 
retrieving it is not an atomic operation, there are always problems that 
can occur. Some may be fixable, some may not. However, using a build 
failure to detect these kind of problems is a bit harsh on the developers 
who only sees their builds complete only to get an error for something 
that is not their fault. We have better ways to detect these kinds of 
problems, e.g., through log monitoring, without having to cause 
unnecessary grief amongst the developers.

> I'm not convinced patching out the errors is the right solution here...

How about I make it conditional by adding an IGNORE_SETSCENE_ERRORS? 
That way it can default to "0", but we can set it to "1" to prioritize 
the production builds.

> Cheers,
> 
> Richard

//Peter


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ✗ patchtest: failure for Avoid build failures due to setscene errors
  2017-08-29 22:35     ` Philip Balister
@ 2017-08-30  7:41       ` Peter Kjellerstedt
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Kjellerstedt @ 2017-08-30  7:41 UTC (permalink / raw)
  To: Philip Balister, openembedded-core

> -----Original Message-----
> From: openembedded-core-bounces@lists.openembedded.org
> [mailto:openembedded-core-bounces@lists.openembedded.org] On Behalf Of
> Philip Balister
> Sent: den 30 augusti 2017 00:36
> To: Peter Kjellerstedt <peter.kjellerstedt@axis.com>; openembedded-
> core@lists.openembedded.org
> Subject: Re: [OE-core] ✗ patchtest: failure for Avoid build failures
> due to setscene errors
> 
> On 08/29/2017 04:25 PM, Peter Kjellerstedt wrote:
> >> -----Original Message-----
> >> From: Patchwork [mailto:patchwork@patchwork.openembedded.org]
> >> Sent: den 29 augusti 2017 22:05
> >> To: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> >> Cc: openembedded-core@lists.openembedded.org
> >> Subject: ✗ patchtest: failure for Avoid build failures due to
> setscene
> >> errors
> >>
> >> == Series Details ==
> >>
> >> Series: Avoid build failures due to setscene errors
> >> Revision: 1
> >> URL   : https://patchwork.openembedded.org/series/8575/
> >> State : failure
> >>
> >> == Summary ==
> >>
> >>
> >> Thank you for submitting this patch series to OpenEmbedded Core.
> This is
> >> an automated response. Several tests have been executed on the
> proposed
> >> series by patchtest resulting in the following failures:
> >>
> >>
> >>
> >> * Issue             Series does not apply on top of target branch
> [test_series_merge_on_head]
> >>   Suggested fix    Rebase your series on top of targeted branch
> >>   Targeted branch  master (currently at 2454019844)

Actually, would it be possible to get a better error message that 
indicates that one has mixed in patches for other projects that are 
part of Poky? When working with Poky as the basis, differentiating 
between, e.g., bitbake and OE-Core is not something that comes 
natural. I actually had to think both once and twice before I 
realized that one of my patches was actually for bitbake (and just 
barely stopped me in time from sending an irritated mail about why 
patchtest wasn't accepting my changes).

> > Argh, why can't this handle combined bitbake and OE-Core changes,
> > i.e., changes for Poky. Oh well, separate patches coming up...
> 
> Because poky isn't the upstream project.
> 
> Philip

Well, I know that. However, I doubt we are the only ones who use Poky 
as the base for our distribution. Thus Poky is our de facto upstream. 
So when working on a change that affects both BitBake, OE-Core and 
maybe even the OE documentation (none of which is uncommon), having 
to split the changes in multiple stacks and keeping track of them 
together over multiple projects is not very encouraging, especially 
when Poky is distributed as a unit.

I am not asking for this to change, but it would have been nice to be 
able to treat Poky as an upstream and to deliver changes that span the 
individual projects as one set of changes against Poky.

//Peter


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Avoid build failures due to setscene errors
  2017-08-30  6:44       ` Peter Kjellerstedt
@ 2017-08-30  7:54         ` Martin Jansa
  2019-11-29 16:48           ` Martin Jansa
  2017-08-30  8:02         ` Richard Purdie
  1 sibling, 1 reply; 18+ messages in thread
From: Martin Jansa @ 2017-08-30  7:54 UTC (permalink / raw)
  To: Peter Kjellerstedt; +Cc: OE Core mailing list

[-- Attachment #1: Type: text/plain, Size: 6965 bytes --]

I agree with this patchset and it would be OK with IGNORE_SETSCENE_ERRORS
conditional as well.

We're also sometimes seeing these errors, sometime anticipated when
cleaning shared sstate-cache on NFS server sometimes unexpected when NFS or
network goes down for a minute and for some builds it happens between
sstate_checkhashes()  and using the sstate.

We normally stop all jenkins builds, until the cleanup is complete (there
is jenkins job doing the cleanup, so it puts jenkins into stop mode, waits
for all current jobs to finish which can take hours, then performs the
cleanup and cancels the stop mode), but we cannot stop hundreds of
developers using the same sstate-cache in local builds (especially when we
cannot really know when exactly the job will have free jenkins to perform
the cleanup) - luckily in local builds it doesn't hurt so bad, because the
developers are more likely to ignore the error as long as the image was
created, but in jenkins builds when bitbake returns error we cannot easily
distinguish this case of "RP is intentionally warning us that something
went wrong with sstate, but everything was built correctly in the end" and
"something failed in the build and we weren't able to recover from that,
maybe even the image wasn't created" - so we don't trigger the follow up
actions like announcing new official builds or parsing release notes or
automated testing.

Yes we could add more logic to these CI jobs, to grep the logs to decide if
this error was the only one which caused the bitbake to return error code
and ignore the returned error in such case, but simple variable is easier
to maintain (even for the cost of forking bitbake and oe-core) and will
work for local builds as well.

Regards,

On Wed, Aug 30, 2017 at 8:44 AM, Peter Kjellerstedt <
peter.kjellerstedt@axis.com> wrote:

> > -----Original Message-----
> > From: openembedded-core-bounces@lists.openembedded.org
> > [mailto:openembedded-core-bounces@lists.openembedded.org] On Behalf Of
> > Richard Purdie
> > Sent: den 29 augusti 2017 23:50
> > To: Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Andre McCurdy
> > <armccurdy@gmail.com>
> > Cc: OE Core mailing list <openembedded-core@lists.openembedded.org>
> > Subject: Re: [OE-core] [PATCH 0/2] Avoid build failures due to setscene
> > errors
> >
> > On Tue, 2017-08-29 at 20:59 +0000, Peter Kjellerstedt wrote:
> > > > -----Original Message-----
> > > > From: Andre McCurdy [mailto:armccurdy@gmail.com]
> > > > Sent: den 29 augusti 2017 22:38
> > > > To: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> > > > Cc: OE Core mailing list <openembedded-core@lists.openembedded.org>
> > > > Subject: Re: [OE-core] [PATCH 0/2] Avoid build failures due to
> > > > setscene
> > > > errors
> > > >
> > > > On Tue, Aug 29, 2017 at 1:00 PM, Peter Kjellerstedt
> > > > <peter.kjellerstedt@axis.com> wrote:
> > > > >
> > > > > Occasionally, we see errors on our autobuilders where a setscene
> > > > > task
> > > > > fails to retrieve a file from our global sstate cache. It
> > > > > typically
> > > > > looks something like this:
> > > > >
> > > > > WARNING: zip-3.0-r2 do_populate_sysroot_setscene: Failed to fetch
> > > > > URL
> > > > > file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> > > > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
> > > > > downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-
> > > > 64:3:\
> > > > >
> > > > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz, attempting
> > > > > MIRRORS if available
> > > > > ERROR: zip-3.0-r2 do_populate_sysroot_setscene: Fetcher failure:
> > > > > Unable to find file
> > > > > file://66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-64:3:\
> > > > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz;\
> > > > > downloadfilename=66/sstate:zip:core2-64-poky-linux:3.0:r2:core2-
> > > > 64:3:\
> > > > >
> > > > > 66832b8c4e7babe0eac9d9579d1e2b6a_populate_sysroot.tgz anywhere.
> > > > > The
> > > > > paths that were searched were:
> > > > >     /home/pkj/.openembedded/sstate-cache
> > > > To trigger this, do you have SSTATE_MIRRORS pointing to
> > > > "/home/pkj/.openembedded/sstate-cache" and SSTATE_DIR pointed
> > > > somewhere else? Or are they both pointing to the same local
> > > > directory?
> > > > Or something else?
> > > No, the directory above is actually what is in SSTATE_DIR.
> > > SSTATE_MIRRORS is set to:
> > >
> > > SSTATE_MIRRORS ?= "\
> > > file://.* file:///n/oe/sstate-cache/PATH;downloadfilename=PATH"
> > >
> > > where /n/oe is an NFS mount where we share a global sstate cache.
> > >
> > > The only way I have figured out to manually simulate the problem is
> > > by modifying the code in sstate_checkhashes() in sstate.bbclass and
> > > commenting out the call to fetcher.checkstatus(). Then as long as
> > > there actually is no sstate files for the task in either the global
> > > or the local sstate cache, I will get the above.
> > >
> > > I do not know what triggers it on the autobuilder though. My guess
> > > is
> > > that somehow the sstate tgz file disappears between the call to
> > > sstate_checkhashes() and when bitbake actually tries to download the
> > > file.
> > >
> > > We do have a daily job that cleans up the global sstate cache and
> > > removes files that have not been accessed in the last ten days, but
> > > it seems unlikely that it should remove a file that just happens to
> > > be required again, and do it at exactly the time when that task is
> > > building.
> >
> > I have left this code as an error deliberately as this kind of thing
> > should not happen and if it does, there is really something wrong which
> > you need to figure out. It means that at one point bitbake thinks the
> > sstate is present and valid, then later it isn't.
>
> True, but since the operations of checking if an sstate file exists and
> retrieving it is not an atomic operation, there are always problems that
> can occur. Some may be fixable, some may not. However, using a build
> failure to detect these kind of problems is a bit harsh on the developers
> who only sees their builds complete only to get an error for something
> that is not their fault. We have better ways to detect these kinds of
> problems, e.g., through log monitoring, without having to cause
> unnecessary grief amongst the developers.
>
> > I'm not convinced patching out the errors is the right solution here...
>
> How about I make it conditional by adding an IGNORE_SETSCENE_ERRORS?
> That way it can default to "0", but we can set it to "1" to prioritize
> the production builds.
>
> > Cheers,
> >
> > Richard
>
> //Peter
>
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>

[-- Attachment #2: Type: text/html, Size: 9768 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Avoid build failures due to setscene errors
  2017-08-30  6:44       ` Peter Kjellerstedt
  2017-08-30  7:54         ` Martin Jansa
@ 2017-08-30  8:02         ` Richard Purdie
  2017-08-30  9:52           ` Peter Kjellerstedt
  1 sibling, 1 reply; 18+ messages in thread
From: Richard Purdie @ 2017-08-30  8:02 UTC (permalink / raw)
  To: Peter Kjellerstedt, Andre McCurdy; +Cc: OE Core mailing list

On Wed, 2017-08-30 at 06:44 +0000, Peter Kjellerstedt wrote:
> > I have left this code as an error deliberately as this kind of
> > thing should not happen and if it does, there is really something
> > wrong which you need to figure out. It means that at one point
> > bitbake thinks the sstate is present and valid, then later it
> > isn't.
>
> True, but since the operations of checking if an sstate file exists
> and retrieving it is not an atomic operation, there are always
> problems that can occur. Some may be fixable, some may not. However,
> using a build failure to detect these kind of problems is a bit harsh
> on the developers who only sees their builds complete only to get an
> error for something that is not their fault. We have better ways to
> detect these kinds of problems, e.g., through log monitoring, without
> having to cause unnecessary grief amongst the developers.

Files are randomly disappearing from your sstate source. So far you've
been lucky and these are not causing corruption, but they could.

Please figure out and fix your sstate infrastructure, not hack the code
to avoid the errors.

I do appreciate its painful, we did once see this issue on the
autobuilder. There was a real error in the sstate cleanup scripts and
we fixed that but it took some work to find it.

Also, with changes like this you can end up in a state where sstate can
completely stop working and the only way you'd tell is by increased
build time.

> > I'm not convinced patching out the errors is the right solution
> > here...
> How about I make it conditional by adding an IGNORE_SETSCENE_ERRORS? 
> That way it can default to "0", but we can set it to "1" to
> prioritize the production builds.

I'm still not convinced, sorry.

[The reason being complexity. I don't like having multiple ways of
doing things if we can help it, particularly when one of them is a
workaround for a problem elsewhere. One of the codepaths in a case like
this is unlikely to get well tested.]

Cheers,

Richard


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Avoid build failures due to setscene errors
  2017-08-30  8:02         ` Richard Purdie
@ 2017-08-30  9:52           ` Peter Kjellerstedt
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Kjellerstedt @ 2017-08-30  9:52 UTC (permalink / raw)
  To: Richard Purdie, Andre McCurdy; +Cc: OE Core mailing list

> -----Original Message-----
> From: Richard Purdie [mailto:richard.purdie@linuxfoundation.org]
> Sent: den 30 augusti 2017 10:03
> To: Peter Kjellerstedt <peter.kjellerstedt@axis.com>; Andre McCurdy
> <armccurdy@gmail.com>
> Cc: OE Core mailing list <openembedded-core@lists.openembedded.org>
> Subject: Re: [OE-core] [PATCH 0/2] Avoid build failures due to setscene
> errors
> 
> On Wed, 2017-08-30 at 06:44 +0000, Peter Kjellerstedt wrote:
> > > I have left this code as an error deliberately as this kind of
> > > thing should not happen and if it does, there is really something
> > > wrong which you need to figure out. It means that at one point
> > > bitbake thinks the sstate is present and valid, then later it
> > > isn't.
> >
> > True, but since the operations of checking if an sstate file exists
> > and retrieving it is not an atomic operation, there are always
> > problems that can occur. Some may be fixable, some may not. However,
> > using a build failure to detect these kind of problems is a bit harsh
> > on the developers who only sees their builds complete only to get an
> > error for something that is not their fault. We have better ways to
> > detect these kinds of problems, e.g., through log monitoring, without
> > having to cause unnecessary grief amongst the developers.
> 
> Files are randomly disappearing from your sstate source. So far you've
> been lucky and these are not causing corruption, but they could.

Somehow I fail to see how missing sstate cache files can cause 
corruption. If they are missing, the real task is run and all is well.

Also, I do not actually know if the files disappear permanently or 
temporarily, because at the time when I look at the global sstate cache 
the files are there, newly created because the build continued and let 
the real task run. My guess though is that the files only temporarily 
disappeared due to some network glitch, but currently I cannot verify it.

Regardless of whether my proposed changes are accepted or not, if you 
want to keep the default behavior that a failed setscene task will 
eventually cause the build to fail, then we should change it to fail 
immediately instead. Continuing the build when you know it will fail 
makes no sense at all.
 
> Please figure out and fix your sstate infrastructure, not hack the code
> to avoid the errors.

As Martin Jansa mentioned in another response, the problem may be due 
to NFS or general network disturbances. And I see no way to protect 
ourselves from them. And apparently we are not alone in seeing these 
kinds of transient errors.

> I do appreciate its painful, we did once see this issue on the
> autobuilder. There was a real error in the sstate cleanup scripts and
> we fixed that but it took some work to find it.

Are your sstate cache clean up scripts available somewhere? Because 
obviously it is not trivial to get it right, and since keeping the 
sstate cache clean is something that I expect many like to do, having 
a common script for this seems like a good thing.

Otherwise I can contribute our script. If nothing else it would 
probably be good to have it reviewed by someone who is an expert on 
the sstate cache. It currently features:

* configurable retention period (default is 10 days)
* removes related .tgz and .tgz.siginfo files as one
* can remove stale symbolic links (typically wanted for a local sstate 
  cache which has links into a global sstate cache which have seen the 
  actual files being cleaned away)
* dry run mode
* quiet mode (only prints a summary stating how much was clean up and 
  the current size of the sstate cache; very nice for running it as a 
  cronjob) 

> Also, with changes like this you can end up in a state where sstate can
> completely stop working and the only way you'd tell is by increased
> build time.

As I mentioned, we have monitoring of our builds in place and would 
definitely notice if the global sstate cache is not used as expected.

> > > I'm not convinced patching out the errors is the right solution
> > > here...
> > 
> > How about I make it conditional by adding an IGNORE_SETSCENE_ERRORS?
> > That way it can default to "0", but we can set it to "1" to
> > prioritize the production builds.
> 
> I'm still not convinced, sorry.
> 
> [The reason being complexity. I don't like having multiple ways of
> doing things if we can help it, particularly when one of them is a
> workaround for a problem elsewhere. One of the codepaths in a case like
> this is unlikely to get well tested.]

Well, as long as the conditional path is clearly marked as "only 
enable this if you know what you are doing", I do not see a problem 
with that path receiving less or no testing by you. It should get 
enough testing by those of us who rely on it.

The problem for me in this kind of situations is that we do not want 
to make changes to anything inside the Poky repository (which would 
effectively fork it), because down that route lies madness. So instead 
we rely on making all adaptations in our own layers. Making changes to 
recipes is easy as we can use .bbappends in our layers. Making changes 
to classes or configuration files works by copying them to our layers 
and changing them there, even though I personally hate it because it 
causes extra maintenance for me since I often need to build with a 
newer version of Poky than our layers are currently adapted for in 
preparations for updating to the next Poky release. However, changes 
to anything inside bitbake is near impossible. The same with changes 
to anything in meta/lib/oe. Thus we rely on being able to find a way 
to get these kinds of changes integrated upstream.

> Cheers,
> 
> Richard

And in case any of the above sounds as if I am trying to force a 
feature down your throat that you do not like, then I beg for 
forgiveness. We really do appreciate your expertise and dedication 
to the OE community, and I hope we can work this to something that 
you can accept and that we can use.

//Peter


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Avoid build failures due to setscene errors
  2017-08-29 22:03     ` Andre McCurdy
@ 2017-08-30  9:55       ` Peter Kjellerstedt
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Kjellerstedt @ 2017-08-30  9:55 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: OE Core mailing list

> -----Original Message-----
> From: Andre McCurdy [mailto:armccurdy@gmail.com]
> Sent: den 30 augusti 2017 00:04
> To: Peter Kjellerstedt <peter.kjellerstedt@axis.com>
> Cc: OE Core mailing list <openembedded-core@lists.openembedded.org>
> Subject: Re: [OE-core] [PATCH 0/2] Avoid build failures due to setscene
> errors
> 
> On Tue, Aug 29, 2017 at 1:59 PM, Peter Kjellerstedt
> <peter.kjellerstedt@axis.com> wrote:
> > We do have a daily job that cleans up the global sstate cache and
> > removes files that have not been accessed in the last ten days, but
> > it seems unlikely that it should remove a file that just happens to
> > be required again, and do it at exactly the time when that task is
> > building.
> 
> I guess you've already confirmed that accessing the sstate files over
> NFS does actually modify atime on the server (and that the filesystem
> on the server really does have atime support enabled, e.g. mounted
> with strictatime rather than relatime etc)?

Well, it is mounted with relatime. However, only updating the access 
time once a day should be ok since we are only concerned with changes 
that have not been accessed in the last ten days.

> If access time isn't being determined reliably and sstate files are
> being removed 10 days after being created then that might make the
> race a little more likely to trigger.

The thing is that the cleaning script runs at 3 am (and takes about 
15 minutes to complete), but we have seen the build problem at times 
when no cleaning is taking place. I am currently leaning more towards 
network glitches as the source of the problem, but that is hard to 
verify.

//Peter


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Avoid build failures due to setscene errors
  2017-08-30  7:54         ` Martin Jansa
@ 2019-11-29 16:48           ` Martin Jansa
  2020-01-09 12:26             ` Ricardo Ribalda Delgado
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Jansa @ 2019-11-29 16:48 UTC (permalink / raw)
  To: Peter Kjellerstedt; +Cc: OE Core mailing list

[-- Attachment #1: Type: text/plain, Size: 2625 bytes --]

On Wed, Aug 30, 2017 at 9:54 AM Martin Jansa <martin.jansa@gmail.com> wrote:

> I agree with this patchset and it would be OK with IGNORE_SETSCENE_ERRORS
> conditional as well.
>
> We're also sometimes seeing these errors, sometime anticipated when
> cleaning shared sstate-cache on NFS server sometimes unexpected when NFS or
> network goes down for a minute and for some builds it happens between
> sstate_checkhashes()  and using the sstate.
>
> We normally stop all jenkins builds, until the cleanup is complete (there
> is jenkins job doing the cleanup, so it puts jenkins into stop mode, waits
> for all current jobs to finish which can take hours, then performs the
> cleanup and cancels the stop mode), but we cannot stop hundreds of
> developers using the same sstate-cache in local builds (especially when we
> cannot really know when exactly the job will have free jenkins to perform
> the cleanup) - luckily in local builds it doesn't hurt so bad, because the
> developers are more likely to ignore the error as long as the image was
> created, but in jenkins builds when bitbake returns error we cannot easily
> distinguish this case of "RP is intentionally warning us that something
> went wrong with sstate, but everything was built correctly in the end" and
> "something failed in the build and we weren't able to recover from that,
> maybe even the image wasn't created" - so we don't trigger the follow up
> actions like announcing new official builds or parsing release notes or
> automated testing.
>
> Yes we could add more logic to these CI jobs, to grep the logs to decide
> if this error was the only one which caused the bitbake to return error
> code and ignore the returned error in such case, but simple variable is
> easier to maintain (even for the cost of forking bitbake and oe-core) and
> will work for local builds as well.
>

 I was using these 2 changes in my fork of oe-core and bitbake since they
were sent to the list, but today after getting a bunch of errors like this
from build which unfortunately wasn't using my forks and few questions
about why these errors aren't ignored from fellow developers I've finally
found time to improve our CI jobs to deal with this and ignore the bitbake
return code if it's reporting failure only because of these setscene
fetcher failures.

If someone needs similar work around for bitbake behavior, here is what I
did:
https://github.com/webOS-ports/jenkins-jobs/pull/12
yes, it's ugly, but it seems to work and is a bit better than forking
oe-core and bitbake just because of this issue.

Regards,

[-- Attachment #2: Type: text/html, Size: 3479 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] Avoid build failures due to setscene errors
  2019-11-29 16:48           ` Martin Jansa
@ 2020-01-09 12:26             ` Ricardo Ribalda Delgado
  0 siblings, 0 replies; 18+ messages in thread
From: Ricardo Ribalda Delgado @ 2020-01-09 12:26 UTC (permalink / raw)
  To: Martin Jansa; +Cc: Peter Kjellerstedt, OE Core mailing list

Hi

I am also hitting this wall. Any reason why the original patches could
not be merged?

On Fri, Nov 29, 2019 at 5:49 PM Martin Jansa <martin.jansa@gmail.com> wrote:
>
> On Wed, Aug 30, 2017 at 9:54 AM Martin Jansa <martin.jansa@gmail.com> wrote:
>>
>> I agree with this patchset and it would be OK with IGNORE_SETSCENE_ERRORS conditional as well.
>>
>> We're also sometimes seeing these errors, sometime anticipated when cleaning shared sstate-cache on NFS server sometimes unexpected when NFS or network goes down for a minute and for some builds it happens between sstate_checkhashes()  and using the sstate.
>>
>> We normally stop all jenkins builds, until the cleanup is complete (there is jenkins job doing the cleanup, so it puts jenkins into stop mode, waits for all current jobs to finish which can take hours, then performs the cleanup and cancels the stop mode), but we cannot stop hundreds of developers using the same sstate-cache in local builds (especially when we cannot really know when exactly the job will have free jenkins to perform the cleanup) - luckily in local builds it doesn't hurt so bad, because the developers are more likely to ignore the error as long as the image was created, but in jenkins builds when bitbake returns error we cannot easily distinguish this case of "RP is intentionally warning us that something went wrong with sstate, but everything was built correctly in the end" and "something failed in the build and we weren't able to recover from that, maybe even the image wasn't created" - so we don't trigger the follow up actions like announcing new official builds or parsing release notes or automated testing.
>>
>> Yes we could add more logic to these CI jobs, to grep the logs to decide if this error was the only one which caused the bitbake to return error code and ignore the returned error in such case, but simple variable is easier to maintain (even for the cost of forking bitbake and oe-core) and will work for local builds as well.
>
>
>  I was using these 2 changes in my fork of oe-core and bitbake since they were sent to the list, but today after getting a bunch of errors like this from build which unfortunately wasn't using my forks and few questions about why these errors aren't ignored from fellow developers I've finally found time to improve our CI jobs to deal with this and ignore the bitbake return code if it's reporting failure only because of these setscene fetcher failures.
>
> If someone needs similar work around for bitbake behavior, here is what I did:
> https://github.com/webOS-ports/jenkins-jobs/pull/12
> yes, it's ugly, but it seems to work and is a bit better than forking oe-core and bitbake just because of this issue.
>
> Regards,
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core



-- 
Ricardo Ribalda


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-01-09 12:26 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-29 20:00 [PATCH 0/2] Avoid build failures due to setscene errors Peter Kjellerstedt
2017-08-29 20:00 ` [PATCH 1/2] bitbake: fetch2: Allow Fetch.download() to warn instead of error Peter Kjellerstedt
2017-08-29 20:00 ` [PATCH 2/2] sstate.bbclass: Do not cause build failures due to setscene errors Peter Kjellerstedt
2017-08-29 20:04 ` ✗ patchtest: failure for Avoid " Patchwork
2017-08-29 20:25   ` Peter Kjellerstedt
2017-08-29 22:35     ` Philip Balister
2017-08-30  7:41       ` Peter Kjellerstedt
2017-08-29 20:38 ` [PATCH 0/2] " Andre McCurdy
2017-08-29 20:59   ` Peter Kjellerstedt
2017-08-29 21:49     ` Richard Purdie
2017-08-30  6:44       ` Peter Kjellerstedt
2017-08-30  7:54         ` Martin Jansa
2019-11-29 16:48           ` Martin Jansa
2020-01-09 12:26             ` Ricardo Ribalda Delgado
2017-08-30  8:02         ` Richard Purdie
2017-08-30  9:52           ` Peter Kjellerstedt
2017-08-29 22:03     ` Andre McCurdy
2017-08-30  9:55       ` Peter Kjellerstedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.