bitbake-devel.lists.openembedded.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bitbake: Add task timeout support
@ 2023-05-25 10:21 Michal Sieron
  2023-05-25 10:44 ` [bitbake-devel] " Mikko Rapeli
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Sieron @ 2023-05-25 10:21 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Michal Sieron, Tomasz Dziendzielski, Mateusz Marciniec

By setting `BB_TASK_TIMEOUT` you can control timeout of for tasks.
Using flags `BB_TASK_TIMEOUT[do_mytask]` you can override timeout
settings for specific tasks.

This is especially useful when some server doesn't work properly and
`do_fetch` task takes too long. We may want to kill it early instead
of waiting for a fetch that won't happen.

Signed-off-by: Michal Sieron <michalwsieron@gmail.com>
Signed-off-by: Tomasz Dziendzielski <tomasz.dziendzielski@gmail.com>
Signed-off-by: Mateusz Marciniec <mateuszmar2@gmail.com>
---
 .../bitbake-user-manual-ref-variables.rst     |  7 +++
 bitbake/lib/bb/build.py                       | 43 ++++++++++++++++++-
 2 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/bitbake/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst b/bitbake/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
index 01d4f8d14a..eaaa307960 100644
--- a/bitbake/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
+++ b/bitbake/doc/bitbake-user-manual/bitbake-user-manual-ref-variables.rst
@@ -686,6 +686,13 @@ overview of their function and contents.
       in images is given a higher priority as compared to build tasks to
       ensure that images do not suffer timeouts on loaded systems.
 
+   :term:`BB_TASK_TIMEOUT`
+      Specifies time after which still running tasks will be terminated.
+
+      Time is provided as an integer number of seconds.
+      You can set timeout for specific task using ``BB_TASK_TIMEOUT[do_mytask]``.
+      To remove timeout use an empty value ``BB_TASK_TIMEOUT = ""``.
+
    :term:`BB_TASKHASH`
       Within an executing task, this variable holds the hash of the task as
       returned by the currently enabled signature generator.
diff --git a/bitbake/lib/bb/build.py b/bitbake/lib/bb/build.py
index 44d08f5c55..2057dd9415 100644
--- a/bitbake/lib/bb/build.py
+++ b/bitbake/lib/bb/build.py
@@ -17,6 +17,7 @@ import sys
 import logging
 import glob
 import itertools
+import multiprocessing
 import time
 import re
 import stat
@@ -594,7 +595,7 @@ def _task_data(fn, task, d):
     bb.data.expandKeys(localdata)
     return localdata
 
-def _exec_task(fn, task, d, quieterr):
+def __exec_task(fn, task, d, quieterr):
     """Execute a BB 'task'
 
     Execution of a task involves a bit more setup than executing a function,
@@ -761,6 +762,46 @@ def _exec_task(fn, task, d, quieterr):
 
     return 0
 
+def get_task_timeout(d, task):
+    # task specific timeout
+    timeout = d.getVarFlag("BB_TASK_TIMEOUT", task)
+
+    if timeout is None:
+        # any task timeout
+        timeout = d.getVar("BB_TASK_TIMEOUT")
+
+    if timeout is None or timeout == "":
+        return None
+    else:
+        try:
+            return int(timeout)
+        except ValueError as e:
+            raise ValueError("invalid `BB_TASK_TIMEOUT` value '%s'" % timeout)
+
+def _exec_task(fn, task, d, quieterr):
+    """Intermediate function between exec_task and __exec_task, to support task timeouts."""
+
+    def targetFunc(func):
+        def inner(*args, **kwargs):
+            sys.exit(func(*args, **kwargs))
+
+        return inner
+
+    try:
+        timeout = get_task_timeout(d, task)
+    except ValueError as e:
+        logger.error("%s: %s" % (d.getVar("PF"), e.args[0]))
+        return 1
+
+    task_proc = multiprocessing.Process(target=targetFunc(__exec_task), args=(fn, task, d, quieterr))
+    task_proc.start()
+    task_proc.join(timeout)
+    if task_proc.exitcode is None:
+        logger.error("timeout exceeded (%s s)" % (str(timeout)))
+        return 1
+    else:
+        return task_proc.exitcode
+
 def exec_task(fn, task, d, profile = False):
     try:
         quieterr = False
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [bitbake-devel] [PATCH] bitbake: Add task timeout support
  2023-05-25 10:21 [PATCH] bitbake: Add task timeout support Michal Sieron
@ 2023-05-25 10:44 ` Mikko Rapeli
  2023-05-25 10:59   ` Alexander Kanavin
  0 siblings, 1 reply; 10+ messages in thread
From: Mikko Rapeli @ 2023-05-25 10:44 UTC (permalink / raw)
  To: Michal Sieron; +Cc: bitbake-devel, Tomasz Dziendzielski, Mateusz Marciniec

Hi,

On Thu, May 25, 2023 at 12:21:05PM +0200, Michal Sieron wrote:
> By setting `BB_TASK_TIMEOUT` you can control timeout of for tasks.
> Using flags `BB_TASK_TIMEOUT[do_mytask]` you can override timeout
> settings for specific tasks.
> 
> This is especially useful when some server doesn't work properly and
> `do_fetch` task takes too long. We may want to kill it early instead
> of waiting for a fetch that won't happen.

Is the problem related some tools and network protocols? Which?

An alternative would be to configure the tool (git, svn, curl etc) specific
settings, or even the default socket timeouts on the running system.

I've seen this kind of problems too, but I'm not sure this task timeout is the
correct way to solve it. Time is too relative :)

Cheers,

-Mikko


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitbake-devel] [PATCH] bitbake: Add task timeout support
  2023-05-25 10:44 ` [bitbake-devel] " Mikko Rapeli
@ 2023-05-25 10:59   ` Alexander Kanavin
  2023-05-25 11:44     ` Tomasz Dziendzielski
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Kanavin @ 2023-05-25 10:59 UTC (permalink / raw)
  To: Mikko Rapeli
  Cc: Michal Sieron, bitbake-devel, Tomasz Dziendzielski, Mateusz Marciniec

I tend to agree, if the problem is that some server is too slow, or
unresponsive, this does not solve the problem. If you want to forcibly
terminate stuff (which is still not a good idea - you should look into
why something 'doesn't work properly', not work around it), you can
terminate the whole bitbake process instead.

Alex

On Thu, 25 May 2023 at 12:45, Mikko Rapeli <mikko.rapeli@linaro.org> wrote:
>
> Hi,
>
> On Thu, May 25, 2023 at 12:21:05PM +0200, Michal Sieron wrote:
> > By setting `BB_TASK_TIMEOUT` you can control timeout of for tasks.
> > Using flags `BB_TASK_TIMEOUT[do_mytask]` you can override timeout
> > settings for specific tasks.
> >
> > This is especially useful when some server doesn't work properly and
> > `do_fetch` task takes too long. We may want to kill it early instead
> > of waiting for a fetch that won't happen.
>
> Is the problem related some tools and network protocols? Which?
>
> An alternative would be to configure the tool (git, svn, curl etc) specific
> settings, or even the default socket timeouts on the running system.
>
> I've seen this kind of problems too, but I'm not sure this task timeout is the
> correct way to solve it. Time is too relative :)
>
> Cheers,
>
> -Mikko
>
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#14807): https://lists.openembedded.org/g/bitbake-devel/message/14807
> Mute This Topic: https://lists.openembedded.org/mt/99127010/1686489
> Group Owner: bitbake-devel+owner@lists.openembedded.org
> Unsubscribe: https://lists.openembedded.org/g/bitbake-devel/unsub [alex.kanavin@gmail.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitbake-devel] [PATCH] bitbake: Add task timeout support
  2023-05-25 10:59   ` Alexander Kanavin
@ 2023-05-25 11:44     ` Tomasz Dziendzielski
  2023-05-25 12:00       ` Alexander Kanavin
  0 siblings, 1 reply; 10+ messages in thread
From: Tomasz Dziendzielski @ 2023-05-25 11:44 UTC (permalink / raw)
  To: Alexander Kanavin
  Cc: Mikko Rapeli, Michal Sieron, bitbake-devel, Mateusz Marciniec

[-- Attachment #1: Type: text/plain, Size: 792 bytes --]

Hello,
let me give some more context on the patch. We use self hosted gerrit and
artifactory (with git-lfs content and tarballs with source code) instances
which tend to be slow sometimes due to some overloads and other reasons. We
have a 3h timeout for the whole build. Now let's assume do_fetch tasks take
already 60 hours instead of 10 minutes. With the timeout solution proposed
by Michał we can detect it quite fast and stop the build job and unblock
the jenkins nodes for other builds or just retry the build in case the
fetch on the second run would take less time. Now without that and with
timeout on the whole bitbake process we're still holding the jenkins node
for another 2-3 hours even though we know this build has no future.

Best regards,
Tomasz Dziendzielski

[-- Attachment #2: Type: text/html, Size: 856 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitbake-devel] [PATCH] bitbake: Add task timeout support
  2023-05-25 11:44     ` Tomasz Dziendzielski
@ 2023-05-25 12:00       ` Alexander Kanavin
  2023-05-25 12:13         ` Tomasz Dziendzielski
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Kanavin @ 2023-05-25 12:00 UTC (permalink / raw)
  To: Tomasz Dziendzielski
  Cc: Mikko Rapeli, Michal Sieron, bitbake-devel, Mateusz Marciniec

Yocto has a download cache mechanism, so you should not have to fetch
more than once. Are you throwing away that cache between builds?

Again, the actual problem is slow gerrit/artifactory instances, not
stuck fetch tasks. The idea of 'retrying' the fetch so that it maybe
works better on second try is especially off-putting.

Alex

On Thu, 25 May 2023 at 13:41, Tomasz Dziendzielski
<tomasz.dziendzielski@gmail.com> wrote:
>
> Hello,
> let me give some more context on the patch. We use self hosted gerrit and artifactory (with git-lfs content and tarballs with source code) instances which tend to be slow sometimes due to some overloads and other reasons. We have a 3h timeout for the whole build. Now let's assume do_fetch tasks take already 60 hours instead of 10 minutes. With the timeout solution proposed by Michał we can detect it quite fast and stop the build job and unblock the jenkins nodes for other builds or just retry the build in case the fetch on the second run would take less time. Now without that and with timeout on the whole bitbake process we're still holding the jenkins node for another 2-3 hours even though we know this build has no future.
>
> Best regards,
> Tomasz Dziendzielski


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitbake-devel] [PATCH] bitbake: Add task timeout support
  2023-05-25 12:00       ` Alexander Kanavin
@ 2023-05-25 12:13         ` Tomasz Dziendzielski
  2023-05-25 12:21           ` Alexander Kanavin
  2023-05-25 12:57           ` Mikko Rapeli
  0 siblings, 2 replies; 10+ messages in thread
From: Tomasz Dziendzielski @ 2023-05-25 12:13 UTC (permalink / raw)
  To: Alexander Kanavin
  Cc: Mikko Rapeli, Michal Sieron, bitbake-devel, Mateusz Marciniec

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]

>Yocto has a download cache mechanism, so you should not have to fetch
>more than once. Are you throwing away that cache between builds?

Do you mean the DL_DIR or something else as cache mechanism? We do keep
DL_DIR between builds, but we're talking about cases where there was a
change in the repository.

>the actual problem is slow gerrit/artifactory instances, not
>stuck fetch tasks
Totally agree, but we're not able to fix the slow gerrit/artifactory in a
short term.

>The idea of 'retrying' the fetch
Well, it's  not like I'm saying this as a solution, usually we just
disable jobs until the slowness is gone. But isn't that what we already do
with wget in bitbake?
"wget -t 2 -T 30" from bitbake/lib/bb/fetch2/wget.py, where -t is tries,
and -T is timeout
Problem is that we get the slowness for git repositories and git does not
support such a thing.

Best regards,
Tomasz Dziendzielski

[-- Attachment #2: Type: text/html, Size: 1063 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitbake-devel] [PATCH] bitbake: Add task timeout support
  2023-05-25 12:13         ` Tomasz Dziendzielski
@ 2023-05-25 12:21           ` Alexander Kanavin
  2023-05-25 12:35             ` Tomasz Dziendzielski
  2023-05-25 12:57           ` Mikko Rapeli
  1 sibling, 1 reply; 10+ messages in thread
From: Alexander Kanavin @ 2023-05-25 12:21 UTC (permalink / raw)
  To: Tomasz Dziendzielski
  Cc: Mikko Rapeli, Michal Sieron, bitbake-devel, Mateusz Marciniec

On Thu, 25 May 2023 at 14:09, Tomasz Dziendzielski
<tomasz.dziendzielski@gmail.com> wrote:
> Well, it's  not like I'm saying this as a solution, usually we just disable jobs until the slowness is gone. But isn't that what we already do with wget in bitbake?
> "wget -t 2 -T 30" from bitbake/lib/bb/fetch2/wget.py, where -t is tries, and -T is timeout
> Problem is that we get the slowness for git repositories and git does not support such a thing.

But then you can simply wrap git with a timeout command.
https://man7.org/linux/man-pages/man1/timeout.1.html

No?

Alex


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitbake-devel] [PATCH] bitbake: Add task timeout support
  2023-05-25 12:21           ` Alexander Kanavin
@ 2023-05-25 12:35             ` Tomasz Dziendzielski
  0 siblings, 0 replies; 10+ messages in thread
From: Tomasz Dziendzielski @ 2023-05-25 12:35 UTC (permalink / raw)
  To: Alexander Kanavin
  Cc: Mikko Rapeli, Michal Sieron, bitbake-devel, Mateusz Marciniec

[-- Attachment #1: Type: text/plain, Size: 329 bytes --]

>But then you can simply wrap git with a timeout command.
>https://man7.org/linux/man-pages/man1/timeout.1.html
Well, this is a good point. I don't know how we could not come up with that
idea :D We can just simply use the FETCHCMD_git variable with timeout.
Please disregard this patch then.

Best regards,
Tomasz Dziendzielski

[-- Attachment #2: Type: text/html, Size: 537 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitbake-devel] [PATCH] bitbake: Add task timeout support
  2023-05-25 12:13         ` Tomasz Dziendzielski
  2023-05-25 12:21           ` Alexander Kanavin
@ 2023-05-25 12:57           ` Mikko Rapeli
  2023-06-05 11:46             ` Michal Sieron
  1 sibling, 1 reply; 10+ messages in thread
From: Mikko Rapeli @ 2023-05-25 12:57 UTC (permalink / raw)
  To: Tomasz Dziendzielski
  Cc: Alexander Kanavin, Michal Sieron, bitbake-devel, Mateusz Marciniec

Hi,

On Thu, May 25, 2023 at 02:13:10PM +0200, Tomasz Dziendzielski wrote:
> >Yocto has a download cache mechanism, so you should not have to fetch
> >more than once. Are you throwing away that cache between builds?
> 
> Do you mean the DL_DIR or something else as cache mechanism? We do keep
> DL_DIR between builds, but we're talking about cases where there was a
> change in the repository.
> 
> >the actual problem is slow gerrit/artifactory instances, not
> >stuck fetch tasks
> Totally agree, but we're not able to fix the slow gerrit/artifactory in a
> short term.
> 
> >The idea of 'retrying' the fetch
> Well, it's  not like I'm saying this as a solution, usually we just
> disable jobs until the slowness is gone. But isn't that what we already do
> with wget in bitbake?
> "wget -t 2 -T 30" from bitbake/lib/bb/fetch2/wget.py, where -t is tries,
> and -T is timeout
> Problem is that we get the slowness for git repositories and git does not
> support such a thing.

git uses curl, can't we shorten the timeouts or even add retries there? Maybe even
specific ones for the lfs support?

I've also seen a lot of problematic artifactory and gerrit instances, in addition
to maven, npm etc...

Cheers,

-Mikko


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [bitbake-devel] [PATCH] bitbake: Add task timeout support
  2023-05-25 12:57           ` Mikko Rapeli
@ 2023-06-05 11:46             ` Michal Sieron
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Sieron @ 2023-06-05 11:46 UTC (permalink / raw)
  To: Mikko Rapeli
  Cc: Tomasz Dziendzielski, Alexander Kanavin, bitbake-devel,
	Mateusz Marciniec

Hi,

On Thu, 25 May 2023 15:57:12 +0300
Mikko Rapeli <mikko.rapeli@linaro.org> wrote:

> git uses curl, can't we shorten the timeouts or even add retries
> there? Maybe even specific ones for the lfs support?
> 
> I've also seen a lot of problematic artifactory and gerrit instances,
> in addition to maven, npm etc...

I checked this and couldn't find any environment variables that would
control timeout for the whole operation.

The only solution I found (which doesn't require patching/recompiling)
was using previously mentioned timeout command like so:

```
FETCHCMD_git = "git \
    -c core.fsyncobjectfiles=0 \
    -c filter.lfs.clean='/usr/bin/timeout 300 git-lfs clean -- %f' \
    -c filter.lfs.smudge='/usr/bin/timeout 300 git-lfs smudge -- %f' \
    -c filter.lfs.process='/usr/bin/timeout 300 git-lfs filter-process' \
"
```

You can of course put those settings in your gitconfig file instead.

However, this method is not so nice, as when the fetch task ends, and
prints error message from git, we get "fatal: the remote end hung up
unexpectedly", which doesn't really explain what went wrong.

Michal


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-06-05 11:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-25 10:21 [PATCH] bitbake: Add task timeout support Michal Sieron
2023-05-25 10:44 ` [bitbake-devel] " Mikko Rapeli
2023-05-25 10:59   ` Alexander Kanavin
2023-05-25 11:44     ` Tomasz Dziendzielski
2023-05-25 12:00       ` Alexander Kanavin
2023-05-25 12:13         ` Tomasz Dziendzielski
2023-05-25 12:21           ` Alexander Kanavin
2023-05-25 12:35             ` Tomasz Dziendzielski
2023-05-25 12:57           ` Mikko Rapeli
2023-06-05 11:46             ` Michal Sieron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).