All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] process.py: Increase bitbake timeout and add logs
@ 2021-03-02 21:51 Chaitanya Vadrevu
  2021-03-02 22:44 ` [bitbake-devel] " Richard Purdie
  0 siblings, 1 reply; 4+ messages in thread
From: Chaitanya Vadrevu @ 2021-03-02 21:51 UTC (permalink / raw)
  To: bitbake-devel; +Cc: Chaitanya Vadrevu

We have started seeing "Unable to connect to bitbake server ..." errors on
our build farm consistently with 60s timeout. Increasing the timeout to
300s and logging every 10s.

Signed-off-by: Chaitanya Vadrevu <chaitanya.vadrevu@ni.com>
---
 lib/bb/server/process.py | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/lib/bb/server/process.py b/lib/bb/server/process.py
index b27b4aef..d5571c36 100644
--- a/lib/bb/server/process.py
+++ b/lib/bb/server/process.py
@@ -390,12 +390,19 @@ class ServerCommunicator():
         self.connection = connection
         self.recv = recv
 
+        self.reply_wait = 10
+        self.max_reply_wait = 300
+
     def runCommand(self, command):
         self.connection.send(command)
-        if not self.recv.poll(30):
-            logger.info("No reply from server in 30s")
-            if not self.recv.poll(30):
-                raise ProcessTimeout("Timeout while waiting for a reply from the bitbake server (60s)")
+        total_reply_wait = self.reply_wait
+
+        while not self.recv.poll(self.reply_wait):
+            logger.info("No reply from server in %ds" % total_reply_wait)
+            total_reply_wait += self.reply_wait
+            if total_reply_wait > self.max_reply_wait:
+                raise ProcessTimeout("Timeout while waiting for a reply from the bitbake server (%ds)" % (total_reply_wait - self.reply_wait))
+
         ret, exc = self.recv.get()
         # Should probably turn all exceptions in exc back into exceptions?
         # For now, at least handle BBHandledException
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [bitbake-devel] [PATCH] process.py: Increase bitbake timeout and add logs
  2021-03-02 21:51 [PATCH] process.py: Increase bitbake timeout and add logs Chaitanya Vadrevu
@ 2021-03-02 22:44 ` Richard Purdie
  2021-03-02 23:24   ` Chaitanya Vadrevu
       [not found]   ` <1668AA16CDA346BC.30035@lists.openembedded.org>
  0 siblings, 2 replies; 4+ messages in thread
From: Richard Purdie @ 2021-03-02 22:44 UTC (permalink / raw)
  To: Chaitanya Vadrevu, bitbake-devel

On Tue, 2021-03-02 at 15:51 -0600, Chaitanya Vadrevu wrote:
> We have started seeing "Unable to connect to bitbake server ..." errors on
> our build farm consistently with 60s timeout. Increasing the timeout to
> 300s and logging every 10s.
> 
> Signed-off-by: Chaitanya Vadrevu <chaitanya.vadrevu@ni.com>
> ---
>  lib/bb/server/process.py | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)

Taking a step back, is it reasonable for bitbake to "disappear" 
for more than a minute? I've not wanted to increase this value
too much as for an interactive user its a pretty poor situation to
stall for delays this long.

We're also seeing these on the project autobuilder occasionally,
they seem load related. Have you any monitoring which says what your
build farm is doing when these timeouts happen? Did increasing it to
300s work?

I have a suspicion its IO load related and probably around syncing
files at bitbake exit that there is the issue.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bitbake-devel] [PATCH] process.py: Increase bitbake timeout and add logs
  2021-03-02 22:44 ` [bitbake-devel] " Richard Purdie
@ 2021-03-02 23:24   ` Chaitanya Vadrevu
       [not found]   ` <1668AA16CDA346BC.30035@lists.openembedded.org>
  1 sibling, 0 replies; 4+ messages in thread
From: Chaitanya Vadrevu @ 2021-03-02 23:24 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel

[-- Attachment #1: Type: text/plain, Size: 2229 bytes --]

Hi Richard,

We’re pretty sure its load related.
We started seeing these errors when our build machines were swamped up
with a bunch of jobs after we turned them back on after the
Texas power outage.

The only info I could glean from logs was that it always seemed to happen
after starting the do_rootfs task of our image.
We unfortunately don’t have any more insight into build farm state
when it happened.

Increasing to 300s worked and we stopped seeing the issue right away.
Unfortunately I haven’t been able to find a lower timeout value since the
load on build farm eased up this week and now I’m only seeing at max 20s wait.

For interactive users, are there any cases other than load related where they
usually see this issue?
The periodic logs every 10s should help keep them informed and they always have
the opportunity to kill the build.

Thanks,
Chaitanya

From: Richard Purdie <richard.purdie@linuxfoundation.org>
Date: Tuesday, March 2, 2021 at 4:44 PM
To: Chaitanya Vadrevu <chaitanya.vadrevu@ni.com>, bitbake-devel@lists.openembedded.org <bitbake-devel@lists.openembedded.org>
Subject: [EXTERNAL] Re: [bitbake-devel] [PATCH] process.py: Increase bitbake timeout and add logs
On Tue, 2021-03-02 at 15:51 -0600, Chaitanya Vadrevu wrote:
> We have started seeing "Unable to connect to bitbake server ..." errors on
> our build farm consistently with 60s timeout. Increasing the timeout to
> 300s and logging every 10s.
>
> Signed-off-by: Chaitanya Vadrevu <chaitanya.vadrevu@ni.com>
> ---
>  lib/bb/server/process.py | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)

Taking a step back, is it reasonable for bitbake to "disappear"
for more than a minute? I've not wanted to increase this value
too much as for an interactive user its a pretty poor situation to
stall for delays this long.

We're also seeing these on the project autobuilder occasionally,
they seem load related. Have you any monitoring which says what your
build farm is doing when these timeouts happen? Did increasing it to
300s work?

I have a suspicion its IO load related and probably around syncing
files at bitbake exit that there is the issue.

Cheers,

Richard

[-- Attachment #2: Type: text/html, Size: 4718 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bitbake-devel] [PATCH] process.py: Increase bitbake timeout and add logs
       [not found]   ` <1668AA16CDA346BC.30035@lists.openembedded.org>
@ 2021-03-15 17:05     ` Chaitanya Vadrevu
  0 siblings, 0 replies; 4+ messages in thread
From: Chaitanya Vadrevu @ 2021-03-15 17:05 UTC (permalink / raw)
  To: Richard Purdie, bitbake-devel

[-- Attachment #1: Type: text/plain, Size: 2762 bytes --]

Is there any interest in taking this patch? Can I make any changes to it to get it accepted?

Thanks,
Chaitanya

From: bitbake-devel@lists.openembedded.org <bitbake-devel@lists.openembedded.org> on behalf of Chaitanya Vadrevu <chaitanya.vadrevu@ni.com>
Date: Tuesday, March 2, 2021 at 5:24 PM
To: Richard Purdie <richard.purdie@linuxfoundation.org>, bitbake-devel@lists.openembedded.org <bitbake-devel@lists.openembedded.org>
Subject: [EXTERNAL] Re: [bitbake-devel] [PATCH] process.py: Increase bitbake timeout and add logs
Hi Richard,

We’re pretty sure its load related.
We started seeing these errors when our build machines were swamped up
with a bunch of jobs after we turned them back on after the
Texas power outage.

The only info I could glean from logs was that it always seemed to happen
after starting the do_rootfs task of our image.
We unfortunately don’t have any more insight into build farm state
when it happened.

Increasing to 300s worked and we stopped seeing the issue right away.
Unfortunately I haven’t been able to find a lower timeout value since the
load on build farm eased up this week and now I’m only seeing at max 20s wait.

For interactive users, are there any cases other than load related where they
usually see this issue?
The periodic logs every 10s should help keep them informed and they always have
the opportunity to kill the build.

Thanks,
Chaitanya

From: Richard Purdie <richard.purdie@linuxfoundation.org>
Date: Tuesday, March 2, 2021 at 4:44 PM
To: Chaitanya Vadrevu <chaitanya.vadrevu@ni.com>, bitbake-devel@lists.openembedded.org <bitbake-devel@lists.openembedded.org>
Subject: [EXTERNAL] Re: [bitbake-devel] [PATCH] process.py: Increase bitbake timeout and add logs
On Tue, 2021-03-02 at 15:51 -0600, Chaitanya Vadrevu wrote:
> We have started seeing "Unable to connect to bitbake server ..." errors on
> our build farm consistently with 60s timeout. Increasing the timeout to
> 300s and logging every 10s.
>
> Signed-off-by: Chaitanya Vadrevu <chaitanya.vadrevu@ni.com>
> ---
>  lib/bb/server/process.py | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)

Taking a step back, is it reasonable for bitbake to "disappear"
for more than a minute? I've not wanted to increase this value
too much as for an interactive user its a pretty poor situation to
stall for delays this long.

We're also seeing these on the project autobuilder occasionally,
they seem load related. Have you any monitoring which says what your
build farm is doing when these timeouts happen? Did increasing it to
300s work?

I have a suspicion its IO load related and probably around syncing
files at bitbake exit that there is the issue.

Cheers,

Richard

[-- Attachment #2: Type: text/html, Size: 5843 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-15 17:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02 21:51 [PATCH] process.py: Increase bitbake timeout and add logs Chaitanya Vadrevu
2021-03-02 22:44 ` [bitbake-devel] " Richard Purdie
2021-03-02 23:24   ` Chaitanya Vadrevu
     [not found]   ` <1668AA16CDA346BC.30035@lists.openembedded.org>
2021-03-15 17:05     ` Chaitanya Vadrevu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.