Is there any interest in taking this patch? Can I make any changes to it to get it accepted?

 

Thanks,
Chaitanya

 

From: bitbake-devel@lists.openembedded.org <bitbake-devel@lists.openembedded.org> on behalf of Chaitanya Vadrevu <chaitanya.vadrevu@ni.com>
Date: Tuesday, March 2, 2021 at 5:24 PM
To: Richard Purdie <richard.purdie@linuxfoundation.org>, bitbake-devel@lists.openembedded.org <bitbake-devel@lists.openembedded.org>
Subject: [EXTERNAL] Re: [bitbake-devel] [PATCH] process.py: Increase bitbake timeout and add logs

Hi Richard,

 

We’re pretty sure its load related.

We started seeing these errors when our build machines were swamped up

with a bunch of jobs after we turned them back on after the

Texas power outage.

 

The only info I could glean from logs was that it always seemed to happen

after starting the do_rootfs task of our image.

We unfortunately don’t have any more insight into build farm state

when it happened.

 

Increasing to 300s worked and we stopped seeing the issue right away.

Unfortunately I haven’t been able to find a lower timeout value since the

load on build farm eased up this week and now I’m only seeing at max 20s wait.

 

For interactive users, are there any cases other than load related where they

usually see this issue?

The periodic logs every 10s should help keep them informed and they always have

the opportunity to kill the build.

 

Thanks,

Chaitanya

 

From: Richard Purdie <richard.purdie@linuxfoundation.org>
Date: Tuesday, March 2, 2021 at 4:44 PM
To: Chaitanya Vadrevu <chaitanya.vadrevu@ni.com>, bitbake-devel@lists.openembedded.org <bitbake-devel@lists.openembedded.org>
Subject: [EXTERNAL] Re: [bitbake-devel] [PATCH] process.py: Increase bitbake timeout and add logs

On Tue, 2021-03-02 at 15:51 -0600, Chaitanya Vadrevu wrote:
> We have started seeing "Unable to connect to bitbake server ..." errors on
> our build farm consistently with 60s timeout. Increasing the timeout to
> 300s and logging every 10s.
>
> Signed-off-by: Chaitanya Vadrevu <chaitanya.vadrevu@ni.com>
> ---
>  lib/bb/server/process.py | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)

Taking a step back, is it reasonable for bitbake to "disappear" 
for more than a minute? I've not wanted to increase this value
too much as for an interactive user its a pretty poor situation to
stall for delays this long.

We're also seeing these on the project autobuilder occasionally,
they seem load related. Have you any monitoring which says what your
build farm is doing when these timeouts happen? Did increasing it to
300s work?

I have a suspicion its IO load related and probably around syncing
files at bitbake exit that there is the issue.

Cheers,

Richard