All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] server/process: Fix a rare lockfile race
@ 2020-07-12 10:55 Richard Purdie
  2020-07-12 12:05 ` [bitbake-devel] " Jacob Kroon
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Purdie @ 2020-07-12 10:55 UTC (permalink / raw)
  To: bitbake-devel

We're seeing rare occasional races on the autobuilder as if two server
processes have the lockfile at the same time. We need to be extremely
careful this does not happen.

I think there is a potential race in this shutdown code since we delete
the lockfile, then call unlockfile() which also tries to delete it.

This means we may remove a lock file now held by another process if we're
unlucky. Since unlockfile removes the lockfile when it can, just rely on
that and remove any possible race window.

An example cooker-deamonlog:

--- Starting bitbake server pid 2266 at 2020-07-11 06:17:18.210777 ---
Started bitbake server pid 2266
Entering server connection loop
Accepting [<socket.socket fd=20, family=AddressFamily.AF_UNIX, type=SocketKind.SOCK_STREAM, proto=0, laddr=bitbake.sock>] ([])
Processing Client
Connecting Client
Running command ['setFeatures', [2]]
Running command ['updateConfig', XXX]
Running command ['getVariable', 'BBINCLUDELOGS']
Running command ['getVariable', 'BBINCLUDELOGS_LINES']
Running command ['getSetVariable', 'BB_CONSOLELOG']
Running command ['getSetVariable', 'BB_LOGCONFIG']
Running command ['getUIHandlerNum']
Running command ['setEventMask', XXXX]
Running command ['getVariable', 'BB_DEFAULT_TASK']
Running command ['setConfig', 'cmd', 'build']
Running command ['getVariable', 'BBTARGETS']
Running command ['parseFiles']
--- Starting bitbake server pid 8252 at 2020-07-11 06:17:28.584514 ---
Started bitbake server pid 8252
--- Starting bitbake server pid 13278 at 2020-07-11 06:17:31.330635 ---
Started bitbake server pid 13278
Running command ['dataStoreConnectorCmd', 0, 'getVar', ('BBMULTICONFIG',), {}]
Running command ['getRecipes', '']
Running command ['clientComplete']
Processing Client
Disconnecting Client
No timeout, exiting.
Exiting

where it looks like there are two server processes running which should not be.
In that build there was a process left sitting in memory with its bitbake.sock file
missing but holding the lock (not sure why it wouldn't timeout/exit).

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
---
 lib/bb/server/process.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/bb/server/process.py b/lib/bb/server/process.py
index 83385baf60..b3a7f8b419 100644
--- a/lib/bb/server/process.py
+++ b/lib/bb/server/process.py
@@ -243,7 +243,6 @@ class ProcessServer(multiprocessing.Process):
                 lock = bb.utils.lockfile(lockfile, shared=False, retry=False, block=True)
                 if lock:
                     # We hold the lock so we can remove the file (hide stale pid data)
-                    bb.utils.remove(lockfile)
                     bb.utils.unlockfile(lock)
                     return
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [bitbake-devel] [PATCH] server/process: Fix a rare lockfile race
  2020-07-12 10:55 [PATCH] server/process: Fix a rare lockfile race Richard Purdie
@ 2020-07-12 12:05 ` Jacob Kroon
  2020-07-12 13:14   ` Richard Purdie
  0 siblings, 1 reply; 3+ messages in thread
From: Jacob Kroon @ 2020-07-12 12:05 UTC (permalink / raw)
  To: bitbake-devel

On 7/12/20 12:55 PM, Richard Purdie wrote:
> We're seeing rare occasional races on the autobuilder as if two server
> processes have the lockfile at the same time. We need to be extremely
> careful this does not happen.
> 
> I think there is a potential race in this shutdown code since we delete
> the lockfile, then call unlockfile() which also tries to delete it.
> 
> This means we may remove a lock file now held by another process if we're
> unlucky. Since unlockfile removes the lockfile when it can, just rely on
> that and remove any possible race window.
> 
> An example cooker-deamonlog:
> 
> --- Starting bitbake server pid 2266 at 2020-07-11 06:17:18.210777 ---
> Started bitbake server pid 2266
> Entering server connection loop
> Accepting [<socket.socket fd=20, family=AddressFamily.AF_UNIX, type=SocketKind.SOCK_STREAM, proto=0, laddr=bitbake.sock>] ([])
> Processing Client
> Connecting Client
> Running command ['setFeatures', [2]]
> Running command ['updateConfig', XXX]
> Running command ['getVariable', 'BBINCLUDELOGS']
> Running command ['getVariable', 'BBINCLUDELOGS_LINES']
> Running command ['getSetVariable', 'BB_CONSOLELOG']
> Running command ['getSetVariable', 'BB_LOGCONFIG']
> Running command ['getUIHandlerNum']
> Running command ['setEventMask', XXXX]
> Running command ['getVariable', 'BB_DEFAULT_TASK']
> Running command ['setConfig', 'cmd', 'build']
> Running command ['getVariable', 'BBTARGETS']
> Running command ['parseFiles']
> --- Starting bitbake server pid 8252 at 2020-07-11 06:17:28.584514 ---
> Started bitbake server pid 8252
> --- Starting bitbake server pid 13278 at 2020-07-11 06:17:31.330635 ---
> Started bitbake server pid 13278
> Running command ['dataStoreConnectorCmd', 0, 'getVar', ('BBMULTICONFIG',), {}]
> Running command ['getRecipes', '']
> Running command ['clientComplete']
> Processing Client
> Disconnecting Client
> No timeout, exiting.
> Exiting
> 
> where it looks like there are two server processes running which should not be.
> In that build there was a process left sitting in memory with its bitbake.sock file
> missing but holding the lock (not sure why it wouldn't timeout/exit).
> 
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> ---
>   lib/bb/server/process.py | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/lib/bb/server/process.py b/lib/bb/server/process.py
> index 83385baf60..b3a7f8b419 100644
> --- a/lib/bb/server/process.py
> +++ b/lib/bb/server/process.py
> @@ -243,7 +243,6 @@ class ProcessServer(multiprocessing.Process):
>                   lock = bb.utils.lockfile(lockfile, shared=False, retry=False, block=True)
>                   if lock:
>                       # We hold the lock so we can remove the file (hide stale pid data)
> -                    bb.utils.remove(lockfile)
>                       bb.utils.unlockfile(lock)
>                       return
>   

I'm no export on the bb lockfiles, but if this is correct we should 
update the comment aswell.

Jacob

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [bitbake-devel] [PATCH] server/process: Fix a rare lockfile race
  2020-07-12 12:05 ` [bitbake-devel] " Jacob Kroon
@ 2020-07-12 13:14   ` Richard Purdie
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Purdie @ 2020-07-12 13:14 UTC (permalink / raw)
  To: Jacob Kroon, bitbake-devel

On Sun, 2020-07-12 at 14:05 +0200, Jacob Kroon wrote:
> On 7/12/20 12:55 PM, Richard Purdie wrote:
> > diff --git a/lib/bb/server/process.py b/lib/bb/server/process.py
> > index 83385baf60..b3a7f8b419 100644
> > --- a/lib/bb/server/process.py
> > +++ b/lib/bb/server/process.py
> > @@ -243,7 +243,6 @@ class ProcessServer(multiprocessing.Process):
> >                   lock = bb.utils.lockfile(lockfile, shared=False,
> > retry=False, block=True)
> >                   if lock:
> >                       # We hold the lock so we can remove the file
> > (hide stale pid data)
> > -                    bb.utils.remove(lockfile)
> >                       bb.utils.unlockfile(lock)
> >                       return
> >   
> 
> I'm no export on the bb lockfiles, but if this is correct we should 
> update the comment aswell.

I left the comment as its still right in that unlockfile does remove
the stale pid data potentially but I guess it could be tweaked to
better match the code...

Cheers,

Richard




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-07-12 13:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-12 10:55 [PATCH] server/process: Fix a rare lockfile race Richard Purdie
2020-07-12 12:05 ` [bitbake-devel] " Jacob Kroon
2020-07-12 13:14   ` Richard Purdie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.