All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] SUNRPC: avoid race between mod_timer() and del_timer_sync()
@ 2022-03-08  2:42 NeilBrown
  2022-03-12 20:34 ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2022-03-08  2:42 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: Linux NFS Mailing List


xprt_destory() claims XPRT_LOCKED and then calls del_timer_sync().
Both xprt_unlock_connect() and xprt_release() call
 ->release_xprt()
which drops XPRT_LOCKED and *then* xprt_schedule_autodisconnect()
which calls mod_timer().

This may result in mod_timer() being called *after* del_timer_sync().
When this happens, the timer may fire long after the xprt has been freed,
and run_timer_softirq() will probably crash.

The pairing of ->release_xprt() and xprt_schedule_autodisconnect() is
always called under ->transport_lock.  So if we take ->transport_lock to
call del_timer_sync(), we can be sure that mod_timer() will run first
(if it runs at all).

Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
---
 net/sunrpc/xprt.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index a02de2bddb28..5388263f8fc8 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -2112,7 +2112,14 @@ static void xprt_destroy(struct rpc_xprt *xprt)
 	 */
 	wait_on_bit_lock(&xprt->state, XPRT_LOCKED, TASK_UNINTERRUPTIBLE);
 
+	/*
+	 * xprt_schedule_autodisconnect() can run after XPRT_LOCKED
+	 * is cleared.  We use ->transport_lock to ensure the mod_timer()
+	 * can only run *before* del_time_sync(), never after.
+	 */
+	spin_lock(&xprt->transport_lock);
 	del_timer_sync(&xprt->timer);
+	spin_unlock(&xprt->transport_lock);
 
 	/*
 	 * Destroy sockets etc from the system workqueue so they can
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] SUNRPC: avoid race between mod_timer() and del_timer_sync()
  2022-03-08  2:42 [PATCH] SUNRPC: avoid race between mod_timer() and del_timer_sync() NeilBrown
@ 2022-03-12 20:34 ` Trond Myklebust
  2022-03-13 22:03   ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2022-03-12 20:34 UTC (permalink / raw)
  To: anna, neilb; +Cc: linux-nfs

On Tue, 2022-03-08 at 13:42 +1100, NeilBrown wrote:
> 
> xprt_destory() claims XPRT_LOCKED and then calls del_timer_sync().
> Both xprt_unlock_connect() and xprt_release() call
>  ->release_xprt()
> which drops XPRT_LOCKED and *then* xprt_schedule_autodisconnect()
> which calls mod_timer().
> 
> This may result in mod_timer() being called *after* del_timer_sync().
> When this happens, the timer may fire long after the xprt has been
> freed,
> and run_timer_softirq() will probably crash.
> 
> The pairing of ->release_xprt() and xprt_schedule_autodisconnect() is
> always called under ->transport_lock.  So if we take ->transport_lock
> to
> call del_timer_sync(), we can be sure that mod_timer() will run first
> (if it runs at all).

xprt_destroy() never releases XPRT_LOCKED, so how could the above race
happen?

> 
> Cc: stable@vger.kernel.org
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  net/sunrpc/xprt.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index a02de2bddb28..5388263f8fc8 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -2112,7 +2112,14 @@ static void xprt_destroy(struct rpc_xprt
> *xprt)
>          */
>         wait_on_bit_lock(&xprt->state, XPRT_LOCKED,
> TASK_UNINTERRUPTIBLE);
>  
> +       /*
> +        * xprt_schedule_autodisconnect() can run after XPRT_LOCKED
> +        * is cleared.  We use ->transport_lock to ensure the
> mod_timer()
> +        * can only run *before* del_time_sync(), never after.
> +        */
> +       spin_lock(&xprt->transport_lock);
>         del_timer_sync(&xprt->timer);
> +       spin_unlock(&xprt->transport_lock);
>  
>         /*
>          * Destroy sockets etc from the system workqueue so they can

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] SUNRPC: avoid race between mod_timer() and del_timer_sync()
  2022-03-12 20:34 ` Trond Myklebust
@ 2022-03-13 22:03   ` NeilBrown
  2022-03-13 22:58     ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2022-03-13 22:03 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: anna, linux-nfs

On Sun, 13 Mar 2022, Trond Myklebust wrote:
> On Tue, 2022-03-08 at 13:42 +1100, NeilBrown wrote:
> > 
> > xprt_destory() claims XPRT_LOCKED and then calls del_timer_sync().
> > Both xprt_unlock_connect() and xprt_release() call
> >  ->release_xprt()
> > which drops XPRT_LOCKED and *then* xprt_schedule_autodisconnect()
> > which calls mod_timer().
> > 
> > This may result in mod_timer() being called *after* del_timer_sync().
> > When this happens, the timer may fire long after the xprt has been
> > freed,
> > and run_timer_softirq() will probably crash.
> > 
> > The pairing of ->release_xprt() and xprt_schedule_autodisconnect() is
> > always called under ->transport_lock.  So if we take ->transport_lock
> > to
> > call del_timer_sync(), we can be sure that mod_timer() will run first
> > (if it runs at all).
> 
> xprt_destroy() never releases XPRT_LOCKED, so how could the above race
> happen?

the race would happen as xprt_destroy() is taking the lock.
One core is in xprt_destroy(), the other in xprt_unlock_connect()

   Core 1                           Core 2
                                    spin_lock(transport_lock)
                                    xprt->ops->release_xprt()
                                         aka xprt_release_xprt()
                                         This clears XPRT_LOCKED
  wait_on_bit_lock(XPRT_LOCKED)
       This doesn't need to wait
  del_timer_sync(->timer)
                                    xprt_schedule_autodisconnect()
                                      calls mod_timer(->timer)
                                    spin_unlock(transport_lock)
                                    wake_up_bit(XPRT_LOCKED)

The mod_timer() call being after del_timer_sync() is the problem.

If the wait_on_bit_lock() was just a little earlier and had to wait, it
wouldn't be woken until it was safe, so it is a small race window to
hit.

An alternate fix might be to move the xprt_schedule_autodisconnect()
call before xprt->ops->release_xprt(), but as a spinlock was already
used to group these operations, I thought it simpler to use the spinlock
to remove the race.

The only evidence I have that it is possible at all is two crash dumps
with run_timer_softirq() tripping over a corrupt timer.
The timer function is xprt_init_autodisconnect()
The ->next pointer is LIST_POISON2, so detach_timer() must have been
called on the timer, but it was still found in the list of pending
timers.

It looks like the xprt was freed just after above race, so timer was
still active, then allocated again so timer was reinitialised, then
freed again, so del_timer_sync() as called and ->next became
LIST_POISON2. At the time of the crash, the memory was unallocated.

I cannot be 100% certain this race is happening, but I cannot find any
other path by which it is even vaguely possible.

Thanks,
NeilBrown

> 
> > 
> > Cc: stable@vger.kernel.org
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  net/sunrpc/xprt.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > index a02de2bddb28..5388263f8fc8 100644
> > --- a/net/sunrpc/xprt.c
> > +++ b/net/sunrpc/xprt.c
> > @@ -2112,7 +2112,14 @@ static void xprt_destroy(struct rpc_xprt
> > *xprt)
> >          */
> >         wait_on_bit_lock(&xprt->state, XPRT_LOCKED,
> > TASK_UNINTERRUPTIBLE);
> >  
> > +       /*
> > +        * xprt_schedule_autodisconnect() can run after XPRT_LOCKED
> > +        * is cleared.  We use ->transport_lock to ensure the
> > mod_timer()
> > +        * can only run *before* del_time_sync(), never after.
> > +        */
> > +       spin_lock(&xprt->transport_lock);
> >         del_timer_sync(&xprt->timer);
> > +       spin_unlock(&xprt->transport_lock);
> >  
> >         /*
> >          * Destroy sockets etc from the system workqueue so they can
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] SUNRPC: avoid race between mod_timer() and del_timer_sync()
  2022-03-13 22:03   ` NeilBrown
@ 2022-03-13 22:58     ` Trond Myklebust
  2022-03-13 23:37       ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2022-03-13 22:58 UTC (permalink / raw)
  To: neilb; +Cc: anna, linux-nfs

On Mon, 2022-03-14 at 09:03 +1100, NeilBrown wrote:
> On Sun, 13 Mar 2022, Trond Myklebust wrote:
> > On Tue, 2022-03-08 at 13:42 +1100, NeilBrown wrote:
> > > 
> > > xprt_destory() claims XPRT_LOCKED and then calls
> > > del_timer_sync().
> > > Both xprt_unlock_connect() and xprt_release() call
> > >  ->release_xprt()
> > > which drops XPRT_LOCKED and *then* xprt_schedule_autodisconnect()
> > > which calls mod_timer().
> > > 
> > > This may result in mod_timer() being called *after*
> > > del_timer_sync().
> > > When this happens, the timer may fire long after the xprt has
> > > been
> > > freed,
> > > and run_timer_softirq() will probably crash.
> > > 
> > > The pairing of ->release_xprt() and
> > > xprt_schedule_autodisconnect() is
> > > always called under ->transport_lock.  So if we take -
> > > >transport_lock
> > > to
> > > call del_timer_sync(), we can be sure that mod_timer() will run
> > > first
> > > (if it runs at all).
> > 
> > xprt_destroy() never releases XPRT_LOCKED, so how could the above
> > race
> > happen?
> 
> the race would happen as xprt_destroy() is taking the lock.
> One core is in xprt_destroy(), the other in xprt_unlock_connect()
> 
>    Core 1                           Core 2
>                                     spin_lock(transport_lock)
>                                     xprt->ops->release_xprt()
>                                          aka xprt_release_xprt()
>                                          This clears XPRT_LOCKED
>   wait_on_bit_lock(XPRT_LOCKED)
>        This doesn't need to wait
>   del_timer_sync(->timer)
>                                     xprt_schedule_autodisconnect()
>                                       calls mod_timer(->timer)
>                                     spin_unlock(transport_lock)
>                                     wake_up_bit(XPRT_LOCKED)
> 
> The mod_timer() call being after del_timer_sync() is the problem.
> 
> If the wait_on_bit_lock() was just a little earlier and had to wait,
> it
> wouldn't be woken until it was safe, so it is a small race window to
> hit.
> 
> An alternate fix might be to move the xprt_schedule_autodisconnect()
> call before xprt->ops->release_xprt(), but as a spinlock was already
> used to group these operations, I thought it simpler to use the
> spinlock
> to remove the race.
> 
> The only evidence I have that it is possible at all is two crash
> dumps
> with run_timer_softirq() tripping over a corrupt timer.
> The timer function is xprt_init_autodisconnect()
> The ->next pointer is LIST_POISON2, so detach_timer() must have been
> called on the timer, but it was still found in the list of pending
> timers.
> 
> It looks like the xprt was freed just after above race, so timer was
> still active, then allocated again so timer was reinitialised, then
> freed again, so del_timer_sync() as called and ->next became
> LIST_POISON2. At the time of the crash, the memory was unallocated.
> 
> I cannot be 100% certain this race is happening, but I cannot find
> any
> other path by which it is even vaguely possible.

The above looks plausible, and I think this patch is the correct one. 

The one additional change I suggest is that we also move the
wake_up_bit() in xprt_connect_unlock() inside the spinlock, so that we
avoid a potential use-after-free.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] SUNRPC: avoid race between mod_timer() and del_timer_sync()
  2022-03-13 22:58     ` Trond Myklebust
@ 2022-03-13 23:37       ` NeilBrown
  0 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2022-03-13 23:37 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: anna, linux-nfs

On Mon, 14 Mar 2022, Trond Myklebust wrote:
> On Mon, 2022-03-14 at 09:03 +1100, NeilBrown wrote:
> > On Sun, 13 Mar 2022, Trond Myklebust wrote:
> > > On Tue, 2022-03-08 at 13:42 +1100, NeilBrown wrote:
> > > > 
> > > > xprt_destory() claims XPRT_LOCKED and then calls
> > > > del_timer_sync().
> > > > Both xprt_unlock_connect() and xprt_release() call
> > > >  ->release_xprt()
> > > > which drops XPRT_LOCKED and *then* xprt_schedule_autodisconnect()
> > > > which calls mod_timer().
> > > > 
> > > > This may result in mod_timer() being called *after*
> > > > del_timer_sync().
> > > > When this happens, the timer may fire long after the xprt has
> > > > been
> > > > freed,
> > > > and run_timer_softirq() will probably crash.
> > > > 
> > > > The pairing of ->release_xprt() and
> > > > xprt_schedule_autodisconnect() is
> > > > always called under ->transport_lock.  So if we take -
> > > > >transport_lock
> > > > to
> > > > call del_timer_sync(), we can be sure that mod_timer() will run
> > > > first
> > > > (if it runs at all).
> > > 
> > > xprt_destroy() never releases XPRT_LOCKED, so how could the above
> > > race
> > > happen?
> > 
> > the race would happen as xprt_destroy() is taking the lock.
> > One core is in xprt_destroy(), the other in xprt_unlock_connect()
> > 
> >    Core 1                           Core 2
> >                                     spin_lock(transport_lock)
> >                                     xprt->ops->release_xprt()
> >                                          aka xprt_release_xprt()
> >                                          This clears XPRT_LOCKED
> >   wait_on_bit_lock(XPRT_LOCKED)
> >        This doesn't need to wait
> >   del_timer_sync(->timer)
> >                                     xprt_schedule_autodisconnect()
> >                                       calls mod_timer(->timer)
> >                                     spin_unlock(transport_lock)
> >                                     wake_up_bit(XPRT_LOCKED)
> > 
> > The mod_timer() call being after del_timer_sync() is the problem.
> > 
> > If the wait_on_bit_lock() was just a little earlier and had to wait,
> > it
> > wouldn't be woken until it was safe, so it is a small race window to
> > hit.
> > 
> > An alternate fix might be to move the xprt_schedule_autodisconnect()
> > call before xprt->ops->release_xprt(), but as a spinlock was already
> > used to group these operations, I thought it simpler to use the
> > spinlock
> > to remove the race.
> > 
> > The only evidence I have that it is possible at all is two crash
> > dumps
> > with run_timer_softirq() tripping over a corrupt timer.
> > The timer function is xprt_init_autodisconnect()
> > The ->next pointer is LIST_POISON2, so detach_timer() must have been
> > called on the timer, but it was still found in the list of pending
> > timers.
> > 
> > It looks like the xprt was freed just after above race, so timer was
> > still active, then allocated again so timer was reinitialised, then
> > freed again, so del_timer_sync() as called and ->next became
> > LIST_POISON2. At the time of the crash, the memory was unallocated.
> > 
> > I cannot be 100% certain this race is happening, but I cannot find
> > any
> > other path by which it is even vaguely possible.
> 
> The above looks plausible, and I think this patch is the correct one. 

Thanks.

> The one additional change I suggest is that we also move the
> wake_up_bit() in xprt_connect_unlock() inside the spinlock, so that we
> avoid a potential use-after-free.

I had a look and I don't think this is needed.
wake_up_bit() doesn't dereference the pointer.  It hashes the address
together with the bit to choose a wake_queue to wake up.
The wake_bit_function() which might be called as part of that wakeup
will test_bit() the given bit at the given address, but only after
the address has been found to match an address that something is waiting
on.  So that something must hold a reference to stop the memory being
freed.

However, as I was looking, I was reminded of the use of XPRT_LOCKED in
sysfs.c which isn't safe.
Two sysfs functions use wait_on_bit_lock() to wait for XPRT_LOCKED.
However most places that clear XPRT_LOCKED don't generate a wakeup,
so these can wait indefinitely.  This can easily be reproduced by
writing "online" to "xprt_state" in a tight loop while there is
concurrent NFS traffic. 

If we wanted to add a wake_up_bit() whenever XPRT_LOCKED is cleared,
that would be remove that race and it would trivially put wake_up_bit()
inside the spinlock anyway.

I'm not sure all those wakeups are really warranted, but it isn't clear
to me what the sysfs functions should be waiting for, so I'm not
certain.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-13 23:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-08  2:42 [PATCH] SUNRPC: avoid race between mod_timer() and del_timer_sync() NeilBrown
2022-03-12 20:34 ` Trond Myklebust
2022-03-13 22:03   ` NeilBrown
2022-03-13 22:58     ` Trond Myklebust
2022-03-13 23:37       ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.