linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* ibmvnic: Race condition in remove callback
@ 2021-01-17 10:12 Uwe Kleine-König
  2021-01-19 18:14 ` Dany Madden
  0 siblings, 1 reply; 3+ messages in thread
From: Uwe Kleine-König @ 2021-01-17 10:12 UTC (permalink / raw)
  To: Dany Madden, Lijun Pan, Sukadev Bhattiprolu, Michael Ellerman,
	Juliet Kim
  Cc: Greg Kroah-Hartman, Paul Mackerras, kernel, netdev,
	Jakub Kicinski, linuxppc-dev, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1952 bytes --]

Hello,

while working on some cleanup I stumbled over a problem in the ibmvnic's
remove callback. Since commit

        7d7195a026ba ("ibmvnic: Do not process device remove during device reset")

there is the following code in the remove callback:

        static int ibmvnic_remove(struct vio_dev *dev)
        {
                ...
                spin_lock_irqsave(&adapter->state_lock, flags);
                if (test_bit(0, &adapter->resetting)) {
                        spin_unlock_irqrestore(&adapter->state_lock, flags);
                        return -EBUSY;
                }

                adapter->state = VNIC_REMOVING;
                spin_unlock_irqrestore(&adapter->state_lock, flags);

                flush_work(&adapter->ibmvnic_reset);
                flush_delayed_work(&adapter->ibmvnic_delayed_reset);
                ...
        }

Unfortunately returning -EBUSY doesn't work as intended. That's because
the return value of this function is ignored[1] and the device is
considered unbound by the device core (shortly) after ibmvnic_remove()
returns.

While looking into fixing that I noticed a worse problem:

If ibmvnic_reset() (e.g. called by the tx_timeout callback) calls
schedule_work(&adapter->ibmvnic_reset); just after the work queue is
flushed above the problem that 7d7195a026ba intends to fix will trigger
resulting in a use-after-free.

Also ibmvnic_reset() checks for adapter->state without holding the lock
which might be racy, too.

Best regards
Uwe

[1] vio_bus_remove (in arch/powerpc/platforms/pseries/vio.c) records the
    return value and passes it on. But the driver core doesn't care for
    the return value (see __device_release_driver() in drivers/base/dd.c
    calling dev->bus->remove()).

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ibmvnic: Race condition in remove callback
  2021-01-17 10:12 ibmvnic: Race condition in remove callback Uwe Kleine-König
@ 2021-01-19 18:14 ` Dany Madden
  2021-01-19 19:18   ` Uwe Kleine-König
  0 siblings, 1 reply; 3+ messages in thread
From: Dany Madden @ 2021-01-19 18:14 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Greg Kroah-Hartman, Juliet Kim, Jakub Kicinski, netdev,
	Lijun Pan, kernel, Paul Mackerras, Sukadev Bhattiprolu,
	linuxppc-dev, David S. Miller

On 2021-01-17 02:12, Uwe Kleine-König wrote:
> Hello,
> 
> while working on some cleanup I stumbled over a problem in the 
> ibmvnic's
> remove callback. Since commit
> 
>         7d7195a026ba ("ibmvnic: Do not process device remove during
> device reset")
> 
> there is the following code in the remove callback:
> 
>         static int ibmvnic_remove(struct vio_dev *dev)
>         {
>                 ...
>                 spin_lock_irqsave(&adapter->state_lock, flags);
>                 if (test_bit(0, &adapter->resetting)) {
>                         spin_unlock_irqrestore(&adapter->state_lock, 
> flags);
>                         return -EBUSY;
>                 }
> 
>                 adapter->state = VNIC_REMOVING;
>                 spin_unlock_irqrestore(&adapter->state_lock, flags);
> 
>                 flush_work(&adapter->ibmvnic_reset);
>                 flush_delayed_work(&adapter->ibmvnic_delayed_reset);
>                 ...
>         }
> 
> Unfortunately returning -EBUSY doesn't work as intended. That's because
> the return value of this function is ignored[1] and the device is
> considered unbound by the device core (shortly) after ibmvnic_remove()
> returns.

Oh! I was not aware of this. In our code review, a question on whether 
or not device reset should have a higher precedence over device remove 
was raised before. So, now it is clear that this driver has to take care 
of remove over reset.

> 
> While looking into fixing that I noticed a worse problem:
> 
> If ibmvnic_reset() (e.g. called by the tx_timeout callback) calls
> schedule_work(&adapter->ibmvnic_reset); just after the work queue is
> flushed above the problem that 7d7195a026ba intends to fix will trigger
> resulting in a use-after-free.

It was proposed that when coming into ibmvnic_remove() we lock down the 
workqueue to prevent future access, flush, cleanup, then unregister the 
device. Your thought on this?

> 
> Also ibmvnic_reset() checks for adapter->state without holding the lock
> which might be racy, too.
> 
Suka started addressing consistent locking with this patch series:
https://lists.openwall.net/netdev/2021/01/08/89

He is reworking this.

> Best regards
> Uwe

Thank you for taking the time to review this driver, Uwe. This is very 
helpful for us.

Best Regards,
Dany

> 
> [1] vio_bus_remove (in arch/powerpc/platforms/pseries/vio.c) records 
> the
>     return value and passes it on. But the driver core doesn't care for
>     the return value (see __device_release_driver() in 
> drivers/base/dd.c
>     calling dev->bus->remove()).

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: ibmvnic: Race condition in remove callback
  2021-01-19 18:14 ` Dany Madden
@ 2021-01-19 19:18   ` Uwe Kleine-König
  0 siblings, 0 replies; 3+ messages in thread
From: Uwe Kleine-König @ 2021-01-19 19:18 UTC (permalink / raw)
  To: Dany Madden
  Cc: Juliet Kim, Greg Kroah-Hartman, Lijun Pan, kernel, netdev,
	Jakub Kicinski, Sukadev Bhattiprolu, linuxppc-dev,
	David S. Miller, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 3517 bytes --]

Hello Dany,

On Tue, Jan 19, 2021 at 10:14:25AM -0800, Dany Madden wrote:
> On 2021-01-17 02:12, Uwe Kleine-König wrote:
> > while working on some cleanup I stumbled over a problem in the ibmvnic's
> > remove callback. Since commit
> > 
> >         7d7195a026ba ("ibmvnic: Do not process device remove during
> > device reset")
> > 
> > there is the following code in the remove callback:
> > 
> >         static int ibmvnic_remove(struct vio_dev *dev)
> >         {
> >                 ...
> >                 spin_lock_irqsave(&adapter->state_lock, flags);
> >                 if (test_bit(0, &adapter->resetting)) {
> >                         spin_unlock_irqrestore(&adapter->state_lock,
> > flags);
> >                         return -EBUSY;
> >                 }
> > 
> >                 adapter->state = VNIC_REMOVING;
> >                 spin_unlock_irqrestore(&adapter->state_lock, flags);
> > 
> >                 flush_work(&adapter->ibmvnic_reset);
> >                 flush_delayed_work(&adapter->ibmvnic_delayed_reset);
> >                 ...
> >         }
> > 
> > Unfortunately returning -EBUSY doesn't work as intended. That's because
> > the return value of this function is ignored[1] and the device is
> > considered unbound by the device core (shortly) after ibmvnic_remove()
> > returns.
> 
> Oh! I was not aware of this.

There are not very many people who are aware. That's why I'm working on
making all these remove callbacks return void. (And that's how I stubled
over this driver's expectations when returning -EBUSY.)

> In our code review, a question on whether or not device reset should
> have a higher precedence over device remove was raised before. So, now
> it is clear that this driver has to take care of remove over reset.

Yes, either you have to sleep in .remove until the reset is completed,
or you cancel the request.

> > While looking into fixing that I noticed a worse problem:
> > 
> > If ibmvnic_reset() (e.g. called by the tx_timeout callback) calls
> > schedule_work(&adapter->ibmvnic_reset); just after the work queue is
> > flushed above the problem that 7d7195a026ba intends to fix will trigger
> > resulting in a use-after-free.
> 
> It was proposed that when coming into ibmvnic_remove() we lock down the
> workqueue to prevent future access, flush, cleanup, then unregister the
> device. Your thought on this?

This is already done a bit, as ibmvnic_reset() checks for adapter->state
== VNIC_REMOVING. The problem is just that this check is racy.

> > Also ibmvnic_reset() checks for adapter->state without holding the lock
> > which might be racy, too.
>
> Suka started addressing consistent locking with this patch series:
> https://lists.openwall.net/netdev/2021/01/08/89
> 
> He is reworking this.

Please understand that I won't look into this. This is not in my area of
expertise and interest and I'd like to consider this problem already
done for me with my report. I'm glad you're acting on it. Depending on
how quick this is fixed I plan to submit a patch that changes

	return -EBUSY;

to

	return 0;

to prepare changing the prototype of the remove callback to return void.

> Thank you for taking the time to review this driver, Uwe. This is very
> helpful for us.

You're welcome.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-01-19 19:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-17 10:12 ibmvnic: Race condition in remove callback Uwe Kleine-König
2021-01-19 18:14 ` Dany Madden
2021-01-19 19:18   ` Uwe Kleine-König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).