linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] driver core: flush async calls before testing driver removal
@ 2016-12-10  0:15 Vladimir Zapolskiy
  2016-12-10  1:59 ` Dmitry Torokhov
  2016-12-10  7:32 ` Greg Kroah-Hartman
  0 siblings, 2 replies; 10+ messages in thread
From: Vladimir Zapolskiy @ 2016-12-10  0:15 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rob Herring; +Cc: linux-kernel

If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
positives are reported for ATA controller drivers, because ATA port
probes are done asynchronously, and the same problem may also touch
other asynchronously probed drivers.

To reduce the rate of false reports on boot call async_synchronize_full()
before attempting to remove a driver, the same is done in delete_module()
syscall for all possible drivers and in __device_release_driver() function
for asynchronously probed drivers.

Fixes: bea5b158ff0d ("driver core: add test of driver remove calls during probe")
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
---
Some time ago the issue was discussed on the linux-ide mailing list, see

  https://www.spinics.net/lists/linux-ide/msg53481.html

 drivers/base/dd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index d76cd97..a4feecf 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -384,6 +384,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 	if (test_remove) {
 		test_remove = false;
 
+		async_synchronize_full();
+
 		if (dev->bus->remove)
 			dev->bus->remove(dev);
 		else if (drv->remove)
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] driver core: flush async calls before testing driver removal
  2016-12-10  0:15 [PATCH] driver core: flush async calls before testing driver removal Vladimir Zapolskiy
@ 2016-12-10  1:59 ` Dmitry Torokhov
  2016-12-10 12:57   ` Vladimir Zapolskiy
  2016-12-10  7:32 ` Greg Kroah-Hartman
  1 sibling, 1 reply; 10+ messages in thread
From: Dmitry Torokhov @ 2016-12-10  1:59 UTC (permalink / raw)
  To: Vladimir Zapolskiy; +Cc: Greg Kroah-Hartman, Rob Herring, lkml

On Fri, Dec 9, 2016 at 4:15 PM, Vladimir Zapolskiy <vz@mleia.com> wrote:
> If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
> positives are reported for ATA controller drivers, because ATA port
> probes are done asynchronously, and the same problem may also touch
> other asynchronously probed drivers.
>
> To reduce the rate of false reports on boot call async_synchronize_full()
> before attempting to remove a driver, the same is done in delete_module()
> syscall for all possible drivers and in __device_release_driver() function
> for asynchronously probed drivers.

I'd say CONFIG_DEBUG_TEST_DRIVER_REMOVE did what it was supposed to do
and uncovered a big in ATA drivers. Since driver core did not
asynchronously scheduled those actions it should not wait for their
completion either, but either ATA core or drivers should wait for
probing to complete before allowing remove() methods to run.

>
> Fixes: bea5b158ff0d ("driver core: add test of driver remove calls during probe")
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
> ---
> Some time ago the issue was discussed on the linux-ide mailing list, see
>
>   https://www.spinics.net/lists/linux-ide/msg53481.html
>
>  drivers/base/dd.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index d76cd97..a4feecf 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -384,6 +384,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>         if (test_remove) {
>                 test_remove = false;
>
> +               async_synchronize_full();
> +
>                 if (dev->bus->remove)
>                         dev->bus->remove(dev);
>                 else if (drv->remove)
> --
> 2.10.2
>

Thanks.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] driver core: flush async calls before testing driver removal
  2016-12-10  0:15 [PATCH] driver core: flush async calls before testing driver removal Vladimir Zapolskiy
  2016-12-10  1:59 ` Dmitry Torokhov
@ 2016-12-10  7:32 ` Greg Kroah-Hartman
  2016-12-10 12:38   ` Vladimir Zapolskiy
  1 sibling, 1 reply; 10+ messages in thread
From: Greg Kroah-Hartman @ 2016-12-10  7:32 UTC (permalink / raw)
  To: Vladimir Zapolskiy; +Cc: Rob Herring, linux-kernel

On Sat, Dec 10, 2016 at 02:15:19AM +0200, Vladimir Zapolskiy wrote:
> If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
> positives are reported for ATA controller drivers, because ATA port
> probes are done asynchronously, and the same problem may also touch
> other asynchronously probed drivers.
> 
> To reduce the rate of false reports on boot call async_synchronize_full()
> before attempting to remove a driver, the same is done in delete_module()
> syscall for all possible drivers and in __device_release_driver() function
> for asynchronously probed drivers.

__device_release_driver() already calls this function, why call it
again?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] driver core: flush async calls before testing driver removal
  2016-12-10  7:32 ` Greg Kroah-Hartman
@ 2016-12-10 12:38   ` Vladimir Zapolskiy
  2016-12-10 13:04     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 10+ messages in thread
From: Vladimir Zapolskiy @ 2016-12-10 12:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: Rob Herring, linux-kernel

Hello Greg,

On 12/10/2016 09:32 AM, Greg Kroah-Hartman wrote:
> On Sat, Dec 10, 2016 at 02:15:19AM +0200, Vladimir Zapolskiy wrote:
>> If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
>> positives are reported for ATA controller drivers, because ATA port
>> probes are done asynchronously, and the same problem may also touch
>> other asynchronously probed drivers.
>>
>> To reduce the rate of false reports on boot call async_synchronize_full()
>> before attempting to remove a driver, the same is done in delete_module()
>> syscall for all possible drivers and in __device_release_driver() function
>> for asynchronously probed drivers.
> 
> __device_release_driver() already calls this function, why call it
> again?
> 

__device_release_driver() is not called on test removal of drivers, if
CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled.

This opens a possibility to races like one I've discovered:

  https://www.spinics.net/lists/linux-ide/msg53473.html

Next, async_synchronize_full() from __device_release_driver() is not
called in case of removal of ATA controller drivers, because
driver_allows_async_probing(drv) return value is false.

--
With best wishes,
Vladimir

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] driver core: flush async calls before testing driver removal
  2016-12-10  1:59 ` Dmitry Torokhov
@ 2016-12-10 12:57   ` Vladimir Zapolskiy
  0 siblings, 0 replies; 10+ messages in thread
From: Vladimir Zapolskiy @ 2016-12-10 12:57 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Greg Kroah-Hartman, Rob Herring, lkml

Hello Dmitry,

On 12/10/2016 03:59 AM, Dmitry Torokhov wrote:
> On Fri, Dec 9, 2016 at 4:15 PM, Vladimir Zapolskiy <vz@mleia.com> wrote:
>> If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
>> positives are reported for ATA controller drivers, because ATA port
>> probes are done asynchronously, and the same problem may also touch
>> other asynchronously probed drivers.
>>
>> To reduce the rate of false reports on boot call async_synchronize_full()
>> before attempting to remove a driver, the same is done in delete_module()
>> syscall for all possible drivers and in __device_release_driver() function
>> for asynchronously probed drivers.
> 
> I'd say CONFIG_DEBUG_TEST_DRIVER_REMOVE did what it was supposed to do
> and uncovered a big in ATA drivers. Since driver core did not
> asynchronously scheduled those actions it should not wait for their
> completion either, but either ATA core or drivers should wait for
> probing to complete before allowing remove() methods to run.
> 

can you please share the idea why?

I haven't managed to find any problems with ATA subsystem and drivers,
my fuzz testing by inserting delays to postpone scheduled execution
of async_port_probe() don't show any problems also.

So I believe the problem is with the test itself.

--
With best wishes,
Vladimir

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] driver core: flush async calls before testing driver removal
  2016-12-10 12:38   ` Vladimir Zapolskiy
@ 2016-12-10 13:04     ` Greg Kroah-Hartman
  2016-12-11  1:44       ` Vladimir Zapolskiy
  0 siblings, 1 reply; 10+ messages in thread
From: Greg Kroah-Hartman @ 2016-12-10 13:04 UTC (permalink / raw)
  To: Vladimir Zapolskiy; +Cc: Rob Herring, linux-kernel

On Sat, Dec 10, 2016 at 02:38:41PM +0200, Vladimir Zapolskiy wrote:
> Hello Greg,
> 
> On 12/10/2016 09:32 AM, Greg Kroah-Hartman wrote:
> > On Sat, Dec 10, 2016 at 02:15:19AM +0200, Vladimir Zapolskiy wrote:
> >> If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
> >> positives are reported for ATA controller drivers, because ATA port
> >> probes are done asynchronously, and the same problem may also touch
> >> other asynchronously probed drivers.
> >>
> >> To reduce the rate of false reports on boot call async_synchronize_full()
> >> before attempting to remove a driver, the same is done in delete_module()
> >> syscall for all possible drivers and in __device_release_driver() function
> >> for asynchronously probed drivers.
> > 
> > __device_release_driver() already calls this function, why call it
> > again?
> > 
> 
> __device_release_driver() is not called on test removal of drivers, if
> CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled.
> 
> This opens a possibility to races like one I've discovered:
> 
>   https://www.spinics.net/lists/linux-ide/msg53473.html
> 
> Next, async_synchronize_full() from __device_release_driver() is not
> called in case of removal of ATA controller drivers, because
> driver_allows_async_probing(drv) return value is false.

Hm, how does this not also get hit if you unbind/bind/unbind/bind/etc.
from userspace as well?  I don't think this is a
CONFIG_DEBUG_TEST_DRIVER_REMOVE issue, but just that this option finds
the problem corner cases as you are finding out :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] driver core: flush async calls before testing driver removal
  2016-12-10 13:04     ` Greg Kroah-Hartman
@ 2016-12-11  1:44       ` Vladimir Zapolskiy
  2016-12-12 17:50         ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Vladimir Zapolskiy @ 2016-12-11  1:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Tejun Heo; +Cc: Rob Herring, linux-kernel

Hello Greg,

I'm adding Tejun to the list of addressees.

On 12/10/2016 03:04 PM, Greg Kroah-Hartman wrote:
> On Sat, Dec 10, 2016 at 02:38:41PM +0200, Vladimir Zapolskiy wrote:
>> Hello Greg,
>>
>> On 12/10/2016 09:32 AM, Greg Kroah-Hartman wrote:
>>> On Sat, Dec 10, 2016 at 02:15:19AM +0200, Vladimir Zapolskiy wrote:
>>>> If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
>>>> positives are reported for ATA controller drivers, because ATA port
>>>> probes are done asynchronously, and the same problem may also touch
>>>> other asynchronously probed drivers.
>>>>
>>>> To reduce the rate of false reports on boot call async_synchronize_full()
>>>> before attempting to remove a driver, the same is done in delete_module()
>>>> syscall for all possible drivers and in __device_release_driver() function
>>>> for asynchronously probed drivers.
>>>
>>> __device_release_driver() already calls this function, why call it
>>> again?
>>>
>>
>> __device_release_driver() is not called on test removal of drivers, if
>> CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled.
>>
>> This opens a possibility to races like one I've discovered:
>>
>>   https://www.spinics.net/lists/linux-ide/msg53473.html
>>
>> Next, async_synchronize_full() from __device_release_driver() is not
>> called in case of removal of ATA controller drivers, because
>> driver_allows_async_probing(drv) return value is false.
> 
> Hm, how does this not also get hit if you unbind/bind/unbind/bind/etc.
> from userspace as well?  I don't think this is a
> CONFIG_DEBUG_TEST_DRIVER_REMOVE issue, but just that this option finds
> the problem corner cases as you are finding out :)
> 

and you are right, I managed to reproduce exactly the same race as before
running the unmodified kernel built from Torvald's branch head:

root@imx6q-sabresd:~# zcat /proc/config.gz | grep TEST_DRIVER_REMOVE
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set

root@imx6q-sabresd:~# uname -a
Linux imx6q-sabresd 4.9.0-rc8-00134-g0451698 #23 SMP Sun Dec 11 03:37:30 EET 2016 armv7l GNU/Linux

root@imx6q-sabresd:~# echo 2200000.sata > /sys/bus/platform/drivers/ahci-imx/unbind ; echo 2200000.sata > /sys/bus/platform/drivers/ahci-imx/bind; echo 2200000.sata > /sys/bus/platform/drivers/ahci-imx/unbind
ahci-imx 2200000.sata: fsl,transmit-level-mV not specified, using 00000024
ahci-imx 2200000.sata: fsl,transmit-boost-mdB not specified, using 00000480
ahci-imx 2200000.sata: fsl,transmit-atten-16ths not specified, using 00002000
ahci-imx 2200000.sata: fsl,receive-eq-mdB not specified, using 05000000
ahci-imx 2200000.sata: SSS flag set, parallel bus scan disabled
ahci-imx 2200000.sata: AHCI 0001.0300 32 slots 1 ports 3 Gbps 0x1 impl platform mode
ahci-imx 2200000.sata: flags: ncq sntf stag pm led clo only pmp pio slum part ccc apst
scsi host0: ahci-imx
ata3: SATA max UDMA/133 mmio [mem 0x02200000-0x02203fff] port 0x100 irq 72
ata3: SATA link down (SStatus 0 SControl 300)
ahci-imx 2200000.sata: no device found, disabling link.
ahci-imx 2200000.sata: pass .hotplug=1 to enable hotplug
------------[ cut here ]------------
WARNING: CPU: 2 PID: 375 at drivers/ata/libata-core.c:6482 ata_port_detach+0x130/0x140
Modules linked in: ahci_imx dw_hdmi_ahb_audio evbug
CPU: 2 PID: 375 Comm: sh Not tainted 4.9.0-rc8-00134-g0451698 #23
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<>] (dump_backtrace) from [<>] (show_stack+0x20/0x24)
[<>] (show_stack) from [<>] (dump_stack+0xb4/0xe8)
[<>] (dump_stack) from [<>] (__warn+0xe4/0x110)
[<>] (__warn) from [<>] (warn_slowpath_null+0x30/0x38)
[<>] (warn_slowpath_null) from [<>] (ata_port_detach+0x130/0x140)
[<>] (ata_port_detach) from [<>] (ata_platform_remove_one+0x34/0x4c)
[<>] (ata_platform_remove_one) from [<>] (platform_drv_remove+0x34/0x4c)
[<>] (platform_drv_remove) from [<>] (__device_release_driver+0x98/0x134)
[<>] (__device_release_driver) from [<>] (device_release_driver+0x30/0x3c)
[<>] (device_release_driver) from [<>] (unbind_store+0x88/0x108)
[<>] (unbind_store) from [<>] (drv_attr_store+0x30/0x3c)
[<>] (drv_attr_store) from [<>] (sysfs_kf_write+0x58/0x5c)
[<>] (sysfs_kf_write) from [<>] (kernfs_fop_write+0x108/0x21c)
[<>] (kernfs_fop_write) from [<>] (__vfs_write+0x3c/0x128)
[<>] (__vfs_write) from [<>] (vfs_write+0xb0/0x178)
[<>] (vfs_write) from [<>] (SyS_write+0x4c/0xa0)
[<>] (SyS_write) from [<>] (ret_fast_syscall+0x0/0x1c)
---[ end trace 457071853738c95e ]---

Tejun, will you look at the problem again?

--
With best wishes,
Vladimir

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] driver core: flush async calls before testing driver removal
  2016-12-11  1:44       ` Vladimir Zapolskiy
@ 2016-12-12 17:50         ` Tejun Heo
  2016-12-12 18:33           ` Rob Herring
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2016-12-12 17:50 UTC (permalink / raw)
  To: Vladimir Zapolskiy; +Cc: Greg Kroah-Hartman, Rob Herring, linux-kernel

Hello,

On Sun, Dec 11, 2016 at 03:44:36AM +0200, Vladimir Zapolskiy wrote:
> On 12/10/2016 03:04 PM, Greg Kroah-Hartman wrote:
> > Hm, how does this not also get hit if you unbind/bind/unbind/bind/etc.
> > from userspace as well?  I don't think this is a
> > CONFIG_DEBUG_TEST_DRIVER_REMOVE issue, but just that this option finds
> > the problem corner cases as you are finding out :)
> > 
> 
> and you are right, I managed to reproduce exactly the same race as before
> running the unmodified kernel built from Torvald's branch head:

Ah, you're right, so this means we need to add flush to all async
probing drivers.  Will do so for libata shortly.

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] driver core: flush async calls before testing driver removal
  2016-12-12 17:50         ` Tejun Heo
@ 2016-12-12 18:33           ` Rob Herring
  2016-12-12 18:46             ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Rob Herring @ 2016-12-12 18:33 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Vladimir Zapolskiy, Greg Kroah-Hartman, linux-kernel

On Mon, Dec 12, 2016 at 11:50 AM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Sun, Dec 11, 2016 at 03:44:36AM +0200, Vladimir Zapolskiy wrote:
>> On 12/10/2016 03:04 PM, Greg Kroah-Hartman wrote:
>> > Hm, how does this not also get hit if you unbind/bind/unbind/bind/etc.
>> > from userspace as well?  I don't think this is a
>> > CONFIG_DEBUG_TEST_DRIVER_REMOVE issue, but just that this option finds
>> > the problem corner cases as you are finding out :)
>> >
>>
>> and you are right, I managed to reproduce exactly the same race as before
>> running the unmodified kernel built from Torvald's branch head:
>
> Ah, you're right, so this means we need to add flush to all async
> probing drivers.  Will do so for libata shortly.

Maybe I'm confused, but don't you need this for all drivers? You need
sync the async SCSI scanning to the driver remove regardless of async
probe. The driver core synchronization is only for synchronizing the
remove with probe AIUI.

Rob

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] driver core: flush async calls before testing driver removal
  2016-12-12 18:33           ` Rob Herring
@ 2016-12-12 18:46             ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2016-12-12 18:46 UTC (permalink / raw)
  To: Rob Herring; +Cc: Vladimir Zapolskiy, Greg Kroah-Hartman, linux-kernel

Hello,

On Mon, Dec 12, 2016 at 12:33:36PM -0600, Rob Herring wrote:
> Maybe I'm confused, but don't you need this for all drivers? You need
> sync the async SCSI scanning to the driver remove regardless of async
> probe. The driver core synchronization is only for synchronizing the
> remove with probe AIUI.

Heh, I'm not quite following what you mean.  Can you please elaborate?
Also, on the second thought, it probably would be better to flush
async calls before unbind.  It's fragile to require indivdiual drivers
to do that and I can't think of benefits of doing so.  It's not like
the unbind / unload paths are hot in any way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-12-12 18:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-10  0:15 [PATCH] driver core: flush async calls before testing driver removal Vladimir Zapolskiy
2016-12-10  1:59 ` Dmitry Torokhov
2016-12-10 12:57   ` Vladimir Zapolskiy
2016-12-10  7:32 ` Greg Kroah-Hartman
2016-12-10 12:38   ` Vladimir Zapolskiy
2016-12-10 13:04     ` Greg Kroah-Hartman
2016-12-11  1:44       ` Vladimir Zapolskiy
2016-12-12 17:50         ` Tejun Heo
2016-12-12 18:33           ` Rob Herring
2016-12-12 18:46             ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).