* [PATCH] async: fix __lowest_in_progress()
@ 2009-01-12 10:16 Arjan van de Ven
0 siblings, 0 replies; 4+ messages in thread
From: Arjan van de Ven @ 2009-01-12 10:16 UTC (permalink / raw)
To: torvalds; +Cc: linux-kernel
>From 24bafdeb8a42e2e43a1928ca72159e6ff90cfe85 Mon Sep 17 00:00:00 2001
From: Arjan van de Ven <arjan@linux.intel.com>
Date: Sun, 11 Jan 2009 15:35:01 +0000
Subject: [PATCH] async: fix __lowest_in_progress()
At 37000 feet somewhere near Greenland I woke up from a half-sleep with the
realisation that __lowest_in_progress() is buggy. After landing I checked
and there were indeed 2 problems with it; this patch fixes both:
* The order of the list checks was wrong
* The locking was not correct.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
---
kernel/async.c | 21 ++++++++++++++++-----
1 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/kernel/async.c b/kernel/async.c
index f286e9f..608b32b 100644
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -90,12 +90,12 @@ extern int initcall_debug;
static async_cookie_t __lowest_in_progress(struct list_head *running)
{
struct async_entry *entry;
- if (!list_empty(&async_pending)) {
- entry = list_first_entry(&async_pending,
+ if (!list_empty(running)) {
+ entry = list_first_entry(running,
struct async_entry, list);
return entry->cookie;
- } else if (!list_empty(running)) {
- entry = list_first_entry(running,
+ } else if (!list_empty(&async_pending)) {
+ entry = list_first_entry(&async_pending,
struct async_entry, list);
return entry->cookie;
} else {
@@ -104,6 +104,17 @@ static async_cookie_t __lowest_in_progress(struct list_head *running)
}
}
+
+static async_cookie_t lowest_in_progress(struct list_head *running)
+{
+ unsigned long flags;
+ async_cookie_t ret;
+
+ spin_lock_irqsave(&async_lock, flags);
+ ret = __lowest_in_progress(running);
+ spin_unlock_irqrestore(&async_lock, flags);
+ return ret;
+}
/*
* pick the first pending entry and run it
*/
@@ -229,7 +240,7 @@ void async_synchronize_cookie_special(async_cookie_t cookie, struct list_head *r
starttime = ktime_get();
}
- wait_event(async_done, __lowest_in_progress(running) >= cookie);
+ wait_event(async_done, lowest_in_progress(running) >= cookie);
if (initcall_debug && system_state == SYSTEM_BOOTING) {
endtime = ktime_get();
--
1.6.0.6
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"
@ 2013-01-13 12:09 Alex Riesen
2013-01-13 16:56 ` Alan Stern
0 siblings, 1 reply; 4+ messages in thread
From: Alex Riesen @ 2013-01-13 12:09 UTC (permalink / raw)
To: Alan Stern, Jens Axboe; +Cc: linux-usb, Linux Kernel Mailing List
On Sat, Jan 12, 2013 at 11:52 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Sat, 12 Jan 2013, Alex Riesen wrote:
>> Now, who would be interested to handle this kind of misconfiguration ...
>
> So the whole thing was a false alarm?
Yes, almost. What about khubd hanging when machine is shutdown?
> Maybe you should report to the block-layer maintainers that it's
> possible to mess up the system by building an elevator as a module.
> That sounds like the sort of thing they'd be interested to hear.
Hi Jens,
may I point you at this problem report:
http://thread.gmane.org/gmane.linux.kernel/1420814
It is surely a misconfiguration on my part (the used io scheduler
configured as a module), but the behavior is somewhat problematic
anyway: at least in this case USB storage is essentially locked up.
Regards,
Alex
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"
2013-01-13 12:09 USB device cannot be reconnected and khubd "blocked for more than 120 seconds" Alex Riesen
@ 2013-01-13 16:56 ` Alan Stern
2013-01-13 17:42 ` Alex Riesen
0 siblings, 1 reply; 4+ messages in thread
From: Alan Stern @ 2013-01-13 16:56 UTC (permalink / raw)
To: Alex Riesen; +Cc: Jens Axboe, linux-usb, Linux Kernel Mailing List
On Sun, 13 Jan 2013, Alex Riesen wrote:
> On Sat, Jan 12, 2013 at 11:52 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Sat, 12 Jan 2013, Alex Riesen wrote:
> >> Now, who would be interested to handle this kind of misconfiguration ...
> >
> > So the whole thing was a false alarm?
>
> Yes, almost. What about khubd hanging when machine is shutdown?
What about it? I have trouble understanding all the descriptions you
have provided so far, because you talk about several different things
and change your mind a lot. Can you provide a single, simple scenario
that illustrates this problem?
Alan Stern
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"
2013-01-13 16:56 ` Alan Stern
@ 2013-01-13 17:42 ` Alex Riesen
2013-01-14 3:47 ` Ming Lei
0 siblings, 1 reply; 4+ messages in thread
From: Alex Riesen @ 2013-01-13 17:42 UTC (permalink / raw)
To: Alan Stern; +Cc: Jens Axboe, linux-usb, Linux Kernel Mailing List
On Sun, Jan 13, 2013 at 5:56 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Sun, 13 Jan 2013, Alex Riesen wrote:
>>
>> Yes, almost. What about khubd hanging when machine is shutdown?
>
> What about it? I have trouble understanding all the descriptions you
> have provided so far, because you talk about several different things
> and change your mind a lot. Can you provide a single, simple scenario
> that illustrates this problem?
1. Compile a kernel with deadline elevator as module
2. Boot into it, make sure the elevator is selected
(I used "elevator=deadline" in the kernel command line)
3. Insert a FAT formatted mass storage device in an USB2 port
Observe "io scheduler deadline registered"
4. Pull the stick out, wait a moment, and either shutdown or just
and press alt-sysrq-W:
[ 158.170585] usb 1-1.2: USB disconnect, device number 3
[ 158.170590] usb 1-1.2: unregistering device
[ 158.170595] usb 1-1.2: unregistering interface 1-1.2:1.0
[ 166.959398] SysRq : Show Blocked State
[ 166.959410] task PC stack pid father
[ 166.959432] khubd D ffff880213a68000 0 513 2 0x00000000
[ 166.959440] ffff880213affa18 0000000000000046 ffff88020000006b 0000000000000
[ 166.959448] ffff880213a68000 ffff880213afffd8 ffff880213afffd8 0000000000013
[ 166.959454] ffffffff81a14400 ffff880213a68000 ffff880213aff9a8 0000000000000
[ 166.959461] Call Trace:
[ 166.959475] [<ffffffff8104d763>] ? flush_work+0x6d/0x1fe
[ 166.959485] [<ffffffff8133defb>] ? scsi_remove_host+0x24/0x10e
[ 166.959490] [<ffffffff8104d6fb>] ? flush_work+0x5/0x1fe
[ 166.959499] [<ffffffff815e1dd6>] schedule+0x65/0x67
[ 166.959506] [<ffffffff815e201e>] schedule_preempt_disabled+0x18/0x24
[ 166.959513] [<ffffffff815e07e4>] mutex_lock_nested+0x181/0x2c1
[ 166.959518] [<ffffffff8133defb>] ? scsi_remove_host+0x24/0x10e
[ 166.959524] [<ffffffff8133defb>] scsi_remove_host+0x24/0x10e
[ 166.959531] [<ffffffff813910d5>] usb_stor_disconnect+0x77/0xbc
[ 166.959539] [<ffffffff81377ca3>] usb_unbind_interface+0x6c/0x14d
[ 166.959548] [<ffffffff813266fc>] __device_release_driver+0x88/0xdb
[ 166.959554] [<ffffffff81326774>] device_release_driver+0x25/0x32
[ 166.959561] [<ffffffff8132616f>] bus_remove_device+0xf5/0x10a
[ 166.959567] [<ffffffff8132413f>] device_del+0x12e/0x189
[ 166.959574] [<ffffffff81375d3a>] usb_disable_device+0xb1/0x20e
[ 166.959582] [<ffffffff8136ed8b>] usb_disconnect+0xab/0x113
[ 166.959589] [<ffffffff81370218>] hub_port_connect_change+0x1b0/0x879
[ 166.959597] [<ffffffff81370e3a>] hub_events+0x559/0x69d
[ 166.959604] [<ffffffff81370fb6>] hub_thread+0x38/0x19b
[ 166.959612] [<ffffffff81052587>] ? wake_up_bit+0x2a/0x2a
[ 166.959618] [<ffffffff81370f7e>] ? hub_events+0x69d/0x69d
[ 166.959625] [<ffffffff81051f2a>] kthread+0xd5/0xdd
[ 166.959632] [<ffffffff8105d5f6>] ? finish_task_switch+0x3f/0xf7
[ 166.959641] [<ffffffff81051e55>] ? __init_kthread_worker+0x5a/0x5a
[ 166.959648] [<ffffffff815e965c>] ret_from_fork+0x7c/0xb0
[ 166.959655] [<ffffffff81051e55>] ? __init_kthread_worker+0x5a/0x5a
This trace if from alt-sysrq-W. I can attach an image from the shutdown case,
the traces from that case are hard to save: the main storage is usually already
stopped. I believe it was the same, though.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"
2013-01-13 17:42 ` Alex Riesen
@ 2013-01-14 3:47 ` Ming Lei
2013-01-14 7:15 ` Ming Lei
0 siblings, 1 reply; 4+ messages in thread
From: Ming Lei @ 2013-01-14 3:47 UTC (permalink / raw)
To: Alex Riesen; +Cc: Alan Stern, Jens Axboe, linux-usb, Linux Kernel Mailing List
On Mon, Jan 14, 2013 at 1:42 AM, Alex Riesen <raa.lkml@gmail.com> wrote:
>
> 1. Compile a kernel with deadline elevator as module
> 2. Boot into it, make sure the elevator is selected
> (I used "elevator=deadline" in the kernel command line)
> 3. Insert a FAT formatted mass storage device in an USB2 port
> Observe "io scheduler deadline registered"
> 4. Pull the stick out, wait a moment, and either shutdown or just
> and press alt-sysrq-W:
I can reproduce the problem too on one ehci-only system(Pandaboard)
with deadline elevator module case, and no such problem in the
built-in case, and still on 3.8-rc3.
Follows the dmesg log:
[ 85.665679] usb 1-1.2.2: new high-speed USB device number 5 using ehci-omap
[ 85.784423] usb 1-1.2.2: default language 0x0409
[ 85.790008] usb 1-1.2.2: udev 5, busnum 1, minor = 4
[ 85.790039] usb 1-1.2.2: New USB device found, idVendor=0951, idProduct=1624
[ 85.790039] usb 1-1.2.2: New USB device strings: Mfr=1, Product=2,
SerialNumber=3
[ 85.790069] usb 1-1.2.2: Product: DataTraveler G2
[ 85.790069] usb 1-1.2.2: Manufacturer: Kingston
[ 85.790100] usb 1-1.2.2: SerialNumber: 0019E06C5346EA41D0000071
[ 85.790100] device: '1-1.2.2': device_add
[ 85.790344] bus: 'usb': add device 1-1.2.2
[ 85.790405] PM: Adding info for usb:1-1.2.2
[ 85.790740] bus: 'usb': driver_probe_device: matched device 1-1.2.2
with driver usb
[ 85.790771] bus: 'usb': really_probe: probing driver usb with device 1-1.2.2
[ 85.790802] usb 1-1.2.2: usb_probe_device
[ 85.790832] usb 1-1.2.2: configuration #1 chosen from 1 choice
[ 85.791076] usb 1-1.2.2: adding 1-1.2.2:1.0 (config #1, interface 0)
[ 85.791076] device: '1-1.2.2:1.0': device_add
[ 85.791137] bus: 'usb': add device 1-1.2.2:1.0
[ 85.791168] PM: Adding info for usb:1-1.2.2:1.0
[ 85.791442] device: 'ep_81': device_add
[ 85.791564] PM: Adding info for No Bus:ep_81
[ 85.791564] device: 'ep_02': device_add
[ 85.791687] PM: Adding info for No Bus:ep_02
[ 85.791687] driver: '1-1.2.2': driver_bound: bound to device 'usb'
[ 85.791717] bus: 'usb': really_probe: bound device 1-1.2.2 to driver usb
[ 85.791748] PM: Moving platform:musb-hdrc.0.auto to end of list
[ 85.791748] device: 'ep_00': device_add
[ 85.791778] platform musb-hdrc.0.auto: Retrying from deferred list
[ 85.791839] PM: Adding info for No Bus:ep_00
[ 85.791839] bus: 'platform': driver_probe_device: matched device
musb-hdrc.0.auto with driver musb-hdrc
[ 85.791839] bus: 'platform': really_probe: probing driver musb-hdrc
with device musb-hdrc.0.auto
[ 85.791870] hub 1-1.2:1.0: state 7 ports 4 chg 0000 evt 0004
[ 85.791900] unable to find transceiver of type USB2 PHY
[ 85.797454] HS USB OTG: no transceiver configured
[ 85.802703] musb-hdrc musb-hdrc.0.auto: musb_init_controller failed
with status -517
[ 85.811157] platform musb-hdrc.0.auto: Driver musb-hdrc requests
probe deferral
[ 85.811187] platform musb-hdrc.0.auto: Added to deferred list
[ 85.811218] PM: Moving platform:twl6030_usb to end of list
[ 85.811218] platform twl6030_usb: Retrying from deferred list
[ 85.811279] bus: 'platform': driver_probe_device: matched device
twl6030_usb with driver twl6030_usb
[ 85.811279] bus: 'platform': really_probe: probing driver
twl6030_usb with device twl6030_usb
[ 85.811309] twl6030_usb twl6030_usb: phy not ready, deferring probe
[ 85.811462] platform twl6030_usb: Driver twl6030_usb requests probe deferral
[ 85.811462] platform twl6030_usb: Added to deferred list
[ 85.883331] Initializing USB Mass Storage driver...
[ 85.883361] bus: 'usb': add driver usb-storage
[ 85.883453] bus: 'usb': driver_probe_device: matched device
1-1.2.2:1.0 with driver usb-storage
[ 85.883483] bus: 'usb': really_probe: probing driver usb-storage
with device 1-1.2.2:1.0
[ 85.883514] usb-storage 1-1.2.2:1.0: usb_probe_interface
[ 85.883544] usb-storage 1-1.2.2:1.0: usb_probe_interface - got id
[ 85.884094] scsi0 : usb-storage 1-1.2.2:1.0
[ 85.884155] device: 'host0': device_add
[ 85.884185] bus: 'scsi': add device host0
[ 85.884246] PM: Adding info for scsi:host0
[ 85.884552] device: 'host0': device_add
[ 85.884674] PM: Adding info for No Bus:host0
[ 85.884948] driver: '1-1.2.2:1.0': driver_bound: bound to device
'usb-storage'
[ 85.884979] bus: 'usb': really_probe: bound device 1-1.2.2:1.0 to
driver usb-storage
[ 85.884979] PM: Moving platform:musb-hdrc.0.auto to end of list
[ 85.885009] platform musb-hdrc.0.auto: Retrying from deferred list
[ 85.885070] bus: 'platform': driver_probe_device: matched device
musb-hdrc.0.auto with driver musb-hdrc
[ 85.885070] bus: 'platform': really_probe: probing driver musb-hdrc
with device musb-hdrc.0.auto
[ 85.885131] unable to find transceiver of type USB2 PHY
[ 85.886230] usbcore: registered new interface driver usb-storage
[ 85.886230] USB Mass Storage support registered.
[ 85.890655] HS USB OTG: no transceiver configured
[ 85.895660] musb-hdrc musb-hdrc.0.auto: musb_init_controller failed
with status -517
[ 85.903839] platform musb-hdrc.0.auto: Driver musb-hdrc requests
probe deferral
[ 85.903869] platform musb-hdrc.0.auto: Added to deferred list
[ 85.903869] PM: Moving platform:twl6030_usb to end of list
[ 85.903900] platform twl6030_usb: Retrying from deferred list
[ 85.903930] bus: 'platform': driver_probe_device: matched device
twl6030_usb with driver twl6030_usb
[ 85.903961] bus: 'platform': really_probe: probing driver
twl6030_usb with device twl6030_usb
[ 85.903991] twl6030_usb twl6030_usb: phy not ready, deferring probe
[ 85.904022] platform twl6030_usb: Driver twl6030_usb requests probe deferral
[ 85.904052] platform twl6030_usb: Added to deferred list
[ 86.901367] io scheduler deadline registered (default)
[ 181.168487] INFO: task modprobe:2462 blocked for more than 90 seconds.
[ 181.175323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 181.183624] modprobe D c04f1920 0 2462 2461 0x00000000
[ 181.183685] [<c04f1920>] (__schedule+0x5fc/0x6d4) from [<c005eba4>]
(async_synchronize_cookie_domain+0xdc/0x
168)
[ 181.183715] [<c005eba4>]
(async_synchronize_cookie_domain+0xdc/0x168) from [<c005ed04>]
(async_synchronize_f
ull+0x3c/0x60)
[ 181.183776] [<c005ed04>] (async_synchronize_full+0x3c/0x60) from
[<c0085610>] (load_module+0x1aac/0x1cdc)
[ 181.183807] [<c0085610>] (load_module+0x1aac/0x1cdc) from
[<c0085944>] (sys_init_module+0x104/0x110)
[ 181.183837] [<c0085944>] (sys_init_module+0x104/0x110) from
[<c000dfe0>] (ret_fast_syscall+0x0/0x48)
[ 271.175506] INFO: task modprobe:2462 blocked for more than 90 seconds.
[ 271.182373] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 271.190826] modprobe D c04f1920 0 2462 2461 0x00000000
[ 271.190887] [<c04f1920>] (__schedule+0x5fc/0x6d4) from [<c005eba4>]
(async_synchronize_cookie_domain+0xdc/0x
168)
[ 271.190917] [<c005eba4>]
(async_synchronize_cookie_domain+0xdc/0x168) from [<c005ed04>]
(async_synchronize_f
ull+0x3c/0x60)
[ 271.190948] [<c005ed04>] (async_synchronize_full+0x3c/0x60) from
[<c0085610>] (load_module+0x1aac/0x1cdc)
[ 271.190948] [<c0085610>] (load_module+0x1aac/0x1cdc) from
[<c0085944>] (sys_init_module+0x104/0x110)
[ 271.190979] [<c0085944>] (sys_init_module+0x104/0x110) from
[<c000dfe0>] (ret_fast_syscall+0x0/0x48)
Thanks,
--
Ming Lei
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"
2013-01-14 3:47 ` Ming Lei
@ 2013-01-14 7:15 ` Ming Lei
2013-01-14 17:30 ` Linus Torvalds
0 siblings, 1 reply; 4+ messages in thread
From: Ming Lei @ 2013-01-14 7:15 UTC (permalink / raw)
To: Alex Riesen, Linus Torvalds
Cc: Alan Stern, Jens Axboe, linux-usb, Linux Kernel Mailing List
On Mon, Jan 14, 2013 at 11:47 AM, Ming Lei <ming.lei@canonical.com> wrote:
> On Mon, Jan 14, 2013 at 1:42 AM, Alex Riesen <raa.lkml@gmail.com> wrote:
> [ 86.901367] io scheduler deadline registered (default)
> [ 181.168487] INFO: task modprobe:2462 blocked for more than 90 seconds.
> [ 181.175323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 181.183624] modprobe D c04f1920 0 2462 2461 0x00000000
> [ 181.183685] [<c04f1920>] (__schedule+0x5fc/0x6d4) from [<c005eba4>]
> (async_synchronize_cookie_domain+0xdc/0x168)
> [ 181.183715] [<c005eba4>] (async_synchronize_cookie_domain+0xdc/0x168) from [<c005ed04>] (async_synchronize_full+0x3c/0x60)
> [ 181.183776] [<c005ed04>] (async_synchronize_full+0x3c/0x60) from [<c0085610>] (load_module+0x1aac/0x1cdc)
> [ 181.183807] [<c0085610>] (load_module+0x1aac/0x1cdc) from [<c0085944>] (sys_init_module+0x104/0x110)
> [ 181.183837] [<c0085944>] (sys_init_module+0x104/0x110) from
> [<c000dfe0>] (ret_fast_syscall+0x0/0x48)
The deadlock problem is caused by calling request_module() inside
async function of do_scan_async(), and it was introduced by Linus's
below commit:
commit d6de2c80e9d758d2e36c21699117db6178c0f517
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri Apr 10 12:17:41 2009 -0700
async: Fix module loading async-work regression
IMO, maybe the commit isn't a proper fix, considered the
below fact:
- it isn't good to allow async function to be marked as __init
- any user mode shouldn't expect that the device is ready just
after completing of 'insmod', and drivers should make
the device ready for user mode just after its async probing or
other kind of async initialization(done in work or kthread)
completes.
- from view of driver, introducing async_synchronize_full() after
do_one_initcall() inside do_init_module() is like a sync probe
for drivers built as module, and cause this kind of deadlock easily.
So could we revert the commit and fix the previous problems just
case by case? or other better fix?
Thanks,
--
Ming Lei
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"
2013-01-14 7:15 ` Ming Lei
@ 2013-01-14 17:30 ` Linus Torvalds
2013-01-15 1:53 ` Ming Lei
0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2013-01-14 17:30 UTC (permalink / raw)
To: Ming Lei
Cc: Alex Riesen, Alan Stern, Jens Axboe, USB list, Linux Kernel Mailing List
On Sun, Jan 13, 2013 at 11:15 PM, Ming Lei <ming.lei@canonical.com> wrote:
>
> The deadlock problem is caused by calling request_module() inside
> async function of do_scan_async(), and it was introduced by Linus's
> below commit:
>
> commit d6de2c80e9d758d2e36c21699117db6178c0f517
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri Apr 10 12:17:41 2009 -0700
>
> async: Fix module loading async-work regression
>
> IMO, maybe the commit isn't a proper fix, considered the
> below fact:
>
> - it isn't good to allow async function to be marked as __init
Immaterial. For modules, __init is a non-issue. For non-modules, the
synchronization elsewhere is sufficient.
> - any user mode shouldn't expect that the device is ready just
> after completing of 'insmod'
Bullshit. That expectation is just a fact. People insmod a device
driver, and mount the device immediately in scripts.
We do not say "user mode shouldn't". Seriously. EVER. User mode
*does*, and we deal with it. Learn it now, and stop ever saying that
again.
This is really starting to annoy me. Kernel developers who say "user
mode should be fixes to not do that" should go somewhere else. The
whole and *only* point of a kernel is to hide these kinds of issues
from user mode, and make things "just work" in user mode. User mode
should not ever worry about "oh, doing X can trigger a module load, so
now the device might not be available immediately, so I should delay
and re-try until it is".
That's just f*cking crazy talk.
We have a very simple rule in the kernel: we don't break user space. EVER.
Learn that rule. I don't ever want to hear "any user mode shouldn't
expect" again. User mode *does* expect. End of discussion.
> - from view of driver, introducing async_synchronize_full() after
> do_one_initcall() inside do_init_module() is like a sync probe
> for drivers built as module, and cause this kind of deadlock easily.
>
> So could we revert the commit and fix the previous problems just
> case by case? or other better fix?
There's no way in hell we take a "fix things one by one" approach.
It's not going to work. And your suggestion seems to not do async
discovery of devices in general, which is a *much* worse fix than
anything else. It's just crazy.
But there are other approaches we might take. We might move the call to
async_synchronize_full();
to other places. For example, maybe we're better off doing it at
block/char device open instead?
Linus
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"
2013-01-14 17:30 ` Linus Torvalds
@ 2013-01-15 1:53 ` Ming Lei
2013-01-15 6:23 ` Ming Lei
0 siblings, 1 reply; 4+ messages in thread
From: Ming Lei @ 2013-01-15 1:53 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alex Riesen, Alan Stern, Jens Axboe, USB list, Linux Kernel Mailing List
On Tue, Jan 15, 2013 at 1:30 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Jan 13, 2013 at 11:15 PM, Ming Lei <ming.lei@canonical.com> wrote:
>>
>> The deadlock problem is caused by calling request_module() inside
>> async function of do_scan_async(), and it was introduced by Linus's
>> below commit:
>>
>> commit d6de2c80e9d758d2e36c21699117db6178c0f517
>> Author: Linus Torvalds <torvalds@linux-foundation.org>
>> Date: Fri Apr 10 12:17:41 2009 -0700
>>
>> async: Fix module loading async-work regression
>>
>> IMO, maybe the commit isn't a proper fix, considered the
>> below fact:
>>
>> - it isn't good to allow async function to be marked as __init
>
> Immaterial. For modules, __init is a non-issue. For non-modules, the
> synchronization elsewhere is sufficient.
Looks 5d38258ec026921a7b266f4047ebeaa75db358e5(ACPI battery:
fix async boot oops) addresses the issue of __init for modules.
>
>> - any user mode shouldn't expect that the device is ready just
>> after completing of 'insmod'
>
> Bullshit. That expectation is just a fact. People insmod a device
> driver, and mount the device immediately in scripts.
I mean we can let the device node populated in probe() first,
but let open() wait for completion of the async probe(). Maybe my
expression is not accurate, here the 'device isn't ready' just means
that the async probe() isn't completed, and doesn't mean the device
node doesn't come.
>
> We do not say "user mode shouldn't". Seriously. EVER. User mode
> *does*, and we deal with it. Learn it now, and stop ever saying that
> again.
>
> This is really starting to annoy me. Kernel developers who say "user
> mode should be fixes to not do that" should go somewhere else. The
> whole and *only* point of a kernel is to hide these kinds of issues
> from user mode, and make things "just work" in user mode. User mode
> should not ever worry about "oh, doing X can trigger a module load, so
> now the device might not be available immediately, so I should delay
> and re-try until it is".
>
> That's just f*cking crazy talk.
>
> We have a very simple rule in the kernel: we don't break user space. EVER.
No, I don't mean we should break user space, see above.
>
> Learn that rule. I don't ever want to hear "any user mode shouldn't
> expect" again. User mode *does* expect. End of discussion.
>
>> - from view of driver, introducing async_synchronize_full() after
>> do_one_initcall() inside do_init_module() is like a sync probe
>> for drivers built as module, and cause this kind of deadlock easily.
>>
>> So could we revert the commit and fix the previous problems just
>> case by case? or other better fix?
>
> There's no way in hell we take a "fix things one by one" approach.
> It's not going to work. And your suggestion seems to not do async
> discovery of devices in general, which is a *much* worse fix than
> anything else. It's just crazy.
I will try to figure out one patch to address the scsi block async probe
issue first, and see if it can fix the problem by moving add_disk()
into sd_probe()
and calling async_synchronize_full_domain(&scsi_sd_probe_domain)
in the entry of sd_open().
>
> But there are other approaches we might take. We might move the call to
>
> async_synchronize_full();
>
> to other places. For example, maybe we're better off doing it at
> block/char device open instead?
Looks it is similar with the above idea, but we have to remove the
async_synchronize_full() in do_init_module() together.
Thanks,
--
Ming Lei
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"
2013-01-15 1:53 ` Ming Lei
@ 2013-01-15 6:23 ` Ming Lei
2013-01-15 17:36 ` Linus Torvalds
0 siblings, 1 reply; 4+ messages in thread
From: Ming Lei @ 2013-01-15 6:23 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alex Riesen, Alan Stern, Jens Axboe, USB list, Linux Kernel Mailing List
On Tue, Jan 15, 2013 at 9:53 AM, Ming Lei <ming.lei@canonical.com> wrote:
>
> I will try to figure out one patch to address the scsi block async probe
> issue first, and see if it can fix the problem by moving add_disk()
> into sd_probe()
> and calling async_synchronize_full_domain(&scsi_sd_probe_domain)
> in the entry of sd_open().
Looks it isn't doable because the block partition device can only be created
inside the async things.
But I have another idea to address the problem, and let module code call
async_synchronize_full() only if the module requires that explicitly, so how
about the below draft patch?
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7992635..c5106a0 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3143,6 +3143,8 @@ static int __init init_sd(void)
if (err)
goto err_out_driver;
+ mod_init_async_wait(THIS_MODULE);
+
return 0;
err_out_driver:
diff --git a/include/linux/module.h b/include/linux/module.h
index 7760c6d..09bd4c5 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -300,6 +300,12 @@ struct module
unsigned int taints; /* same bits as kernel:tainted */
+ /*
+ * set if the module wants to call async_synchronize_full
+ * after its init() is complted.
+ */
+ unsigned int init_async_wait:1;
+
#ifdef CONFIG_GENERIC_BUG
/* Support for BUG */
unsigned num_bugs;
@@ -656,4 +662,16 @@ static inline void module_bug_finalize(const Elf_Ehdr *hdr,
static inline void module_bug_cleanup(struct module *mod) {}
#endif /* CONFIG_GENERIC_BUG */
+/*
+ * If one module wants to complete its all async code after
+ * its init() executed, the module can call this function in
+ * the entry of its init(), but the module's async function
+ * can't call request_module, otherwise deadlock will be caused.
+ */
+static inline void mod_init_async_wait(struct module *mod)
+{
+ if (mod)
+ mod->init_async_wait = 1;
+}
+
#endif /* _LINUX_MODULE_H */
diff --git a/kernel/module.c b/kernel/module.c
index 250092c..dc5d011 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3058,8 +3058,9 @@ static int do_init_module(struct module *mod)
blocking_notifier_call_chain(&module_notify_list,
MODULE_STATE_LIVE, mod);
- /* We need to finish all async code before the module init sequence is done */
- async_synchronize_full();
+ /* Only complete all async code if the module requires that */
+ if (mod->init_async_wait)
+ async_synchronize_full();
mutex_lock(&module_mutex);
/* Drop initial reference. */
Thanks,
--
Ming Lei
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"
2013-01-15 6:23 ` Ming Lei
@ 2013-01-15 17:36 ` Linus Torvalds
2013-01-15 18:32 ` Tejun Heo
0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2013-01-15 17:36 UTC (permalink / raw)
To: Ming Lei, Tejun Heo
Cc: Alex Riesen, Alan Stern, Jens Axboe, USB list, Linux Kernel Mailing List
[ Added Tejun to the discussion, since he's the async go-to-guy ]
On Mon, Jan 14, 2013 at 10:23 PM, Ming Lei <ming.lei@canonical.com> wrote:
>
> But I have another idea to address the problem, and let module code call
> async_synchronize_full() only if the module requires that explicitly, so how
> about the below draft patch?
No way.
This kind of "let's just let drivers tell us when they used async
helpers" is basically *asking* for buggy code. In fact, just to prove
how bad it is, YOU SCREWED IT UP YOURSELF.
Because it's not just sd.c that uses async_schedule(), and would need
the async synchronize. It's floppy.c, it's generic scsi scanning (so
scsi tapes etc), and it's libata-core.c.
This kind of "let's randomly encourage people to write subtly buggy
code that has magical timing dependencies, so that the developer won't
likely even see it because he has fast disks etc" code is totally
unacceptable. And this code was *designed* to be that kind of buggy.
No, if we set a flag like this, then it needs to be set
*automatically*, so that a module cannot screw this up by mistake.
It could be as simple as having a per-thread flag that gets set by the
__async_schedule() function, and gets cleared by fork. Then the module
code could do something like
/* before calling the module ->init function */
current->used_async = 0;
...
if (current->used_async)
async_synchronize_full();
or whatever.
Tejun, comments? You can see the whole thread on lkml, but the basic
problem is that the module loading doing the unconditional
async_synchronize_full() has caused problems, because we have
- load module A
- module A does per-controller async discovery of its devices (eg
scsi or ata probing)
- in the async thread, it initializes somethign that needs another
module B (in this case the default IO scheduler module)
- modprobe for B loads the IO scheduler module successfully
at the end of the module load, it does
async_synchronize_full() to make sure load_module won't return before
the module is ready
*DEADLOCK*, because the async_synchronize_full() thing
actually waits for not the module B async code (it didn't have any),
but for the module *A* async code, which is waiting for module B to
finish.
Now, I'll happily argue that we shouldn't have this kind of "load
modules from random context" behavior in the kernel, and I think the
block layer is to blame for doing the IO scheduler load at an insane
time. So "don't do that then" would be the best solution. Sadly, we
don't even have a good way to notice that we're doing it, so "hacky
workaround that at least doesn't require driver authors to care" is
likely the second-best workaround.
But the "hacky workaround" absolutely needs to be *automatic*. Because
the "driver writers need to get this subtle untestable thing right" is
*not* acceptable. That's the patch that Ming Lei did, and I refuse to
have that kind of fragile crap in the kernel.
Linus
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"
2013-01-15 17:36 ` Linus Torvalds
@ 2013-01-15 18:32 ` Tejun Heo
2013-01-16 17:19 ` [PATCH] async: fix __lowest_in_progress() Tejun Heo
0 siblings, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2013-01-15 18:32 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ming Lei, Alex Riesen, Alan Stern, Jens Axboe, USB list,
Linux Kernel Mailing List
Hello, Linus.
On Tue, Jan 15, 2013 at 09:36:57AM -0800, Linus Torvalds wrote:
> Tejun, comments? You can see the whole thread on lkml, but the basic
> problem is that the module loading doing the unconditional
> async_synchronize_full() has caused problems, because we have
>
> - load module A
> - module A does per-controller async discovery of its devices (eg
> scsi or ata probing)
> - in the async thread, it initializes somethign that needs another
> module B (in this case the default IO scheduler module)
> - modprobe for B loads the IO scheduler module successfully
> at the end of the module load, it does
> async_synchronize_full() to make sure load_module won't return before
> the module is ready
> *DEADLOCK*, because the async_synchronize_full() thing
> actually waits for not the module B async code (it didn't have any),
> but for the module *A* async code, which is waiting for module B to
> finish.
I think the root problem here, apart from request_module() from block
- which is a bit nasty but making that part completely async would too
be quite nasty albeit in a different way - is that
async_synchronize_full() is way too indescriminate. It's something
only suitable for things like the end of system init.
I'm wondering whether what we need is a rudimentray nesting like the
following.
finished_loading()
{
blah blah;
cookie = async_current_cookie();
do init calls;
async_synchronize_upto(cookie);
blah blah;
}
The nesting here would be an approximation as the dependency recorded
here is chronological. I *suspect* this should be safe unless the
module is doing something weird. Need to think more about it. One
way or the other, I think what we need is some form of scoping for
flushing async ops.
BTW, the current synchronization is broken - cookie isn't transferred
to running->domain in queueing order but __lowest_in_progress()
assumes that. I think I broke that while converting it to workqueue.
Anyways, working on it.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] async: fix __lowest_in_progress()
2013-01-15 18:32 ` Tejun Heo
@ 2013-01-16 17:19 ` Tejun Heo
2013-01-17 18:16 ` Linus Torvalds
0 siblings, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2013-01-16 17:19 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ming Lei, Alex Riesen, Alan Stern, Jens Axboe, USB list,
Linux Kernel Mailing List, Arjan van de Ven
083b804c4d3e1e3d0eace56bdbc0f674946d2847 ("async: use workqueue for
worker pool") made it possible that async jobs are moved from pending
to running out-of-order. While pending async jobs will be queued and
dispatched for execution in the same order, nothing guarantees they'll
enter "1) move self to the running queue" of async_run_entry_fn() in
the same order.
This broke __lowest_in_progress(). running->domain may not be
properly sorted and is not guaranteed to contain lower cookies than
pending list when not empty. Fix it by ensuring sort-inserting to the
running list and always looking at both pending and running when
trying to determine the lowest cookie.
Over time, the async synchronization implementation became quite
messy. We better restructure it such that each async_entry is linked
to two lists - one global and one per domain - and not move it when
execution starts. There's no reason to distinguish pending and
running. They behave the same for synchronization purposes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: stable@vger.kernel.org
---
And here's the fix for the breakage I mentioned earlier. It wouldn't
happen often in the wild and the effect of it happening wouldn't be
critical for modern distros but it's still kinda surprising nobody
noticed this.
We definitely need to rewrite async synchronization. It was already
messy and this makes it worse and there's no reason to be messy here.
Thanks.
kernel/async.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
--- a/kernel/async.c
+++ b/kernel/async.c
@@ -86,18 +86,27 @@ static atomic_t entry_count;
*/
static async_cookie_t __lowest_in_progress(struct async_domain *running)
{
+ async_cookie_t first_running = next_cookie; /* infinity value */
+ async_cookie_t first_pending = next_cookie; /* ditto */
struct async_entry *entry;
+ /*
+ * Both running and pending lists are sorted but not disjoint.
+ * Take the first cookies from both and return the min.
+ */
if (!list_empty(&running->domain)) {
entry = list_first_entry(&running->domain, typeof(*entry), list);
- return entry->cookie;
+ first_running = entry->cookie;
}
- list_for_each_entry(entry, &async_pending, list)
- if (entry->running == running)
- return entry->cookie;
+ list_for_each_entry(entry, &async_pending, list) {
+ if (entry->running == running) {
+ first_pending = entry->cookie;
+ break;
+ }
+ }
- return next_cookie; /* "infinity" value */
+ return min(first_running, first_pending);
}
static async_cookie_t lowest_in_progress(struct async_domain *running)
@@ -118,13 +127,17 @@ static void async_run_entry_fn(struct wo
{
struct async_entry *entry =
container_of(work, struct async_entry, work);
+ struct async_entry *pos;
unsigned long flags;
ktime_t uninitialized_var(calltime), delta, rettime;
struct async_domain *running = entry->running;
- /* 1) move self to the running queue */
+ /* 1) move self to the running queue, make sure it stays sorted */
spin_lock_irqsave(&async_lock, flags);
- list_move_tail(&entry->list, &running->domain);
+ list_for_each_entry_reverse(pos, &running->domain, list)
+ if (entry->cookie < pos->cookie)
+ break;
+ list_move_tail(&entry->list, &pos->list);
spin_unlock_irqrestore(&async_lock, flags);
/* 2) run (and print duration) */
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] async: fix __lowest_in_progress()
2013-01-16 17:19 ` [PATCH] async: fix __lowest_in_progress() Tejun Heo
@ 2013-01-17 18:16 ` Linus Torvalds
2013-01-17 18:50 ` Tejun Heo
0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2013-01-17 18:16 UTC (permalink / raw)
To: Tejun Heo
Cc: Ming Lei, Alex Riesen, Alan Stern, Jens Axboe, USB list,
Linux Kernel Mailing List, Arjan van de Ven
Tejun, mind explaining this one a bit more to me?
If ordering matters, and we have a running queue and a pending queue,
how could the pending queue *ever* be lower than the running one?
That implies that something was taken off the pending queue and put on
the running queue out of order, right?
And that in turn implies that there isn't much of a "lowest" ordering
at all, so how could anybody even care about what lowest is? It seems
to be a meaningless measure.
So with that in mind, I don't see what semantics the first part of the
patch fixes. Can you explain more?
Linus
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] async: fix __lowest_in_progress()
2013-01-17 18:16 ` Linus Torvalds
@ 2013-01-17 18:50 ` Tejun Heo
0 siblings, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2013-01-17 18:50 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ming Lei, Alex Riesen, Alan Stern, Jens Axboe, USB list,
Linux Kernel Mailing List, Arjan van de Ven
Hello,
On Thu, Jan 17, 2013 at 10:16:50AM -0800, Linus Torvalds wrote:
> Tejun, mind explaining this one a bit more to me?
>
> If ordering matters, and we have a running queue and a pending queue,
> how could the pending queue *ever* be lower than the running one?
So, before being converted to workqueue, async spooled up its own
workers and each worker would lock and move the first pending item to
the executing list and everything was in order.
The conversion to workqueue was done by adding work_struct to each
async_entry and async just schedules the work item. The queueing and
dispatching of such work items are still in order but now each worker
thread is associated with a specific async_entry and move that
specific async_entry to the executing list. So, depending on which
worker reaches that point earlier, which is completely
non-deterministic, we may end up moving an async_entry with larger
cookie before one with smaller one.
> That implies that something was taken off the pending queue and put on
> the running queue out of order, right?
>
> And that in turn implies that there isn't much of a "lowest" ordering
> at all, so how could anybody even care about what lowest is? It seems
> to be a meaningless measure.
The execution is still lowest first as workqueue would dispatch
workers in queued order. It just is that they can reach the
synchronization point at their own differing paces.
> So with that in mind, I don't see what semantics the first part of the
> patch fixes. Can you explain more?
The problem with the code is that it's keeping a global pending list
and domain-specific running lists. I don't know why it developed like
this but even before workqueue conversion the code was weird.
* It has sorted per-domain running list, so looking at the first item
is easy.
* It has sorted global pennding list, and looking for first item in a
domain involves scanning it.
Global syncing ends up scanning all per-domain running lists and
domain syncing ends up scanning global pending list, when all we need
is each async item to be queued on two lists - global and per-domain
in-flight lists - and stay there until done.
The posted patch is minimal fix while keeping the basic operation the
same so that it doesn't disturb -stable too much. I'll prep a patch
to redo synchronization for 3.9.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-01-17 18:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-12 10:16 [PATCH] async: fix __lowest_in_progress() Arjan van de Ven
2013-01-13 12:09 USB device cannot be reconnected and khubd "blocked for more than 120 seconds" Alex Riesen
2013-01-13 16:56 ` Alan Stern
2013-01-13 17:42 ` Alex Riesen
2013-01-14 3:47 ` Ming Lei
2013-01-14 7:15 ` Ming Lei
2013-01-14 17:30 ` Linus Torvalds
2013-01-15 1:53 ` Ming Lei
2013-01-15 6:23 ` Ming Lei
2013-01-15 17:36 ` Linus Torvalds
2013-01-15 18:32 ` Tejun Heo
2013-01-16 17:19 ` [PATCH] async: fix __lowest_in_progress() Tejun Heo
2013-01-17 18:16 ` Linus Torvalds
2013-01-17 18:50 ` Tejun Heo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.