All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Ming Lei <ming.lei@canonical.com>, Tejun Heo <tj@kernel.org>
Cc: Alex Riesen <raa.lkml@gmail.com>,
	Alan Stern <stern@rowland.harvard.edu>,
	Jens Axboe <axboe@kernel.dk>,
	USB list <linux-usb@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds"
Date: Tue, 15 Jan 2013 09:36:57 -0800	[thread overview]
Message-ID: <CA+55aFyOyTS1ZCm6UWqGLteoPuFXKu0WSYB-aTeRYvJUYPqe5A@mail.gmail.com> (raw)
In-Reply-To: <CACVXFVMFyhM+EsBytw0MKVW4Vp4aOtc_5sfX0v-643m7fwnc3Q@mail.gmail.com>

[ Added Tejun to the discussion, since he's the async go-to-guy ]

On Mon, Jan 14, 2013 at 10:23 PM, Ming Lei <ming.lei@canonical.com> wrote:
>
> But I have another idea to address the problem, and let module code call
> async_synchronize_full() only if the module requires that explicitly, so how
> about the below draft patch?

No way.

This kind of "let's just let drivers tell us when they used async
helpers" is basically *asking* for buggy code. In fact, just to prove
how bad it is, YOU SCREWED IT UP YOURSELF.

Because it's not just sd.c that uses async_schedule(), and would need
the async synchronize. It's floppy.c, it's generic scsi scanning (so
scsi tapes etc), and it's libata-core.c.

This kind of "let's randomly encourage people to write subtly buggy
code that has magical timing dependencies, so that the developer won't
likely even see it because he has fast disks etc" code is totally
unacceptable. And this code was *designed* to be that kind of buggy.

No, if we set a flag like this, then it needs to be set
*automatically*, so that a module cannot screw this up by mistake.

It could be as simple as having a per-thread flag that gets set by the
__async_schedule() function, and gets cleared by fork. Then the module
code could do something like

   /* before calling the module ->init function */
   current->used_async = 0;
   ...
   if (current->used_async)
      async_synchronize_full();

or whatever.

Tejun, comments? You can see the whole thread on lkml, but the basic
problem is that the module loading doing the unconditional
async_synchronize_full() has caused problems, because we have

 - load module A
   - module A does per-controller async discovery of its devices (eg
scsi or ata probing)
   - in the async thread, it initializes somethign that needs another
module B (in this case the default IO scheduler module)
      - modprobe for B loads the IO scheduler module successfully
          at the end of the module load, it does
async_synchronize_full() to make sure load_module won't return before
the module is ready
          *DEADLOCK*, because the async_synchronize_full() thing
actually waits for not the module B async code (it didn't have any),
but for the module *A* async code, which is waiting for module B to
finish.

Now, I'll happily argue that we shouldn't have this kind of "load
modules from random context" behavior in the kernel, and I think the
block layer is to blame for doing the IO scheduler load at an insane
time. So "don't do that then" would be the best solution. Sadly, we
don't even have a good way to notice that we're doing it, so "hacky
workaround that at least doesn't require driver authors to care" is
likely the second-best workaround.

But the "hacky workaround" absolutely needs to be *automatic*. Because
the "driver writers need to get this subtle untestable thing right" is
*not* acceptable. That's the patch that Ming Lei did, and I refuse to
have that kind of fragile crap in the kernel.

                          Linus

  reply	other threads:[~2013-01-15 17:37 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-11 21:04 USB device cannot be reconnected and khubd "blocked for more than 120 seconds" Alex Riesen
2013-01-12  7:48 ` Alex Riesen
2013-01-12  9:18   ` Lan Tianyu
2013-01-12 17:37   ` Alan Stern
2013-01-12 19:39     ` Alex Riesen
2013-01-12 20:33       ` Alex Riesen
2013-01-12 22:52         ` Alan Stern
2013-01-13 12:09           ` Alex Riesen
2013-01-13 16:56             ` Alan Stern
2013-01-13 17:42               ` Alex Riesen
2013-01-13 19:16                 ` Oliver Neukum
2013-01-14  2:39                   ` Alan Stern
2013-01-14 16:43                     ` Alex Riesen
2013-01-14  3:47                 ` Ming Lei
2013-01-14  7:15                   ` Ming Lei
2013-01-14 17:30                     ` Linus Torvalds
2013-01-14 18:04                       ` Alan Stern
2013-01-14 18:34                         ` Linus Torvalds
2013-01-15  1:53                       ` Ming Lei
2013-01-15  6:23                         ` Ming Lei
2013-01-15 17:36                           ` Linus Torvalds [this message]
2013-01-15 18:18                             ` Linus Torvalds
2013-01-15 23:17                               ` Tejun Heo
2013-01-15 18:20                             ` Alan Stern
2013-01-15 18:39                               ` Tejun Heo
2013-01-15 18:32                             ` Tejun Heo
2013-01-15 20:18                               ` Linus Torvalds
2013-01-15 23:50                                 ` Tejun Heo
2013-01-16  0:25                                   ` Arjan van de Ven
2013-01-16  0:35                                     ` Tejun Heo
2013-01-16  4:01                                       ` Alan Stern
2013-01-16 16:12                                         ` Tejun Heo
2013-01-16 17:01                                           ` Alan Stern
2013-01-16 17:37                                             ` Tejun Heo
2013-01-16 17:51                                               ` Alan Stern
2013-01-16  0:36                                   ` Linus Torvalds
2013-01-16  0:40                                     ` Linus Torvalds
2013-01-16  2:52                                       ` [PATCH] module, async: async_synchronize_full() on module init iff async is used Tejun Heo
2013-01-16  3:00                                         ` Linus Torvalds
2013-01-16  3:25                                           ` Tejun Heo
2013-01-16  3:37                                             ` Linus Torvalds
2013-01-16 16:22                                               ` Arjan van de Ven
2013-01-16 16:48                                               ` Tejun Heo
2013-01-16 17:03                                                 ` Arjan van de Ven
2013-01-16 17:06                                                   ` Linus Torvalds
2013-01-16 21:30                                                     ` [PATCH 1/2] init, block: try to load default elevator module early during boot Tejun Heo
2013-01-17 18:05                                                       ` Linus Torvalds
2013-01-17 18:38                                                         ` Tejun Heo
2013-01-17 18:46                                                           ` Linus Torvalds
2013-01-17 18:59                                                             ` Tejun Heo
2013-01-17 19:00                                                               ` Linus Torvalds
2013-01-18  1:24                                                         ` [PATCH 1/3] workqueue: set PF_WQ_WORKER on rescuers Tejun Heo
2013-01-18  1:25                                                         ` [PATCH 2/3] workqueue, async: implement work/async_current_func() Tejun Heo
2013-01-18  2:47                                                           ` Linus Torvalds
2013-01-18  2:59                                                             ` Tejun Heo
2013-01-18  3:04                                                               ` Tejun Heo
2013-01-18  3:18                                                                 ` Linus Torvalds
2013-01-18  3:47                                                                   ` Tejun Heo
2013-01-18 22:08                                                                   ` [PATCH 1/5] workqueue: set PF_WQ_WORKER on rescuers Tejun Heo
2013-01-18 22:10                                                                   ` [PATCH 2/5] workqueue: rename kernel/workqueue_sched.h to kernel/workqueue_internal.h Tejun Heo
2013-01-18 22:11                                                                   ` [PATCH 3/5] workqueue: move struct worker definition to workqueue_internal.h Tejun Heo
2013-01-18 22:11                                                                   ` [PATCH 4/5] workqueue: implement current_is_async() Tejun Heo
2013-01-18 22:12                                                                   ` [PATCH 5/5] async, kmod: warn on synchronous request_module() from async workers Tejun Heo
2022-06-23  5:25                                                                     ` Saravana Kannan
2013-01-18  1:27                                                         ` [PATCH 3/3] " Tejun Heo
2013-01-23  0:53                                                       ` [PATCH v2 1/2] init, block: try to load default elevator module early during boot Tejun Heo
2013-01-16 21:31                                                     ` [PATCH 2/2] block: don't request module during elevator init Tejun Heo
2013-01-23  0:51                                                       ` [PATCH v2 " Tejun Heo
2013-01-16  3:30                                         ` [PATCH] module, async: async_synchronize_full() on module init iff async is used Ming Lei
2013-01-16  4:24                                         ` Rusty Russell
2013-01-16 11:36                                         ` Alex Riesen
2013-08-12  7:04                                         ` [3.8-rc3 -> 3.8-rc4 regression] " Jonathan Nieder
2013-08-12 15:09                                           ` Tejun Heo
2013-11-26 21:29                                             ` Josh Hunt
2013-11-26 21:53                                               ` Linus Torvalds
2013-11-26 22:12                                                 ` Josh Hunt
2013-11-26 22:29                                                   ` Tejun Heo
2013-12-03 14:28                                                     ` Josh Hunt
2013-12-03 15:19                                                       ` Tejun Heo
2013-12-04 23:01                                                         ` Josh Hunt
2013-12-04 23:12                                                           ` Tejun Heo
2013-11-26 22:30                                                   ` Linus Torvalds
2013-01-16  0:44                                     ` USB device cannot be reconnected and khubd "blocked for more than 120 seconds" Tejun Heo
2013-01-16 17:19                               ` [PATCH] async: fix __lowest_in_progress() Tejun Heo
2013-01-17 18:16                                 ` Linus Torvalds
2013-01-17 18:50                                   ` Tejun Heo
2013-01-23  0:15                                 ` [PATCH v2] " Tejun Heo
2013-01-23  0:22                                   ` Linus Torvalds
2013-01-16  3:05                             ` USB device cannot be reconnected and khubd "blocked for more than 120 seconds" Ming Lei
2013-01-16  4:14                               ` Linus Torvalds
2013-01-14  8:22                   ` Oliver Neukum
2013-01-14  8:40                     ` Ming Lei
2013-01-12 19:56     ` Alex Riesen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFyOyTS1ZCm6UWqGLteoPuFXKu0WSYB-aTeRYvJUYPqe5A@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=ming.lei@canonical.com \
    --cc=raa.lkml@gmail.com \
    --cc=stern@rowland.harvard.edu \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.