From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757177Ab3AOSTL (ORCPT ); Tue, 15 Jan 2013 13:19:11 -0500 Received: from mail-vc0-f174.google.com ([209.85.220.174]:45081 "EHLO mail-vc0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751759Ab3AOSTJ (ORCPT ); Tue, 15 Jan 2013 13:19:09 -0500 MIME-Version: 1.0 In-Reply-To: References: From: Linus Torvalds Date: Tue, 15 Jan 2013 10:18:45 -0800 X-Google-Sender-Auth: LwFjJ-F-RDh6VkBXxl7QtSDVW9Y Message-ID: Subject: Re: USB device cannot be reconnected and khubd "blocked for more than 120 seconds" To: Ming Lei , Tejun Heo Cc: Alex Riesen , Alan Stern , Jens Axboe , USB list , Linux Kernel Mailing List Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 15, 2013 at 9:36 AM, Linus Torvalds wrote: > > This kind of "let's randomly encourage people to write subtly buggy > code that has magical timing dependencies, so that the developer won't > likely even see it because he has fast disks etc" code is totally > unacceptable. And this code was *designed* to be that kind of buggy. Btw, we could *possibly* do this the other way around. Wait for all async work by default, but then have a really hacky way to turn that off for modules that explicitly don't want it, because they know they can be loaded in async context, and they don't do any async work themselves. Then we could make the IO schedulers set that flag ("I know I'm loaded from async space, and I know I'm not myself doing any async init") Quite frankly, I'd still much rather prefer the automated approach - or even better, just avoiding the "load modules in async context" entirely. But at least the "I can put a huge comment about why I don't want to be waited on" would be much more acceptable than the "I need to explicitly tell the world that it needs to wait on me". So Ming Lei's patch was "easily subtly buggy by mistake" (showing that by the fact that it was indeed buggy), while the opposite model where you have to explicitly ask people not to wait for you could still be very buggy, but at least now it needs to explicitly do extra work in order to be buggy. So if an interface is fragile, it should aim to be fragile in the right way - making the fragility explicit, so that people can grep for it, and people can add comments to the particular code that marks it fragile. The default behavior should be the robust one. And if would be lovely to add a warning to the "people loaded a module from async context" case, so that we'd *see* this. Tejun, is there a good way for code to see "I'm running in async context"? Then we could do something like WARN_ON_ONCE(wait && system_state == SYSTEM_RUNNING && in_async_thread()); in kernel/kmod.c (__request_module()). That should at least warn about this whole issue happening. Linus