From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753822AbaEOGFV (ORCPT ); Thu, 15 May 2014 02:05:21 -0400 Received: from mail-qg0-f41.google.com ([209.85.192.41]:36888 "EHLO mail-qg0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751996AbaEOGFU (ORCPT ); Thu, 15 May 2014 02:05:20 -0400 Date: Thu, 15 May 2014 02:05:15 -0400 From: Tejun Heo To: Mike Galbraith Cc: Vojtech Pavlik , Jiri Slaby , Jiri Kosina , linux-kernel@vger.kernel.org, jirislaby@gmail.com, Michael Matz , Steven Rostedt , Frederic Weisbecker , Ingo Molnar , Greg Kroah-Hartman , "Theodore Ts'o" , Dipankar Sarma , "Paul E. McKenney" Subject: Re: [RFC 09/16] kgr: mark task_safe in some kthreads Message-ID: <20140515060515.GB5539@mtj.dyndns.org> References: <537384B9.5090907@suse.cz> <20140514151501.GA24142@suse.cz> <20140514163238.GA15690@htj.dyndns.org> <1400126037.5175.55.camel@marge.simpson.net> <20140515040608.GA3825@htj.dyndns.org> <1400129178.5175.82.camel@marge.simpson.net> <20140515045011.GB3825@htj.dyndns.org> <1400130262.5175.93.camel@marge.simpson.net> <20140515050915.GA5539@mtj.dyndns.org> <1400131949.5175.114.camel@marge.simpson.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1400131949.5175.114.camel@marge.simpson.net> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Mike. On Thu, May 15, 2014 at 07:32:29AM +0200, Mike Galbraith wrote: > On Thu, 2014-05-15 at 01:09 -0400, Tejun Heo wrote: > > > Soft/hard irq threads and anything having to do with IO mostly, which > > > including workqueues. I had to give the user a rather fugly global > > > prioritization option to let users more or less safely do the evil deeds > > > they want to and WILL do whether I agree with their motivation to do so > > > or not. I tell all users that realtime is real dangerous, but if they > > > want to do that, it's their box, so by definition perfectly fine. > > > > Frederic is working on global settings for workqueues, so that'll > > resolve some of those issues at least. > > Yeah, wrt what runs where for unbound workqueues, but not priority. Shouldn't be too difficult to extend it to cover priorities if necessary once the infrastructure is in place. > > > > If there are good enough reasons for specific ones, sure, but I don't > > > > think "we can't change any of the kthreads because someone might be > > > > diddling with it" is something we can sustain in the long term. > > > > > > I think the opposite. Taking any control the user has is pure evil. > > > > I'm not sure good/evil is the right frame to think about it. Is > > pooling worker threads evil in nature then? > > When there may be realtime consumers, yes to some extent, because it > inserts allocations he can't control directly into his world, but that's > the least of his worries. The instant userspace depends upon any kernel > proxy the user has no control over, he instantly has a priority > inversion he can do nothing about. This is exactly what happened that > prompted me to do fugly global hack. User turned pet database piggies > loose as realtime tasks for his own reasons, misguided or not, they > depend upon worker threads and kjournald et al who he can control, but > kworker threads respawn as normal tasks which can and will end up under > high priority userspace tasks. Worst case is box becomes dead, also > killing pet, best case is pet collapses to the floor in a quivering > heap. Neither makes Joe User particularly happy. I'm not sure how much weight I can put on the specific use case. Even with the direct control that the user thought to have previously, the use case was ripe with possibilities of breakage from any number of reasons. For example, there are driver paths which bounce to async execution on IO exceptions (doesn't have to be hard errors) and setups like the above would easily lock out exception handling and how's the setup gonna work when the filesystems have to use dynamic pool of workers as btrfs does? The identified problem in the above case is allowing the kernel to make reasonable forward progress even when RT processes don't concede CPU cycles. If that is a use case that needs to be supported, we better engineer an appropriate solution for that. Such solution doesn't necessarily have to be advanced either. Maybe all that's necessary is marking the async mechanisms involved in IO path as such (we already need to mark all workqueues involved in memory reclaim path anyway) and provide a mechanism to make all of them RT when directed. It might be simple but still would be a concious engineering decision. I think the point I'm trying to make is that it isn't possible to continue improving and maintaining the kernel with blanket restrictions on internal details. If certain things shouldn't be done, we better find out the specific reasons; otherwise, it's impossible to weight the pros and cons of different options and make a reasonable choice or to find out ways to accomodate those restrictions while still achieving the original goals. Anyways, we're getting slightly off-topic and it seems like we'll have to agree to disagree. Thanks. -- tejun