From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261692AbUKTHUw (ORCPT ); Sat, 20 Nov 2004 02:20:52 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262926AbUKTHUw (ORCPT ); Sat, 20 Nov 2004 02:20:52 -0500 Received: from holomorphy.com ([207.189.100.168]:35971 "EHLO holomorphy.com") by vger.kernel.org with ESMTP id S261692AbUKTHUe (ORCPT ); Sat, 20 Nov 2004 02:20:34 -0500 Date: Fri, 19 Nov 2004 23:15:14 -0800 From: William Lee Irwin III To: Nick Piggin Cc: Linus Torvalds , Christoph Lameter , akpm@osdl.org, Benjamin Herrenschmidt , Hugh Dickins , linux-mm@kvack.org, linux-ia64@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: page fault scalability patch V11 [0/7]: overview Message-ID: <20041120071514.GO2714@holomorphy.com> References: <20041120020306.GA2714@holomorphy.com> <419EBBE0.4010303@yahoo.com.au> <20041120035510.GH2714@holomorphy.com> <419EC205.5030604@yahoo.com.au> <20041120042340.GJ2714@holomorphy.com> <419EC829.4040704@yahoo.com.au> <20041120053802.GL2714@holomorphy.com> <419EDB21.3070707@yahoo.com.au> <20041120062341.GM2714@holomorphy.com> <419EE911.20205@yahoo.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <419EE911.20205@yahoo.com.au> Organization: The Domain of Holomorphy User-Agent: Mutt/1.5.6+20040722i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org William Lee Irwin III wrote: >> touch_nmi_watchdog() is only "protection" against local interrupt >> disablement triggering the NMI oopser because alert_counter[] >> increments are not atomic. Yet even supposing they were made so, the On Sat, Nov 20, 2004 at 05:49:53PM +1100, Nick Piggin wrote: > That would be a bug in touch_nmi_watchdog then, because you're > racy against your own NMI too. > So I'm actually not very very wrong at all. I'm technically wrong > because touch_nmi_watchdog has a theoretical 'bug'. In practice, > multiple races with the non atomic increments to the same counter, > and in an unbroken sequence would be about as likely as hardware > failure. > Anyway, this touch nmi thing is going off topic, sorry list. No, it's on-topic. (1) The issue is not theoretical. e.g. sysrq t does trigger NMI oopses, merely not every time, and not on every system. It is not associated with hardware failure. It is, however, tolerable because sysrq's require privilege to trigger and are primarly used when the box is dying anyway. (2) NMI's don't nest. There is no possibility of NMI's racing against themselves while the data is per-cpu. William Lee Irwin III wrote: >> net effect of "covering up" this gross deficiency is making the >> user-observable problems it causes undiagnosable, as noted before. On Sat, Nov 20, 2004 at 05:49:53PM +1100, Nick Piggin wrote: > Well the loops that are in there now aren't covered up, and they > don't seem to be causing problems. Ergo there is no problem (we're > being _practical_ here, right?) They are causing problems. They never stopped causing problems. None of the above attempts to reduce rwlock starvation has been successful in reducing it to untriggerable-in-the-field levels, and empirical demonstrations of starvation recurring after those available at the time of testing were put into place did in fact happen. Reduction of frequency and making starvation more difficult to trigger are all that they've achieved thus far. William Lee Irwin III wrote: >> Kevin Marin was the first to report this issue to lkml. I had seen >> instances of it in internal corporate bugreports and it was one of >> the motivators for the work I did on pidhashing (one of the causes >> of the timeouts was worst cases in pid allocation). Manfred Spraul >> and myself wrote patches attempting to reduce read-side hold time >> in /proc/ algorithms, Ingo Molnar wrote patches to hierarchically >> subdivide the /proc/ iterations, and Dipankar Sarma and Maneesh >> Soni wrote patches to carry out the long iterations in /proc/ locklessly. >> The last several of these affecting /proc/ have not gained acceptance, >> though the work has not been halted in any sense, as this problem >> recurs quite regularly. A considerable amount of sustained effort has >> gone toward mitigating and resolving rwlock starvation. On Sat, Nov 20, 2004 at 05:49:53PM +1100, Nick Piggin wrote: > That's very nice. But there is no problem _now_, is there? There is and has always been. All of the above merely mitigate the issue, with the possible exception of the tasklist RCU patch, for which I know of no testing results. Also note that almost none of the work on /proc/ has been merged. William Lee Irwin III wrote: >> Aggravating the rwlock starvation destabilizes, not pessimizes, >> and performance is secondary to stability. On Sat, Nov 20, 2004 at 05:49:53PM +1100, Nick Piggin wrote: > Well luckily we're not going to be aggravating the rwlock stavation. > If you found a problem with, and fixed do_task_stat: ?time, ???_flt, > et al, then you would apply the same solution to per thread rss to > fix it in the same way. You are aggravating the rwlock starvation by introducing gratuitous full tasklist iterations. There is no solution to do_task_stat() because it was recently introduced. There will be one as part of a port of the usual mitigation patches when the perennial problem is reported against a sufficiently recent kernel version, as usual. The already- demonstrated problematic iterations have not been removed. -- wli