From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751963AbeAWQNh convert rfc822-to-8bit (ORCPT ); Tue, 23 Jan 2018 11:13:37 -0500 Received: from mail.kernel.org ([198.145.29.99]:51064 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751840AbeAWQNf (ORCPT ); Tue, 23 Jan 2018 11:13:35 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 83D952178B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=rostedt@goodmis.org Date: Tue, 23 Jan 2018 11:13:30 -0500 From: Steven Rostedt To: Tejun Heo Cc: Sergey Senozhatsky , Sergey Senozhatsky , Petr Mladek , akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang , Dave Hansen , Johannes Weiner , Mel Gorman , Michal Hocko , Vlastimil Babka , Peter Zijlstra , Linus Torvalds , Jan Kara , Mathieu Desnoyers , Tetsuo Handa , rostedt@rostedt.homelinux.com, Byungchul Park , Pavel Machek , linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Message-ID: <20180123111330.4356ec8d@gandalf.local.home> In-Reply-To: <20180123154347.GE1771050@devbig577.frc2.facebook.com> References: <20180117121251.7283a56e@gandalf.local.home> <20180117134201.0a9cbbbf@gandalf.local.home> <20180119132052.02b89626@gandalf.local.home> <20180120071402.GB8371@jagdpanzerIV> <20180120104931.1942483e@gandalf.local.home> <20180121141521.GA429@tigerII.localdomain> <20180123064023.GA492@jagdpanzerIV> <20180123095652.5e14da85@gandalf.local.home> <20180123152130.GB429@tigerII.localdomain> <20180123104121.2ef96d81@gandalf.local.home> <20180123154347.GE1771050@devbig577.frc2.facebook.com> X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 23 Jan 2018 07:43:47 -0800 Tejun Heo wrote: > So, at least in the case that we were seeing, it isn't that black and > white. printk keeps causing printks but only because printk buffer > flushing is preventing the printk'ing context from making forward > progress. The key problem there is that a flushing context may get > pinned flushing indefinitely and using a separate context does solve > the problem. > Does it? >>From what I understand is that there's an issue with one of the printk consoles, due to memory pressure or whatnot. Then a printk happens within a printk recursively. It gets put into the safe buffer and an irq is sent to printk this printk. The issue you are saying is that when the printk enables interrupts, the irq work triggers and loads the log buffer with the safe buffer, and then the printk sees the new data added and continues to print, and hence never leaves this printk. Your solution is to delay the flushing of the safe buffer to another thread (work queue), which I also have issues with, because you break the "get printks out ASAP mantra". Then the work queue comes in and flushes the printks. And since the printks cause printks, we continue to spam the machine, but hey, we are making forward progress. Again, this is treating the symptom and not solving the problem. I really hate delaying printks to another thread, unless we can guarantee that that thread is ready to go immediately (basically spinning on a run queue waiting to print). Because if the system is having issues (which is the main reason for printks to happen), there's no guarantee that a work queue or another thread will ever schedule, and the safe printk buffer never gets out to the consoles. I much rather have throttling when recursive printks are detected. Make it a 100 lines to print if you want, but then throttle. Because once you have 100 lines or so, you will know that printks are causing printks, and you don't give a crap about the repeated process. Allow one flushing of the printk safe buffers, and then if it happens again, throttle it. Both methods can lose important data. I believe the throttling of recursive printks, after 100 prints or whatever, will be the least likely to lose important data, because printks caused by printks will just keep repeating the same data, and we don't care about repeats. But delaying the flushing could very well lose important data that caused a lockup. -- Steve From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-f199.google.com (mail-ot0-f199.google.com [74.125.82.199]) by kanga.kvack.org (Postfix) with ESMTP id ADBA2800D8 for ; Tue, 23 Jan 2018 11:13:36 -0500 (EST) Received: by mail-ot0-f199.google.com with SMTP id 40so529580otv.21 for ; Tue, 23 Jan 2018 08:13:36 -0800 (PST) Received: from mail.kernel.org (mail.kernel.org. [198.145.29.99]) by mx.google.com with ESMTPS id h54si212621otc.129.2018.01.23.08.13.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 23 Jan 2018 08:13:35 -0800 (PST) Date: Tue, 23 Jan 2018 11:13:30 -0500 From: Steven Rostedt Subject: Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Message-ID: <20180123111330.4356ec8d@gandalf.local.home> In-Reply-To: <20180123154347.GE1771050@devbig577.frc2.facebook.com> References: <20180117121251.7283a56e@gandalf.local.home> <20180117134201.0a9cbbbf@gandalf.local.home> <20180119132052.02b89626@gandalf.local.home> <20180120071402.GB8371@jagdpanzerIV> <20180120104931.1942483e@gandalf.local.home> <20180121141521.GA429@tigerII.localdomain> <20180123064023.GA492@jagdpanzerIV> <20180123095652.5e14da85@gandalf.local.home> <20180123152130.GB429@tigerII.localdomain> <20180123104121.2ef96d81@gandalf.local.home> <20180123154347.GE1771050@devbig577.frc2.facebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Tejun Heo Cc: Sergey Senozhatsky , Sergey Senozhatsky , Petr Mladek , akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang , Dave Hansen , Johannes Weiner , Mel Gorman , Michal Hocko , Vlastimil Babka , Peter Zijlstra , Linus Torvalds , Jan Kara , Mathieu Desnoyers , Tetsuo Handa , rostedt@home.goodmis.org, Byungchul Park , Pavel Machek , linux-kernel@vger.kernel.org On Tue, 23 Jan 2018 07:43:47 -0800 Tejun Heo wrote: > So, at least in the case that we were seeing, it isn't that black and > white. printk keeps causing printks but only because printk buffer > flushing is preventing the printk'ing context from making forward > progress. The key problem there is that a flushing context may get > pinned flushing indefinitely and using a separate context does solve > the problem. > Does it? =46rom what I understand is that there's an issue with one of the printk consoles, due to memory pressure or whatnot. Then a printk happens within a printk recursively. It gets put into the safe buffer and an irq is sent to printk this printk. The issue you are saying is that when the printk enables interrupts, the irq work triggers and loads the log buffer with the safe buffer, and then the printk sees the new data added and continues to print, and hence never leaves this printk. Your solution is to delay the flushing of the safe buffer to another thread (work queue), which I also have issues with, because you break the "get printks out ASAP mantra". Then the work queue comes in and flushes the printks. And since the printks cause printks, we continue to spam the machine, but hey, we are making forward progress. Again, this is treating the symptom and not solving the problem. I really hate delaying printks to another thread, unless we can guarantee that that thread is ready to go immediately (basically spinning on a run queue waiting to print). Because if the system is having issues (which is the main reason for printks to happen), there's no guarantee that a work queue or another thread will ever schedule, and the safe printk buffer never gets out to the consoles. I much rather have throttling when recursive printks are detected. Make it a 100 lines to print if you want, but then throttle. Because once you have 100 lines or so, you will know that printks are causing printks, and you don't give a crap about the repeated process. Allow one flushing of the printk safe buffers, and then if it happens again, throttle it. Both methods can lose important data. I believe the throttling of recursive printks, after 100 prints or whatever, will be the least likely to lose important data, because printks caused by printks will just keep repeating the same data, and we don't care about repeats. But delaying the flushing could very well lose important data that caused a lockup. -- Steve -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org