From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EEB4C433F5 for ; Sat, 2 Oct 2021 11:41:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 04B0361242 for ; Sat, 2 Oct 2021 11:41:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232893AbhJBLn0 (ORCPT ); Sat, 2 Oct 2021 07:43:26 -0400 Received: from mail-ed1-f46.google.com ([209.85.208.46]:45894 "EHLO mail-ed1-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232887AbhJBLnZ (ORCPT ); Sat, 2 Oct 2021 07:43:25 -0400 Received: by mail-ed1-f46.google.com with SMTP id r18so44363770edv.12; Sat, 02 Oct 2021 04:41:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:subject:to:cc:references:from :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=9tDiOVgsOJM22Fe9m6y3O2Yzwc8y2p0M/sgl00e/dH4=; b=QYz6aUt9kyrNtreu2+4dXVd4VxZfo0adaTR9T+3eomGjmIH+1ljKMCKSP1/dJ2T2JA jPBdZI+ICE1HGkO+ccvoWf975Q3tnR5P0KTyJtIS9uAlWtKzIoCPvA9/SeIQmI5sDO90 eXmImYirHKZ+EZQTM8xdbfM58iQp1YI/tv7TxvXV/xPadq23wzKq/igTYz17RBDsC+je Ud1IQa04+VzUhdadKC11IGoHkfW6RuTmlPVpv0q73twGK828fYcSV7HLyj0QpI2IQ07k PCnat8gQfInRHacDzOYZjSlbfzQaZyPeinV3jhh4gGrLBleSXluxQpYzB2CouX4ZnAo2 eNWQ== X-Gm-Message-State: AOAM532rCcAHRMjFaVFuAXMR2cjv46TYEcgH3JEuhNzx731XbHOCcGvT U6IPmRKR3keaQDxYNJh3XG8= X-Google-Smtp-Source: ABdhPJxJ7mL8CuDeamuLJyE93IBziQGzJIvvcMtkhnIEfPrNJtB4lpsvJK/qZiDiTKzyLd8ktoP4LA== X-Received: by 2002:a17:906:608e:: with SMTP id t14mr3627978ejj.441.1633174898735; Sat, 02 Oct 2021 04:41:38 -0700 (PDT) Received: from [10.9.0.26] ([46.166.133.199]) by smtp.gmail.com with ESMTPSA id r26sm4091367ejd.85.2021.10.02.04.41.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 02 Oct 2021 04:41:38 -0700 (PDT) Reply-To: alex.popov@linux.com Subject: Re: [PATCH] Introduce the pkill_on_warn boot parameter To: Linus Torvalds , Petr Mladek Cc: "Paul E. McKenney" , Jonathan Corbet , Andrew Morton , Thomas Gleixner , Peter Zijlstra , Joerg Roedel , Maciej Rozycki , Muchun Song , Viresh Kumar , Robin Murphy , Randy Dunlap , Lu Baolu , Kees Cook , Luis Chamberlain , Wei Liu , John Ogness , Andy Shevchenko , Alexey Kardashevskiy , Christophe Leroy , Jann Horn , Greg Kroah-Hartman , Mark Rutland , Andy Lutomirski , Dave Hansen , Steven Rostedt , Will Deacon , David S Miller , Borislav Petkov , Kernel Hardening , linux-hardening@vger.kernel.org, "open list:DOCUMENTATION" , Linux Kernel Mailing List , notify@kernel.org References: <20210929185823.499268-1-alex.popov@linux.com> <20210929194924.GA880162@paulmck-ThinkPad-P17-Gen-1> From: Alexander Popov Message-ID: Date: Sat, 2 Oct 2021 14:41:34 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01.10.2021 22:59, Linus Torvalds wrote: > On Thu, Sep 30, 2021 at 2:15 AM Petr Mladek wrote: >> >> Honestly, I am not sure if panic_on_warn() or the new pkill_on_warn() >> work as expected. I wonder who uses it in practice and what is >> the experience. > > Afaik, there are only two valid uses for panic-on-warn: > > (a) test boxes (particularly VM's) that are literally running things > like syzbot and want to report any kernel warnings > > (b) the "interchangeable production machinery" fail-fast kind of situation > > So in that (a) case, it's literally that you consider a warning to be > a failure case, and just want to stop. Very useful as a way to get > notified by syzbot that "oh, that assert can actually trigger". > > And the (b) case is more of a "we have 150 million machines, we expect > about a thousand of them to fail for any random reason any day > _anyway_ - perhaps simply due to hardware failure, and we'd rather > take a machine down quickly and then perhaps look at why only much > later when we have some pattern to the failures". > > You shouldn't expect panic-on-warn to ever be the case for any actual > production machine that _matters_. If it is, that production > maintainer only has themselves to blame if they set that flag. > > But yes, the expectation is that warnings are for "this can't happen, > but if it does, it's not necessarily fatal, I want to know about it so > that I can think about it". > > So it might be a case that you don't handle, but that isn't > necessarily _wrong_ to not handle. You are ok returning an error like > -ENOSYS for that case, for example, but at the same time you are "If > somebody uses this, we should perhaps react to it". > > In many cases, a "pr_warn()" is much better. But if you are unsure > just _how_ the situation can happen, and want a call trace and > information about what process did it, and it really is a "this > shouldn't ever happen" situation, a WARN_ON() or a WARN_ON_ONCE() is > certainly not wrong. > > So think of WARN_ON() as basically an assert, but an assert with the > intention to be able to continue so that the assert can actually be > reported. BUG_ON() and friends easily result in a machine that is > dead. That's unacceptable. > > And think of "panic-on-warn" as people who can deal with their own > problems. It's fundamentally not your issue. They took that choice, > it's their problem, and the security arguments are pure BS - because > WARN_ON() just shouldn't be something you can trigger anyway. Thanks, Linus. And what do you think about the proposed pkill_on_warn? Let me quote the rationale behind it. Currently, the Linux kernel provides two types of reaction to kernel warnings: 1. Do nothing (by default), 2. Call panic() if panic_on_warn is set. That's a very strong reaction, so panic_on_warn is usually disabled on production systems. >From a safety point of view, the Linux kernel misses a middle way of handling kernel warnings: - The kernel should stop the activity that provokes a warning, - But the kernel should avoid complete denial of service. >From a security point of view, kernel warning messages provide a lot of useful information for attackers. Many GNU/Linux distributions allow unprivileged users to read the kernel log (for various reasons), so attackers use kernel warning infoleak in vulnerability exploits. See the examples: https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html Let's introduce the pkill_on_warn parameter. If this parameter is set, the kernel kills all threads in a process that provoked a kernel warning. This behavior is reasonable from a safety point of view described above. It is also useful for kernel security hardening because the system kills an exploit process that hits a kernel warning. Linus, how do you see the proper way of handling WARN_ON() in kthreads if pkill_on_warn is enabled? Thanks! Best regards, Alexander