From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752642AbcJEFoc (ORCPT <rfc822;w@1wt.eu>);
        Wed, 5 Oct 2016 01:44:32 -0400
Received: from wtarreau.pck.nerim.net ([62.212.114.60]:33577 "EHLO 1wt.eu"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751347AbcJEFob (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 5 Oct 2016 01:44:31 -0400
Date: Wed, 5 Oct 2016 07:44:07 +0200
From: Willy Tarreau <w@1wt.eu>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>,
        Johannes Weiner <hannes@cmpxchg.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Antonio SJ Musumeci <trapexit@spawn.link>,
        Miklos Szeredi <miklos@szeredi.hu>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        stable <stable@vger.kernel.org>
Subject: Re: BUG_ON() in workingset_node_shadows_dec() triggers
Message-ID: <20161005054407.GC7297@1wt.eu>
References: <CA+55aFwyNTLuZgOWMTRuabWobF27ygskuxvFd-P0n-3UNT=0Og@mail.gmail.com>
 <CAP=VYLrEqciiV-DiqR35bV9bDE47v6Ww-N4JnohvYaLWXc40UA@mail.gmail.com>
 <CA+55aFycvN=3DvsnRNpZbQ8z3893EK-nJA+V=Fx8o8yaviW7VA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CA+55aFycvN=3DvsnRNpZbQ8z3893EK-nJA+V=Fx8o8yaviW7VA@mail.gmail.com>
User-Agent: Mutt/1.6.0 (2016-04-01)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Oct 04, 2016 at 08:29:00PM -0700, Linus Torvalds wrote:
> So what I think we should think about is:
> 
>  - extending the checkpatch warning to VM_BUG_ON too, to discourage new users.
> 
>  - look at making BUG_ON() simply be less lethal. Remove the
> unrechable(), reorganize how the string is stored, and make it act
> more like WARN_ON_ONCE() instead (it's the "rewind_stack_do_exit()"
> that ends up causing us to try to kill things, we *could* just try to
> stop doing that).
> 
>  - Instead of adding a BUG_ON_AND_HALT(), we could perhaps add a new
> FATAL_ERROR() thing that acts like the current BUG_ON, and *not* call
> it something similar (we don't want people doing mindless
> conversions!). And that's the one that would do the whole
> rewind_stack_do_exit() to kill the process.

I think instead we should completely remove any simple way to halt the
system and document how to do it. I've already seen some userland code
stuffed with thousands of assert() everywhere and their developers are
proud of this because their code looks clean and they show that they
care for all errors. But the cost of their stupidity doesn't seem to
affect them. Maybe they'll start to think about it the day they're
brought into a self-driven car and will realize that it'd better recover
from a failing flasher and not just crash in the middle of the highway.

Thus since their motives are just to easily write nice-looking code, I'd
simply force them to explicitly write their condition and the associated
printk() and panic() calls. It will become much more of a hassle and will
make their code less elegant, they should be much less tempted.

So I think that we'd rather run a huge sed all over the code to replace
BUG/BUG_ON with their WARN/WARN_ON equivalent. We'll very likely notice
a lot of new gcc warnings from code that was supposed not to every be
reachable, which will tell us a lot about some limited error checking
in these respective code parts.

Willy