From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=MCTS=DN=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DCFB5C4363D
	for <linux-arm-kernel@archiver.kernel.org>; Tue,  6 Oct 2020 11:04:09 +0000 (UTC)
Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 5A50C2080A
	for <linux-arm-kernel@archiver.kernel.org>; Tue,  6 Oct 2020 11:04:09 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="zbNVpoQb";
	dkim=fail reason="signature verification failed" (2048-bit key) header.d=marvell.com header.i=@marvell.com header.b="KjfJBrJ9";
	dkim=fail reason="signature verification failed" (1024-bit key) header.d=marvell.onmicrosoft.com header.i=@marvell.onmicrosoft.com header.b="bIZNaILJ"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5A50C2080A
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=marvell.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding:
	Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive:
	List-Unsubscribe:List-Id:MIME-Version:Content-ID:In-Reply-To:References:
	Message-ID:Date:Subject:To:From:Reply-To:Content-Description:Resent-Date:
	Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner;
	 bh=s0kcwwZQYcptgEB6rSN+SJ5q2tvn+pbr4avz4UmTBCY=; b=zbNVpoQb+y9xYAmUSiT/PpUVX
	hcdXD3QfPw1tV0tsBg164mRrIZ8Zxzo3hLuHehsOkD+R8Su6DT9WymLUkXfc+A411c/oUhS7CCzRo
	qqMxxknJokT0gHhN2+GIwGvG9BntHR4Ll27BeceyYphH5C3fyE9pqDUbJu2815gdRTZ3/pfHeLnUr
	VH96EMV4NIBpYN3Zw0ssg0HhOCx03pPo9Py1RGr1p/bzkGIQODYlwmQKjYktAwJ5AM1hw8tsAt15i
	9QFRxxtnJqQjusoilGI6D6TkffsR7tIV2iq0UKh4Ealm5BUB9LM7sS9MAT/tpeMb9lWoC36293wKZ
	PJIX/QjCw==;
Received: from localhost ([::1] helo=merlin.infradead.org)
	by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux))
	id 1kPkk3-0001Pc-EX; Tue, 06 Oct 2020 11:02:47 +0000
Received: from mx0b-0016f401.pphosted.com ([67.231.156.173])
 by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux))
 id 1kPkjz-0001OS-N5
 for linux-arm-kernel@lists.infradead.org; Tue, 06 Oct 2020 11:02:44 +0000
Received: from pps.filterd (m0045851.ppops.net [127.0.0.1])
 by mx0b-0016f401.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id
 096B1XOW002048; Tue, 6 Oct 2020 04:01:53 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com;
 h=from : to : cc :
 subject : date : message-id : references : in-reply-to : content-type :
 content-id : content-transfer-encoding : mime-version; s=pfpt0220;
 bh=18rNlfc6TOqqbE/RitnlBslpMX6lfMlUhaFaSYg83yI=;
 b=KjfJBrJ9zW4eqZOTDr3wB8flTafmV4+etvo/QjkvU0C9pCbRyeK0VZEtZJx7GbbLbDpd
 kN4gK9Z41YlcfIsowXcDzUNTWxNoD6EpFrYRcX3tBLUlqFo0w9LPl+7AShTt3Wo2wsk1
 tRbSerasSPWEz2Z2bqZ0GTI2B88W98tNIcOMGsALf/3dWO/YTqsO8bAqJ9UqC/AlFMkC
 wrRTnQ1IbXs+25wJseOAk2q8MhD0cdHbW1uryJrvgbwCLXDKK/y0A0QQdOjHh1k04pYv
 CnPnJZfCO5YF9wpaVK3cKjXQGMs6jgUC9xOeH1pLgtQw2OMzDo7lTb9yXpYJcBZ1QxoX 3w== 
Received: from sc-exch01.marvell.com ([199.233.58.181])
 by mx0b-0016f401.pphosted.com with ESMTP id 33xrtng2b2-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT);
 Tue, 06 Oct 2020 04:01:53 -0700
Received: from SC-EXCH01.marvell.com (10.93.176.81) by SC-EXCH01.marvell.com
 (10.93.176.81) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 6 Oct
 2020 04:01:51 -0700
Received: from NAM12-DM6-obe.outbound.protection.outlook.com (104.47.59.169)
 by SC-EXCH01.marvell.com (10.93.176.81) with Microsoft SMTP Server (TLS) id
 15.0.1497.2 via Frontend Transport; Tue, 6 Oct 2020 04:01:50 -0700
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=dUrEQrpslJv71hniuSwi4Nl3rRe6Cg+DFo6+IhqhmF0gLokmZL4gFej9nc0VbV1Y3yuMTZVVv7Me3tZyPn/PFEZcCCew3ofJZalLsqsTWLXeeHAo9t7YwRoeK32IdrIDa3Uazu56G9SZIMIcZWNWNFPRFeyFH5+RIsENWQPo1uzyBrNyMRKZFZLqyMbsk8XtmAlaY4i697ijenkAX/jKfqYOd/uuDcGPMHD9C0Xn93NpC1e3rZaEDuw5CmYqD58muw0YltvQbx4NOzAQ1NVgJp+BqRLQitDa5501RvqLMxX8Rf/CI9E9P4y5yuimCVHM3gHhIlzqLMvrq4fE9mNSiQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=18rNlfc6TOqqbE/RitnlBslpMX6lfMlUhaFaSYg83yI=;
 b=kd0YFQUy+psErHPOv3fsiSEbyIUXZzEcO6L7ZRYveN6JUUEeg8XHdnKR2s0DOacqBs0epBsc0kpoNy1teIPrmhNsGwy6hz77Wp/EnmDAYUxkGAA3BJ/yyrdXtbxFCppmoHdew5LkR68j9d00GMGclCrfuNRknTe9cndby9Ga3JzU69qVmWmdHXz/1Hny8XlbWjR0prWCHtUPWXr9eGTEjExmk7rCX3VzhntN1Xl/2gig19l5hVvZzIGsErbQS9/fpdyUsWZ9KDVqA49N7MH2+LrjLUMfmNBcV5Hh+QTFS73UKpgxkgX+iYA8Ro44n9H4Xz8JTiBDNjmxozZnNSxbpg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=marvell.com; dmarc=pass action=none header.from=marvell.com;
 dkim=pass header.d=marvell.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=marvell.onmicrosoft.com; s=selector1-marvell-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=18rNlfc6TOqqbE/RitnlBslpMX6lfMlUhaFaSYg83yI=;
 b=bIZNaILJGyhgwU6YVyhUU/OupjzPYHhJ630HakXEVwaNVK8ecwC8M1x3o7RizLPTTHtgl3N4y4pyxpN+swB3c8oLAfNq0utl7/ZJQHRN2+ZyrAVrfcEosEdzdHfiifrwzIvlz5AKFvcFcu5RJteCpN32iI16yA+yFy+wKRaJDOA=
Received: from MW2PR18MB2267.namprd18.prod.outlook.com (2603:10b6:907:3::11)
 by MWHPR1801MB1967.namprd18.prod.outlook.com (2603:10b6:301:63::27) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3433.39; Tue, 6 Oct
 2020 11:01:50 +0000
Received: from MW2PR18MB2267.namprd18.prod.outlook.com
 ([fe80::54a1:710f:41a5:c76f]) by MW2PR18MB2267.namprd18.prod.outlook.com
 ([fe80::54a1:710f:41a5:c76f%5]) with mapi id 15.20.3433.045; Tue, 6 Oct 2020
 11:01:49 +0000
From: Alex Belits <abelits@marvell.com>
To: "frederic@kernel.org" <frederic@kernel.org>
Subject: Re: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard
 isolation from kernel
Thread-Topic: [EXT] Re: [PATCH v4 03/13] task_isolation: userspace hard
 isolation from kernel
Thread-Index: AQHWYDdZDqoeqvfF50CzG3l43t/UoamDNPAAgATEYwCAAI5YAIACWBIA
Date: Tue, 6 Oct 2020 11:01:49 +0000
Message-ID: <ed60dc426cfd8a2fe5e389d3a7f36bafa6a8439e.camel@marvell.com>
References: <04be044c1bcd76b7438b7563edc35383417f12c8.camel@marvell.com>
 <b18546567a2ed61073ae86f2d9945257ab285dfa.camel@marvell.com>
 <20201001135640.GA1748@lothringen>
 <7e54b3c5e0d4c91eb64f2dd1583dd687bc34757e.camel@marvell.com>
 <20201004231404.GA66364@lothringen>
In-Reply-To: <20201004231404.GA66364@lothringen>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
authentication-results: kernel.org; dkim=none (message not signed)
 header.d=none;kernel.org; dmarc=none action=none header.from=marvell.com;
x-originating-ip: [173.228.7.197]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: def4d9d0-f22a-4d81-0cb4-08d869e7399f
x-ms-traffictypediagnostic: MWHPR1801MB1967:
x-ms-exchange-transport-forked: True
x-microsoft-antispam-prvs: <MWHPR1801MB19671EF0429937410119E6B4BC0D0@MWHPR1801MB1967.namprd18.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:9508;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: bPM/Bf/p8KFGcveS0nMMUsRW7XGBztlljmUa5LSFmBqlClPMhVTRG3vk33ra7KEjZoB5WALKK/+STLr71oB8JO8HiYb2NZZzZRPg7Ju+VfbsDHArIoNLhJOIfCG2wJssC1pf6GbKyxYjQO4M/iXhjH/ypvJTrYoyzE2yMosXZ+qvqR/4r7gTN3KxZZ/miID1ytCPK92iHp/4sJMPMwKiH9xi2/Pob8kfBVBgH9HnLQlbhjRDHwcvf17FiYYrc+meP3TW6BouwFpidO0B9EvvqEbnGd69Nc/J1xOFMgsnUSVQRWJfJZzCEBCBJwq7W3SfLBHA5+cy4i8i5Cem63tB9A==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:MW2PR18MB2267.namprd18.prod.outlook.com; PTR:; CAT:NONE;
 SFS:(4636009)(396003)(136003)(376002)(346002)(366004)(39860400002)(186003)(83380400001)(5660300002)(6916009)(86362001)(36756003)(2906002)(478600001)(26005)(6512007)(54906003)(8936002)(2616005)(66556008)(66446008)(76116006)(7416002)(6506007)(6486002)(4326008)(8676002)(316002)(71200400001)(66476007)(64756008)(91956017)(66946007);
 DIR:OUT; SFP:1101; 
x-ms-exchange-antispam-messagedata: /yMTPfvlhuKlH1MafzKs48wJfuo/Fmzih/0WS9o3pe4wsWjnPitz6OExAnyOFzWYQ/T4kyzbRplTBjkg1gz9vh+bTamuNbilINTZGfRoLJQuV1PcHUkdJWFBFiVpci5AQHwD3ET8s5YFEganUveKrO/5rFuqpi/I8vH6+COjo7recvIZPPTT8B0xsW343lEinhPWwds+mKBanHWDkdQBKvXd+QsUxmHbaC1MLMbiDsKZfZxkZcJpkICE7nLaI8b900EK5VzRBg77b6oBRYFJDjPRf7IWKA9oTqGuOESR4vMALOS9rgAMsL652XDo+IxAaGrZ2ujrxXivIawAaEeerT80hKIATz1Goe2cxvd6xwIfAPm5Iw68iEEePItCXh+RfquwdWTWRKI1hMfptzMO7ZlJayKswkSGEDY18BgL6EbvtpXQPxuK5vE3D1aXurDmKO+kvRD3U+l+FbtxdKaYqpkcLoszUEWC0vlL/0ZqJt+J18JQ/2AreCO1EitE3tjKMbt1tpNKLYNTByP03nxXprnn2o2I4fNEATrIlMGTRCA73Kt9H1zashhyfQ2kZcPvGdCC1qsdl6ENONxCCqqvjBzjIwBiYJWx2qMN2ImMyhJ/Ta0MqlNJKKCy6P54b8/4wdmNCIRIzJu33T8OaJBtuA==
Content-ID: <2007B98DE2762943933523B1C0F749CB@namprd18.prod.outlook.com>
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: MW2PR18MB2267.namprd18.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: def4d9d0-f22a-4d81-0cb4-08d869e7399f
X-MS-Exchange-CrossTenant-originalarrivaltime: 06 Oct 2020 11:01:49.5803 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: xw40xc5pJ2NXbSzKSIBY+UVd6YpSkV/en8+1IYMnKle1o6npK+cNH9ltAKZhgkAVh4u2zFCpweiaRJ02YfTK5Q==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR1801MB1967
X-OriginatorOrg: marvell.com
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687
 definitions=2020-10-06_03:2020-10-06,
 2020-10-06 signatures=0
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20201006_070243_863005_ACA5BC24 
X-CRM114-Status: GOOD (  32.47  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
 "peterz@infradead.org" <peterz@infradead.org>,
 "linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
 "rostedt@goodmis.org" <rostedt@goodmis.org>,
 "davem@davemloft.net" <davem@davemloft.net>,
 "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
 "catalin.marinas@arm.com" <catalin.marinas@arm.com>,
 Prasun Kapoor <pkapoor@marvell.com>, "tglx@linutronix.de" <tglx@linutronix.de>,
 "will@kernel.org" <will@kernel.org>, "mingo@kernel.org" <mingo@kernel.org>,
 "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Mon, 2020-10-05 at 01:14 +0200, Frederic Weisbecker wrote:
> Speaking of which, I agree with Thomas that it's unnecessary.
> > > It's
> > > too much
> > > code and complexity. We can use the existing trace events and
> > > perform
> > > the
> > > analysis from userspace to find the source of the disturbance.
> > 
> > The idea behind this is that isolation breaking events are supposed
> > to
> > be known to the applications while applications run normally, and
> > they
> > should not require any analysis or human intervention to be
> > handled.
> 
> Sure but you can use trace events for that. Just trace interrupts,
> workqueues,
> timers, syscalls, exceptions and scheduler events and you get all the
> local
> disturbance. You might want to tune a few filters but that's pretty
> much it.

And keep all tracing enabled all the time, just to be able to figure
out that disturbance happened at all?

Or do you mean that we can use kernel entry mechanism to reliably
determine that isolation breaking event happened (so the isolation-
breaking procedure can be triggered as early as possible), yet avoid
trying to determine why exactly it happened, and use tracing if we want
to know?

Original patch did the opposite, it triggered any isolation-breaking
procedure only once it was known specifically, what kind of event
happened -- a hardware interrupt, IPI, syscall, page fault, or any
other kind of exception, possibly something architecture-specific.
This, of course, always had a potential problem with coverage -- if
handling of something is missing, isolation breaking is not handled at
all, and there is no obvious way of finding if we covered everything.
This also made the patch large and somewhat ugly.

When I have added a mechanism for low-level isolation breaking handling
on kernel entry, it also partially improved the problem with
completeness. Partially because I have not yet added handling of
"unknown cause" before returning to userspace, however that would be a
logical thing to do. Then if we entered kernel from isolation, did
something, and are returning to userspace still not knowing what kind
of isolation-breaking event happened, we can still trigger isolation
breaking.

Did I get it right, and you mean that we can remove all specific
handling of isolation breaking causes, except for syscall that exits
isolation, and report isolation breaking instead of normally returning
to userspace? Then isolation breaking will be handled reliably without
knowing the cause, and we can leave determining the cause to the
tracing mechanism (if enabled)?

This does make sense. However for me it looks somewhat strange, because
I assume isolation breaking to be a kind of runtime error, that
userspace software is supposed to get some basic information about --
like, signals distinguishing between, say, SIGSEGV and SIGPIPE, or
write() being able to set errno to ENOSPC or EIO. Then userspace
receives basic information about the cause of exception or error, and
can do some meaningful reporting, or decide if the error should be
fatal for the application or handled differently, based on its internal
logic. To get those distinctions, application does not have to be aware
of anything internal to the kernel.

Similarly distinguishing between, say, a page fault, device interrupt
and a timer may be important for a logic implemented in userspace, and
I think, it may be nice to allow userspace to get this information
immediately and without being aware of any additional details of kernel
implementation. The current patch doesn't do this yet, however the
intention is to implement reliable isolation breaking by checking on
userspace re-entry, plus make reporting of causes, if any were found,
visible to the userspace in some convenient way.

The part that determines the cause can be implemented separately from
isolation breaking mechanism. Then we can have isolation breaking on
kernel entry (or potentially some other condition on kernel entry that
requires logging the cause) enable reporting, then reporting mechanism,
if it exists will fill the blanks, and once either cause is known, or
it's time to return to userspace, notification will be done with
whatever information is available. For some in-depth analysis, if
necessary for debugging the kernel, we can have tracing check if we are
in this "suspicious kernel entry" mode, and log things that otherwise
would not be.

> As for the source of the disturbances, if you really need that
> information,
> you can trace the workqueue and timer queue events and just filter
> those that
> target your isolated CPUs.

For the purpose of human debugging the kernel or application, the more
information is (usually) the better, so the only concern here is that
now user is responsible for completeness of things he is tracing.
However from application's point of view, or for logging in a
production environment it's usually more important to get general type
of events, so it's possible to, say, confirm that nothing "really bad"
happened, or to trigger the emergency response if it did. Say, if the
only causes of isolation breaking was IPI within few moments of
application startup, or signal from somewhere else when application was
restarted, there is no cause for concern. However if hardware
interrupts arrive at random points in time, something is clearly wrong.
And if page faults happen, most likely application forgot to page-in
and lock its address space.

Again, in my opinion this is not unlike reporting ENOSPC vs. EIO while
doing file I/O -- the former (usually) indicates a common problem that
may require application-level cleanup, the latter (also usually) means
that something is seriously wrong.

> > A process may exit isolation because some leftover delayed work,
> > for
> > example, a timer or a workqueue, is still present on a CPU, or
> > because
> > a page fault or some other exception, normally handled silently, is
> > caused by the task. It is also possible to direct an interrupt to a
> > CPU
> > that is running an isolated task -- currently it's perfectly valid
> > to
> > set interrupt smp affinity to a CPU running isolated task, and then
> > interrupt will cause breaking isolation. While it's probably not
> > the
> > best way of handling interrupts, I would rather not prohibit this
> > explicitly.
> 
> Sure, but you can trace all these events with the existing tracing
> interface we have.

Right. However it would require someone to intentionally do tracing of
all those events, all for the purpose of obtaining a type of runtime
error. As an embedded systems developer, who had to look for signs of
unusual bugs on a large number of customers' systems, and had to
distinguish them from reports of hardware malfunctions, I would prefer
something clearly identifiable in the logs (of kernel, application, or
anything else) when no one is specifically investigating any problem.

When anything suspicious happens, often the system is physically
unreachable, and the problem may or may not happen again, so the first
report from a running system may be the only thing available. When
everything is going well, the same systems more often have hardware
failures than report valid software bugs (or, ideally, all reports are
from hardware failures), so it's much better to know that if software
will do something wrong, it would be possible to identify the problem
from the first report, rather than guess.

Sometimes equipment gets firmware updates many years after production,
when there are reports of all kinds of failures due to mechanical or
thermal damage, faulty parts, bad repair work, deteriorating flash,
etc. Among those there might be something that indicates new bugs made
by a new generation of developers (occasionally literally),
regressions, etc. In those situations getting useful information from
the error message in the first report can make a difference between
quickly identifying the problem and going on a wild goose chase.

-- 
Alex
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel