From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 644EAC433F5
	for <linux-mm@archiver.kernel.org>; Thu, 21 Apr 2022 13:06:55 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id C0B7D6B0074; Thu, 21 Apr 2022 09:06:54 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id BBB056B0075; Thu, 21 Apr 2022 09:06:54 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id A5CA56B0078; Thu, 21 Apr 2022 09:06:54 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24])
	by kanga.kvack.org (Postfix) with ESMTP id 955CA6B0074
	for <linux-mm@kvack.org>; Thu, 21 Apr 2022 09:06:54 -0400 (EDT)
Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id 635782306B
	for <linux-mm@kvack.org>; Thu, 21 Apr 2022 13:06:54 +0000 (UTC)
X-FDA: 79380911148.03.9289F63
Received: from mail-yb1-f170.google.com (mail-yb1-f170.google.com [209.85.219.170])
	by imf01.hostedemail.com (Postfix) with ESMTP id 26D7140023
	for <linux-mm@kvack.org>; Thu, 21 Apr 2022 13:06:52 +0000 (UTC)
Received: by mail-yb1-f170.google.com with SMTP id m132so8634980ybm.4
        for <linux-mm@kvack.org>; Thu, 21 Apr 2022 06:06:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=CPNtVPMGS6n30NXy9Kmuz6m/u8UhzbZkV2pRi1jdkeY=;
        b=N+bbZtE9Yv/pC4Xv0OLC9usoO8whyzxwyPzWWgAv3fam2OO/rdNwQ/+Ax77eSobRs2
         GhJ/WJT6tQV3F84wjgJrcimGHE6nTs2MpELIDCKmcfzdaAEh8N24BobQW0EQZzv+G+rK
         5sTWvaNiv6JGcD1E/PtHXcspOCTBkxY5yL8FVkvCjaCVU4uG2snOYH9u0ZPKPvBVqsqv
         Dm4Bg3TGoi0RgNQ+75/i2FKnZ5LCp/01YgQ3iLwIEbDT1Q3eoV7/NKqaAjqd04e71nMt
         AkZgKQO4OvU49Z8kjZ13myKSM9ItRfnVg3Rtj/7R0IUeq1vVstGIhPeXd1zuSeRF4K01
         gBvA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=CPNtVPMGS6n30NXy9Kmuz6m/u8UhzbZkV2pRi1jdkeY=;
        b=FCtYCku+y7iY8hQrQxQ37edn7W+Zz5LFqJtk1iwl4q9YJ7nn6iGqtbeZ6aDYLCnjzS
         25DuPmNvWA3eBoxXFMA8PnHMgFGRkXmaWBc/su2VE5Fa6opCpb1n8dKA1cGGuIahGoUJ
         9jU6HMu1N63INMpNDk0p8i6ANYXPdvPLw/FUBYeY3ldNd0oJ3tWWGhE1db4gyThlS9Oz
         Mn3q6c3UaeDgcpQs9FaEDnpe/ILEe/bxp3jkQo0+LECftWA/ozs24mIxlELAYFcv1f9j
         CIVZ4gMhZWMk7oJrSFleVawrImEyue+17dFHDXlem03XbHT9RG7oDIpQyMuP3KRm+nb5
         i0sg==
X-Gm-Message-State: AOAM533wOGIQupFrt7wUg/0tAIvZAwIX4nDSM9e0JT0MBQi1scrUrEKk
	aySRbptzuzW4KrQmV2ugYdPwomUxXkipjPLy5JAWP9ZxXGXWqg==
X-Google-Smtp-Source: ABdhPJwa1GTu0X4u3cGJrysF6fBx6QQv2TLJzKSn0j3bFBzRCTJJn4z9ehkeDPgXMcHzjjI6rbd8eL6Q9il6Mc8DBqA=
X-Received: by 2002:a25:b19b:0:b0:641:af55:af7 with SMTP id
 h27-20020a25b19b000000b00641af550af7mr25456838ybj.5.1650546408935; Thu, 21
 Apr 2022 06:06:48 -0700 (PDT)
MIME-Version: 1.0
References: <CAG_fn=Xs-OqpVCW5KyQLYKXNmQ4aH-KDjY0BrWpqMfPKcu-dug@mail.gmail.com>
 <20220421121018.60860-1-huangshaobo6@huawei.com>
In-Reply-To: <20220421121018.60860-1-huangshaobo6@huawei.com>
From: Alexander Potapenko <glider@google.com>
Date: Thu, 21 Apr 2022 15:06:10 +0200
Message-ID: <CAG_fn=UxSwgO8D2dCkM3vWPwcz0-rjvFdwr37cxYUt4awT3crA@mail.gmail.com>
Subject: Re: [PATCH] kfence: check kfence canary in panic and reboot
To: Shaobo Huang <huangshaobo6@huawei.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, chenzefeng2@huawei.com, 
	Dmitriy Vyukov <dvyukov@google.com>, Marco Elver <elver@google.com>, 
	kasan-dev <kasan-dev@googlegroups.com>, LKML <linux-kernel@vger.kernel.org>, 
	Linux Memory Management List <linux-mm@kvack.org>, nixiaoming@huawei.com, wangbing6@huawei.com, 
	wangfangpeng1@huawei.com, young.liuyang@huawei.com, zengweilin@huawei.com, 
	zhongjubin@huawei.com
Content-Type: multipart/alternative; boundary="00000000000069dfc805dd29c821"
X-Rspam-User: 
X-Rspamd-Server: rspam07
X-Rspamd-Queue-Id: 26D7140023
X-Stat-Signature: f4nzijtodgrihgzszytmbxp69f7n4udw
Authentication-Results: imf01.hostedemail.com;
	dkim=pass header.d=google.com header.s=20210112 header.b=N+bbZtE9;
	spf=pass (imf01.hostedemail.com: domain of glider@google.com designates 209.85.219.170 as permitted sender) smtp.mailfrom=glider@google.com;
	dmarc=pass (policy=reject) header.from=google.com
X-HE-Tag: 1650546412-553723
X-Bogosity: Ham, tests=bogofilter, spamicity=0.201244, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

--00000000000069dfc805dd29c821
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, Apr 21, 2022 at 2:10 PM Shaobo Huang <huangshaobo6@huawei.com>
wrote:

> > > From: huangshaobo <huangshaobo6@huawei.com>
> > >
> > > when writing out of bounds to the red zone, it can only be detected a=
t
> > > kfree. However, there were many scenarios before kfree that caused th=
is
> > > out-of-bounds write to not be detected. Therefore, it is necessary to
> > > provide a method for actively detecting out-of-bounds writing to the
> red
> > > zone, so that users can actively detect, and can be detected in the
> > > system reboot or panic.
> > >
> > >
> > After having analyzed a couple of KFENCE memory corruption reports in t=
he
> > wild, I have doubts that this approach will be helpful.
> >
> > Note that KFENCE knows nothing about the memory access that performs th=
e
> > actual corruption.
> >
> > It's rather easy to investigate corruptions of short-living objects, e.=
g.
> > those that are allocated and freed within the same function. In that
> case,
> > one can examine the region of the code between these two events and try
> to
> > understand what exactly caused the corruption.
> >
> > But for long-living objects checked at panic/reboot we'll effectively
> have
> > only the allocation stack and will have to check all the places where t=
he
> > corrupted object was potentially used.
> > Most of the time, such reports won't be actionable.
>
> The detection mechanism of kfence is probabilistic. It is not easy to fin=
d
> a bug.
> It is a pity to catch a bug without reporting it. and the cost of panic
> detection
> is not large, so panic detection is still valuable.
>
>
I am also a big fan of showing as much information as possible to help the
developers debug a memory corruption.
But I am still struggling to understand how the proposed patch helps.
Assume we have some generic allocation of an skbuff, so the reports looks
like this:

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
BUG: KFENCE: memory corruption in <frame that triggered reboot>
Corrupted memory at <end+1>
<stack trace of reboot event>

kfence-#59: <start>-<end>,size=3D100,cache=3Dkmalloc-128  allocated by task=
 77
on cpu 0 at 28.018073s:
kmem_cache_alloc
__alloc_skb
alloc_skb_with_frags
sock_alloc_send_pskb
unix_stream_sendmsg
sock_sendmsg
__sys_sendto
__x64_sys_sendto
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

This report will denote that in a system that could have been running for
days a particular skbuff was corrupted by some unknown task at some unknown
point in time.
How do we figure out what exactly caused this corruption?

When we deploy KFENCE at scale, it is rarely possible for the kernel
developer to get access to the host that reported the bug and try to
reproduce it.
With that in mind, the report (plus the kernel source) must contain all the
necessary information to address the bug, otherwise reporting it will
result in wasting the developer's time.
Moreover, if we report such bugs too often, our tool loses the credit,
which is hard to regain.

> > for example, if the application memory is out of bounds and written to
> > > the red zone in the kfence object, the system suddenly panics, and th=
e
> > > following log can be seen during system reset:
> > > BUG: KFENCE: memory corruption in atomic_notifier_call_chain+0x49/0x7=
0
> [...]
>
> thanks,
> ShaoBo Huang
>


--=20
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Stra=C3=9Fe, 33
80636 M=C3=BCnchen

Gesch=C3=A4ftsf=C3=BChrer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese f=C3=A4lschlicherweise erhalt=
en
haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter,
l=C3=B6schen Sie alle Kopien und Anh=C3=A4nge davon und lassen Sie mich bit=
te wissen,
dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by mistake,
please don't forward it to anyone else, please erase all copies and
attachments, and please let me know that it has gone to the wrong person.

--00000000000069dfc805dd29c821
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Thu, Apr 21, 2022 at 2:10 PM Shaob=
o Huang &lt;<a href=3D"mailto:huangshaobo6@huawei.com">huangshaobo6@huawei.=
com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"marg=
in:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1e=
x">&gt; &gt; From: huangshaobo &lt;<a href=3D"mailto:huangshaobo6@huawei.co=
m" target=3D"_blank">huangshaobo6@huawei.com</a>&gt;<br>
&gt; &gt;<br>
&gt; &gt; when writing out of bounds to the red zone, it can only be detect=
ed at<br>
&gt; &gt; kfree. However, there were many scenarios before kfree that cause=
d this<br>
&gt; &gt; out-of-bounds write to not be detected. Therefore, it is necessar=
y to<br>
&gt; &gt; provide a method for actively detecting out-of-bounds writing to =
the red<br>
&gt; &gt; zone, so that users can actively detect, and can be detected in t=
he<br>
&gt; &gt; system reboot or panic.<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; After having analyzed a couple of KFENCE memory corruption reports in =
the<br>
&gt; wild, I have doubts that this approach will be helpful.<br>
&gt; <br>
&gt; Note that KFENCE knows nothing about the memory access that performs t=
he<br>
&gt; actual corruption.<br>
&gt; <br>
&gt; It&#39;s rather easy to investigate corruptions of short-living object=
s, e.g.<br>
&gt; those that are allocated and freed within the same function. In that c=
ase,<br>
&gt; one can examine the region of the code between these two events and tr=
y to<br>
&gt; understand what exactly caused the corruption.<br>
&gt; <br>
&gt; But for long-living objects checked at panic/reboot we&#39;ll effectiv=
ely have<br>
&gt; only the allocation stack and will have to check all the places where =
the<br>
&gt; corrupted object was potentially used.<br>
&gt; Most of the time, such reports won&#39;t be actionable.<br>
<br>
The detection mechanism of kfence is probabilistic. It is not easy to find =
a bug.<br>
It is a pity to catch a bug without reporting it. and the cost of panic det=
ection<br>
is not large, so panic detection is still valuable.<br>
<br></blockquote><div><br></div><div>I am also a big fan of showing as much=
 information as possible to help the developers debug a memory corruption.<=
/div><div>But I am still struggling to understand how the proposed patch he=
lps.</div><div>Assume we have some generic allocation of an skbuff, so the =
reports looks like this:</div><div><br></div><div>=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</div><div>BUG: KFENCE: memory corrupti=
on in &lt;frame that triggered reboot&gt;</div><div>Corrupted memory at &lt=
;end+1&gt;</div><div>&lt;stack trace of reboot event&gt;</div><div><br></di=
v>kfence-#59: &lt;start&gt;-&lt;end&gt;,size=3D100,cache=3Dkmalloc-128=C2=
=A0=C2=A0allocated by task 77 on cpu 0 at 28.018073s:<br>kmem_cache_alloc<b=
r>__alloc_skb<br>alloc_skb_with_frags<br>sock_alloc_send_pskb<br>unix_strea=
m_sendmsg<br>sock_sendmsg<br>__sys_sendto<br>__x64_sys_sendto<br><div>=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</div><div>=C2=A0<=
/div><div>This report will denote that in a system that could have been run=
ning for days a particular skbuff was corrupted by some unknown task at som=
e unknown point in time.</div><div>How do we figure out what exactly caused=
 this corruption?</div><div><br></div><div>When we deploy KFENCE at scale, =
it is rarely possible for the kernel developer to get access to the host th=
at reported the bug and try to reproduce it.</div><div>With that in mind, t=
he report (plus the kernel source) must contain all the necessary informati=
on to address the bug, otherwise reporting it will result in wasting the de=
veloper&#39;s time.</div><div>Moreover, if we report such bugs too often, o=
ur tool loses the credit, which is hard to regain.</div><div><br></div><blo=
ckquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left=
:1px solid rgb(204,204,204);padding-left:1ex">&gt; &gt; for example, if the=
 application memory is out of bounds and written to<br>
&gt; &gt; the red zone in the kfence object, the system suddenly panics, an=
d the<br>
&gt; &gt; following log can be seen during system reset:<br>
&gt; &gt; BUG: KFENCE: memory corruption in atomic_notifier_call_chain+0x49=
/0x70<br>
[...]<br>
<br>
thanks,<br>
ShaoBo Huang<br>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail_signature"><div dir=3D"ltr">Alexander Potapenko<br>Software=
 Engineer<br><br>Google Germany GmbH<br>Erika-Mann-Stra=C3=9Fe, 33<br>80636=
 M=C3=BCnchen<br><br>Gesch=C3=A4ftsf=C3=BChrer: Paul Manicle, Liana Sebasti=
an<br>Registergericht und -nummer: Hamburg, HRB 86891<br>Sitz der Gesellsch=
aft: Hamburg<br><br>Diese E-Mail ist vertraulich. Falls Sie diese f=C3=A4ls=
chlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jeman=
d anderes weiter, l=C3=B6schen Sie alle Kopien und Anh=C3=A4nge davon und l=
assen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet=
 wurde.<br><br><br>This e-mail is confidential. If you received this commun=
ication by mistake, please don&#39;t forward it to anyone else, please eras=
e all copies and attachments, and please let me know that it has gone to th=
e wrong person.</div></div></div>

--00000000000069dfc805dd29c821--