From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 644EAC433F5 for ; Thu, 21 Apr 2022 13:06:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C0B7D6B0074; Thu, 21 Apr 2022 09:06:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BBB056B0075; Thu, 21 Apr 2022 09:06:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A5CA56B0078; Thu, 21 Apr 2022 09:06:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 955CA6B0074 for ; Thu, 21 Apr 2022 09:06:54 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 635782306B for ; Thu, 21 Apr 2022 13:06:54 +0000 (UTC) X-FDA: 79380911148.03.9289F63 Received: from mail-yb1-f170.google.com (mail-yb1-f170.google.com [209.85.219.170]) by imf01.hostedemail.com (Postfix) with ESMTP id 26D7140023 for ; Thu, 21 Apr 2022 13:06:52 +0000 (UTC) Received: by mail-yb1-f170.google.com with SMTP id m132so8634980ybm.4 for ; Thu, 21 Apr 2022 06:06:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=CPNtVPMGS6n30NXy9Kmuz6m/u8UhzbZkV2pRi1jdkeY=; b=N+bbZtE9Yv/pC4Xv0OLC9usoO8whyzxwyPzWWgAv3fam2OO/rdNwQ/+Ax77eSobRs2 GhJ/WJT6tQV3F84wjgJrcimGHE6nTs2MpELIDCKmcfzdaAEh8N24BobQW0EQZzv+G+rK 5sTWvaNiv6JGcD1E/PtHXcspOCTBkxY5yL8FVkvCjaCVU4uG2snOYH9u0ZPKPvBVqsqv Dm4Bg3TGoi0RgNQ+75/i2FKnZ5LCp/01YgQ3iLwIEbDT1Q3eoV7/NKqaAjqd04e71nMt AkZgKQO4OvU49Z8kjZ13myKSM9ItRfnVg3Rtj/7R0IUeq1vVstGIhPeXd1zuSeRF4K01 gBvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=CPNtVPMGS6n30NXy9Kmuz6m/u8UhzbZkV2pRi1jdkeY=; b=FCtYCku+y7iY8hQrQxQ37edn7W+Zz5LFqJtk1iwl4q9YJ7nn6iGqtbeZ6aDYLCnjzS 25DuPmNvWA3eBoxXFMA8PnHMgFGRkXmaWBc/su2VE5Fa6opCpb1n8dKA1cGGuIahGoUJ 9jU6HMu1N63INMpNDk0p8i6ANYXPdvPLw/FUBYeY3ldNd0oJ3tWWGhE1db4gyThlS9Oz Mn3q6c3UaeDgcpQs9FaEDnpe/ILEe/bxp3jkQo0+LECftWA/ozs24mIxlELAYFcv1f9j CIVZ4gMhZWMk7oJrSFleVawrImEyue+17dFHDXlem03XbHT9RG7oDIpQyMuP3KRm+nb5 i0sg== X-Gm-Message-State: AOAM533wOGIQupFrt7wUg/0tAIvZAwIX4nDSM9e0JT0MBQi1scrUrEKk aySRbptzuzW4KrQmV2ugYdPwomUxXkipjPLy5JAWP9ZxXGXWqg== X-Google-Smtp-Source: ABdhPJwa1GTu0X4u3cGJrysF6fBx6QQv2TLJzKSn0j3bFBzRCTJJn4z9ehkeDPgXMcHzjjI6rbd8eL6Q9il6Mc8DBqA= X-Received: by 2002:a25:b19b:0:b0:641:af55:af7 with SMTP id h27-20020a25b19b000000b00641af550af7mr25456838ybj.5.1650546408935; Thu, 21 Apr 2022 06:06:48 -0700 (PDT) MIME-Version: 1.0 References: <20220421121018.60860-1-huangshaobo6@huawei.com> In-Reply-To: <20220421121018.60860-1-huangshaobo6@huawei.com> From: Alexander Potapenko Date: Thu, 21 Apr 2022 15:06:10 +0200 Message-ID: Subject: Re: [PATCH] kfence: check kfence canary in panic and reboot To: Shaobo Huang Cc: Andrew Morton , chenzefeng2@huawei.com, Dmitriy Vyukov , Marco Elver , kasan-dev , LKML , Linux Memory Management List , nixiaoming@huawei.com, wangbing6@huawei.com, wangfangpeng1@huawei.com, young.liuyang@huawei.com, zengweilin@huawei.com, zhongjubin@huawei.com Content-Type: multipart/alternative; boundary="00000000000069dfc805dd29c821" X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 26D7140023 X-Stat-Signature: f4nzijtodgrihgzszytmbxp69f7n4udw Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=N+bbZtE9; spf=pass (imf01.hostedemail.com: domain of glider@google.com designates 209.85.219.170 as permitted sender) smtp.mailfrom=glider@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1650546412-553723 X-Bogosity: Ham, tests=bogofilter, spamicity=0.201244, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --00000000000069dfc805dd29c821 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Apr 21, 2022 at 2:10 PM Shaobo Huang wrote: > > > From: huangshaobo > > > > > > when writing out of bounds to the red zone, it can only be detected a= t > > > kfree. However, there were many scenarios before kfree that caused th= is > > > out-of-bounds write to not be detected. Therefore, it is necessary to > > > provide a method for actively detecting out-of-bounds writing to the > red > > > zone, so that users can actively detect, and can be detected in the > > > system reboot or panic. > > > > > > > > After having analyzed a couple of KFENCE memory corruption reports in t= he > > wild, I have doubts that this approach will be helpful. > > > > Note that KFENCE knows nothing about the memory access that performs th= e > > actual corruption. > > > > It's rather easy to investigate corruptions of short-living objects, e.= g. > > those that are allocated and freed within the same function. In that > case, > > one can examine the region of the code between these two events and try > to > > understand what exactly caused the corruption. > > > > But for long-living objects checked at panic/reboot we'll effectively > have > > only the allocation stack and will have to check all the places where t= he > > corrupted object was potentially used. > > Most of the time, such reports won't be actionable. > > The detection mechanism of kfence is probabilistic. It is not easy to fin= d > a bug. > It is a pity to catch a bug without reporting it. and the cost of panic > detection > is not large, so panic detection is still valuable. > > I am also a big fan of showing as much information as possible to help the developers debug a memory corruption. But I am still struggling to understand how the proposed patch helps. Assume we have some generic allocation of an skbuff, so the reports looks like this: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D BUG: KFENCE: memory corruption in Corrupted memory at kfence-#59: -,size=3D100,cache=3Dkmalloc-128 allocated by task= 77 on cpu 0 at 28.018073s: kmem_cache_alloc __alloc_skb alloc_skb_with_frags sock_alloc_send_pskb unix_stream_sendmsg sock_sendmsg __sys_sendto __x64_sys_sendto =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D This report will denote that in a system that could have been running for days a particular skbuff was corrupted by some unknown task at some unknown point in time. How do we figure out what exactly caused this corruption? When we deploy KFENCE at scale, it is rarely possible for the kernel developer to get access to the host that reported the bug and try to reproduce it. With that in mind, the report (plus the kernel source) must contain all the necessary information to address the bug, otherwise reporting it will result in wasting the developer's time. Moreover, if we report such bugs too often, our tool loses the credit, which is hard to regain. > > for example, if the application memory is out of bounds and written to > > > the red zone in the kfence object, the system suddenly panics, and th= e > > > following log can be seen during system reset: > > > BUG: KFENCE: memory corruption in atomic_notifier_call_chain+0x49/0x7= 0 > [...] > > thanks, > ShaoBo Huang > --=20 Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Stra=C3=9Fe, 33 80636 M=C3=BCnchen Gesch=C3=A4ftsf=C3=BChrer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg Diese E-Mail ist vertraulich. Falls Sie diese f=C3=A4lschlicherweise erhalt= en haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, l=C3=B6schen Sie alle Kopien und Anh=C3=A4nge davon und lassen Sie mich bit= te wissen, dass die E-Mail an die falsche Person gesendet wurde. This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person. --00000000000069dfc805dd29c821 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Thu, Apr 21, 2022 at 2:10 PM Shaob= o Huang <huangshaobo6@huawei.= com> wrote:
> > From: huangshaobo <huangshaobo6@huawei.com>
> >
> > when writing out of bounds to the red zone, it can only be detect= ed at
> > kfree. However, there were many scenarios before kfree that cause= d this
> > out-of-bounds write to not be detected. Therefore, it is necessar= y to
> > provide a method for actively detecting out-of-bounds writing to = the red
> > zone, so that users can actively detect, and can be detected in t= he
> > system reboot or panic.
> >
> >
> After having analyzed a couple of KFENCE memory corruption reports in = the
> wild, I have doubts that this approach will be helpful.
>
> Note that KFENCE knows nothing about the memory access that performs t= he
> actual corruption.
>
> It's rather easy to investigate corruptions of short-living object= s, e.g.
> those that are allocated and freed within the same function. In that c= ase,
> one can examine the region of the code between these two events and tr= y to
> understand what exactly caused the corruption.
>
> But for long-living objects checked at panic/reboot we'll effectiv= ely have
> only the allocation stack and will have to check all the places where = the
> corrupted object was potentially used.
> Most of the time, such reports won't be actionable.

The detection mechanism of kfence is probabilistic. It is not easy to find = a bug.
It is a pity to catch a bug without reporting it. and the cost of panic det= ection
is not large, so panic detection is still valuable.


I am also a big fan of showing as much= information as possible to help the developers debug a memory corruption.<= /div>
But I am still struggling to understand how the proposed patch he= lps.
Assume we have some generic allocation of an skbuff, so the = reports looks like this:

=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
BUG: KFENCE: memory corrupti= on in <frame that triggered reboot>
Corrupted memory at <= ;end+1>
<stack trace of reboot event>

kfence-#59: <start>-<end>,size=3D100,cache=3Dkmalloc-128=C2= =A0=C2=A0allocated by task 77 on cpu 0 at 28.018073s:
kmem_cache_alloc__alloc_skb
alloc_skb_with_frags
sock_alloc_send_pskb
unix_strea= m_sendmsg
sock_sendmsg
__sys_sendto
__x64_sys_sendto
=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=C2=A0<= /div>
This report will denote that in a system that could have been run= ning for days a particular skbuff was corrupted by some unknown task at som= e unknown point in time.
How do we figure out what exactly caused= this corruption?

When we deploy KFENCE at scale, = it is rarely possible for the kernel developer to get access to the host th= at reported the bug and try to reproduce it.
With that in mind, t= he report (plus the kernel source) must contain all the necessary informati= on to address the bug, otherwise reporting it will result in wasting the de= veloper's time.
Moreover, if we report such bugs too often, o= ur tool loses the credit, which is hard to regain.

> > for example, if the= application memory is out of bounds and written to
> > the red zone in the kfence object, the system suddenly panics, an= d the
> > following log can be seen during system reset:
> > BUG: KFENCE: memory corruption in atomic_notifier_call_chain+0x49= /0x70
[...]

thanks,
ShaoBo Huang


--
Alexander Potapenko
Software= Engineer

Google Germany GmbH
Erika-Mann-Stra=C3=9Fe, 33
80636= M=C3=BCnchen

Gesch=C3=A4ftsf=C3=BChrer: Paul Manicle, Liana Sebasti= an
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellsch= aft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese f=C3=A4ls= chlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jeman= d anderes weiter, l=C3=B6schen Sie alle Kopien und Anh=C3=A4nge davon und l= assen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet= wurde.


This e-mail is confidential. If you received this commun= ication by mistake, please don't forward it to anyone else, please eras= e all copies and attachments, and please let me know that it has gone to th= e wrong person.
--00000000000069dfc805dd29c821--