From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3587AC10F27 for ; Tue, 10 Mar 2020 10:41:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DF7D62071B for ; Tue, 10 Mar 2020 10:41:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="rfN6Lx7B" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DF7D62071B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7D0666B0005; Tue, 10 Mar 2020 06:41:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 780676B0006; Tue, 10 Mar 2020 06:41:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66F016B0007; Tue, 10 Mar 2020 06:41:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0049.hostedemail.com [216.40.44.49]) by kanga.kvack.org (Postfix) with ESMTP id 4E05D6B0005 for ; Tue, 10 Mar 2020 06:41:52 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2CF15181AC9CC for ; Tue, 10 Mar 2020 10:41:52 +0000 (UTC) X-FDA: 76579112064.11.house58_84ae4f8789729 X-HE-Tag: house58_84ae4f8789729 X-Filterd-Recvd-Size: 9798 Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Tue, 10 Mar 2020 10:41:51 +0000 (UTC) Received: by mail-lj1-f193.google.com with SMTP id u12so9316192ljo.2 for ; Tue, 10 Mar 2020 03:41:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=0vZawaTqWCz2V8Z7mamXXKPhhUtkT0sw/sdFRGO2vlw=; b=rfN6Lx7BOmqQhhOcLZn74EbzqdvStmveWhgz/2wPqRS6Vsou7fptHHWEIrnS4TE5yp JTpNP9cVTh9+ScV8dItrwHNxAxXeH2M366CwIpOP/wfCYI8Lr6gWHzuSx2wtV3+CotGf vS8wvdPooB2/guIp7F2u9RYKDRt/5+8VJfG61EPWc1IiA6LEanZkA62sFK/IyPHDj3c4 XISld2tme2+mLRcphKCh3phxQo89djJLpcsShu4ldtlDr+zRUmlW7K6sHnBNcoi5Waik td79ng96wZGYcrIub2d9UrMG4QLXkEFBlTDCdk8EaNGbiEKhTJBuuRRONadmzh/i9mG0 Nigw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=0vZawaTqWCz2V8Z7mamXXKPhhUtkT0sw/sdFRGO2vlw=; b=hyXViG1P7oTKDjtNfVDCrlvpKHMZgKjbdM9uX1i1OD86OZ7XJ7Zj8R82bMEidFVOvl FsSqZA9tzJwhdkvEFPUQibSflJNkfsg865w2N/szE1W0U0E2I9XAFlczz357rwPEbwMX VQAttsmtgTBNhtdubRyC/JlsfpkTPOeOkbN66sLebfMUQpG/J0AglDEytAJB8SH+mSGY QVMHdz/2tz1qCYU4n9vwRf/Ejo4GYO7Crp2sFkaXv87UfYyZFVvpJc+aFLxY6u6GO1Jk iLAf1en70Dj0d/4baE1NdoO7GJev7K7xVsqMAqyMH0IF8eEYXeOcNFhDiXoBPiVt4/ub YGPw== X-Gm-Message-State: ANhLgQ3IhLMg0ilLmyf/1ArZD3GR0HsmbgEBV2PQ4ydpNQFn7S8P2tgR sbuAUjK1BpOoQ2B5rYmhfFs/zA== X-Google-Smtp-Source: ADFU+vtWe5qAVNKXwQBYPCtcvZJH5it9LZ15Lowysh7WzMF8KSRZeQGnQWR4Gbgv9g0YFkPOui+jIA== X-Received: by 2002:a2e:7c03:: with SMTP id x3mr12742730ljc.104.1583836909643; Tue, 10 Mar 2020 03:41:49 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id h2sm2079300ljm.103.2020.03.10.03.41.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Mar 2020 03:41:48 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id 27FEF1013FA; Tue, 10 Mar 2020 13:41:49 +0300 (+03) Date: Tue, 10 Mar 2020 13:41:49 +0300 From: "Kirill A. Shutemov" To: Michal Hocko Cc: brookxu , hannes@cmpxchg.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCHv2] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event Message-ID: <20200310104149.5c3pc75y6ny5hixb@box> References: <077a6f67-aefa-4591-efec-f2f3af2b0b02@gmail.com> <20200310094836.GD8447@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20200310094836.GD8447@dhcp22.suse.cz> Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Mar 10, 2020 at 10:48:36AM +0100, Michal Hocko wrote: > [Cc Kirill, I didn't realize he has implemented this code] My first non-trivial mm contribution :P > On Fri 06-03-20 09:02:02, brookxu wrote: > > From: Chunguang Xu > >=20 > > An eventfd monitors multiple memory thresholds of the cgroup, closes = them, > > the kernel deletes all events related to this eventfd. Before all eve= nts > > are deleted, another eventfd monitors the memory threshold of this cg= roup, > > leading to a crash: > >=20 > > [=A0 135.675108] BUG: kernel NULL pointer dereference, address: 00000= 00000000004 > > [=A0 135.675350] #PF: supervisor write access in kernel mode > > [=A0 135.675579] #PF: error_code(0x0002) - not-present page > > [=A0 135.675816] PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce= 067 PMD 0 > > [=A0 135.676080] Oops: 0002 [#1] SMP PTI > > [=A0 135.676332] CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded No= t tainted 5.6.0-rc4 #3 > > [=A0 135.676610] Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GL= ET70WW (2.24 ) 05/21/2014 > > [=A0 135.676909] Workqueue: events memcg_event_remove > > [=A0 135.677192] RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0= x190 > > [=A0 135.677825] RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202 > > [=A0 135.678186] RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 000= 0000000000001 > > [=A0 135.678548] RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 000= 0000000000001 > > [=A0 135.678912] RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 000= 0000000000010 > > [=A0 135.679287] R10: 000000000000000c R11: 071c71c71c71c71c R12: fff= f8bb226aba880 > > [=A0 135.679670] R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 000= 0000000000000 > > [=A0 135.680066] FS:=A0 0000000000000000(0000) GS:ffff8bb242680000(00= 00) knlGS:0000000000000000 > > [=A0 135.680475] CS:=A0 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [=A0 135.680894] CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 000= 00000001606e0 > > [=A0 135.681325] Call Trace: > > [=A0 135.681763]=A0 memcg_event_remove+0x32/0x90 > > [=A0 135.682209]=A0 process_one_work+0x172/0x380 > > [=A0 135.682657]=A0 worker_thread+0x49/0x3f0 > > [=A0 135.683111]=A0 kthread+0xf8/0x130 > > [=A0 135.683570]=A0 ? max_active_store+0x80/0x80 > > [=A0 135.684034]=A0 ? kthread_bind+0x10/0x10 > > [=A0 135.684506]=A0 ret_from_fork+0x35/0x40 > > [=A0 135.689733] CR2: 0000000000000004 > >=20 > > We can reproduce this problem in the following ways: > > =A0 > > 1. We create a new cgroup subdirectory and a new eventfd, and then we > > =A0=A0 monitor multiple memory thresholds of the cgroup through this = eventfd. > > 2. closing this eventfd, and __mem_cgroup_usage_unregister_event () w= ill be > > =A0=A0 called multiple times to delete all events related to this eve= ntfd. > >=20 > > The first time __mem_cgroup_usage_unregister_event() is called, the k= ernel > > will clear all items related to this eventfd in thresholds-> primary.= Since > > there is currently only one eventfd, thresholds-> primary becomes emp= ty, > > so the kernel will set thresholds-> primary and hresholds-> spare to = NULL. ^ typo > > If at this time, the user creates a new eventfd and monitor the memor= y > > threshold of this cgroup, kernel will re-initialize thresholds-> prim= ary. > > Then when __mem_cgroup_usage_unregister_event () is called for the se= cond > > time, because thresholds-> primary is not empty, the system will acce= ss > > thresholds-> spare, but thresholds-> spare is NULL, which will trigge= r a > > crash. > >=20 > > In general, the longer it takes to delete all events related to this > > eventfd, the easier it is to trigger this problem. > >=20 > > The solution is to check whether the thresholds associated with the e= ventfd > > has been cleared when deleting the event. If so, we do nothing. > >=20 > > Signed-off-by: Chunguang Xu >=20 > The fix looks reasonable to me > Acked-by: Michal Hocko Agreed. Two typos have to be addressed. Acked-by: Kirill A. Shutemov > It seems that the code has been broken since 2c488db27b61 ("memcg: clea= n > up memory thresholds"). We've had 371528caec55 ("mm: memcg: Correct > unregistring of events attached to the same eventfd") but it didn't > catch this case for some reason. Unless I am missing something the code > was broken back then already. Kirill please double check after me. I think the issue exitsted before 2c488db27b61. The fields had different names back then. The logic to make unregister never-fail is added in 907860ed381a ("cgroups: make cftype.unregister_event() void-returning"). I believe the Fixes should point there. >=20 > So if I am not wrong then we want > Fixes: 2c488db27b61 ("memcg: clean up memory thresholds") > Cc: stable >=20 > sounds appropriate because this seems to be user trigerable. >=20 > Thanks for preparing the patch! >=20 > Btw. you should double check your email sender because it seemed to > whitespace damaged the patch (\t -> spaces). Please use git send-email > instead. >=20 > > --- > > =A0mm/memcontrol.c | 10 ++++++++-- > > =A01 file changed, 8 insertions(+), 2 deletions(-) > >=20 > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index d09776c..4575a58 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -4027,7 +4027,7 @@ static void __mem_cgroup_usage_unregister_event= (struct mem_cgroup *memcg, > > =A0=A0=A0=A0 struct mem_cgroup_thresholds *thresholds; > > =A0=A0=A0=A0 struct mem_cgroup_threshold_ary *new; > > =A0=A0=A0=A0 unsigned long usage; > > -=A0=A0=A0 int i, j, size; > > +=A0=A0=A0 int i, j, size, entries; > > =A0 > > =A0=A0=A0=A0 mutex_lock(&memcg->thresholds_lock); > > =A0 > > @@ -4047,12 +4047,18 @@ static void __mem_cgroup_usage_unregister_eve= nt(struct mem_cgroup *memcg, > > =A0=A0=A0=A0 __mem_cgroup_threshold(memcg, type =3D=3D _MEMSWAP); > > =A0 > > =A0=A0=A0=A0 /* Calculate new number of threshold */ > > -=A0=A0=A0 size =3D 0; > > +=A0=A0=A0 size =3D entries =3D 0; > > =A0=A0=A0=A0 for (i =3D 0; i < thresholds->primary->size; i++) { > > =A0=A0=A0=A0 =A0=A0=A0 if (thresholds->primary->entries[i].eventfd !=3D= eventfd) > > =A0=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 size++; > > +=A0=A0=A0 =A0=A0=A0 else > > +=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 entries++; > > =A0=A0=A0=A0 } > > =A0 > > +=A0=A0=A0 /* If items related to eventfd have been cleared, nothing = to do */ ^ "no items" ? > > +=A0=A0=A0 if (!entries) > > +=A0=A0=A0 =A0=A0=A0 goto unlock; > > + > > =A0=A0=A0=A0 new =3D thresholds->spare; > > =A0 > > =A0=A0=A0=A0 /* Set thresholds array to NULL if we don't have thresho= lds */ > > --=20 > > 1.8.3.1 >=20 > --=20 > Michal Hocko > SUSE Labs --=20 Kirill A. Shutemov