From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=3.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1274C3F2D2 for ; Fri, 6 Mar 2020 01:02:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6BDE12073B for ; Fri, 6 Mar 2020 01:02:06 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hHE1MjCr" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6BDE12073B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AF8C96B0005; Thu, 5 Mar 2020 20:02:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A814F6B0006; Thu, 5 Mar 2020 20:02:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 949EB6B0007; Thu, 5 Mar 2020 20:02:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0152.hostedemail.com [216.40.44.152]) by kanga.kvack.org (Postfix) with ESMTP id 7611A6B0005 for ; Thu, 5 Mar 2020 20:02:05 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 412EA180AD801 for ; Fri, 6 Mar 2020 01:02:05 +0000 (UTC) X-FDA: 76563135810.16.tree80_4c642e80bc149 X-HE-Tag: tree80_4c642e80bc149 X-Filterd-Recvd-Size: 7711 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Fri, 6 Mar 2020 01:02:04 +0000 (UTC) Received: by mail-pl1-f196.google.com with SMTP id f8so129642plt.10 for ; Thu, 05 Mar 2020 17:02:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:cc:from:subject:message-id:date:user-agent:mime-version :content-transfer-encoding:content-language; bh=s9ClMJP59BvOKqZ7+ZwFacTL/134BXRc25KCFEP50vM=; b=hHE1MjCrbopahUEUJA0MBev6uuhPh04ezUqS+U9gs3v87CSPHNRvHUe5y3vhIY+Veu 8k7OvdtHYOlJ+GHoMP159fFubVq2enhZE/YDm6qkE9wOXaTkhb3nVS1LYJ98f/7nPrOo 9OMR7ttude+Ee+jkioPNLEPhxOJLJbB6KoBH5+t7b4zrze2CNFX0/LNev4uorgaMS8Ry ECyF2h90/wVp89pkoI7s9k+LLfcsxRCrV7P6/+HvefPgS1vIWuWZahAzStzGg8QkG97h gkNdJOhz7EulOLcJbb6Lx/pC0klUfwTPn4Y71gJ9j6qI2eA4kBGVDDnhzfnxYuRpH17B lE5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:from:subject:message-id:date:user-agent :mime-version:content-transfer-encoding:content-language; bh=s9ClMJP59BvOKqZ7+ZwFacTL/134BXRc25KCFEP50vM=; b=LJcvOESEzmfcBCtPbsWH7VOJRsTfNDRSYFNNIh1MfijSfX1R/kjqeE5NsexAPV2c/C walRvt77Lac+twWlG+ylDnYRrK7EpZfyeW7UCwTKx7nkPWBT4LchulB2ixq0eUK66Fwb pTNs4Gvm3Q9z0IqS3phas9u3yc8wYKwzma2olOmY4pnGU9D+/fAixrzjq4NNRRHhHHKW W5pk3Jsxnvz9Qf9vZgbinmDVA16Lka9XRbx7OrgXS6C+ccYiAaU6s4HGCKbDTja8nSJL 20TPBs0Vp5VP+UZKE9+jRag/6khZdj/maiKBT3MqbFSAIzb9elBaGlAbxCLOltqEXTeU VMyQ== X-Gm-Message-State: ANhLgQ0UXk72ZqZTzyHtrt+qC/ID+2bFrfvTtLW/fwQJTPEyRxkzv1UO CLHox6XY4kEiosTElZc8YH8= X-Google-Smtp-Source: ADFU+vvzwQsR40z/L2TzcX4QBzVxBy8GBfX7YMwdGTKA0r38HwP+MhMQ7Kyo8hS7hBNuBxtPmvvEBQ== X-Received: by 2002:a17:90a:630b:: with SMTP id e11mr858912pjj.53.1583456523712; Thu, 05 Mar 2020 17:02:03 -0800 (PST) Received: from [10.80.50.61] ([203.205.141.39]) by smtp.gmail.com with ESMTPSA id y7sm21215094pfq.15.2020.03.05.17.02.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 05 Mar 2020 17:02:03 -0800 (PST) To: Michal Hocko Cc: hannes@cmpxchg.org, vdavydov.dev@gmail.com, akpm@linux-foundation.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org From: brookxu Subject: [PATCHv2] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event Message-ID: <077a6f67-aefa-4591-efec-f2f3af2b0b02@gmail.com> Date: Fri, 6 Mar 2020 09:02:02 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Chunguang Xu An eventfd monitors multiple memory thresholds of the cgroup, closes them= , the kernel deletes all events related to this eventfd. Before all events are deleted, another eventfd monitors the memory threshold of this cgroup= , leading to a crash: [=C2=A0 135.675108] BUG: kernel NULL pointer dereference, address: 000000= 0000000004 [=C2=A0 135.675350] #PF: supervisor write access in kernel mode [=C2=A0 135.675579] #PF: error_code(0x0002) - not-present page [=C2=A0 135.675816] PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce0= 67 PMD 0 [=C2=A0 135.676080] Oops: 0002 [#1] SMP PTI [=C2=A0 135.676332] CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not= tainted 5.6.0-rc4 #3 [=C2=A0 135.676610] Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLE= T70WW (2.24 ) 05/21/2014 [=C2=A0 135.676909] Workqueue: events memcg_event_remove [=C2=A0 135.677192] RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x= 190 [=C2=A0 135.677825] RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202 [=C2=A0 135.678186] RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000= 000000000001 [=C2=A0 135.678548] RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000= 000000000001 [=C2=A0 135.678912] RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000= 000000000010 [=C2=A0 135.679287] R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff= 8bb226aba880 [=C2=A0 135.679670] R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000= 000000000000 [=C2=A0 135.680066] FS:=C2=A0 0000000000000000(0000) GS:ffff8bb242680000(= 0000) knlGS:0000000000000000 [=C2=A0 135.680475] CS:=C2=A0 0010 DS: 0000 ES: 0000 CR0: 000000008005003= 3 [=C2=A0 135.680894] CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 0000= 0000001606e0 [=C2=A0 135.681325] Call Trace: [=C2=A0 135.681763]=C2=A0 memcg_event_remove+0x32/0x90 [=C2=A0 135.682209]=C2=A0 process_one_work+0x172/0x380 [=C2=A0 135.682657]=C2=A0 worker_thread+0x49/0x3f0 [=C2=A0 135.683111]=C2=A0 kthread+0xf8/0x130 [=C2=A0 135.683570]=C2=A0 ? max_active_store+0x80/0x80 [=C2=A0 135.684034]=C2=A0 ? kthread_bind+0x10/0x10 [=C2=A0 135.684506]=C2=A0 ret_from_fork+0x35/0x40 [=C2=A0 135.689733] CR2: 0000000000000004 We can reproduce this problem in the following ways: =C2=A0 1. We create a new cgroup subdirectory and a new eventfd, and then we =C2=A0=C2=A0 monitor multiple memory thresholds of the cgroup through thi= s eventfd. 2. closing this eventfd, and __mem_cgroup_usage_unregister_event () will = be =C2=A0=C2=A0 called multiple times to delete all events related to this e= ventfd. The first time __mem_cgroup_usage_unregister_event() is called, the kerne= l will clear all items related to this eventfd in thresholds-> primary.Sinc= e there is currently only one eventfd, thresholds-> primary becomes empty, so the kernel will set thresholds-> primary and hresholds-> spare to NULL= . If at this time, the user creates a new eventfd and monitor the memory threshold of this cgroup, kernel will re-initialize thresholds-> primary. Then when __mem_cgroup_usage_unregister_event () is called for the second time, because thresholds-> primary is not empty, the system will access thresholds-> spare, but thresholds-> spare is NULL, which will trigger a crash. In general, the longer it takes to delete all events related to this eventfd, the easier it is to trigger this problem. The solution is to check whether the thresholds associated with the event= fd has been cleared when deleting the event. If so, we do nothing. Signed-off-by: Chunguang Xu --- =C2=A0mm/memcontrol.c | 10 ++++++++-- =C2=A01 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index d09776c..4575a58 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4027,7 +4027,7 @@ static void __mem_cgroup_usage_unregister_event(str= uct mem_cgroup *memcg, =C2=A0=C2=A0=C2=A0=C2=A0 struct mem_cgroup_thresholds *thresholds; =C2=A0=C2=A0=C2=A0=C2=A0 struct mem_cgroup_threshold_ary *new; =C2=A0=C2=A0=C2=A0=C2=A0 unsigned long usage; -=C2=A0=C2=A0=C2=A0 int i, j, size; +=C2=A0=C2=A0=C2=A0 int i, j, size, entries; =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0 mutex_lock(&memcg->thresholds_lock); =C2=A0 @@ -4047,12 +4047,18 @@ static void __mem_cgroup_usage_unregister_event(s= truct mem_cgroup *memcg, =C2=A0=C2=A0=C2=A0=C2=A0 __mem_cgroup_threshold(memcg, type =3D=3D _MEMSW= AP); =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0 /* Calculate new number of threshold */ -=C2=A0=C2=A0=C2=A0 size =3D 0; +=C2=A0=C2=A0=C2=A0 size =3D entries =3D 0; =C2=A0=C2=A0=C2=A0=C2=A0 for (i =3D 0; i < thresholds->primary->size; i++= ) { =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (thresholds->primary->entr= ies[i].eventfd !=3D eventfd) =C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 size++; +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 else +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 entries++; =C2=A0=C2=A0=C2=A0=C2=A0 } =C2=A0 +=C2=A0=C2=A0=C2=A0 /* If items related to eventfd have been cleared, not= hing to do */ +=C2=A0=C2=A0=C2=A0 if (!entries) +=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 goto unlock; + =C2=A0=C2=A0=C2=A0=C2=A0 new =3D thresholds->spare; =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0 /* Set thresholds array to NULL if we don't have= thresholds */ --=20 1.8.3.1