From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83BA8C433E0 for ; Fri, 19 Mar 2021 07:15:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C907C64DFF for ; Fri, 19 Mar 2021 07:15:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C907C64DFF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3EE146B006E; Fri, 19 Mar 2021 03:15:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C5086B0071; Fri, 19 Mar 2021 03:15:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 23EDB6B0072; Fri, 19 Mar 2021 03:15:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0057.hostedemail.com [216.40.44.57]) by kanga.kvack.org (Postfix) with ESMTP id 063C36B006E for ; Fri, 19 Mar 2021 03:15:51 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id AC7388249980 for ; Fri, 19 Mar 2021 07:15:50 +0000 (UTC) X-FDA: 77935764060.04.363575F Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com [209.85.222.178]) by imf19.hostedemail.com (Postfix) with ESMTP id 076ED90009EA for ; Fri, 19 Mar 2021 07:15:49 +0000 (UTC) Received: by mail-qk1-f178.google.com with SMTP id g20so1924536qkk.1 for ; Fri, 19 Mar 2021 00:15:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=ach62O71zJmjQaxBWOhbiCSRi6YbNUmXSSxM/3dIo+Q=; b=LLPxssSQQyhHC/5j0K/BnAF9lDHMavl2/PTd2rwGE98XJVC95NdTDe026Yif3esELH GRdvUKk+Uqgc+UFDBAhIRgK3rp7PfpFY0yWrP+rU/OYCI17KckVin8MPGJXa+i2G3oof YwH2dQ35WvcMKe8DiY1Zc5BH2axHtoXZ+EZw3eQFIblWMwZcWtiCFcVdg/AYV4CjuZsq yIbejLme3dOKD467pB9SBVkMhuNjXgbqSoT+wrkgn2q9gyuPSRQxQwl46Of0QzC/XtmL vAGYPoBoXtjdjuaF0391+wDtY3K0ddqc8GeWy1U7mEF7FG5JdAocigUxr81Qkiu9zlPJ 54IA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=ach62O71zJmjQaxBWOhbiCSRi6YbNUmXSSxM/3dIo+Q=; b=hYoOjuMXfUc5gIpY8K1Rr6NYOx0Xg0/vuMoyeBTiJ/fDKvcjyXFOcDhnZI6KKFtK7Y j0dZQlh9F0F4ONwcpy9XQqU6uMYNmc2FGQb/siW7XeWwEoLlNdv7hwDOXZqFTds/a89J fLHLBljgzcZgs4n1ilteCCXuPqHjHZ649BYfCaPniRHNcPBRko7o4S5vS/B33SXdYUWH UFdAM6PIKb03FyHWqdvlwYkCVTo4XDPqzAMrN7mcaQPqEA8yYikx8CLj84HoXoslng3/ KUQ554NBvBDtFo/cxvH/tqnHrq2KcCw8SrlPql8m6rTk+Po/ln+vUvbCfVCg5Pgk46VH QJfQ== X-Gm-Message-State: AOAM532EFRhXX9fOFJd1+6LZ8q0E7sdLaHi6ZhNNQ/9wCatdk2WQDJaC f96L/r6eBhZRgnZcb+89JRQg0Q== X-Google-Smtp-Source: ABdhPJzcVZooDuy5On6VBkwk0aHsGMJSy5WZ/VTSMoqE24RA8u3VTU1mcjFwPXOrAnTGoc8ZFG5Uew== X-Received: by 2002:a05:620a:e10:: with SMTP id y16mr8123305qkm.375.1616138149176; Fri, 19 Mar 2021 00:15:49 -0700 (PDT) Received: from localhost (70.44.39.90.res-cmts.bus.ptd.net. [70.44.39.90]) by smtp.gmail.com with ESMTPSA id r17sm3101975qtn.25.2021.03.19.00.15.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Mar 2021 00:15:48 -0700 (PDT) From: Johannes Weiner To: Andrew Morton Cc: Matthew Wilcox , Michal Hocko , Hugh Dickins , Zhou Guanghui , Zi Yan , Shakeel Butt , Roman Gushchin , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH] mm: page_alloc: fix memcg accounting leak in speculative cache lookup Date: Fri, 19 Mar 2021 03:15:47 -0400 Message-Id: <20210319071547.60973-1-hannes@cmpxchg.org> X-Mailer: git-send-email 2.30.1 MIME-Version: 1.0 X-Stat-Signature: wbextsnukffwff7r3n1bz395m88eonar X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 076ED90009EA Received-SPF: none (cmpxchg.org>: No applicable sender policy available) receiver=imf19; identity=mailfrom; envelope-from=""; helo=mail-qk1-f178.google.com; client-ip=209.85.222.178 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616138149-873268 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When the freeing of a higher-order page block (non-compound) races with a speculative page cache lookup, __free_pages() needs to leave the first order-0 page in the chunk to the lookup but free the buddy pages that the lookup doesn't know about separately. However, if such a higher-order page is charged to a memcg (e.g. !vmap kernel stack)), only the first page of the block has page->memcg set. That means we'll uncharge only one order-0 page from the entire block, and leak the remainder. Add a split_page_memcg() to __free_pages() right before it starts taking the higher-order page apart and freeing its individual constituent pages. This ensures all of them will have the memcg linkage set up for correct uncharging. Also update the comments a bit to clarify what exactly is happening to the page during that race. This bug is old and has its roots in the speculative page cache patch and adding cgroup accounting of kernel pages. There are no known user reports. A backport to stable is therefor not warranted. Reported-by: Matthew Wilcox Signed-off-by: Johannes Weiner --- mm/page_alloc.c | 33 +++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c53fe4fa10bf..f4bd56656402 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5112,10 +5112,9 @@ static inline void free_the_page(struct page *page= , unsigned int order) * the allocation, so it is easy to leak memory. Freeing more memory * than was allocated will probably emit a warning. * - * If the last reference to this page is speculative, it will be release= d - * by put_page() which only frees the first page of a non-compound - * allocation. To prevent the remaining pages from being leaked, we fre= e - * the subsequent pages here. If you want to use the page's reference + * This function isn't a put_page(). Don't let the put_page_testzero() + * fool you, it's only to deal with speculative cache references. It + * WILL free pages directly. If you want to use the page's reference * count to decide when to free the allocation, you should allocate a * compound page, and use put_page() instead of __free_pages(). * @@ -5124,11 +5123,33 @@ static inline void free_the_page(struct page *pag= e, unsigned int order) */ void __free_pages(struct page *page, unsigned int order) { - if (put_page_testzero(page)) + /* + * Drop the base reference from __alloc_pages and free. In + * case there is an outstanding speculative reference, from + * e.g. the page cache, it will put and free the page later. + */ + if (likely(put_page_testzero(page))) { free_the_page(page, order); - else if (!PageHead(page)) + return; + } + + /* + * The speculative reference will put and free the page. + * + * However, if the speculation was into a higher-order page + * chunk that isn't marked compound, the other side will know + * nothing about our buddy pages and only free the order-0 + * page at the start of our chunk! We must split off and free + * the buddy pages here. + * + * The buddy pages aren't individually refcounted, so they + * can't have any pending speculative references themselves. + */ + if (!PageHead(page) && order > 0) { + split_page_memcg(page, 1 << order); while (order-- > 0) free_the_page(page + (1 << order), order); + } } EXPORT_SYMBOL(__free_pages); =20 --=20 2.30.1