From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECE13C2D0F8 for ; Tue, 12 May 2020 23:58:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7DB5620740 for ; Tue, 12 May 2020 23:58:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lca.pw header.i=@lca.pw header.b="NCtjCiXd" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7DB5620740 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lca.pw Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D979D9000F9; Tue, 12 May 2020 19:58:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D482C9000F3; Tue, 12 May 2020 19:58:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0FFB9000F9; Tue, 12 May 2020 19:58:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0190.hostedemail.com [216.40.44.190]) by kanga.kvack.org (Postfix) with ESMTP id A72CA9000F3 for ; Tue, 12 May 2020 19:58:18 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 651C3180AD806 for ; Tue, 12 May 2020 23:58:18 +0000 (UTC) X-FDA: 76809733476.29.wall94_82aa4ee79414a X-HE-Tag: wall94_82aa4ee79414a X-Filterd-Recvd-Size: 7094 Received: from mail-qv1-f65.google.com (mail-qv1-f65.google.com [209.85.219.65]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 May 2020 23:58:17 +0000 (UTC) Received: by mail-qv1-f65.google.com with SMTP id z5so5413191qvw.4 for ; Tue, 12 May 2020 16:58:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lca.pw; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=yD3a12+euav5lEzQ2eD5g0W71Fdr+4HWLSF5wAq+Di4=; b=NCtjCiXdpUxrD3bLzlXKjxlXGtB+pIewyDmhQ6NgTrKvEBiiAO3l064DUq0chV8UqU 3wB2OBgRQ9Iht75ymMDaei3StnLtfND5OXchaakbk/r7q/N8ILNzEior35O7K8va4vj+ BMOiB9k3rO9/RspupD8JjBww3kCaa3IQrVCaRw2F0BIkSgdnctCS9PhcekAiHzOzBLkr nEry2gYpIjfwV5M6ajTLYZLme6Jsn+kCR4KMcN0MXVdjQ+LvwvTXQ0r55weM0847xzkz Mheh2jC4iUU31tpYcrbMBSe8SsSJO3QT5fBSmT+HjN6JfSS/rHlRsDS7n8JNyjTZLX8L ZRdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=yD3a12+euav5lEzQ2eD5g0W71Fdr+4HWLSF5wAq+Di4=; b=dDwj538+LQD7UTVyfu4pC+o0qcLmg/f5egNPxMr+cF9q35TjLJn3S8Eg4PnC6sRfiM CRXkKJkomhssfFWfpvw7NePMbP+cD7UfLT2vLj0JsTwpbe76IhhOHFDNvaw49JV/md26 +ZEH1Th3ocfHWtmkBJf12AN8YGaKLd0cCWzYaxGMxmW79N9ZoU6y+/zJXP0GVxlpW9QR 1jEtyw3Mo14xrljTBmB7KHll3BDNHBGrd+KRzBhecxsofQFHhGXmiLVcnFOkFExufG5W jw7YnLflVvfklCGeCsg63d57Rb94Ute63ZeWeHdfoN1ss3skxGxKxOBqHPLwm9ef2M+9 6Z+g== X-Gm-Message-State: AGi0PuaMQeQgp6xZHX9NKtMljbINa88PsRLFniGbfdJZz730uKq7yJrt TMdPlD/JhcbU8SIw0hF636hgHw== X-Google-Smtp-Source: APiQypLQc2IzguaIlxfjV2xEQaPP5JANj3ncXax4B/oEnympCQO03p058pVnDRHNNYvdeA4bh4hL0w== X-Received: by 2002:ad4:42b1:: with SMTP id e17mr23319979qvr.149.1589327896644; Tue, 12 May 2020 16:58:16 -0700 (PDT) Received: from [192.168.1.153] (pool-71-184-117-43.bstnma.fios.verizon.net. [71.184.117.43]) by smtp.gmail.com with ESMTPSA id j11sm11834028qkk.33.2020.05.12.16.58.15 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 May 2020 16:58:15 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.80.23.2.2\)) Subject: Re: [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API From: Qian Cai In-Reply-To: <20200512215813.GA487759@cmpxchg.org> Date: Tue, 12 May 2020 19:58:14 -0400 Cc: Andrew Morton , Alex Shi , Joonsoo Kim , Shakeel Butt , Hugh Dickins , Michal Hocko , "Kirill A. Shutemov" , Roman Gushchin , Linux-MM , cgroups@vger.kernel.org, LKML , kernel-team@fb.com, Johannes Weiner Content-Transfer-Encoding: quoted-printable Message-Id: <25270BAD-CB7E-4E92-96BA-740690744DBA@lca.pw> References: <20200508183105.225460-1-hannes@cmpxchg.org> <20200508183105.225460-13-hannes@cmpxchg.org> <45AA36A9-0C4D-49C2-BA3C-08753BBC30FB@lca.pw> <20200512215813.GA487759@cmpxchg.org> To: Stephen Rothwell X-Mailer: Apple Mail (2.3608.80.23.2.2) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On May 12, 2020, at 5:58 PM, Johannes Weiner = wrote: >=20 > On Tue, May 12, 2020 at 10:38:54AM -0400, Qian Cai wrote: >>> On May 8, 2020, at 2:30 PM, Johannes Weiner = wrote: >>>=20 >>> With the page->mapping requirement gone from memcg, we can charge = anon >>> and file-thp pages in one single step, right after they're = allocated. >>>=20 >>> This removes two out of three API calls - especially the tricky = commit >>> step that needed to happen at just the right time between when the >>> page is "set up" and when it's "published" - somewhat vague and = fluid >>> concepts that varied by page type. All we need is a freshly = allocated >>> page and a memcg context to charge. >>>=20 >>> v2: prevent double charges on pre-allocated hugepages in khugepaged >>>=20 >>> Signed-off-by: Johannes Weiner >>> Reviewed-by: Joonsoo Kim >>> --- >>> include/linux/mm.h | 4 +--- >>> kernel/events/uprobes.c | 11 +++-------- >>> mm/filemap.c | 2 +- >>> mm/huge_memory.c | 9 +++------ >>> mm/khugepaged.c | 35 ++++++++++------------------------- >>> mm/memory.c | 36 ++++++++++-------------------------- >>> mm/migrate.c | 5 +---- >>> mm/swapfile.c | 6 +----- >>> mm/userfaultfd.c | 5 +---- >>> 9 files changed, 31 insertions(+), 82 deletions(-) >> [] >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>>=20 >>> @@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct = mm_struct *mm, >>> out_up_write: >>> up_write(&mm->mmap_sem); >>> out_nolock: >>> + if (*hpage) >>> + mem_cgroup_uncharge(*hpage); >>> trace_mm_collapse_huge_page(mm, isolated, result); >>> return; >>> out: >>> - mem_cgroup_cancel_charge(new_page, memcg); >>> goto out_up_write; >>> } >> [] >>=20 >> Some memory pressure will crash this new code. It looks like somewhat = racy. >>=20 >> if (!page->mem_cgroup) >>=20 >> where page =3D=3D NULL in mem_cgroup_uncharge(). >=20 > Thanks for the report, sorry about the inconvenience. >=20 > Hm, the page is exclusive at this point, nobody else should be > touching it. After all, khugepaged might reuse the preallocated page > for another pmd if this one fails to collapse. >=20 > Looking at the code, I think it's page itself that's garbage, not > page->mem_cgroup changing. If you have CONFIG_NUMA and the allocation > fails, *hpage could contain an ERR_PTR instead of being NULL. >=20 > I think we need the following fixlet: Yes, I have NUMA here. Stephen, can you pick this up for first before Andrew has a chance to = push out the next mmotm hopefully contain this fix? https://lore.kernel.org/lkml/20200512215813.GA487759@cmpxchg.org/ >=20 > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index f2e0a5e5cfbb..f6161e17da26 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1193,7 +1193,7 @@ static void collapse_huge_page(struct mm_struct = *mm, > out_up_write: > up_write(&mm->mmap_sem); > out_nolock: > - if (*hpage) > + if (!IS_ERR_OR_NULL(*hpage)) > mem_cgroup_uncharge(*hpage); > trace_mm_collapse_huge_page(mm, isolated, result); > return; > @@ -1928,7 +1928,7 @@ static void collapse_file(struct mm_struct *mm, > unlock_page(new_page); > out: > VM_BUG_ON(!list_empty(&pagelist)); > - if (*hpage) > + if (!IS_ERR_OR_NULL(*hpage)) > mem_cgroup_uncharge(*hpage); > /* TODO: tracepoints */ > }