From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A464C55182 for ; Mon, 20 Apr 2020 22:11:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B6BA52078C for ; Mon, 20 Apr 2020 22:11:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="DkH0cJIW" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B6BA52078C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 628328E0007; Mon, 20 Apr 2020 18:11:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D8CF8E0003; Mon, 20 Apr 2020 18:11:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C75C8E0007; Mon, 20 Apr 2020 18:11:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0235.hostedemail.com [216.40.44.235]) by kanga.kvack.org (Postfix) with ESMTP id 2E4188E0003 for ; Mon, 20 Apr 2020 18:11:45 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E7659180AD83E for ; Mon, 20 Apr 2020 22:11:44 +0000 (UTC) X-FDA: 76729631328.06.nose02_667e6a9198c55 X-HE-Tag: nose02_667e6a9198c55 X-Filterd-Recvd-Size: 6291 Received: from mail-qt1-f193.google.com (mail-qt1-f193.google.com [209.85.160.193]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Mon, 20 Apr 2020 22:11:44 +0000 (UTC) Received: by mail-qt1-f193.google.com with SMTP id z90so9993334qtd.10 for ; Mon, 20 Apr 2020 15:11:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=T7/z7NMewJSIi2itFoYhlm0aJz+TzGla37jQr9d/NIQ=; b=DkH0cJIWSxLVdQu3FuqQ6xuCWHM39JsPRR6UFozRIF79nnzQlMdLi7EsaGotcqDA0e mNCEoz7y0HOoBLz8deGU0CU2oVJztygMslxineP4v3GTFiAHGm4eQpca6dq0xW06C/MO CoLBEidF1VHQMiolPK5tH8rqy/W4Ht/Xog9SjtOCtiUV2iJOwa5jS1H+H5ZbxvEqqoSs /i7hQwgNO2XWIRMa1RxwCGoLHG+Lo3vjNXq7LMVY+yWi8szuGGn4UDdrP9UJoEmYZxO3 oAfPtfRd37MeP5qQNnUbjZPMqHQLwi8MrXy+3jsseVCy4fJ0jGfuLZghtHfEHs57jIfS Mz4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=T7/z7NMewJSIi2itFoYhlm0aJz+TzGla37jQr9d/NIQ=; b=QmNs43t42zbK7zbadJNA81KAF+zv7/P70R4DEVQoKVVhenMppqGhYIp1Yq2RFsiTbg vDWvUKi7VUx2oHMZztFtBuznhMOdOXjpNnv1kex5eipo01d6Ayr1wQqq7KB/h0p77Q49 w0itPmMhaF5tDQUUdmFBHldqYTfYsWcLMJ8D2nLF1vOljqc3K+s1mKHhZ5ZQ+Zuc2SYV VIYB6olYWqB4QB/g6wU2Qkpydl0kpmsLUHlbvjWP0N58DHYJGgdxUdmOrRmbXqZMRxPf 3ucFG+SM+i+aePDbGojdwIJe4zqRxL1P22JieT46KWLrAyaE9c/tF6cVvRZhLv8pgMde GpgA== X-Gm-Message-State: AGi0PuZ0ZNZSVfnPzbHpA8ZeaiEl4vtCDHNIRQ7D0PFIlnnduxPO+ceV TnjzB34MU8tSdHpkgh0eUjHtGiBXyi4= X-Google-Smtp-Source: APiQypIX1NRNeD/ZDLr7me0AtRIIPQ0sv2K/iH3epenr+0kAb6qHkWrCRyhhYf5EM6qgF7KVBtQkXg== X-Received: by 2002:ac8:7c96:: with SMTP id y22mr18152524qtv.17.1587420702984; Mon, 20 Apr 2020 15:11:42 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:e6b6]) by smtp.gmail.com with ESMTPSA id m25sm606580qkg.83.2020.04.20.15.11.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Apr 2020 15:11:42 -0700 (PDT) From: Johannes Weiner To: Joonsoo Kim , Alex Shi Cc: Shakeel Butt , Hugh Dickins , Michal Hocko , "Kirill A. Shutemov" , Roman Gushchin , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: [PATCH 02/18] mm: memcontrol: fix theoretical race in charge moving Date: Mon, 20 Apr 2020 18:11:10 -0400 Message-Id: <20200420221126.341272-3-hannes@cmpxchg.org> X-Mailer: git-send-email 2.26.0 In-Reply-To: <20200420221126.341272-1-hannes@cmpxchg.org> References: <20200420221126.341272-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The move_lock is a per-memcg lock, but the VM accounting code that needs to acquire it comes from the page and follows page->mem_cgroup under RCU protection. That means that the page becomes unlocked not when we drop the move_lock, but when we update page->mem_cgroup. And that assignment doesn't imply any memory ordering. If that pointer write gets reordered against the reads of the page state - page_mapped, PageDirty etc. the state may change while we rely on it being stable and we can end up corrupting the counters. Place an SMP memory barrier to make sure we're done with all page state by the time the new page->mem_cgroup becomes visible. Also replace the open-coded move_lock with a lock_page_memcg() to make it more obvious what we're serializing against. Signed-off-by: Johannes Weiner --- mm/memcontrol.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5beea03dd58a..41f5ed79272e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5372,7 +5372,6 @@ static int mem_cgroup_move_account(struct page *pag= e, { struct lruvec *from_vec, *to_vec; struct pglist_data *pgdat; - unsigned long flags; unsigned int nr_pages =3D compound ? hpage_nr_pages(page) : 1; int ret; bool anon; @@ -5399,18 +5398,13 @@ static int mem_cgroup_move_account(struct page *p= age, from_vec =3D mem_cgroup_lruvec(from, pgdat); to_vec =3D mem_cgroup_lruvec(to, pgdat); =20 - spin_lock_irqsave(&from->move_lock, flags); + lock_page_memcg(page); =20 if (!anon && page_mapped(page)) { __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages); __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages); } =20 - /* - * move_lock grabbed above and caller set from->moving_account, so - * mod_memcg_page_state will serialize updates to PageDirty. - * So mapping should be stable for dirty pages. - */ if (!anon && PageDirty(page)) { struct address_space *mapping =3D page_mapping(page); =20 @@ -5426,15 +5420,23 @@ static int mem_cgroup_move_account(struct page *p= age, } =20 /* + * All state has been migrated, let's switch to the new memcg. + * * It is safe to change page->mem_cgroup here because the page - * is referenced, charged, and isolated - we can't race with - * uncharging, charging, migration, or LRU putback. + * is referenced, charged, isolated, and locked: we can't race + * with (un)charging, migration, LRU putback, or anything else + * that would rely on a stable page->mem_cgroup. + * + * Note that lock_page_memcg is a memcg lock, not a page lock, + * to save space. As soon as we switch page->mem_cgroup to a + * new memcg that isn't locked, the above state can change + * concurrently again. Make sure we're truly done with it. */ + smp_mb(); =20 - /* caller should have done css_get */ - page->mem_cgroup =3D to; + page->mem_cgroup =3D to; /* caller should have done css_get */ =20 - spin_unlock_irqrestore(&from->move_lock, flags); + __unlock_page_memcg(from); =20 ret =3D 0; =20 --=20 2.26.0