From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39296C433EF for ; Wed, 16 Feb 2022 11:52:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7A9E46B007B; Wed, 16 Feb 2022 06:52:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 732416B007D; Wed, 16 Feb 2022 06:52:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 585596B007E; Wed, 16 Feb 2022 06:52:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0166.hostedemail.com [216.40.44.166]) by kanga.kvack.org (Postfix) with ESMTP id 37F236B007B for ; Wed, 16 Feb 2022 06:52:21 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E6A29181AC9C6 for ; Wed, 16 Feb 2022 11:52:20 +0000 (UTC) X-FDA: 79148480040.19.6635D5E Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf30.hostedemail.com (Postfix) with ESMTP id 7A77280006 for ; Wed, 16 Feb 2022 11:52:20 +0000 (UTC) Received: by mail-pl1-f177.google.com with SMTP id l9so1863616plg.0 for ; Wed, 16 Feb 2022 03:52:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0eIeZlraOMhnIGvYQ1DKNms/6a6+xjMQP4VMdJ+FeVI=; b=Lk52PPLP++WXnr65wbY3KT7YkWgxPQVLSd+9GPVgEUM55MjwLA5hip7PQ1lVeWbZkH V72nfGAE/Wn/twE+80TN2RwLE5FgxayjeJJz6zjet2Tt3y2fr9ZN2gBVMHGairFM2iLO qQyUi8F1iLmInhD1o5AVj1CVlhANg4NC9YISvqJZJgXesR19d5EJ8MqN5T/xxSCxW5XS 2IJxAK2IRC6JzIBmDT6DO+4hEVyNjXMLBAZ5pH6LfTMYuGwW4yCsuVlKJFjcO5voNMT9 bgXshIMcY1jUBGpk4BN9pQAAdaWlx2mRSXr19SiOM+T25cksVXK+yORQ2QxBprL6Uhes s5bQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0eIeZlraOMhnIGvYQ1DKNms/6a6+xjMQP4VMdJ+FeVI=; b=rFjEF3H3d4Xqv9KQUjMtlwuZHk+vKfVmEcTapaydrdWc8s+jOYNpZsPrlxTAWhiXNn AZO0EMvx/K2oh6tqIojOMoLGQS28VHYGlxrBU86jCUDTMFzMW+9soYg1vN5G7uxTnGxl ilidJ6FL8256WsofkoVa5iUP3854uKwptHAbCRyDpcYvSc3inKEixCSl9gmvQGytMZ79 mypIhFu5wIb1UibjIX2OPklsNGMB4ndfXP1MI8N1Y0QYQrzuVkilSSEhQzYd3Mzb27wl Xd4tqHB1BlXoKtPM6f/aP2Pn3WhHAuMkiG1SErZmwhrjnz/NVF8b92dAFwYoVm1LL9wZ lyLQ== X-Gm-Message-State: AOAM5305Rm/zVUPKDUfJwfCCJPZUHpY6b3y+eyiTrRVUkWtVEd8DODtm CkgC/2ENTrYoVjZDHf3sA139eA== X-Google-Smtp-Source: ABdhPJzMjXjv4oYiNhaX8VcZnb3AyZNG/qYdAV5g4SBrctN8kM+cmk8GQgmCjKay+PMCJWzdU69Wcw== X-Received: by 2002:a17:902:b692:b0:14c:935b:2b03 with SMTP id c18-20020a170902b69200b0014c935b2b03mr2164330pls.81.1645012339585; Wed, 16 Feb 2022 03:52:19 -0800 (PST) Received: from FVFYT0MHHV2J.tiktokcdn.com ([139.177.225.249]) by smtp.gmail.com with ESMTPSA id m16sm14790221pfc.156.2022.02.16.03.52.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Feb 2022 03:52:19 -0800 (PST) From: Muchun Song To: guro@fb.com, hannes@cmpxchg.org, mhocko@kernel.org, akpm@linux-foundation.org, shakeelb@google.com, vdavydov.dev@gmail.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, duanxiongchun@bytedance.com, fam.zheng@bytedance.com, bsingharora@gmail.com, shy828301@gmail.com, alexs@kernel.org, smuchun@gmail.com, zhengqi.arch@bytedance.com, Muchun Song Subject: [PATCH v3 03/12] mm: memcontrol: make lruvec lock safe when LRU pages are reparented Date: Wed, 16 Feb 2022 19:51:23 +0800 Message-Id: <20220216115132.52602-4-songmuchun@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) In-Reply-To: <20220216115132.52602-1-songmuchun@bytedance.com> References: <20220216115132.52602-1-songmuchun@bytedance.com> MIME-Version: 1.0 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Lk52PPLP; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf30.hostedemail.com: domain of songmuchun@bytedance.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=songmuchun@bytedance.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 7A77280006 X-Stat-Signature: 19z6tazxm54qtz148ioqrbj6wdg3duf9 X-HE-Tag: 1645012340-119215 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The diagram below shows how to make the folio lruvec lock safe when LRU pages are reparented. folio_lruvec_lock(folio) retry: lruvec =3D folio_lruvec(folio); // The folio is reparented at this time. spin_lock(&lruvec->lru_lock); if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) // Acquired the wrong lruvec lock and need to retry. // Because this folio is on the parent memcg lruvec list. goto retry; // If we reach here, it means that folio_memcg(folio) is stable. memcg_reparent_objcgs(memcg) // lruvec belongs to memcg and lruvec_parent belongs to parent memcg. spin_lock(&lruvec->lru_lock); spin_lock(&lruvec_parent->lru_lock); // Move all the pages from the lruvec list to the parent lruvec list. spin_unlock(&lruvec_parent->lru_lock); spin_unlock(&lruvec->lru_lock); After we acquire the lruvec lock, we need to check whether the folio is reparented. If so, we need to reacquire the new lruvec lock. On the routine of the LRU pages reparenting, we will also acquire the lruvec lock (will be implemented in the later patch). So folio_memcg() cannot be changed when we hold the lruvec lock. Since lruvec_memcg(lruvec) is always equal to folio_memcg(folio) after we hold the lruvec lock, lruvec_memcg_debug() check is pointless. So remove it. This is a preparation for reparenting the LRU pages. Signed-off-by: Muchun Song --- include/linux/memcontrol.h | 18 +++---------- mm/compaction.c | 10 +++++++- mm/memcontrol.c | 63 +++++++++++++++++++++++++++++-----------= ------ mm/swap.c | 4 +++ 4 files changed, 56 insertions(+), 39 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 81a2720653d0..961e9f9b6567 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -737,7 +737,9 @@ static inline struct lruvec *mem_cgroup_lruvec(struct= mem_cgroup *memcg, * folio_lruvec - return lruvec for isolating/putting an LRU folio * @folio: Pointer to the folio. * - * This function relies on folio->mem_cgroup being stable. + * The lruvec can be changed to its parent lruvec when the page reparent= ed. + * The caller need to recheck if it cares about this changes (just like + * folio_lruvec_lock() does). */ static inline struct lruvec *folio_lruvec(struct folio *folio) { @@ -756,15 +758,6 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *f= olio); struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags); =20 -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio); -#else -static inline -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ -} -#endif - static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; @@ -1227,11 +1220,6 @@ static inline struct lruvec *folio_lruvec(struct f= olio *folio) return &pgdat->__lruvec; } =20 -static inline -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ -} - static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *me= mcg) { return NULL; diff --git a/mm/compaction.c b/mm/compaction.c index 58d0e91cde49..eebe55e596fd 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -515,6 +515,8 @@ compact_folio_lruvec_lock_irqsave(struct folio *folio= , unsigned long *flags, { struct lruvec *lruvec; =20 + rcu_read_lock(); +retry: lruvec =3D folio_lruvec(folio); =20 /* Track if the lock is contended in async mode */ @@ -527,7 +529,13 @@ compact_folio_lruvec_lock_irqsave(struct folio *foli= o, unsigned long *flags, =20 spin_lock_irqsave(&lruvec->lru_lock, *flags); out: - lruvec_memcg_debug(lruvec, folio); + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } + + /* See the comments in folio_lruvec_lock(). */ + rcu_read_unlock(); =20 return lruvec; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 6501f5b6df4b..7c7672631456 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1178,23 +1178,6 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg= , return ret; } =20 -#ifdef CONFIG_DEBUG_VM -void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio) -{ - struct mem_cgroup *memcg; - - if (mem_cgroup_disabled()) - return; - - memcg =3D folio_memcg(folio); - - if (!memcg) - VM_BUG_ON_FOLIO(lruvec_memcg(lruvec) !=3D root_mem_cgroup, folio); - else - VM_BUG_ON_FOLIO(lruvec_memcg(lruvec) !=3D memcg, folio); -} -#endif - /** * folio_lruvec_lock - Lock the lruvec for a folio. * @folio: Pointer to the folio. @@ -1209,10 +1192,24 @@ void lruvec_memcg_debug(struct lruvec *lruvec, st= ruct folio *folio) */ struct lruvec *folio_lruvec_lock(struct folio *folio) { - struct lruvec *lruvec =3D folio_lruvec(folio); + struct lruvec *lruvec; + + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); =20 spin_lock(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, folio); + + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock(&lruvec->lru_lock); + goto retry; + } + + /* + * Preemption is disabled in the internal of spin_lock, which can serve + * as RCU read-side critical sections. + */ + rcu_read_unlock(); =20 return lruvec; } @@ -1232,10 +1229,20 @@ struct lruvec *folio_lruvec_lock(struct folio *fo= lio) */ struct lruvec *folio_lruvec_lock_irq(struct folio *folio) { - struct lruvec *lruvec =3D folio_lruvec(folio); + struct lruvec *lruvec; =20 + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); spin_lock_irq(&lruvec->lru_lock); - lruvec_memcg_debug(lruvec, folio); + + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock_irq(&lruvec->lru_lock); + goto retry; + } + + /* See the comments in folio_lruvec_lock(). */ + rcu_read_unlock(); =20 return lruvec; } @@ -1257,10 +1264,20 @@ struct lruvec *folio_lruvec_lock_irq(struct folio= *folio) struct lruvec *folio_lruvec_lock_irqsave(struct folio *folio, unsigned long *flags) { - struct lruvec *lruvec =3D folio_lruvec(folio); + struct lruvec *lruvec; =20 + rcu_read_lock(); +retry: + lruvec =3D folio_lruvec(folio); spin_lock_irqsave(&lruvec->lru_lock, *flags); - lruvec_memcg_debug(lruvec, folio); + + if (unlikely(lruvec_memcg(lruvec) !=3D folio_memcg(folio))) { + spin_unlock_irqrestore(&lruvec->lru_lock, *flags); + goto retry; + } + + /* See the comments in folio_lruvec_lock(). */ + rcu_read_unlock(); =20 return lruvec; } diff --git a/mm/swap.c b/mm/swap.c index bcf3ac288b56..9c2bcc2651c6 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -305,6 +305,10 @@ void lru_note_cost(struct lruvec *lruvec, bool file,= unsigned int nr_pages) =20 void lru_note_cost_folio(struct folio *folio) { + /* + * The rcu read lock is held by the caller, so we do not need to + * care about the lruvec returned by folio_lruvec() being released. + */ lru_note_cost(folio_lruvec(folio), folio_is_file_lru(folio), folio_nr_pages(folio)); } --=20 2.11.0