From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 199E1C433F5 for ; Tue, 15 Feb 2022 18:01:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 83D666B0078; Tue, 15 Feb 2022 13:01:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 79E016B007B; Tue, 15 Feb 2022 13:01:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63E076B007D; Tue, 15 Feb 2022 13:01:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 548DD6B0078 for ; Tue, 15 Feb 2022 13:01:07 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 160CF1641 for ; Tue, 15 Feb 2022 18:01:07 +0000 (UTC) X-FDA: 79145780574.02.892B318 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf25.hostedemail.com (Postfix) with ESMTP id 0F24DA001C for ; Tue, 15 Feb 2022 18:01:05 +0000 (UTC) Date: Tue, 15 Feb 2022 19:01:02 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1644948063; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=q51LtZ2w/96ITqPXE16sCnoUk2TYcGVB7Ha6CRaE3C4=; b=e/8WehrcariZC3ulS2jIlDHBgj49qR4qabiVUXjKm1eLXzcuNQQd3ZqOWGMIGqadwJk338 vTQ9YMSeFmQWo03dsxB7raczJiFSWfuPLljmkbLnskBFL4wGayG6mCQOVTEbZNSTn4NJXu KclurHNMc6sQi7G7HJ+dM2LcosMpCjNP9w0pByUvXnF02T0wtY8JwI17KHLJNsJTK7fQ5Z GjljbJ3Oz3fjkn/8Zi1CZtQsBcwPTOZcyvNkfdHShZa/qcaZkdBjAbW+rnijcMBxF1zbzm AWaPm/JIXRwMr2sEdD6OWZdAlZ8/GDoBaITGnwdXCnCjJKefW2waFOOxZdzhug== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1644948063; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=q51LtZ2w/96ITqPXE16sCnoUk2TYcGVB7Ha6CRaE3C4=; b=WVM3mcP/CXCZ3b7ewDccoCOtQP6a4TAwiW0zeHLt1cN+Eki3oGaX9STQ/9KMq8dYRDUu7o u5ViBf7OrQneS4DQ== From: Sebastian Andrzej Siewior To: Johannes Weiner Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Michal Hocko , Michal =?utf-8?Q?Koutn=C3=BD?= , Peter Zijlstra , Thomas Gleixner , Vladimir Davydov , Waiman Long Subject: Re: [PATCH v2 3/4] mm/memcg: Protect per-CPU counter by disabling preemption on PREEMPT_RT where needed. Message-ID: References: <20220211223537.2175879-1-bigeasy@linutronix.de> <20220211223537.2175879-4-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b="e/8Wehrc"; dkim=pass header.d=linutronix.de header.s=2020e header.b="WVM3mcP/"; spf=pass (imf25.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de; dmarc=pass (policy=none) header.from=linutronix.de X-Rspamd-Server: rspam07 X-Rspam-User: X-Rspamd-Queue-Id: 0F24DA001C X-Stat-Signature: tm6pxkspkdnfbpaap6ezzwpsogmjgskx X-HE-Tag: 1644948065-467025 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2022-02-14 11:46:00 [-0500], Johannes Weiner wrote: > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index c1caa662946dc..466466f285cea 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -705,6 +705,8 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, > > pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); > > memcg = pn->memcg; > > > > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > > + preempt_disable(); > > /* Update memcg */ > > __this_cpu_add(memcg->vmstats_percpu->state[idx], val); > > > > @@ -712,6 +714,8 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, > > __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val); > > > > memcg_rstat_updated(memcg, val); > > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) > > + preempt_enable(); > > } > > I notice you didn't annoate __mod_memcg_state(). I suppose that is > because it's called with explicit local_irq_disable(), and that > disables preemption on rt? And you only need another preempt_disable() > for stacks that rely on coming from spin_lock_irq(save)? Correct. The code is not used in_hardirq() on PREEMPT_RT so preempt_disable() is sufficient. I didn't bother to replace all local_irq_save() with preempt_disable() since it is probably not worth it. And yes: spin_lock_irq() does not disable interrupts so I need something here to ensure that the RMW operation is not interrupted. > That makes sense, but it's difficult to maintain. It'll easily break > if somebody adds more memory accounting sites that may also rely on an > irq-disabled spinlock somewhere. > > So better to make this an unconditional locking protocol: > > static void memcg_stats_lock(void) > { > #ifdef CONFIG_PREEMPT_RT > preempt_disable(); > #else > VM_BUG_ON(!irqs_disabled()); > #endif > } > > static void memcg_stats_unlock(void) > { > #ifdef CONFIG_PREEMPT_RT > preempt_enable(); > #endif > } > > and always use these around the counter updates. Something like the following perhaps? I didn't add anything to __mod_memcg_state() since it has no users besides the one which does local_irq_save(). diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c1caa662946dc..69130a5fe3d51 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -629,6 +629,28 @@ static DEFINE_SPINLOCK(stats_flush_lock); static DEFINE_PER_CPU(unsigned int, stats_updates); static atomic_t stats_flush_threshold = ATOMIC_INIT(0); +/* + * Accessors to ensure that preemption is disabled on PREEMPT_RT because it can + * not rely on this as part of an acquired spinlock_t lock. These functions are + * never used in hardirq context on PREEMPT_RT and therefore disabling preemtion + * is sufficient. + */ +static void memcg_stats_lock(void) +{ +#ifdef CONFIG_PREEMPT_RT + preempt_disable(); +#else + VM_BUG_ON(!irqs_disabled()); +#endif +} + +static void memcg_stats_unlock(void) +{ +#ifdef CONFIG_PREEMPT_RT + preempt_enable(); +#endif +} + static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) { unsigned int x; @@ -705,6 +727,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); memcg = pn->memcg; + memcg_stats_lock(); /* Update memcg */ __this_cpu_add(memcg->vmstats_percpu->state[idx], val); @@ -712,6 +735,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val); memcg_rstat_updated(memcg, val); + memcg_stats_unlock(); } /** @@ -794,8 +818,10 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, if (mem_cgroup_disabled()) return; + memcg_stats_lock(); __this_cpu_add(memcg->vmstats_percpu->events[idx], count); memcg_rstat_updated(memcg, count); + memcg_stats_unlock(); } static unsigned long memcg_events(struct mem_cgroup *memcg, int event) @@ -7149,8 +7175,9 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) * important here to have the interrupts disabled because it is the * only synchronisation we have for updating the per-CPU variables. */ - VM_BUG_ON(!irqs_disabled()); + memcg_stats_lock(); mem_cgroup_charge_statistics(memcg, -nr_entries); + memcg_stats_unlock(); memcg_check_events(memcg, page_to_nid(page)); css_put(&memcg->css); -- 2.34.1