From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 470D26453 for ; Tue, 22 Mar 2022 21:45:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 101B4C340EE; Tue, 22 Mar 2022 21:45:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985548; bh=nuD6SuBfmNlTRsYolyl+S0xAQThzwrc+qWwimrO1VD8=; h=Date:To:From:In-Reply-To:Subject:From; b=bAcNaAD2Xq1jBAA3MzY1BWeJLg9DDwuyaUdjcWXRiDIUkifXWHzS+h5qAm6J7rAwS taQusEnhZR9NsqkOUBdeEZ9YKGCXSNL8VlOXPZKyCgOTugrnSqAYT/XnSrD0W9QE5R 9roQtvhpItOoFLa07bzoUz+PVUC0tgPOdZkF281w= Date: Tue, 22 Mar 2022 14:45:47 -0700 To: willy@infradead.org,tglx@linutronix.de,paulmck@kernel.org,nsaenzju@redhat.com,minchan@kernel.org,mgorman@techsingularity.net,juri.lelli@redhat.com,bigeasy@linutronix.de,mtosatti@redhat.com,akpm@linux-foundation.org,patches@lists.linux.dev,linux-mm@kvack.org,mm-commits@vger.kernel.org,torvalds@linux-foundation.org,akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 143/227] mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu Message-Id: <20220322214548.101B4C340EE@smtp.kernel.org> Precedence: bulk X-Mailing-List: patches@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: From: Marcelo Tosatti Subject: mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu On systems that run FIFO:1 applications that busy loop, any SCHED_OTHER task that attempts to execute on such a CPU (such as work threads) will not be scheduled, which leads to system hangs. Commit d479960e44f27e0e5 ("mm: disable LRU pagevec during the migration temporarily") relies on queueing work items on all online CPUs to ensure visibility of lru_disable_count. To fix this, replace the usage of work items with synchronize_rcu, which provides the same guarantees. Readers of lru_disable_count are protected by either disabling preemption or rcu_read_lock: preempt_disable, local_irq_disable [bh_lru_lock()] rcu_read_lock [rt_spin_lock CONFIG_PREEMPT_RT] preempt_disable [local_lock !CONFIG_PREEMPT_RT] Since v5.1 kernel, synchronize_rcu() is guaranteed to wait on preempt_disable() regions of code. So any CPU which sees lru_disable_count = 0 will have exited the critical section when synchronize_rcu() returns. Link: https://lkml.kernel.org/r/Yin7hDxdt0s/x+fp@fuller.cnet Signed-off-by: Marcelo Tosatti Reviewed-by: Nicolas Saenz Julienne Acked-by: Minchan Kim Cc: Matthew Wilcox Cc: Mel Gorman Cc: Juri Lelli Cc: Thomas Gleixner Cc: Sebastian Andrzej Siewior Cc: Paul E. McKenney Signed-off-by: Andrew Morton --- mm/swap.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) --- a/mm/swap.c~mm-lru_cache_disable-replace-work-queue-synchronization-with-synchronize_rcu +++ a/mm/swap.c @@ -831,8 +831,7 @@ inline void __lru_add_drain_all(bool for for_each_online_cpu(cpu) { struct work_struct *work = &per_cpu(lru_add_drain_work, cpu); - if (force_all_cpus || - pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) || + if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) || data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) || pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) || pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) || @@ -876,15 +875,21 @@ atomic_t lru_disable_count = ATOMIC_INIT void lru_cache_disable(void) { atomic_inc(&lru_disable_count); -#ifdef CONFIG_SMP /* - * lru_add_drain_all in the force mode will schedule draining on - * all online CPUs so any calls of lru_cache_disabled wrapped by - * local_lock or preemption disabled would be ordered by that. - * The atomic operation doesn't need to have stronger ordering - * requirements because that is enforced by the scheduling - * guarantees. + * Readers of lru_disable_count are protected by either disabling + * preemption or rcu_read_lock: + * + * preempt_disable, local_irq_disable [bh_lru_lock()] + * rcu_read_lock [rt_spin_lock CONFIG_PREEMPT_RT] + * preempt_disable [local_lock !CONFIG_PREEMPT_RT] + * + * Since v5.1 kernel, synchronize_rcu() is guaranteed to wait on + * preempt_disable() regions of code. So any CPU which sees + * lru_disable_count = 0 will have exited the critical + * section when synchronize_rcu() returns. */ + synchronize_rcu(); +#ifdef CONFIG_SMP __lru_add_drain_all(true); #else lru_add_and_bh_lrus_drain(); _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DAD1C433FE for ; Tue, 22 Mar 2022 21:46:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236518AbiCVVr0 (ORCPT ); Tue, 22 Mar 2022 17:47:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236532AbiCVVrQ (ORCPT ); Tue, 22 Mar 2022 17:47:16 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D80F5F8F4 for ; Tue, 22 Mar 2022 14:45:49 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id AD9A2615B1 for ; Tue, 22 Mar 2022 21:45:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 101B4C340EE; Tue, 22 Mar 2022 21:45:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1647985548; bh=nuD6SuBfmNlTRsYolyl+S0xAQThzwrc+qWwimrO1VD8=; h=Date:To:From:In-Reply-To:Subject:From; b=bAcNaAD2Xq1jBAA3MzY1BWeJLg9DDwuyaUdjcWXRiDIUkifXWHzS+h5qAm6J7rAwS taQusEnhZR9NsqkOUBdeEZ9YKGCXSNL8VlOXPZKyCgOTugrnSqAYT/XnSrD0W9QE5R 9roQtvhpItOoFLa07bzoUz+PVUC0tgPOdZkF281w= Date: Tue, 22 Mar 2022 14:45:47 -0700 To: willy@infradead.org, tglx@linutronix.de, paulmck@kernel.org, nsaenzju@redhat.com, minchan@kernel.org, mgorman@techsingularity.net, juri.lelli@redhat.com, bigeasy@linutronix.de, mtosatti@redhat.com, akpm@linux-foundation.org, patches@lists.linux.dev, linux-mm@kvack.org, mm-commits@vger.kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org From: Andrew Morton In-Reply-To: <20220322143803.04a5e59a07e48284f196a2f9@linux-foundation.org> Subject: [patch 143/227] mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu Message-Id: <20220322214548.101B4C340EE@smtp.kernel.org> Precedence: bulk Reply-To: linux-kernel@vger.kernel.org List-ID: X-Mailing-List: mm-commits@vger.kernel.org From: Marcelo Tosatti Subject: mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu On systems that run FIFO:1 applications that busy loop, any SCHED_OTHER task that attempts to execute on such a CPU (such as work threads) will not be scheduled, which leads to system hangs. Commit d479960e44f27e0e5 ("mm: disable LRU pagevec during the migration temporarily") relies on queueing work items on all online CPUs to ensure visibility of lru_disable_count. To fix this, replace the usage of work items with synchronize_rcu, which provides the same guarantees. Readers of lru_disable_count are protected by either disabling preemption or rcu_read_lock: preempt_disable, local_irq_disable [bh_lru_lock()] rcu_read_lock [rt_spin_lock CONFIG_PREEMPT_RT] preempt_disable [local_lock !CONFIG_PREEMPT_RT] Since v5.1 kernel, synchronize_rcu() is guaranteed to wait on preempt_disable() regions of code. So any CPU which sees lru_disable_count = 0 will have exited the critical section when synchronize_rcu() returns. Link: https://lkml.kernel.org/r/Yin7hDxdt0s/x+fp@fuller.cnet Signed-off-by: Marcelo Tosatti Reviewed-by: Nicolas Saenz Julienne Acked-by: Minchan Kim Cc: Matthew Wilcox Cc: Mel Gorman Cc: Juri Lelli Cc: Thomas Gleixner Cc: Sebastian Andrzej Siewior Cc: Paul E. McKenney Signed-off-by: Andrew Morton --- mm/swap.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) --- a/mm/swap.c~mm-lru_cache_disable-replace-work-queue-synchronization-with-synchronize_rcu +++ a/mm/swap.c @@ -831,8 +831,7 @@ inline void __lru_add_drain_all(bool for for_each_online_cpu(cpu) { struct work_struct *work = &per_cpu(lru_add_drain_work, cpu); - if (force_all_cpus || - pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) || + if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) || data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) || pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) || pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) || @@ -876,15 +875,21 @@ atomic_t lru_disable_count = ATOMIC_INIT void lru_cache_disable(void) { atomic_inc(&lru_disable_count); -#ifdef CONFIG_SMP /* - * lru_add_drain_all in the force mode will schedule draining on - * all online CPUs so any calls of lru_cache_disabled wrapped by - * local_lock or preemption disabled would be ordered by that. - * The atomic operation doesn't need to have stronger ordering - * requirements because that is enforced by the scheduling - * guarantees. + * Readers of lru_disable_count are protected by either disabling + * preemption or rcu_read_lock: + * + * preempt_disable, local_irq_disable [bh_lru_lock()] + * rcu_read_lock [rt_spin_lock CONFIG_PREEMPT_RT] + * preempt_disable [local_lock !CONFIG_PREEMPT_RT] + * + * Since v5.1 kernel, synchronize_rcu() is guaranteed to wait on + * preempt_disable() regions of code. So any CPU which sees + * lru_disable_count = 0 will have exited the critical + * section when synchronize_rcu() returns. */ + synchronize_rcu(); +#ifdef CONFIG_SMP __lru_add_drain_all(true); #else lru_add_and_bh_lrus_drain(); _