From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F2ECC47247 for ; Thu, 7 May 2020 15:49:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E880D20838 for ; Thu, 7 May 2020 15:49:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="WW5lrwo7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E880D20838 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 76CE7900003; Thu, 7 May 2020 11:49:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F64A900002; Thu, 7 May 2020 11:49:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5BDD4900003; Thu, 7 May 2020 11:49:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0142.hostedemail.com [216.40.44.142]) by kanga.kvack.org (Postfix) with ESMTP id 401B0900002 for ; Thu, 7 May 2020 11:49:12 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0550B1309A for ; Thu, 7 May 2020 15:49:12 +0000 (UTC) X-FDA: 76790356944.27.pear63_503ee127c412b X-HE-Tag: pear63_503ee127c412b X-Filterd-Recvd-Size: 7241 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Thu, 7 May 2020 15:49:11 +0000 (UTC) Received: from paulmck-ThinkPad-P72.home (50-39-105-78.bvtn.or.frontiernet.net [50.39.105.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6088420659; Thu, 7 May 2020 15:49:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1588866550; bh=MJDm3l6QbZdH1hJdtfog8c1MYaTiz8HNDScul7E/cA4=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=WW5lrwo75vJsViDkpPtbP7OH4n/PckZizuYNo/yVJgvOun0hEFarqCp0p9B8U/8Xi 1SAPDBZN7DJSivGt1QTO1puJcbyZyEr3LR3RrHOOx5QLrkMeF7EAR+j2q/2ZHoFVpK HMIFEcCszdSaI/27MUkAebytH9oMmCxuCEQ5R96g= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id 43B6C35231A1; Thu, 7 May 2020 08:49:10 -0700 (PDT) Date: Thu, 7 May 2020 08:49:10 -0700 From: "Paul E. McKenney" To: Hillf Danton Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, akpm@linux-foundation.org, linux-mm Subject: Re: [PATCH RFC tip/core/rcu] Add shrinker to shift to fast/inefficient GP mode Message-ID: <20200507154910.GO2869@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <20200507004240.GA9156@paulmck-ThinkPad-P72> <20200507093647.11932-1-hdanton@sina.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200507093647.11932-1-hdanton@sina.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 07, 2020 at 05:36:47PM +0800, Hillf Danton wrote: > > Hello Paul > > On Wed, 6 May 2020 17:42:40 Paul E. McKenney wrote: > > > > This commit adds a shrinker so as to inform RCU when memory is scarce. > > A simpler hook is added in the logic of kswapd for subscribing the info > that memory pressure is high, and then on top of it make rcu a subscriber > by copying your code for the shrinker, wishing it makes a sense to you. > > What's not yet included is to make the hook per node to help make every > reviewer convinced that memory is becoming tight. Of course without the > cost of making subscribers node aware. > > Hillf I must defer to the MM folks on the MM portion of this patch, but early warning of impending memory pressure would be extremely good. A few RCU-related notes inline below, though. Thanx, Paul > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -49,6 +49,16 @@ static inline void set_max_mapnr(unsigne > static inline void set_max_mapnr(unsigned long limit) { } > #endif > > +/* subscriber of kswapd's memory_pressure_high signal */ > +struct mph_subscriber { > + struct list_head node; > + void (*info) (void *data); > + void *data; > +}; > + > +int mph_subscribe(struct mph_subscriber *ms); > +void mph_unsubscribe(struct mph_subscriber *ms); > + > extern atomic_long_t _totalram_pages; > static inline unsigned long totalram_pages(void) > { > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -3536,6 +3536,40 @@ static bool kswapd_shrink_node(pg_data_t > } > > /* > + * subscribers of kswapd's signal that memory pressure is high > + */ > +static LIST_HEAD(mph_subs); > +static DEFINE_MUTEX(mph_lock); > + > +int mph_subscribe(struct mph_subscriber *ms) > +{ > + if (!ms->info) > + return -EAGAIN; > + > + mutex_lock(&mph_lock); > + list_add_tail(&ms->node, &mph_subs); > + mutex_unlock(&mph_lock); > + return 0; > +} > + > +void mph_unsubscribe(struct mph_subscriber *ms) > +{ > + mutex_lock(&mph_lock); > + list_del(&ms->node); > + mutex_unlock(&mph_lock); > +} > + > +static void kswapd_bbc_mph(void) > +{ > + struct mph_subscriber *ms; > + > + mutex_lock(&mph_lock); > + list_for_each_entry(ms, &mph_subs, node) > + ms->info(ms->data); > + mutex_unlock(&mph_lock); > +} > + > +/* > * For kswapd, balance_pgdat() will reclaim pages across a node from zones > * that are eligible for use by the caller until at least one zone is > * balanced. > @@ -3663,8 +3697,11 @@ restart: > * If we're getting trouble reclaiming, start doing writepage > * even in laptop mode. > */ > - if (sc.priority < DEF_PRIORITY - 2) > + if (sc.priority < DEF_PRIORITY - 2) { > sc.may_writepage = 1; > + if (sc.priority == DEF_PRIORITY - 3) > + kswapd_bbc_mph(); > + } > > /* Call soft limit reclaim before calling shrink_node. */ > sc.nr_scanned = 0; > --- a/kernel/rcu/tree.h > +++ b/kernel/rcu/tree.h > @@ -325,6 +325,8 @@ struct rcu_state { > int ncpus_snap; /* # CPUs seen last time. */ > u8 cbovld; /* Callback overload now? */ > u8 cbovldnext; /* ^ ^ next time? */ > + u8 mph; /* mm pressure high signal from kswapd */ > + unsigned long mph_end; /* time stamp in jiffies */ > > unsigned long jiffies_force_qs; /* Time at which to invoke */ > /* force_quiescent_state(). */ > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -52,6 +52,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -2314,8 +2315,15 @@ static void force_qs_rnp(int (*f)(struct > struct rcu_data *rdp; > struct rcu_node *rnp; > > - rcu_state.cbovld = rcu_state.cbovldnext; > + rcu_state.cbovld = smp_load_acquire(&rcu_state.mph) || > + rcu_state.cbovldnext; > rcu_state.cbovldnext = false; > + > + if (READ_ONCE(rcu_state.mph) && > + time_after(jiffies, READ_ONCE(rcu_state.mph_end))) { > + WRITE_ONCE(rcu_state.mph, false); > + pr_info("%s: Ending OOM-mode grace periods.\n", __func__); > + } > rcu_for_each_leaf_node(rnp) { > cond_resched_tasks_rcu_qs(); > mask = 0; > @@ -2643,6 +2651,20 @@ static void check_cb_ovld(struct rcu_dat > raw_spin_unlock_rcu_node(rnp); > } > > +static void rcu_mph_info(void *data) This pointer will always be &rcu_state, so why not ignore the pointer and use "rcu_state" below? RCU grace periods are inherently global, so I don't know of any way for RCU to focus on a given NUMA node. All or nothing. But on the other hand, speeding up RCU grace periods will also help specific NUMA nodes, so I believe that it is all good. > +{ > + struct rcu_state *state = data; > + > + WRITE_ONCE(state->mph_end, jiffies + HZ / 10); > + smp_store_release(&state->mph, true); > + rcu_force_quiescent_state(); > +} > + > +static struct mph_subscriber rcu_mph_subscriber = { > + .info = rcu_mph_info, > + .data = &rcu_state, Then this ".data" entry can be omitted, correct? > +}; > + > /* Helper function for call_rcu() and friends. */ > static void > __call_rcu(struct rcu_head *head, rcu_callback_t func) > @@ -4036,6 +4058,8 @@ void __init rcu_init(void) > qovld_calc = DEFAULT_RCU_QOVLD_MULT * qhimark; > else > qovld_calc = qovld; > + > + mph_subscribe(&rcu_mph_subscriber); > } > > #include "tree_stall.h" >