All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC v3] memcg: add memcg lru
@ 2019-11-18 12:50 Hillf Danton
  2019-11-18 14:04 ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: Hillf Danton @ 2019-11-18 12:50 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Hillf Danton, linux-mm, Rong Chen, linux-kernel


On Mon, 18 Nov 2019 11:29:50 +0100 Michal Hocko wrote:
> 
> On Sun 17-11-19 19:35:26, Hillf Danton wrote:
> > 
> > Currently soft limit reclaim (slr) is frozen, see
> > Documentation/admin-guide/cgroup-v2.rst for reasons.
> > 
> > This work adds memcg hook into kswapd's logic to bypass slr, paving
> > a brick for its cleanup later.
> > 
> > After b23afb93d317 ("memcg: punt high overage reclaim to
> > return-to-userland path"), high limit breachers (hlb) are reclaimed
> > one after another spiraling up through the memcg hierarchy before
> > returning to userspace.
> > 
> > The current memcg high work helps to add the lru because we get to
> > collect hlb at zero price and in particular without adding changes
> > to the high work's behavior.
> > 
> > Then a fifo list, which is essencially a simple copy of the page lru,
> > is needed to facilitate queuing up hlb and ripping pages off them in
> > round robin once kswapd starts doing its job.
> > 
> > Finally new hook is added with slr's two problems addressed i.e.
> > hierarchy-unaware reclaim and overreclaim.
> > 
> > Thanks to Rong Chen for testing.
> 
Hey Michal

Thanks for your comments, this time and previous.

> You have ignored the previous review feedback again [1]. I have nacked
> the patch on grounds that it is completely missing any real use case
> scenario or any numbers suggesting there is an actual improvement.
> 
You are right though around half.

After another peep at your comment on v2, I think you didn't approve
the change added in high work to defer reclaim until kswapd becomes
active with good reasoning. That defer is cut in v3.

The added lru will take the place of the current slr, so slr's use
cases apply to it with no exception, yes? Please feel free let us
know what use cases else you may have interests in.

And the concerned improvement is just on the table, Sir, if we fail
to see regressions following lru, with no need to say the price we
pay for maintaining hlb is close to zero for instance among others.

Hillf

> Please do not post new versions until you make those things clear.
> 
> [1] http://lkml.kernel.org/r/20191029083730.GC31513@dhcp22.suse.cz
> -- 
> Michal Hocko
> SUSE Labs



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC v3] memcg: add memcg lru
  2019-11-18 12:50 [RFC v3] memcg: add memcg lru Hillf Danton
@ 2019-11-18 14:04 ` Michal Hocko
  0 siblings, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2019-11-18 14:04 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-mm, Rong Chen, linux-kernel

On Mon 18-11-19 20:50:14, Hillf Danton wrote:
> 
> On Mon, 18 Nov 2019 11:29:50 +0100 Michal Hocko wrote:
> > 
> > On Sun 17-11-19 19:35:26, Hillf Danton wrote:
> > > 
> > > Currently soft limit reclaim (slr) is frozen, see
> > > Documentation/admin-guide/cgroup-v2.rst for reasons.
> > > 
> > > This work adds memcg hook into kswapd's logic to bypass slr, paving
> > > a brick for its cleanup later.
> > > 
> > > After b23afb93d317 ("memcg: punt high overage reclaim to
> > > return-to-userland path"), high limit breachers (hlb) are reclaimed
> > > one after another spiraling up through the memcg hierarchy before
> > > returning to userspace.
> > > 
> > > The current memcg high work helps to add the lru because we get to
> > > collect hlb at zero price and in particular without adding changes
> > > to the high work's behavior.
> > > 
> > > Then a fifo list, which is essencially a simple copy of the page lru,
> > > is needed to facilitate queuing up hlb and ripping pages off them in
> > > round robin once kswapd starts doing its job.
> > > 
> > > Finally new hook is added with slr's two problems addressed i.e.
> > > hierarchy-unaware reclaim and overreclaim.
> > > 
> > > Thanks to Rong Chen for testing.
> > 
> Hey Michal
> 
> Thanks for your comments, this time and previous.
> 
> > You have ignored the previous review feedback again [1]. I have nacked
> > the patch on grounds that it is completely missing any real use case
> > scenario or any numbers suggesting there is an actual improvement.
> > 
> You are right though around half.
> 
> After another peep at your comment on v2, I think you didn't approve
> the change added in high work to defer reclaim until kswapd becomes
> active with good reasoning. That defer is cut in v3.

OK, that part was obviously broken in the previous version. But please
read the whole feedback I (and Johannes) have provided.
Besides that I would consider it polite to summarize the previous
version which received to NAKs from maintainers and explain why you
believe the code has addressed that problem.

> The added lru will take the place of the current slr, so slr's use
> cases apply to it with no exception, yes? Please feel free let us
> know what use cases else you may have interests in.

Let me ask differently. There must be a reason you have spent time on
developing this feature. There must be a usecase you are targetting.
Can you describe it so that we can evaluate pros and cons?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC v3] memcg: add memcg lru
  2019-11-17 11:35 Hillf Danton
@ 2019-11-18 10:29 ` Michal Hocko
  0 siblings, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2019-11-18 10:29 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-mm, Rong Chen, linux-kernel

On Sun 17-11-19 19:35:26, Hillf Danton wrote:
> 
> Currently soft limit reclaim (slr) is frozen, see
> Documentation/admin-guide/cgroup-v2.rst for reasons.
> 
> This work adds memcg hook into kswapd's logic to bypass slr, paving
> a brick for its cleanup later.
> 
> After b23afb93d317 ("memcg: punt high overage reclaim to
> return-to-userland path"), high limit breachers (hlb) are reclaimed
> one after another spiraling up through the memcg hierarchy before
> returning to userspace.
> 
> The current memcg high work helps to add the lru because we get to
> collect hlb at zero price and in particular without adding changes
> to the high work's behavior.
> 
> Then a fifo list, which is essencially a simple copy of the page lru,
> is needed to facilitate queuing up hlb and ripping pages off them in
> round robin once kswapd starts doing its job.
> 
> Finally new hook is added with slr's two problems addressed i.e.
> hierarchy-unaware reclaim and overreclaim.
> 
> Thanks to Rong Chen for testing.

You have ignored the previous review feedback again [1]. I have nacked
the patch on grounds that it is completely missing any real use case
scenario or any numbers suggesting there is an actual improvement.

Please do not post new versions until you make those things clear.

[1] http://lkml.kernel.org/r/20191029083730.GC31513@dhcp22.suse.cz
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC v3] memcg: add memcg lru
@ 2019-11-17 11:35 Hillf Danton
  2019-11-18 10:29 ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: Hillf Danton @ 2019-11-17 11:35 UTC (permalink / raw)
  To: linux-mm; +Cc: Rong Chen, linux-kernel, Hillf Danton


Currently soft limit reclaim (slr) is frozen, see
Documentation/admin-guide/cgroup-v2.rst for reasons.

This work adds memcg hook into kswapd's logic to bypass slr, paving
a brick for its cleanup later.

After b23afb93d317 ("memcg: punt high overage reclaim to
return-to-userland path"), high limit breachers (hlb) are reclaimed
one after another spiraling up through the memcg hierarchy before
returning to userspace.

The current memcg high work helps to add the lru because we get to
collect hlb at zero price and in particular without adding changes
to the high work's behavior.

Then a fifo list, which is essencially a simple copy of the page lru,
is needed to facilitate queuing up hlb and ripping pages off them in
round robin once kswapd starts doing its job.

Finally new hook is added with slr's two problems addressed i.e.
hierarchy-unaware reclaim and overreclaim.

Thanks to Rong Chen for testing.

V3 is based on next-20191115.

Changes since v2
- fix build error reported by kbuild test robot <lkp@intel.com>
- split hook function into two parts for better round robin
- make memcg lru able to reclaim dirty pages to cut risk of 
  premature oom reported by kernel test robot <rong.a.chen@intel.com>
- drop change added in high work's behavior to defer reclaim

Changes since v1
- drop MEMCG_LRU 
- add hook into kswapd's logic to bypass slr

Changes since v0
- add MEMCG_LRU in init/Kconfig
- drop changes in mm/vmscan.c
- make memcg lru work in parallel to slr

Signed-off-by: Hillf Danton <hdanton@sina.com>
---

--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -218,6 +218,8 @@ struct mem_cgroup {
 	/* Upper bound of normal memory consumption range */
 	unsigned long high;
 
+	struct list_head lru_node;
+
 	/* Range enforcement for interrupt charges */
 	struct work_struct high_work;
 
@@ -732,6 +734,9 @@ static inline void mod_lruvec_page_state
 	local_irq_restore(flags);
 }
 
+struct mem_cgroup *mem_cgroup_reclaim_high_begin(void);
+void mem_cgroup_reclaim_high_end(struct mem_cgroup *memcg);
+
 unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 						gfp_t gfp_mask,
 						unsigned long *total_scanned);
@@ -1123,6 +1128,14 @@ static inline void __mod_lruvec_slab_sta
 	__mod_node_page_state(page_pgdat(page), idx, val);
 }
 
+static inline struct mem_cgroup *mem_cgroup_reclaim_high_begin(void)
+{
+	return NULL;
+}
+static inline void mem_cgroup_reclaim_high_end(struct mem_cgroup *memcg)
+{
+}
+
 static inline
 unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
 					    gfp_t gfp_mask,
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2221,15 +2221,81 @@ static int memcg_hotplug_cpu_dead(unsign
 	return 0;
 }
 
+static DEFINE_SPINLOCK(memcg_lru_lock);
+static LIST_HEAD(memcg_lru);	/* a copy of page lru */
+
+static void memcg_add_lru(struct mem_cgroup *memcg)
+{
+	spin_lock_irq(&memcg_lru_lock);
+	if (list_empty(&memcg->lru_node))
+		list_add_tail(&memcg->lru_node, &memcg_lru);
+	spin_unlock_irq(&memcg_lru_lock);
+}
+
+static struct mem_cgroup *memcg_pinch_lru(void)
+{
+	struct mem_cgroup *memcg, *next;
+
+	spin_lock_irq(&memcg_lru_lock);
+
+	list_for_each_entry_safe(memcg, next, &memcg_lru, lru_node) {
+		list_del_init(&memcg->lru_node);
+
+		if (page_counter_read(&memcg->memory) > memcg->high) {
+			spin_unlock_irq(&memcg_lru_lock);
+			return memcg;
+		}
+	}
+	spin_unlock_irq(&memcg_lru_lock);
+
+	return NULL;
+}
+
+struct mem_cgroup *mem_cgroup_reclaim_high_begin(void)
+{
+	struct mem_cgroup *memcg, *victim;
+
+	memcg = victim = memcg_pinch_lru();
+	if (!memcg)
+		return NULL;
+
+	while ((memcg = parent_mem_cgroup(memcg)))
+		if (page_counter_read(&memcg->memory) > memcg->high) {
+			memcg_memory_event(memcg, MEMCG_HIGH);
+			memcg_add_lru(memcg);
+			break;
+		}
+
+	return victim;
+}
+
+void mem_cgroup_reclaim_high_end(struct mem_cgroup *memcg)
+{
+	while (memcg) {
+		if (page_counter_read(&memcg->memory) > memcg->high) {
+			memcg_memory_event(memcg, MEMCG_HIGH);
+			memcg_add_lru(memcg);
+			return;
+		}
+		memcg = parent_mem_cgroup(memcg);
+	}
+}
+
 static void reclaim_high(struct mem_cgroup *memcg,
 			 unsigned int nr_pages,
 			 gfp_t gfp_mask)
 {
+	bool add_lru = true;
 	do {
 		if (page_counter_read(&memcg->memory) <= memcg->high)
 			continue;
 		memcg_memory_event(memcg, MEMCG_HIGH);
 		try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true);
+		if (add_lru &&
+		    page_counter_read(&memcg->memory) > memcg->high) {
+			memcg_add_lru(memcg);
+			add_lru = false;
+		}
 	} while ((memcg = parent_mem_cgroup(memcg)));
 }
 
@@ -4953,6 +5019,7 @@ static struct mem_cgroup *mem_cgroup_all
 	if (memcg_wb_domain_init(memcg, GFP_KERNEL))
 		goto fail;
 
+	INIT_LIST_HEAD(&memcg->lru_node);
 	INIT_WORK(&memcg->high_work, high_work_func);
 	INIT_LIST_HEAD(&memcg->oom_notify);
 	mutex_init(&memcg->thresholds_lock);
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2910,6 +2910,30 @@ static inline bool compaction_ready(stru
 	return zone_watermark_ok_safe(zone, 0, watermark, sc->reclaim_idx);
 }
 
+#ifdef CONFIG_MEMCG
+static void mem_cgroup_reclaim_high(struct pglist_data *pgdat,
+					struct scan_control *sc)
+{
+	struct mem_cgroup *memcg;
+
+	memcg = mem_cgroup_reclaim_high_begin();
+	if (memcg) {
+		unsigned long ntr = sc->nr_to_reclaim;
+		struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
+
+		sc->nr_to_reclaim = SWAP_CLUSTER_MAX;
+		shrink_lruvec(lruvec, sc);
+		sc->nr_to_reclaim = ntr;
+	}
+	mem_cgroup_reclaim_high_end(memcg);
+}
+#else
+static void mem_cgroup_reclaim_high(struct pglist_data *pgdat,
+					struct scan_control *sc)
+{
+}
+#endif
+
 /*
  * This is the direct reclaim path, for page-allocating processes.  We only
  * try to reclaim pages from zones which will satisfy the caller's allocation
@@ -2974,6 +2998,9 @@ static void shrink_zones(struct zonelist
 			if (zone->zone_pgdat == last_pgdat)
 				continue;
 
+			mem_cgroup_reclaim_high(zone->zone_pgdat, sc);
+			continue;
+
 			/*
 			 * This steals pages from memory cgroups over softlimit
 			 * and returns the number of reclaimed pages and
@@ -3690,12 +3717,16 @@ restart:
 		if (sc.priority < DEF_PRIORITY - 2)
 			sc.may_writepage = 1;
 
+		mem_cgroup_reclaim_high(pgdat, &sc);
+		goto soft_limit_reclaim_end;
+
 		/* Call soft limit reclaim before calling shrink_node. */
 		sc.nr_scanned = 0;
 		nr_soft_scanned = 0;
 		nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(pgdat, sc.order,
 						sc.gfp_mask, &nr_soft_scanned);
 		sc.nr_reclaimed += nr_soft_reclaimed;
+soft_limit_reclaim_end:
 
 		/*
 		 * There should be no need to raise the scanning priority if
--



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-18 14:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-18 12:50 [RFC v3] memcg: add memcg lru Hillf Danton
2019-11-18 14:04 ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2019-11-17 11:35 Hillf Danton
2019-11-18 10:29 ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.