All of lore.kernel.org
 help / color / mirror / Atom feed
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
	Ying Han <yinghan@google.com>,
	hannes@cmpxchg.org, Michal Hocko <mhocko@suse.cz>
Subject: Re: [PATCH 8/8] memcg asyncrhouns reclaim workqueue
Date: Mon, 23 May 2011 09:25:57 +0900	[thread overview]
Message-ID: <20110523092557.30d322aa.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20110520182640.7e71af33.akpm@linux-foundation.org>

On Fri, 20 May 2011 18:26:40 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Sat, 21 May 2011 09:41:50 +0900 Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com> wrote:
> 
> > 2011/5/21 Andrew Morton <akpm@linux-foundation.org>:
> > > On Fri, 20 May 2011 12:48:37 +0900
> > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > >
> > >> workqueue for memory cgroup asynchronous memory shrinker.
> > >>
> > >> This patch implements the workqueue of async shrinker routine. each
> > >> memcg has a work and only one work can be scheduled at the same time.
> > >>
> > >> If shrinking memory doesn't goes well, delay will be added to the work.
> > >>
> > >
> > > When this code explodes (as it surely will), users will see large
> > > amounts of CPU consumption in the work queue thread. __We want to make
> > > this as easy to debug as possible, so we should try to make the
> > > workqueue's names mappable back onto their memcg's. __And anything else
> > > we can think of to help?
> > >
> > 
> > I had a patch for showing per-memcg reclaim latency stats. It will be help.
> > I'll add it again to this set. I just dropped it because there are many patches
> > onto memory.stat in flight..
> 
> Will that patch help us when users report the memcg equivalent of
> "kswapd uses 99% of CPU"?
> 
I think so. Each memcg shows what amount of cpu is used.

But, maybe it's not an easy interface. I have several idea.


An idea I have is to rename task->comm by overwrite from  kworker/u:%d as
to memcg/%d when the work is scheduled. I think this can be implemented in very
simple interface and flags to workqueue. Then, ps -elf can show what was goin on.
If necessary, I'll add a hardlimit of cpu usage for a work or I'll limit
the number of thread for memcg workqueue. 

Considering there are user who uses 2000+ memcg on a system, a thread per a memcg
was not a choice to me. Another idea was thread poll or workqueue. Because thread
pool can be a poor reimplemenation of workqueue, I used workqueue.

I'll implement some idea in above to the next version. 


> > >
> > >> + __ __ limit = res_counter_read_u64(&mem->res, RES_LIMIT);
> > >> + __ __ shrink_to = limit - MEMCG_ASYNC_MARGIN - PAGE_SIZE;
> > >> + __ __ usage = res_counter_read_u64(&mem->res, RES_USAGE);
> > >> + __ __ if (shrink_to <= usage) {
> > >> + __ __ __ __ __ __ required = usage - shrink_to;
> > >> + __ __ __ __ __ __ required = (required >> PAGE_SHIFT) + 1;
> > >> + __ __ __ __ __ __ /*
> > >> + __ __ __ __ __ __ __* This scans some number of pages and returns that memory
> > >> + __ __ __ __ __ __ __* reclaim was slow or now. If slow, we add a delay as
> > >> + __ __ __ __ __ __ __* congestion_wait() in vmscan.c
> > >> + __ __ __ __ __ __ __*/
> > >> + __ __ __ __ __ __ congested = mem_cgroup_shrink_static_scan(mem, (long)required);
> > >> + __ __ }
> > >> + __ __ if (test_bit(ASYNC_NORESCHED, &mem->async_flags)
> > >> + __ __ __ __ || mem_cgroup_async_should_stop(mem))
> > >> + __ __ __ __ __ __ goto finish_scan;
> > >> + __ __ /* If memory reclaim couldn't go well, add delay */
> > >> + __ __ if (congested)
> > >> + __ __ __ __ __ __ delay = HZ/10;
> > >
> > > Another magic number.
> > >
> > > If Moore's law holds, we need to reduce this number by 1.4 each year.
> > > Is this good?
> > >
> > 
> > not good.  I just used the same magic number now used with wait_iff_congested.
> > Other than timer, I can use pagein/pageout event counter. If we have
> > dirty_ratio,
> > I may able to link this to dirty_ratio and wait until dirty_ratio is enough low.
> > Or, wake up again hit limit.
> > 
> > Do you have suggestion ?
> > 
> 
> mm..  It would be pretty easy to generate an estimate of "pages scanned
> per second" from the contents of (and changes in) the scan_control. 

Hmm.

> Konwing that datum and knowing the number of pages in the memcg, we
> should be able to come up with a delay period which scales
> appropriately with CPU speed and with memory size?
> 
> Such a thing could be used to rationalise magic delays in other places,
> hopefully.
> 

Ok, I'll conder that. Thank you for nice idea.


> > 
> > >> + __ __ queue_delayed_work(memcg_async_shrinker, &mem->async_work, delay);
> > >> + __ __ return;
> > >> +finish_scan:
> > >> + __ __ cgroup_release_and_wakeup_rmdir(&mem->css);
> > >> + __ __ clear_bit(ASYNC_RUNNING, &mem->async_flags);
> > >> + __ __ return;
> > >> +}
> > >> +
> > >> +static void run_mem_cgroup_async_shrinker(struct mem_cgroup *mem)
> > >> +{
> > >> + __ __ if (test_bit(ASYNC_NORESCHED, &mem->async_flags))
> > >> + __ __ __ __ __ __ return;
> > >
> > > I can't work out what ASYNC_NORESCHED does. __Is its name well-chosen?
> > >
> > how about BLOCK/STOP_ASYNC_RECLAIM ?
> 
> I can't say - I don't know what it does!  Or maybe I did, and immediately
> forgot ;)
> 

I'll find a better name ;)

Thanks,
-Kame


WARNING: multiple messages have this Message-ID (diff)
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"nishimura@mxp.nes.nec.co.jp" <nishimura@mxp.nes.nec.co.jp>,
	"balbir@linux.vnet.ibm.com" <balbir@linux.vnet.ibm.com>,
	Ying Han <yinghan@google.com>,
	hannes@cmpxchg.org, Michal Hocko <mhocko@suse.cz>
Subject: Re: [PATCH 8/8] memcg asyncrhouns reclaim workqueue
Date: Mon, 23 May 2011 09:25:57 +0900	[thread overview]
Message-ID: <20110523092557.30d322aa.kamezawa.hiroyu@jp.fujitsu.com> (raw)
In-Reply-To: <20110520182640.7e71af33.akpm@linux-foundation.org>

On Fri, 20 May 2011 18:26:40 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Sat, 21 May 2011 09:41:50 +0900 Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com> wrote:
> 
> > 2011/5/21 Andrew Morton <akpm@linux-foundation.org>:
> > > On Fri, 20 May 2011 12:48:37 +0900
> > > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > >
> > >> workqueue for memory cgroup asynchronous memory shrinker.
> > >>
> > >> This patch implements the workqueue of async shrinker routine. each
> > >> memcg has a work and only one work can be scheduled at the same time.
> > >>
> > >> If shrinking memory doesn't goes well, delay will be added to the work.
> > >>
> > >
> > > When this code explodes (as it surely will), users will see large
> > > amounts of CPU consumption in the work queue thread. __We want to make
> > > this as easy to debug as possible, so we should try to make the
> > > workqueue's names mappable back onto their memcg's. __And anything else
> > > we can think of to help?
> > >
> > 
> > I had a patch for showing per-memcg reclaim latency stats. It will be help.
> > I'll add it again to this set. I just dropped it because there are many patches
> > onto memory.stat in flight..
> 
> Will that patch help us when users report the memcg equivalent of
> "kswapd uses 99% of CPU"?
> 
I think so. Each memcg shows what amount of cpu is used.

But, maybe it's not an easy interface. I have several idea.


An idea I have is to rename task->comm by overwrite from  kworker/u:%d as
to memcg/%d when the work is scheduled. I think this can be implemented in very
simple interface and flags to workqueue. Then, ps -elf can show what was goin on.
If necessary, I'll add a hardlimit of cpu usage for a work or I'll limit
the number of thread for memcg workqueue. 

Considering there are user who uses 2000+ memcg on a system, a thread per a memcg
was not a choice to me. Another idea was thread poll or workqueue. Because thread
pool can be a poor reimplemenation of workqueue, I used workqueue.

I'll implement some idea in above to the next version. 


> > >
> > >> + __ __ limit = res_counter_read_u64(&mem->res, RES_LIMIT);
> > >> + __ __ shrink_to = limit - MEMCG_ASYNC_MARGIN - PAGE_SIZE;
> > >> + __ __ usage = res_counter_read_u64(&mem->res, RES_USAGE);
> > >> + __ __ if (shrink_to <= usage) {
> > >> + __ __ __ __ __ __ required = usage - shrink_to;
> > >> + __ __ __ __ __ __ required = (required >> PAGE_SHIFT) + 1;
> > >> + __ __ __ __ __ __ /*
> > >> + __ __ __ __ __ __ __* This scans some number of pages and returns that memory
> > >> + __ __ __ __ __ __ __* reclaim was slow or now. If slow, we add a delay as
> > >> + __ __ __ __ __ __ __* congestion_wait() in vmscan.c
> > >> + __ __ __ __ __ __ __*/
> > >> + __ __ __ __ __ __ congested = mem_cgroup_shrink_static_scan(mem, (long)required);
> > >> + __ __ }
> > >> + __ __ if (test_bit(ASYNC_NORESCHED, &mem->async_flags)
> > >> + __ __ __ __ || mem_cgroup_async_should_stop(mem))
> > >> + __ __ __ __ __ __ goto finish_scan;
> > >> + __ __ /* If memory reclaim couldn't go well, add delay */
> > >> + __ __ if (congested)
> > >> + __ __ __ __ __ __ delay = HZ/10;
> > >
> > > Another magic number.
> > >
> > > If Moore's law holds, we need to reduce this number by 1.4 each year.
> > > Is this good?
> > >
> > 
> > not good.  I just used the same magic number now used with wait_iff_congested.
> > Other than timer, I can use pagein/pageout event counter. If we have
> > dirty_ratio,
> > I may able to link this to dirty_ratio and wait until dirty_ratio is enough low.
> > Or, wake up again hit limit.
> > 
> > Do you have suggestion ?
> > 
> 
> mm..  It would be pretty easy to generate an estimate of "pages scanned
> per second" from the contents of (and changes in) the scan_control. 

Hmm.

> Konwing that datum and knowing the number of pages in the memcg, we
> should be able to come up with a delay period which scales
> appropriately with CPU speed and with memory size?
> 
> Such a thing could be used to rationalise magic delays in other places,
> hopefully.
> 

Ok, I'll conder that. Thank you for nice idea.


> > 
> > >> + __ __ queue_delayed_work(memcg_async_shrinker, &mem->async_work, delay);
> > >> + __ __ return;
> > >> +finish_scan:
> > >> + __ __ cgroup_release_and_wakeup_rmdir(&mem->css);
> > >> + __ __ clear_bit(ASYNC_RUNNING, &mem->async_flags);
> > >> + __ __ return;
> > >> +}
> > >> +
> > >> +static void run_mem_cgroup_async_shrinker(struct mem_cgroup *mem)
> > >> +{
> > >> + __ __ if (test_bit(ASYNC_NORESCHED, &mem->async_flags))
> > >> + __ __ __ __ __ __ return;
> > >
> > > I can't work out what ASYNC_NORESCHED does. __Is its name well-chosen?
> > >
> > how about BLOCK/STOP_ASYNC_RECLAIM ?
> 
> I can't say - I don't know what it does!  Or maybe I did, and immediately
> forgot ;)
> 

I'll find a better name ;)

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-05-23  0:32 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-20  3:37 [PATCH 0/8] memcg async reclaim v2 KAMEZAWA Hiroyuki
2011-05-20  3:37 ` KAMEZAWA Hiroyuki
2011-05-20  3:41 ` [PATCH 1/8] memcg: export zone reclaimable pages KAMEZAWA Hiroyuki
2011-05-20  3:41   ` KAMEZAWA Hiroyuki
2011-05-20  3:42 ` [PATCH 2/8] memcg: easy check routine for reclaimable KAMEZAWA Hiroyuki
2011-05-20  3:42   ` KAMEZAWA Hiroyuki
2011-05-20 21:49   ` Andrew Morton
2011-05-20 21:49     ` Andrew Morton
2011-05-20 23:57     ` Hiroyuki Kamezawa
2011-05-20 23:57       ` Hiroyuki Kamezawa
2011-05-20  3:43 ` [PATCH 0/8] memcg: clean up, export swapiness KAMEZAWA Hiroyuki
2011-05-20  3:43   ` KAMEZAWA Hiroyuki
2011-05-23 17:26   ` Ying Han
2011-05-23 17:26     ` Ying Han
2011-05-23 23:55     ` KAMEZAWA Hiroyuki
2011-05-23 23:55       ` KAMEZAWA Hiroyuki
2011-05-20  3:44 ` [PATCH 4/8] memcg: export release victim KAMEZAWA Hiroyuki
2011-05-20  3:44   ` KAMEZAWA Hiroyuki
2011-05-20  3:46 ` [PATCH 6/8] memcg asynchronous memory reclaim interface KAMEZAWA Hiroyuki
2011-05-20  3:46   ` KAMEZAWA Hiroyuki
2011-05-20 21:49   ` Andrew Morton
2011-05-20 21:49     ` Andrew Morton
2011-05-20 23:56     ` Hiroyuki Kamezawa
2011-05-20 23:56       ` Hiroyuki Kamezawa
2011-05-23 23:36       ` Ying Han
2011-05-23 23:36         ` Ying Han
2011-05-24  0:11         ` KAMEZAWA Hiroyuki
2011-05-24  0:11           ` KAMEZAWA Hiroyuki
2011-05-24  0:26           ` Ying Han
2011-05-24  0:26             ` Ying Han
2011-05-20  3:47 ` [PATCH 7/8] memcg static scan reclaim for asyncrhonous reclaim KAMEZAWA Hiroyuki
2011-05-20  3:47   ` KAMEZAWA Hiroyuki
2011-05-20 21:50   ` Andrew Morton
2011-05-20 21:50     ` Andrew Morton
2011-05-21  0:23     ` Hiroyuki Kamezawa
2011-05-21  0:23       ` Hiroyuki Kamezawa
2011-05-20  3:48 ` [PATCH 8/8] memcg asyncrhouns reclaim workqueue KAMEZAWA Hiroyuki
2011-05-20  3:48   ` KAMEZAWA Hiroyuki
2011-05-20 21:51   ` Andrew Morton
2011-05-20 21:51     ` Andrew Morton
2011-05-21  0:41     ` Hiroyuki Kamezawa
2011-05-21  0:41       ` Hiroyuki Kamezawa
2011-05-21  1:26       ` Andrew Morton
2011-05-21  1:26         ` Andrew Morton
2011-05-23  0:25         ` KAMEZAWA Hiroyuki [this message]
2011-05-23  0:25           ` KAMEZAWA Hiroyuki
2011-05-25  5:51           ` Ying Han
     [not found] ` <BANLkTimd0CAqoAnuGz7WvKsbwphJxo0eZQ@mail.gmail.com>
2011-05-24  0:19   ` [PATCH 0/8] memcg async reclaim v2 KAMEZAWA Hiroyuki
2011-05-24  0:19     ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110523092557.30d322aa.kamezawa.hiroyu@jp.fujitsu.com \
    --to=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyuki@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.