From: MinChan Kim <minchan.kim@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
"Rafael J. Wysocki" <rjw@sisk.pl>, Rik van Riel <riel@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 3/3][RFC] swsusp: shrink file cache first
Date: Fri, 6 Feb 2009 22:35:26 +0900 [thread overview]
Message-ID: <28c262360902060535g22facdd0tf082ca0abaec3f80@mail.gmail.com> (raw)
In-Reply-To: <20090206122417.GB1580@cmpxchg.org>
Thanks for kind explaining and good discussion, Hannes and Kosaki-san.
Always, I learn lots of thing with such good discussion. :)
On Fri, Feb 6, 2009 at 9:24 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Fri, Feb 06, 2009 at 02:59:35PM +0900, KOSAKI Motohiro wrote:
>> Hi
>>
>> > > if we think suspend performance, we should consider swap device and file-backed device
>> > > are different block device.
>> > > the interleave of file-backed page out and swap out can improve total write out performce.
>> >
>> > Hm, good point. We could probably improve that but I don't think it's
>> > too pressing because at least on my test boxen, actual shrinking time
>> > is really short compared to the total of suspending to disk.
>>
>> ok.
>> only remain problem is mesurement result posting :)
>>
>>
>> > > if we think resume performance, we shold how think the on-disk contenious of the swap consist
>> > > process's virtual address contenious.
>> > > it cause to reduce unnecessary seek.
>> > > but your patch doesn't this.
>> > >
>> > > Could you explain this patch benefit?
>> >
>> > The patch tries to shrink those pages first that are most unlikely to
>> > be needed again after resume. It assumes that active anon pages are
>> > immediately needed after resume while inactive file pages are not. So
>> > it defers shrinking anon pages after file cache.
>>
>> hmm, I'm confusing.
>> I agree active anon is important than inactive file.
>> but I don't understand why scanning order at suspend change resume order.
>
> This is the problem: on suspend, we can only save about 50% of memory
> through the suspend image because of the snapshotting. So we have to
> shrink memory before suspend. Since you probably always have more RAM
> used than 50%, you always have to shrink. And the image is always the
> same size.
>
> After restoring the image, resuming processes want to continue their
> work immediately and the user wants to use the applications again as
> soon as possible.
>
> Everything that is saved in the suspend image is restored and back in
> memory when the processes resume their work.
>
> Everything that is NOT saved in the suspend image is still on swap or
> not yet in the page page when the processes resume their work.
>
> So if we shrink the memory in the wrong order, after restoring the
> image we have page cache in memory that is not needed and those anon
> pages that are needed are swapped out.
It make sense.
> And the goal is that after restoring the image we have as much of the
> working set back in memory and those pages in swap and on disk-only
> that are unlikely to be used immediately by the resumed processes, so
> they can continue their work without much disk io.
>
Your intention is good to me.
>> > But I just noticed that the old behaviour defers it as well, because
>> > even if it does scan anon pages from the beginning, it allows writing
>> > only starting from pass 3.
>>
>> Ah, I see.
>> it's obiously wrong.
>>
>> > I couldn't quite understand what you wrote about on-disk
>> > contiguousness, but that claim still stands: faulting in contiguous
>> > pages from swap can be much slower than faulting file pages. And my
>> > patch prefers mapped file pages over anon pages. This is probably
>> > where I have seen the improvements after resume in my tests.
>>
>> sorry, I don't understand yet.
>> Why "prefers mapped file pages over anon pages" makes large improvement?
>
> Because contigously mapped file pages are faster to read in than a
> group of anon pages. Or at least that is my claim.
It make sense if general resume process happens fault which have
locality pattern
so, you should prove this.
>
> And if we have to evict some of the working set just because the
> working set is bigger than 50% of memory, then it's better to evict
> those pages that are cheaper to refault.
>
> Does that make sense?
Indeed!
>> > Yes, I'm still thinking about ideas how to quantify it properly. I
>> > have not yet found a reliable way to check for whether the working set
>> > is intact besides seeing whether the resumed applications are
>> > responsive right away or if they first have to swap in their pages
>> > again.
>>
>> thanks.
>> I'm looking for this :)
>
> Thanks to YOU, also for for reviewing!
>
>> > > > @@ -2134,17 +2144,17 @@ unsigned long shrink_all_memory(unsigned
>> > > >
>> > > > /*
>> > > > * We try to shrink LRUs in 5 passes:
>> > > > - * 0 = Reclaim from inactive_list only
>> > > > - * 1 = Reclaim from active list but don't reclaim mapped
>> > > > - * 2 = 2nd pass of type 1
>> > > > - * 3 = Reclaim mapped (normal reclaim)
>> > > > - * 4 = 2nd pass of type 3
>> > > > + * 0 = Reclaim unmapped inactive file pages
>> > > > + * 1 = Reclaim unmapped file pages
>> > >
>> > > I think your patch reclaim mapped file at priority 0 and 1 too.
>> >
>> > Doesn't the following check in shrink_page_list prevent this:
>> >
>> > if (!sc->may_swap && page_mapped(page))
>> > goto keep_locked;
>> >
>> > ?
>>
>> Grr, you are right.
>> I agree, currently may_swap doesn't control swap out or not.
>> so I think we should change it correct name ;)
>
> Agreed. What do you think about the following patch?
As for me, I can't agree with you.
There are two kinds of file-mapped pages.
1. file-mapped and dirty page.
2. file-mapped and no-dirty page
Both pages are not swapped.
File-mapped and dirty page is synced with original file
File-mapped and no-dirty page is just discarded with viewpoint of reclaim.
So, may_swap is just related to anon-pages
Thus, I think may_swap is reasonable.
How about you ?
>
> ---
> Subject: vmscan: rename may_swap scan control knob
>
> may_swap applies not only to anon pages but to mapped file pages as
> well. Rename it to may_unmap which is the actual meaning.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 9a27c44..2523600 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -60,8 +60,8 @@ struct scan_control {
>
> int may_writepage;
>
> - /* Can pages be swapped as part of reclaim? */
> - int may_swap;
> + /* Reclaim mapped pages */
> + int may_unmap;
>
> /* This context's SWAP_CLUSTER_MAX. If freeing memory for
> * suspend, we effectively ignore SWAP_CLUSTER_MAX.
> @@ -606,7 +606,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> if (unlikely(!page_evictable(page, NULL)))
> goto cull_mlocked;
>
> - if (!sc->may_swap && page_mapped(page))
> + if (!sc->may_unmap && page_mapped(page))
> goto keep_locked;
>
> /* Double the slab pressure for mapped and swapcache pages */
> @@ -1694,7 +1694,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
> .gfp_mask = gfp_mask,
> .may_writepage = !laptop_mode,
> .swap_cluster_max = SWAP_CLUSTER_MAX,
> - .may_swap = 1,
> + .may_unmap = 1,
> .swappiness = vm_swappiness,
> .order = order,
> .mem_cgroup = NULL,
> @@ -1713,7 +1713,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> {
> struct scan_control sc = {
> .may_writepage = !laptop_mode,
> - .may_swap = 1,
> + .may_unmap = 1,
> .swap_cluster_max = SWAP_CLUSTER_MAX,
> .swappiness = swappiness,
> .order = 0,
> @@ -1723,7 +1723,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> struct zonelist *zonelist;
>
> if (noswap)
> - sc.may_swap = 0;
> + sc.may_unmap = 0;
>
> sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) |
> (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK);
> @@ -1762,7 +1762,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order)
> struct reclaim_state *reclaim_state = current->reclaim_state;
> struct scan_control sc = {
> .gfp_mask = GFP_KERNEL,
> - .may_swap = 1,
> + .may_unmap = 1,
> .swap_cluster_max = SWAP_CLUSTER_MAX,
> .swappiness = vm_swappiness,
> .order = order,
> @@ -2109,7 +2109,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
> struct reclaim_state reclaim_state;
> struct scan_control sc = {
> .gfp_mask = GFP_KERNEL,
> - .may_swap = 0,
> + .may_unmap = 0,
> .swap_cluster_max = nr_pages,
> .may_writepage = 1,
> .swappiness = vm_swappiness,
> @@ -2147,7 +2147,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
>
> /* Force reclaiming mapped pages in the passes #3 and #4 */
> if (pass > 2) {
> - sc.may_swap = 1;
> + sc.may_unmap = 1;
> sc.swappiness = 100;
> }
>
> @@ -2292,7 +2292,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
> int priority;
> struct scan_control sc = {
> .may_writepage = !!(zone_reclaim_mode & RECLAIM_WRITE),
> - .may_swap = !!(zone_reclaim_mode & RECLAIM_SWAP),
> + .may_unmap = !!(zone_reclaim_mode & RECLAIM_SWAP),
> .swap_cluster_max = max_t(unsigned long, nr_pages,
> SWAP_CLUSTER_MAX),
> .gfp_mask = gfp_mask,
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
Kinds regards,
MinChan Kim
next prev parent reply other threads:[~2009-02-06 13:35 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-06 3:11 [PATCH 0/3] [PATCH 0/3] swsusp: shrink file cache first Johannes Weiner
2009-02-06 3:11 ` [PATCH 1/3] swsusp: clean up shrink_all_zones() Johannes Weiner
2009-02-06 3:20 ` KOSAKI Motohiro
2009-02-06 3:11 ` [PATCH 2/3] swsusp: dont fiddle with swappiness Johannes Weiner
2009-02-06 3:21 ` KOSAKI Motohiro
2009-02-06 3:11 ` [PATCH 3/3][RFC] swsusp: shrink file cache first Johannes Weiner
2009-02-06 3:39 ` KOSAKI Motohiro
2009-02-06 4:49 ` Johannes Weiner
2009-02-06 5:59 ` KOSAKI Motohiro
2009-02-06 12:24 ` Johannes Weiner
2009-02-06 13:35 ` MinChan Kim [this message]
2009-02-06 17:15 ` MinChan Kim
2009-02-06 23:37 ` Johannes Weiner
2009-02-09 19:43 ` [patch] vmscan: rename sc.may_swap to may_unmap Johannes Weiner
2009-02-09 23:02 ` MinChan Kim
2009-02-10 10:00 ` KOSAKI Motohiro
2009-03-27 6:19 ` [PATCH] vmscan: memcg needs may_swap (Re: [patch] vmscan: rename sc.may_swap to may_unmap) Daisuke Nishimura
2009-03-27 6:30 ` KAMEZAWA Hiroyuki
2009-03-29 23:45 ` KOSAKI Motohiro
2009-03-31 0:18 ` Daisuke Nishimura
2009-03-31 1:26 ` Minchan Kim
2009-03-31 1:42 ` KAMEZAWA Hiroyuki
2009-03-31 1:48 ` KOSAKI Motohiro
2009-04-01 4:09 ` Johannes Weiner
2009-04-01 5:08 ` Daisuke Nishimura
2009-04-01 9:04 ` KAMEZAWA Hiroyuki
2009-04-01 9:11 ` KOSAKI Motohiro
2009-04-01 9:49 ` Johannes Weiner
2009-04-01 9:55 ` KOSAKI Motohiro
2009-04-01 16:03 ` Johannes Weiner
2009-03-31 1:52 ` Daisuke Nishimura
2009-02-06 21:00 ` [PATCH 3/3][RFC] swsusp: shrink file cache first Andrew Morton
2009-02-06 23:27 ` Johannes Weiner
2009-02-07 17:23 ` Rafael J. Wysocki
2009-02-08 20:56 ` Johannes Weiner
2009-02-07 4:41 ` Nigel Cunningham
2009-02-07 16:51 ` KOSAKI Motohiro
2009-02-07 21:20 ` Nigel Cunningham
2009-02-27 13:27 ` Pavel Machek
2009-03-01 10:37 ` KOSAKI Motohiro
2009-02-06 8:03 ` MinChan Kim
2009-02-06 10:06 ` MinChan Kim
2009-02-06 11:50 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=28c262360902060535g22facdd0tf082ca0abaec3f80@mail.gmail.com \
--to=minchan.kim@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
--cc=rjw@sisk.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).