linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andi Kleen <andi@firstfloor.org>, Michal Hocko <mhocko@suse.cz>,
	Tim Chen <tim.c.chen@linux.intel.com>, <kernel-team@fb.com>
Subject: Re: [PATCH 01/10] mm: allow swappiness that prefers anon over file
Date: Thu, 9 Jun 2016 10:01:07 +0900	[thread overview]
Message-ID: <20160609010107.GF28620@bbox> (raw)
In-Reply-To: <20160608155812.GC6727@cmpxchg.org>

On Wed, Jun 08, 2016 at 11:58:12AM -0400, Johannes Weiner wrote:
> On Wed, Jun 08, 2016 at 09:06:32AM +0900, Minchan Kim wrote:
> > On Tue, Jun 07, 2016 at 10:18:18AM -0400, Johannes Weiner wrote:
> > > On Tue, Jun 07, 2016 at 09:25:50AM +0900, Minchan Kim wrote:
> > > > On Mon, Jun 06, 2016 at 03:48:27PM -0400, Johannes Weiner wrote:
> > > > > --- a/Documentation/sysctl/vm.txt
> > > > > +++ b/Documentation/sysctl/vm.txt
> > > > > @@ -771,14 +771,20 @@ with no ill effects: errors and warnings on these stats are suppressed.)
> > > > >  
> > > > >  swappiness
> > > > >  
> > > > > -This control is used to define how aggressive the kernel will swap
> > > > > -memory pages.  Higher values will increase agressiveness, lower values
> > > > > -decrease the amount of swap.  A value of 0 instructs the kernel not to
> > > > > -initiate swap until the amount of free and file-backed pages is less
> > > > > -than the high water mark in a zone.
> > > > > +This control is used to define the relative IO cost of cache misses
> > > > > +between the swap device and the filesystem as a value between 0 and
> > > > > +200. At 100, the VM assumes equal IO cost and will thus apply memory
> > > > > +pressure to the page cache and swap-backed pages equally. At 0, the
> > > > > +kernel will not initiate swap until the amount of free and file-backed
> > > > > +pages is less than the high watermark in a zone.
> > > > 
> > > > Generally, I agree extending swappiness value good but not sure 200 is
> > > > enough to represent speed gap between file and swap sotrage in every
> > > > cases. - Just nitpick.
> > > 
> > > How so? You can't give swap more weight than 100%. 200 is the maximum
> > > possible value.
> > 
> > In old, swappiness is how agressively reclaim anonymous pages in favour
> > of page cache. But when I read your description and changes about
> > swappiness in vm.txt, esp, *relative IO cost*, I feel you change swappiness
> > define to represent relative IO cost between swap storage and file storage.
> > Then, with that, we could balance anonymous and file LRU with the weight.
> > 
> > For example, let's assume that in-memory swap storage is 10x times faster
> > than slow thumb drive. In that case, IO cost of 5 anonymous pages
> > swapping-in/out is equal to 1 file-backed page-discard/read.
> > 
> > I thought it does make sense because that measuring the speed gab between
> > those storages is easier than selecting vague swappiness tendency.
> > 
> > In terms of such approach, I thought 200 is not enough to show the gab
> > because the gap is started from 100.
> > Isn't it your intention? If so, to me, the description was rather
> > misleading. :(
> 
> The way swappiness works never actually changed.
> 
> The only thing that changed is that we used to look at referenced
> pages (recent_rotated) and *assumed* they would likely cause IO when
> reclaimed, whereas with my patches we actually know whether they are.
> But swappiness has always been about relative IO cost of the LRUs.
> 
> Swappiness defines relative IO cost between file and swap on a scale
> from 0 to 200, where 100 is the point of equality. The scale factors
> are calculated in get_scan_count() like this:
> 
>   anon_prio = swappiness
>   file_prio = 200 - swappiness
> 
> and those are applied to the recorded cost/value ratios like this:
> 
>   ap = anon_prio * scanned / rotated
>   fp = file_prio * scanned / rotated
> 
> That means if your swap device is 10 times faster than your filesystem
> device, and you thus want anon to receive 10x the refaults when the
> anon and file pages are used equally, you do this:
> 
>   x + 10x = 200
>         x = 18 (ish)
> 
> So your file priority is ~18 and your swap priority is the remainder
> of the range, 200 - 18. You set swappiness to 182.
> 
> Now fill in the numbers while assuming all pages on both lists have
> been referenced before and will likely refault (or in the new model,
> all pages are refaulting):
> 
>   fraction[anon] = ap      = 182 * 1 / 1 = 182
>   fraction[file] = fp      =  18 * 1 / 1 =  18
>      denominator = ap + fp =    182 + 18 = 200
> 
> and then calculate the scan target like this:
> 
>   scan[type] = (lru_size() >> priority) * fraction[type] / denominator
> 
> This will scan and reclaim 9% of the file pages and 90% of the anon
> pages. On refault, 9% of the IO will be from the filesystem and 90%
> from the swap device.

Thanks for the detail example. Then, let's change the example a little bit.

A system has big HDD storage and SSD swap.

HDD:    200 IOPS
SSD: 100000 IOPS
>From https://en.wikipedia.org/wiki/IOPS

So, speed gap is 500x.
x + 500x = 200
If we use PCIe-SSD, the gap will be larger.
That's why I said 200 is enough to represent speed gap.
Such system configuration is already non-sense so it is okay to ignore such
usecases?

  reply	other threads:[~2016-06-09  1:00 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06 19:48 [PATCH 00/10] mm: balance LRU lists based on relative thrashing Johannes Weiner
2016-06-06 19:48 ` [PATCH 01/10] mm: allow swappiness that prefers anon over file Johannes Weiner
2016-06-07  0:25   ` Minchan Kim
2016-06-07 14:18     ` Johannes Weiner
2016-06-08  0:06       ` Minchan Kim
2016-06-08 15:58         ` Johannes Weiner
2016-06-09  1:01           ` Minchan Kim [this message]
2016-06-09 13:32             ` Johannes Weiner
2016-06-06 19:48 ` [PATCH 02/10] mm: swap: unexport __pagevec_lru_add() Johannes Weiner
2016-06-06 21:32   ` Rik van Riel
2016-06-07  9:07   ` Michal Hocko
2016-06-08  7:14   ` Minchan Kim
2016-06-06 19:48 ` [PATCH 03/10] mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() Johannes Weiner
2016-06-06 21:33   ` Rik van Riel
2016-06-07  9:12   ` Michal Hocko
2016-06-08  7:24   ` Minchan Kim
2016-06-06 19:48 ` [PATCH 04/10] mm: fix LRU balancing effect of new transparent huge pages Johannes Weiner
2016-06-06 21:36   ` Rik van Riel
2016-06-07  9:19   ` Michal Hocko
2016-06-08  7:28   ` Minchan Kim
2016-06-06 19:48 ` [PATCH 05/10] mm: remove LRU balancing effect of temporary page isolation Johannes Weiner
2016-06-06 21:56   ` Rik van Riel
2016-06-06 22:15     ` Johannes Weiner
2016-06-07  1:11       ` Rik van Riel
2016-06-07 13:57         ` Johannes Weiner
2016-06-07  9:26       ` Michal Hocko
2016-06-07 14:06         ` Johannes Weiner
2016-06-07  9:49   ` Michal Hocko
2016-06-08  7:39   ` Minchan Kim
2016-06-08 16:02     ` Johannes Weiner
2016-06-06 19:48 ` [PATCH 06/10] mm: remove unnecessary use-once cache bias from LRU balancing Johannes Weiner
2016-06-07  2:20   ` Rik van Riel
2016-06-07 14:11     ` Johannes Weiner
2016-06-08  8:03   ` Minchan Kim
2016-06-08 12:31   ` Michal Hocko
2016-06-06 19:48 ` [PATCH 07/10] mm: base LRU balancing on an explicit cost model Johannes Weiner
2016-06-06 19:13   ` kbuild test robot
2016-06-07  2:34   ` Rik van Riel
2016-06-07 14:12     ` Johannes Weiner
2016-06-08  8:14   ` Minchan Kim
2016-06-08 16:06     ` Johannes Weiner
2016-06-08 12:51   ` Michal Hocko
2016-06-08 16:16     ` Johannes Weiner
2016-06-09 12:18       ` Michal Hocko
2016-06-09 13:33         ` Johannes Weiner
2016-06-06 19:48 ` [PATCH 08/10] mm: deactivations shouldn't bias the LRU balance Johannes Weiner
2016-06-08  8:15   ` Minchan Kim
2016-06-08 12:57   ` Michal Hocko
2016-06-06 19:48 ` [PATCH 09/10] mm: only count actual rotations as LRU reclaim cost Johannes Weiner
2016-06-08  8:19   ` Minchan Kim
2016-06-08 13:18   ` Michal Hocko
2016-06-06 19:48 ` [PATCH 10/10] mm: balance LRU lists based on relative thrashing Johannes Weiner
2016-06-06 19:22   ` kbuild test robot
2016-06-06 23:50   ` Tim Chen
2016-06-07 16:23     ` Johannes Weiner
2016-06-07 19:56       ` Tim Chen
2016-06-08 13:58   ` Michal Hocko
2016-06-10  2:19   ` Minchan Kim
2016-06-13 15:52     ` Johannes Weiner
2016-06-15  2:23       ` Minchan Kim
2016-06-16 15:12         ` Johannes Weiner
2016-06-17  7:49           ` Minchan Kim
2016-06-17 17:01             ` Johannes Weiner
2016-06-20  7:42               ` Minchan Kim
2016-06-22 21:56                 ` Johannes Weiner
2016-06-24  6:22                   ` Minchan Kim
2016-06-07  9:51 ` [PATCH 00/10] " Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160609010107.GF28620@bbox \
    --to=minchan@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).