All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Barnes, Jesse" <jesse.barnes@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Andi Kleen <andi@firstfloor.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [patch v3] swap: virtual swap readahead
Date: Fri, 19 Jun 2009 11:30:04 +0800	[thread overview]
Message-ID: <20090619033004.GB5603@localhost> (raw)
In-Reply-To: <20090618130121.GA1817@cmpxchg.org>

On Thu, Jun 18, 2009 at 09:01:21PM +0800, Johannes Weiner wrote:
> On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
> > On Tue, Jun 16, 2009 at 02:22:17AM +0800, Johannes Weiner wrote:
> > > On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> > > > On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > > > > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > > > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > > > > (even worse this time):
> > > > > 
> > > > > Thanks for doing the tests.  Do you know if the time difference comes
> > > > > from IO or CPU time?
> > > > > 
> > > > > Because one reason I could think of is that the original code walks
> > > > > the readaround window in two directions, starting from the target each
> > > > > time but immediately stops when it encounters a hole where the new
> > > > > code just skips holes but doesn't abort readaround and thus might
> > > > > indeed read more slots.
> > > > > 
> > > > > I have an old patch flying around that changed the physical ra code to
> > > > > use a bitmap that is able to represent holes.  If the increased time
> > > > > is waiting for IO, I would be interested if that patch has the same
> > > > > negative impact.
> > > > 
> > > > You can send me the patch :)
> > > 
> > > Okay, attached is a rebase against latest -mmotm.
> > > 
> > > > But for this patch it is IO bound. The CPU iowait field actually is
> > > > going up as the test goes on:
> > > 
> > > It's probably the larger ra window then which takes away the bandwidth
> > > needed to load the new executables.  This sucks.  Would be nice to
> > > have 'optional IO' for readahead that is dropped when normal-priority
> > > IO requests are coming in...  Oh, we have READA for bios.  But it
> > > doesn't seem to implement dropping requests on load (or I am blind).
> > 
> > Hi Hannes,
> > 
> > Sorry for the long delay! A bad news is that I get many oom with this patch:
> 
> Okay, evaluating this test-patch any further probably isn't worth it.
> It's too aggressive, I think readahead is stealing pages reclaimed by
> other allocations which in turn oom.

OK.

> Back to the original problem: you detected increased latency for
> launching new applications, so they get less share of the IO bandwidth

There are no "launch new app" phase. The test flow works like:

  for all apps {
        for all started apps {
                activate its GUI window
        }
        start one new app
  }
        
But yes, as time goes by, the test becomes more and more about
switching between existing windows under high memory pressure.

> than without the patch.
> 
> I can see two reasons for this:
> 
>   a) the new heuristics don't work out and we read more unrelated
>   pages than before
> 
>   b) we readahead more pages in total as the old code would stop at
>   holes, as described above
> 
> We can verify a) by comparing major fault numbers between the two

Plus pswpin numbers :) I found it significantly decreased when we do
pte swap readahead..  See another email.

> kernels with your testload.  If they increase with my patch, we
> anticipate the wrong slots and every fault has do the reading itself.
> 
> b) seems to be a trade-off.  After all, the IO resources you have less
> for new applications in your test is the bandwidth that is used by
> swapping applications.  My qsbench numbers are a sign for this as the
> only IO going on is swap.
> 
> Of course, the theory is not to improve swap performance by increasing
> the readahead window but to choose better readahead candidates.  So I
> will run your tests and qsbench with a smaller page cluster and see if
> this improves both loads.

The general principle is, any non sector number based readahead should
be really accurate in order to be a net gain. Because each readahead
page miss will lead to one disk seek, which is much more costly than
wasting a memory page.

Thanks,
Fengguang


WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Barnes, Jesse" <jesse.barnes@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Andi Kleen <andi@firstfloor.org>,
	Minchan Kim <minchan.kim@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [patch v3] swap: virtual swap readahead
Date: Fri, 19 Jun 2009 11:30:04 +0800	[thread overview]
Message-ID: <20090619033004.GB5603@localhost> (raw)
In-Reply-To: <20090618130121.GA1817@cmpxchg.org>

On Thu, Jun 18, 2009 at 09:01:21PM +0800, Johannes Weiner wrote:
> On Thu, Jun 18, 2009 at 05:19:49PM +0800, Wu Fengguang wrote:
> > On Tue, Jun 16, 2009 at 02:22:17AM +0800, Johannes Weiner wrote:
> > > On Fri, Jun 12, 2009 at 09:59:27AM +0800, Wu Fengguang wrote:
> > > > On Thu, Jun 11, 2009 at 06:17:42PM +0800, Johannes Weiner wrote:
> > > > > On Thu, Jun 11, 2009 at 01:22:28PM +0800, Wu Fengguang wrote:
> > > > > > Unfortunately, after fixing it up the swap readahead patch still performs slow
> > > > > > (even worse this time):
> > > > > 
> > > > > Thanks for doing the tests.  Do you know if the time difference comes
> > > > > from IO or CPU time?
> > > > > 
> > > > > Because one reason I could think of is that the original code walks
> > > > > the readaround window in two directions, starting from the target each
> > > > > time but immediately stops when it encounters a hole where the new
> > > > > code just skips holes but doesn't abort readaround and thus might
> > > > > indeed read more slots.
> > > > > 
> > > > > I have an old patch flying around that changed the physical ra code to
> > > > > use a bitmap that is able to represent holes.  If the increased time
> > > > > is waiting for IO, I would be interested if that patch has the same
> > > > > negative impact.
> > > > 
> > > > You can send me the patch :)
> > > 
> > > Okay, attached is a rebase against latest -mmotm.
> > > 
> > > > But for this patch it is IO bound. The CPU iowait field actually is
> > > > going up as the test goes on:
> > > 
> > > It's probably the larger ra window then which takes away the bandwidth
> > > needed to load the new executables.  This sucks.  Would be nice to
> > > have 'optional IO' for readahead that is dropped when normal-priority
> > > IO requests are coming in...  Oh, we have READA for bios.  But it
> > > doesn't seem to implement dropping requests on load (or I am blind).
> > 
> > Hi Hannes,
> > 
> > Sorry for the long delay! A bad news is that I get many oom with this patch:
> 
> Okay, evaluating this test-patch any further probably isn't worth it.
> It's too aggressive, I think readahead is stealing pages reclaimed by
> other allocations which in turn oom.

OK.

> Back to the original problem: you detected increased latency for
> launching new applications, so they get less share of the IO bandwidth

There are no "launch new app" phase. The test flow works like:

  for all apps {
        for all started apps {
                activate its GUI window
        }
        start one new app
  }
        
But yes, as time goes by, the test becomes more and more about
switching between existing windows under high memory pressure.

> than without the patch.
> 
> I can see two reasons for this:
> 
>   a) the new heuristics don't work out and we read more unrelated
>   pages than before
> 
>   b) we readahead more pages in total as the old code would stop at
>   holes, as described above
> 
> We can verify a) by comparing major fault numbers between the two

Plus pswpin numbers :) I found it significantly decreased when we do
pte swap readahead..  See another email.

> kernels with your testload.  If they increase with my patch, we
> anticipate the wrong slots and every fault has do the reading itself.
> 
> b) seems to be a trade-off.  After all, the IO resources you have less
> for new applications in your test is the bandwidth that is used by
> swapping applications.  My qsbench numbers are a sign for this as the
> only IO going on is swap.
> 
> Of course, the theory is not to improve swap performance by increasing
> the readahead window but to choose better readahead candidates.  So I
> will run your tests and qsbench with a smaller page cluster and see if
> this improves both loads.

The general principle is, any non sector number based readahead should
be really accurate in order to be a net gain. Because each readahead
page miss will lead to one disk seek, which is much more costly than
wasting a memory page.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-06-19  3:30 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-09 19:01 [patch v3] swap: virtual swap readahead Johannes Weiner
2009-06-09 19:01 ` Johannes Weiner
2009-06-09 19:37 ` Johannes Weiner
2009-06-09 19:37   ` Johannes Weiner
2009-06-10  5:03   ` Wu Fengguang
2009-06-10  5:03     ` Wu Fengguang
2009-06-10  7:45     ` Johannes Weiner
2009-06-10  7:45       ` Johannes Weiner
2009-06-10  8:11       ` Wu Fengguang
2009-06-10  8:11         ` Wu Fengguang
2009-06-10  8:32         ` KAMEZAWA Hiroyuki
2009-06-10  8:32           ` KAMEZAWA Hiroyuki
2009-06-10  8:56           ` Wu Fengguang
2009-06-10  8:56             ` Wu Fengguang
2009-06-10  9:42             ` Peter Zijlstra
2009-06-10  9:42               ` Peter Zijlstra
2009-06-10  9:59               ` Wu Fengguang
2009-06-10  9:59                 ` Wu Fengguang
2009-06-10 10:05                 ` Peter Zijlstra
2009-06-10 10:05                   ` Peter Zijlstra
2009-06-10 11:32                   ` Wu Fengguang
2009-06-10 11:32                     ` Wu Fengguang
2009-06-10 17:25                     ` Jesse Barnes
2009-06-10 17:25                       ` Jesse Barnes
2009-06-11  5:22                       ` Wu Fengguang
2009-06-11  5:22                         ` Wu Fengguang
2009-06-11 10:17                         ` Johannes Weiner
2009-06-11 10:17                           ` Johannes Weiner
2009-06-12  1:59                           ` Wu Fengguang
2009-06-12  1:59                             ` Wu Fengguang
2009-06-15 18:22                             ` Johannes Weiner
2009-06-15 18:22                               ` Johannes Weiner
2009-06-18  9:19                               ` Wu Fengguang
2009-06-18  9:19                                 ` Wu Fengguang
2009-06-18 13:01                                 ` Johannes Weiner
2009-06-18 13:01                                   ` Johannes Weiner
2009-06-19  3:30                                   ` Wu Fengguang [this message]
2009-06-19  3:30                                     ` Wu Fengguang
2009-06-21 18:07                                   ` Hugh Dickins
2009-06-21 18:07                                     ` Hugh Dickins
2009-06-21 18:37                                     ` Johannes Weiner
2009-06-21 18:37                                       ` Johannes Weiner
2009-06-10  9:30           ` Johannes Weiner
2009-06-10  9:30             ` Johannes Weiner
2009-06-10  6:39   ` KAMEZAWA Hiroyuki
2009-06-10  6:39     ` KAMEZAWA Hiroyuki
2009-06-11  5:31 ` KAMEZAWA Hiroyuki
2009-06-11  5:31   ` KAMEZAWA Hiroyuki
2009-06-17 22:41   ` Johannes Weiner
2009-06-17 22:41     ` Johannes Weiner
2009-06-18  9:29     ` Wu Fengguang
2009-06-18  9:29       ` Wu Fengguang
2009-06-18 13:09       ` Johannes Weiner
2009-06-18 13:09         ` Johannes Weiner
2009-06-19  3:17         ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090619033004.GB5603@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hannes@cmpxchg.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=jesse.barnes@intel.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.