All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Hugh Dickins <hugh.dickins@tiscali.org.uk>,
	Andi Kleen <andi@firstfloor.org>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch v3] swap: virtual swap readahead
Date: Thu, 18 Jun 2009 00:41:49 +0200	[thread overview]
Message-ID: <20090617224149.GA16104@cmpxchg.org> (raw)
In-Reply-To: <20090611143122.108468f1.kamezawa.hiroyu@jp.fujitsu.com>

On Thu, Jun 11, 2009 at 02:31:22PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 9 Jun 2009 21:01:28 +0200
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> > [resend with lists cc'd, sorry]
> > 
> > +static int swap_readahead_ptes(struct mm_struct *mm,
> > +			unsigned long addr, pmd_t *pmd,
> > +			swp_entry_t *entries,
> > +			unsigned long cluster)
> > +{
> > +	unsigned long window, min, max, limit;
> > +	spinlock_t *ptl;
> > +	pte_t *ptep;
> > +	int i, nr;
> > +
> > +	window = cluster << PAGE_SHIFT;
> > +	min = addr & ~(window - 1);
> > +	max = min + cluster;
> 
> Johannes, I wonder there is no reason to use "alignment".

I am wondering too.  I digged into the archives but the alignment
comes from a change older than what history.git documents, so I wasn't
able to find written down justification for this.

> I think we just need to read "nearby" pages. Then, this function's
> scan range should be
> 
> 	[addr - window/2, addr + window/2)
> or some.
> 
> And here, too
> > +	if (!entries)	/* XXX: shmem case */
> > +		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> > +	pmin = swp_offset(entry) & ~(cluster - 1);
> > +	pmax = pmin + cluster;
> 
> pmin = swp_offset(entry) - cluster/2.
> pmax = swp_offset(entry) + cluster/2.
> 
> I'm sorry if I miss a reason for using "alignment".

Perhas someone else knows a good reason for it, but I think it could
even be harmful.

Chances are that several processes fault around the same slots
simultaneously.  By letting them all start at the same aligned offset
we have a maximum race between them and they all allocate pages for
the same slots concurrently.

By placing the window unaligned we decrease this overlapping, so it
sounds like a good idea.

It would increase the amount of readahead done even more, though, and
Fengguang already measured degradation in IO latency with my patch, so
this probably needs more changes to work well.

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Hugh Dickins <hugh.dickins@tiscali.org.uk>,
	Andi Kleen <andi@firstfloor.org>,
	Wu Fengguang <fengguang.wu@intel.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch v3] swap: virtual swap readahead
Date: Thu, 18 Jun 2009 00:41:49 +0200	[thread overview]
Message-ID: <20090617224149.GA16104@cmpxchg.org> (raw)
In-Reply-To: <20090611143122.108468f1.kamezawa.hiroyu@jp.fujitsu.com>

On Thu, Jun 11, 2009 at 02:31:22PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 9 Jun 2009 21:01:28 +0200
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> > [resend with lists cc'd, sorry]
> > 
> > +static int swap_readahead_ptes(struct mm_struct *mm,
> > +			unsigned long addr, pmd_t *pmd,
> > +			swp_entry_t *entries,
> > +			unsigned long cluster)
> > +{
> > +	unsigned long window, min, max, limit;
> > +	spinlock_t *ptl;
> > +	pte_t *ptep;
> > +	int i, nr;
> > +
> > +	window = cluster << PAGE_SHIFT;
> > +	min = addr & ~(window - 1);
> > +	max = min + cluster;
> 
> Johannes, I wonder there is no reason to use "alignment".

I am wondering too.  I digged into the archives but the alignment
comes from a change older than what history.git documents, so I wasn't
able to find written down justification for this.

> I think we just need to read "nearby" pages. Then, this function's
> scan range should be
> 
> 	[addr - window/2, addr + window/2)
> or some.
> 
> And here, too
> > +	if (!entries)	/* XXX: shmem case */
> > +		return swapin_readahead_phys(entry, gfp_mask, vma, addr);
> > +	pmin = swp_offset(entry) & ~(cluster - 1);
> > +	pmax = pmin + cluster;
> 
> pmin = swp_offset(entry) - cluster/2.
> pmax = swp_offset(entry) + cluster/2.
> 
> I'm sorry if I miss a reason for using "alignment".

Perhas someone else knows a good reason for it, but I think it could
even be harmful.

Chances are that several processes fault around the same slots
simultaneously.  By letting them all start at the same aligned offset
we have a maximum race between them and they all allocate pages for
the same slots concurrently.

By placing the window unaligned we decrease this overlapping, so it
sounds like a good idea.

It would increase the amount of readahead done even more, though, and
Fengguang already measured degradation in IO latency with my patch, so
this probably needs more changes to work well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-06-17 22:45 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-09 19:01 [patch v3] swap: virtual swap readahead Johannes Weiner
2009-06-09 19:01 ` Johannes Weiner
2009-06-09 19:37 ` Johannes Weiner
2009-06-09 19:37   ` Johannes Weiner
2009-06-10  5:03   ` Wu Fengguang
2009-06-10  5:03     ` Wu Fengguang
2009-06-10  7:45     ` Johannes Weiner
2009-06-10  7:45       ` Johannes Weiner
2009-06-10  8:11       ` Wu Fengguang
2009-06-10  8:11         ` Wu Fengguang
2009-06-10  8:32         ` KAMEZAWA Hiroyuki
2009-06-10  8:32           ` KAMEZAWA Hiroyuki
2009-06-10  8:56           ` Wu Fengguang
2009-06-10  8:56             ` Wu Fengguang
2009-06-10  9:42             ` Peter Zijlstra
2009-06-10  9:42               ` Peter Zijlstra
2009-06-10  9:59               ` Wu Fengguang
2009-06-10  9:59                 ` Wu Fengguang
2009-06-10 10:05                 ` Peter Zijlstra
2009-06-10 10:05                   ` Peter Zijlstra
2009-06-10 11:32                   ` Wu Fengguang
2009-06-10 11:32                     ` Wu Fengguang
2009-06-10 17:25                     ` Jesse Barnes
2009-06-10 17:25                       ` Jesse Barnes
2009-06-11  5:22                       ` Wu Fengguang
2009-06-11  5:22                         ` Wu Fengguang
2009-06-11 10:17                         ` Johannes Weiner
2009-06-11 10:17                           ` Johannes Weiner
2009-06-12  1:59                           ` Wu Fengguang
2009-06-12  1:59                             ` Wu Fengguang
2009-06-15 18:22                             ` Johannes Weiner
2009-06-15 18:22                               ` Johannes Weiner
2009-06-18  9:19                               ` Wu Fengguang
2009-06-18  9:19                                 ` Wu Fengguang
2009-06-18 13:01                                 ` Johannes Weiner
2009-06-18 13:01                                   ` Johannes Weiner
2009-06-19  3:30                                   ` Wu Fengguang
2009-06-19  3:30                                     ` Wu Fengguang
2009-06-21 18:07                                   ` Hugh Dickins
2009-06-21 18:07                                     ` Hugh Dickins
2009-06-21 18:37                                     ` Johannes Weiner
2009-06-21 18:37                                       ` Johannes Weiner
2009-06-10  9:30           ` Johannes Weiner
2009-06-10  9:30             ` Johannes Weiner
2009-06-10  6:39   ` KAMEZAWA Hiroyuki
2009-06-10  6:39     ` KAMEZAWA Hiroyuki
2009-06-11  5:31 ` KAMEZAWA Hiroyuki
2009-06-11  5:31   ` KAMEZAWA Hiroyuki
2009-06-17 22:41   ` Johannes Weiner [this message]
2009-06-17 22:41     ` Johannes Weiner
2009-06-18  9:29     ` Wu Fengguang
2009-06-18  9:29       ` Wu Fengguang
2009-06-18 13:09       ` Johannes Weiner
2009-06-18 13:09         ` Johannes Weiner
2009-06-19  3:17         ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090617224149.GA16104@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=fengguang.wu@intel.com \
    --cc=hugh.dickins@tiscali.org.uk \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.