Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, kernel-team <kernel-team@lge.com>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Hugh Dickins <hughd@google.com>,
	Ilya Dryomov <idryomov@gmail.com>, Jens Axboe <axboe@kernel.dk>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [PATCH] mm for mmotm: Revert skip swap cache feture for synchronous device
Date: Wed, 3 Jan 2018 09:24:22 +0900
Message-ID: <20180103002422.GA20500@bbox> (raw)
In-Reply-To: <1514938277.4018.18.camel@HansenPartnership.com>

On Tue, Jan 02, 2018 at 04:11:17PM -0800, James Bottomley wrote:
> On Wed, 2018-01-03 at 08:56 +0900, Minchan Kim wrote:
> > On Tue, Jan 02, 2018 at 02:42:21PM -0800, James Bottomley wrote:
> > > 
> > > On Tue, 2018-01-02 at 13:22 -0800, Andrew Morton wrote:
> > > > 
> > > > On Fri, 29 Dec 2017 09:55:07 +0900 Minchan Kim <minchan@kernel.or
> > > > g>
> > > > wrote:
> > > > 
> > > > > 
> > > > > 
> > > > > James reported a bug of swap paging-in for his testing and
> > > > > found it
> > > > > at rc5, soon to be -rc5.
> > > > > 
> > > > > Although we can fix the specific problem at the moment, it may
> > > > > have other lurkig bugs so want to have one more cycle in -next
> > > > > before merging.
> > > > > 
> > > > > This patchset reverts 23c47d2ada9f, 08fa93021d80, 8e31f339295f
> > > > > completely
> > > > > but 79b5f08fa34e partially because the swp_swap_info function
> > > > > that
> > > > > 79b5f08fa34e introduced is used by [1].
> > > > 
> > > > Gets a significant reject in do_swap_page().  Could you please
> > > > take a
> > > > look, redo against current mainline?
> > > > 
> > > > Or not.  We had a bug and James fixed it.  That's what -rc is
> > > > for.  Why not fix the thing and proceed?
> > > 
> > > My main worry was lack of testing at -rc5, since the bug could
> > > essentially be excited by pushing pages out to swap and then trying
> > > to
> > > access them again ... plus since one serious bug was discovered it
> > > wouldn't be unusual for there to be others.  However, because of
> > > the
> > > IPT stuff, I think Linus is going to take 4.15 over a couple of
> > > extra
> > > -rc releases, so this is less of a problem.
> > 
> > Then, Here is right fix patch against current mainline.
> > 
> > 
> > From 012bdb0774744455ab7aa8abd74c8b9ca1cdc009 Mon Sep 17 00:00:00
> > 2001
> > From: Minchan Kim <minchan@kernel.org>
> > Date: Wed, 3 Jan 2018 08:25:15 +0900
> > Subject: [PATCH] mm: release locked page in do_swap_page
> > 
> > James reported a bug of swap paging-in for his testing. It is that
> > do_swap_page doesn't release locked page so system hang-up happens
> > by deadlock of PG_locked.
> > 
> > It was introduced by [1] because I missed swap cache hit places to
> > update swapcache variable to work well with other logics against
> > swapcache in do_swap_page.
> > 
> > This patch fixes it.
> > 
> > [1] 0bcac06f27d7, mm, swap: skip swapcache for swapin of synchronous
> > device
> > 
> > Link: http://lkml.kernel.org/r/<1514407817.4169.4.camel@HansenPartner
> > ship.com>;
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > Cc: Huang Ying <ying.huang@intel.com>
> > Debugged-by: James Bottomley <James.Bottomley@hansenpartnership.com>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  mm/memory.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index ca5674cbaff2..793004608332 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2857,8 +2857,11 @@ int do_swap_page(struct vm_fault *vmf)
> >  	int ret = 0;
> >  	bool vma_readahead = swap_use_vma_readahead();
> >  
> > -	if (vma_readahead)
> > +	if (vma_readahead) {
> >  		page = swap_readahead_detect(vmf, &swap_ra);
> > +		swapcache = page;
> > +	}
> > +
> >  	if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf-
> > >orig_pte)) {
> >  		if (page)
> >  			put_page(page);
> > @@ -2889,9 +2892,12 @@ int do_swap_page(struct vm_fault *vmf)
> >  
> >  
> >  	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> > -	if (!page)
> > +	if (!page) {
> >  		page = lookup_swap_cache(entry, vma_readahead ? vma
> > : NULL,
> >  					 vmf->address);
> > +		swapcache = page;
> > +	}
> > +
> 
> I've got to say I prefer my version.  The problem with the above is

Unfortunately, it was not right fix because with synchronous device,
we skip swapcache but your patch set the swapcache variable
unconditionally. So, it doesn't work by below logic, IOW, it goes
with out_page jump.

        /*
         * Make sure try_to_free_swap or reuse_swap_page or swapoff did not
         * release the swapcache from under us.  The page pin, and pte_same
         * test below, are not enough to exclude that.  Even if it is still
         * swapcache, we need to check that the page's swap has not changed.
         */
        if (unlikely((!PageSwapCache(page) ||
                        page_private(page) != entry.val)) && swapcache)
                goto out_page;


> that if something else gets added to this path and forgets to set
> swapcache = page you'll get the locked pages problem back.
> 
> Instead of setting swapcache to NULL at the top, don't set it until it
> matters, which is just before the second if (!page).  It doesn't matter
> before this because you're using it as a signal for the synchronous I/O
> path, so why have a whole section of code where you invite people to
> get it wrong for no benefit.
> 

Yub, it doesn't looks neat. But please see below patch I am planning
to send to Andrew for current mmotm. It looks better because in the
head of do_swap_page, we set swapcache unconditionally and others
logic work well due to patchset vma-based readahead cleanup.

If you don't against, I want to go my patch into current mainline
and Andrew can apply below patch into current mmots.

Thanks.

      reply index

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-29  0:55 Minchan Kim
2018-01-02 21:22 ` Andrew Morton
2018-01-02 22:42   ` James Bottomley
2018-01-02 23:56     ` Minchan Kim
2018-01-03  0:11       ` James Bottomley
2018-01-03  0:24         ` Minchan Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180103002422.GA20500@bbox \
    --to=minchan@kernel.org \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=idryomov@gmail.com \
    --cc=kernel-team@lge.com \
    --cc=linux-mm@kvack.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git