All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Shaohua Li <shli@kernel.org>,
	sasha.levin@oracle.com, linux-mm@kvack.org,
	akpm@linux-foundation.org, mpm@selenic.com, cpw@sgi.com,
	kosaki.motohiro@jp.fujitsu.com, hannes@cmpxchg.org,
	kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz,
	aneesh.kumar@linux.vnet.ibm.com, xemul@parallels.com,
	riel@redhat.com, kirill.shutemov@linux.intel.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] madvise: fix locking in force_swapin_readahead() (Re: [PATCH 08/11] madvise: redefine callback functions for page table walker)
Date: Thu, 20 Mar 2014 22:16:21 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.1403202159190.1488@eggly.anvils> (raw)
In-Reply-To: <532ba74e.48c70e0a.7b9e.119cSMTPIN_ADDED_BROKEN@mx.google.com>

On Thu, 20 Mar 2014, Naoya Horiguchi wrote:
> On Thu, Mar 20, 2014 at 09:47:04PM -0400, Sasha Levin wrote:
> > On 02/10/2014 04:44 PM, Naoya Horiguchi wrote:
> > >swapin_walk_pmd_entry() is defined as pmd_entry(), but it has no code
> > >about pmd handling (except pmd_none_or_trans_huge_or_clear_bad, but the
> > >same check are now done in core page table walk code).
> > >So let's move this function on pte_entry() as swapin_walk_pte_entry().
> > >
> > >Signed-off-by: Naoya Horiguchi<n-horiguchi@ah.jp.nec.com>
> > 
> > This patch seems to generate:
> 
> Sasha, thank you for reporting.
> I forgot to unlock ptlock before entering read_swap_cache_async() which
> holds page lock in it, as a result lock ordering rule (written in mm/rmap.c)
> was violated (we should take in the order of mmap_sem -> page lock -> ptlock.)
> The following patch should fix this. Could you test with it?
> 
> ---
> From c0d56af5874dc40467c9b3a0f9e53b39b3c4f1c5 Mon Sep 17 00:00:00 2001
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Date: Thu, 20 Mar 2014 22:30:51 -0400
> Subject: [PATCH] madvise: fix locking in force_swapin_readahead()
> 
> We take mmap_sem and ptlock in walking over ptes with swapin_walk_pte_entry(),
> but inside it we call read_swap_cache_async() which holds page lock.
> So we should unlock ptlock to call read_swap_cache_async() to meet lock order
> rule (mmap_sem -> page lock -> ptlock).
> 
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

NAK.  You are now unlocking and relocking the spinlock, good; but on
arm frv or i386 CONFIG_HIGHPTE you are leaving the page table atomically
kmapped across read_swap_cache_async(), which (never mind lock ordering)
is quite likely to block waiting to allocate memory.

I do not see
madvise-redefine-callback-functions-for-page-table-walker.patch
as an improvement.  I can see what's going on in Shaohua's original
code, whereas this style makes bugs more likely.  Please drop it.

Hugh

> ---
>  mm/madvise.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 5e957b984c14..ed9c31e3b5ff 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -141,24 +141,35 @@ static int swapin_walk_pte_entry(pte_t *pte, unsigned long start,
>  	swp_entry_t entry;
>  	struct page *page;
>  	struct vm_area_struct *vma = walk->vma;
> +	spinlock_t *ptl = (spinlock_t *)walk->private;
>  
>  	if (pte_present(*pte) || pte_none(*pte) || pte_file(*pte))
>  		return 0;
>  	entry = pte_to_swp_entry(*pte);
>  	if (unlikely(non_swap_entry(entry)))
>  		return 0;
> +	spin_unlock(ptl);
>  	page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
>  				     vma, start);
> +	spin_lock(ptl);
>  	if (page)
>  		page_cache_release(page);
>  	return 0;
>  }
>  
> +static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
> +	unsigned long end, struct mm_walk *walk)
> +{
> +	walk->private = pte_lockptr(walk->mm, pmd);
> +	return 0;
> +}
> +
>  static void force_swapin_readahead(struct vm_area_struct *vma,
>  		unsigned long start, unsigned long end)
>  {
>  	struct mm_walk walk = {
>  		.mm = vma->vm_mm,
> +		.pmd_entry = swapin_walk_pmd_entry,
>  		.pte_entry = swapin_walk_pte_entry,
>  	};
>  
> -- 
> 1.8.5.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

WARNING: multiple messages have this Message-ID (diff)
From: Hugh Dickins <hughd@google.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Shaohua Li <shli@kernel.org>,
	sasha.levin@oracle.com, linux-mm@kvack.org,
	akpm@linux-foundation.org, mpm@selenic.com, cpw@sgi.com,
	kosaki.motohiro@jp.fujitsu.com, hannes@cmpxchg.org,
	kamezawa.hiroyu@jp.fujitsu.com, mhocko@suse.cz,
	aneesh.kumar@linux.vnet.ibm.com, xemul@parallels.com,
	riel@redhat.com, kirill.shutemov@linux.intel.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] madvise: fix locking in force_swapin_readahead() (Re: [PATCH 08/11] madvise: redefine callback functions for page table walker)
Date: Thu, 20 Mar 2014 22:16:21 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.1403202159190.1488@eggly.anvils> (raw)
In-Reply-To: <532ba74e.48c70e0a.7b9e.119cSMTPIN_ADDED_BROKEN@mx.google.com>

On Thu, 20 Mar 2014, Naoya Horiguchi wrote:
> On Thu, Mar 20, 2014 at 09:47:04PM -0400, Sasha Levin wrote:
> > On 02/10/2014 04:44 PM, Naoya Horiguchi wrote:
> > >swapin_walk_pmd_entry() is defined as pmd_entry(), but it has no code
> > >about pmd handling (except pmd_none_or_trans_huge_or_clear_bad, but the
> > >same check are now done in core page table walk code).
> > >So let's move this function on pte_entry() as swapin_walk_pte_entry().
> > >
> > >Signed-off-by: Naoya Horiguchi<n-horiguchi@ah.jp.nec.com>
> > 
> > This patch seems to generate:
> 
> Sasha, thank you for reporting.
> I forgot to unlock ptlock before entering read_swap_cache_async() which
> holds page lock in it, as a result lock ordering rule (written in mm/rmap.c)
> was violated (we should take in the order of mmap_sem -> page lock -> ptlock.)
> The following patch should fix this. Could you test with it?
> 
> ---
> From c0d56af5874dc40467c9b3a0f9e53b39b3c4f1c5 Mon Sep 17 00:00:00 2001
> From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Date: Thu, 20 Mar 2014 22:30:51 -0400
> Subject: [PATCH] madvise: fix locking in force_swapin_readahead()
> 
> We take mmap_sem and ptlock in walking over ptes with swapin_walk_pte_entry(),
> but inside it we call read_swap_cache_async() which holds page lock.
> So we should unlock ptlock to call read_swap_cache_async() to meet lock order
> rule (mmap_sem -> page lock -> ptlock).
> 
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

NAK.  You are now unlocking and relocking the spinlock, good; but on
arm frv or i386 CONFIG_HIGHPTE you are leaving the page table atomically
kmapped across read_swap_cache_async(), which (never mind lock ordering)
is quite likely to block waiting to allocate memory.

I do not see
madvise-redefine-callback-functions-for-page-table-walker.patch
as an improvement.  I can see what's going on in Shaohua's original
code, whereas this style makes bugs more likely.  Please drop it.

Hugh

> ---
>  mm/madvise.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 5e957b984c14..ed9c31e3b5ff 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -141,24 +141,35 @@ static int swapin_walk_pte_entry(pte_t *pte, unsigned long start,
>  	swp_entry_t entry;
>  	struct page *page;
>  	struct vm_area_struct *vma = walk->vma;
> +	spinlock_t *ptl = (spinlock_t *)walk->private;
>  
>  	if (pte_present(*pte) || pte_none(*pte) || pte_file(*pte))
>  		return 0;
>  	entry = pte_to_swp_entry(*pte);
>  	if (unlikely(non_swap_entry(entry)))
>  		return 0;
> +	spin_unlock(ptl);
>  	page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
>  				     vma, start);
> +	spin_lock(ptl);
>  	if (page)
>  		page_cache_release(page);
>  	return 0;
>  }
>  
> +static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
> +	unsigned long end, struct mm_walk *walk)
> +{
> +	walk->private = pte_lockptr(walk->mm, pmd);
> +	return 0;
> +}
> +
>  static void force_swapin_readahead(struct vm_area_struct *vma,
>  		unsigned long start, unsigned long end)
>  {
>  	struct mm_walk walk = {
>  		.mm = vma->vm_mm,
> +		.pmd_entry = swapin_walk_pmd_entry,
>  		.pte_entry = swapin_walk_pte_entry,
>  	};
>  
> -- 
> 1.8.5.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-03-21  5:17 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-10 21:44 [PATCH 00/11 v5] update page table walker Naoya Horiguchi
2014-02-10 21:44 ` Naoya Horiguchi
2014-02-10 21:44 ` [PATCH 01/11] pagewalk: update page table walker core Naoya Horiguchi
2014-02-10 21:44   ` Naoya Horiguchi
2014-02-12  5:39   ` Joonsoo Kim
2014-02-12  5:39     ` Joonsoo Kim
2014-02-12 15:40     ` Naoya Horiguchi
2014-02-20 23:47   ` Sasha Levin
2014-02-20 23:47     ` Sasha Levin
2014-02-21  3:20     ` Naoya Horiguchi
2014-02-21  4:30     ` Sasha Levin
2014-02-21  4:30       ` Sasha Levin
     [not found]     ` <5306c629.012ce50a.6c48.ffff9844SMTPIN_ADDED_BROKEN@mx.google.com>
2014-02-21  6:43       ` Sasha Levin
2014-02-21  6:43         ` Sasha Levin
2014-02-21 16:35         ` Naoya Horiguchi
     [not found]         ` <1393000553-ocl81482@n-horiguchi@ah.jp.nec.com>
2014-02-21 16:50           ` Sasha Levin
2014-06-02 23:49   ` Dave Hansen
2014-06-02 23:49     ` Dave Hansen
2014-06-03  0:29     ` Naoya Horiguchi
2014-02-10 21:44 ` [PATCH 02/11] pagewalk: add walk_page_vma() Naoya Horiguchi
2014-02-10 21:44   ` Naoya Horiguchi
2014-02-10 21:44 ` [PATCH 03/11] smaps: redefine callback functions for page table walker Naoya Horiguchi
2014-02-10 21:44   ` Naoya Horiguchi
2014-02-10 21:44 ` [PATCH 04/11] clear_refs: " Naoya Horiguchi
2014-02-10 21:44   ` Naoya Horiguchi
2014-02-10 21:44 ` [PATCH 05/11] pagemap: " Naoya Horiguchi
2014-02-10 21:44   ` Naoya Horiguchi
2014-02-10 21:44 ` [PATCH 06/11] numa_maps: " Naoya Horiguchi
2014-02-10 21:44   ` Naoya Horiguchi
2014-02-10 21:44 ` [PATCH 07/11] memcg: " Naoya Horiguchi
2014-02-10 21:44   ` Naoya Horiguchi
2014-02-10 21:44 ` [PATCH 08/11] madvise: " Naoya Horiguchi
2014-02-10 21:44   ` Naoya Horiguchi
2014-03-21  1:47   ` Sasha Levin
2014-03-21  1:47     ` Sasha Levin
2014-03-21  2:43     ` [PATCH] madvise: fix locking in force_swapin_readahead() (Re: [PATCH 08/11] madvise: redefine callback functions for page table walker) Naoya Horiguchi
2014-03-21  5:16       ` Hugh Dickins [this message]
2014-03-21  5:16         ` Hugh Dickins
2014-03-21  6:22         ` Naoya Horiguchi
2014-02-10 21:44 ` [PATCH 09/11] arch/powerpc/mm/subpage-prot.c: use walk_page_vma() instead of walk_page_range() Naoya Horiguchi
2014-02-10 21:44   ` Naoya Horiguchi
2014-02-10 21:44 ` [PATCH 10/11] pagewalk: remove argument hmask from hugetlb_entry() Naoya Horiguchi
2014-02-10 21:44   ` Naoya Horiguchi
2014-02-10 21:44 ` [PATCH 11/11] mempolicy: apply page table walker on queue_pages_range() Naoya Horiguchi
2014-02-10 21:44   ` Naoya Horiguchi
2014-02-21  6:30   ` Sasha Levin
2014-02-21  6:30     ` Sasha Levin
2014-02-21 16:58     ` Naoya Horiguchi
     [not found]     ` <530785b2.d55c8c0a.3868.ffffa4e1SMTPIN_ADDED_BROKEN@mx.google.com>
2014-02-21 17:18       ` Sasha Levin
2014-02-21 17:18         ` Sasha Levin
2014-02-21 17:25         ` Naoya Horiguchi
     [not found]         ` <1393003512-qjyhnu0@n-horiguchi@ah.jp.nec.com>
2014-02-23 13:04           ` Sasha Levin
2014-02-23 13:04             ` Sasha Levin
2014-02-23 18:59             ` Naoya Horiguchi
2014-02-10 22:42 ` [PATCH 00/11 v5] update page table walker Andrew Morton
2014-02-10 22:42   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1403202159190.1488@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cpw@sgi.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=mpm@selenic.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=shli@kernel.org \
    --cc=xemul@parallels.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.