linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	torvalds@linux-foundation.org, kirill.shutemov@linux.intel.com,
	akpm@linux-foundation.org, hannes@cmpxchg.org,
	iamjoonsoo.kim@lge.com, mgorman@techsingularity.net,
	tony.luck@intel.com, vbabka@suse.cz, aarcange@redhat.com,
	hillf.zj@alibaba-inc.com, hughd@google.com, oleg@redhat.com,
	peterz@infradead.org, riel@redhat.com, srikar@linux.vnet.ibm.com,
	vdavydov.dev@gmail.com, mingo@kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org
Subject: Re: [mm 4.15-rc8] Random oopses under memory pressure.
Date: Fri, 19 Jan 2018 13:07:47 +0100	[thread overview]
Message-ID: <20180119120747.GV6584@dhcp22.suse.cz> (raw)
In-Reply-To: <20180119114917.rvghcgexgbm73xkq@node.shutemov.name>

On Fri 19-01-18 14:49:17, Kirill A. Shutemov wrote:
> On Fri, Jan 19, 2018 at 11:33:42AM +0100, Michal Hocko wrote:
> > On Fri 19-01-18 13:02:59, Kirill A. Shutemov wrote:
> > > On Thu, Jan 18, 2018 at 06:22:13PM +0100, Michal Hocko wrote:
> > > > On Thu 18-01-18 18:40:26, Kirill A. Shutemov wrote:
> > > > [...]
> > > > > +	/*
> > > > > +	 * Make sure that pages are in the same section before doing pointer
> > > > > +	 * arithmetics.
> > > > > +	 */
> > > > > +	if (page_to_section(pvmw->page) != page_to_section(page))
> > > > > +		return false;
> > > > 
> > > > OK, THPs shouldn't cross memory sections AFAIK. My brain is meltdown
> > > > these days so this might be a completely stupid question. But why don't
> > > > you simply compare pfns? This would be just simpler, no?
> > > 
> > > In original code, we already had pvmw->page around and I thought it would
> > > be easier to get page for the pte intead of looking for pfn for both
> > > sides.
> > > 
> > > We these changes it's no longer the case.
> > > 
> > > Do you care enough to send a patch? :)
> > 
> > Well, memory sections are sparsemem concept IIRC. Unless I've missed
> > something page_to_section is quarded by SECTION_IN_PAGE_FLAGS and that
> > is conditional to CONFIG_SPARSEMEM. THP is a generic code so using it
> > there is wrong unless I miss some subtle detail here.
> > 
> > Comparing pfn should be generic enough.
> 
> Good point.
> 
> What about something like this?
> 
> >From 861f68c555b87fd6c0ccc3428ace91b7e185b73a Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Thu, 18 Jan 2018 18:24:07 +0300
> Subject: [PATCH] mm, page_vma_mapped: Drop faulty pointer arithmetics in
>  check_pte()
> 
> Tetsuo reported random crashes under memory pressure on 32-bit x86
> system and tracked down to change that introduced
> page_vma_mapped_walk().
> 
> The root cause of the issue is the faulty pointer math in check_pte().
> As ->pte may point to an arbitrary page we have to check that they are
> belong to the section before doing math. Otherwise it may lead to weird
> results.
> 
> It wasn't noticed until now as mem_map[] is virtually contiguous on flatmem or
> vmemmap sparsemem. Pointer arithmetic just works against all 'struct page'
> pointers. But with classic sparsemem, it doesn't.

it doesn't because each section memap is allocated separately and so
consecutive pfns crossing two sections might have struct pages at
completely unrelated addresses.

> Let's restructure code a bit and replace pointer arithmetic with
> operations on pfns.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
> Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
> Cc: stable@vger.kernel.org
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

The patch makes sense but there is one more thing to fix here.

[...]
>  static bool check_pte(struct page_vma_mapped_walk *pvmw)
>  {
> +	unsigned long pfn;
> +
>  	if (pvmw->flags & PVMW_MIGRATION) {
>  #ifdef CONFIG_MIGRATION
>  		swp_entry_t entry;
> @@ -41,37 +61,34 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw)
>  
>  		if (!is_migration_entry(entry))
>  			return false;
> -		if (migration_entry_to_page(entry) - pvmw->page >=
> -				hpage_nr_pages(pvmw->page)) {
> -			return false;
> -		}
> -		if (migration_entry_to_page(entry) < pvmw->page)
> -			return false;
> +
> +		pfn = migration_entry_to_pfn(entry);
>  #else
>  		WARN_ON_ONCE(1);
>  #endif
> -	} else {

now you allow to pass through with uninitialized pfn. We used to return
true in that case so we should probably keep it in this WARN_ON_ONCE
case. Please note that I haven't studied this particular case and the
ifdef is definitely not an act of art but that is a separate topic.

> -		if (is_swap_pte(*pvmw->pte)) {
> -			swp_entry_t entry;
> +	} else if (is_swap_pte(*pvmw->pte)) {
> +		swp_entry_t entry;
>  
> -			entry = pte_to_swp_entry(*pvmw->pte);
> -			if (is_device_private_entry(entry) &&
> -			    device_private_entry_to_page(entry) == pvmw->page)
> -				return true;
> -		}
> +		/* Handle un-addressable ZONE_DEVICE memory */
> +		entry = pte_to_swp_entry(*pvmw->pte);
> +		if (!is_device_private_entry(entry))
> +			return false;
>  
> +		pfn = device_private_entry_to_pfn(entry);
> +	} else {
>  		if (!pte_present(*pvmw->pte))
>  			return false;
>  
> -		/* THP can be referenced by any subpage */
> -		if (pte_page(*pvmw->pte) - pvmw->page >=
> -				hpage_nr_pages(pvmw->page)) {
> -			return false;
> -		}
> -		if (pte_page(*pvmw->pte) < pvmw->page)
> -			return false;
> +		pfn = pte_pfn(*pvmw->pte);
>  	}
>  
> +	if (pfn < page_to_pfn(pvmw->page))
> +		return false;
> +
> +	/* THP can be referenced by any subpage */
> +	if (pfn - page_to_pfn(pvmw->page) >= hpage_nr_pages(pvmw->page))
> +		return false;
> +
>  	return true;
>  }
>  
> -- 
>  Kirill A. Shutemov

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2018-01-19 12:07 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-05 14:45 [x86? mm? fs? 4.15-rc6] Random oopses by simple write under memory pressure Tetsuo Handa
2018-01-09 10:39 ` [mm? 4.15-rc7] " Tetsuo Handa
2018-01-10 11:49   ` [mm? 4.15-rc7] Random oopses " Tetsuo Handa
2018-01-10 12:45     ` Michal Hocko
2018-01-10 13:37       ` Tetsuo Handa
2018-01-11 13:57         ` Michal Hocko
2018-01-11 14:11           ` Tetsuo Handa
2018-01-11 14:21             ` Michal Hocko
2018-01-11 14:37               ` Tetsuo Handa
2018-01-12  1:31               ` [mm " Tetsuo Handa
2018-01-12  1:42                 ` Linus Torvalds
2018-01-12 11:22                   ` Tetsuo Handa
2018-01-14 11:54                     ` Tetsuo Handa
2018-01-15 23:05                       ` Linus Torvalds
2018-01-16  1:15                         ` [mm 4.15-rc8] " Tetsuo Handa
2018-01-16  2:14                           ` Linus Torvalds
2018-01-16  8:06                             ` Dave Hansen
2018-01-16  8:37                               ` Ingo Molnar
2018-01-16 19:30                               ` Linus Torvalds
2018-01-16 17:33                             ` Tetsuo Handa
2018-01-16 19:34                               ` Linus Torvalds
2018-01-17 11:08                                 ` Tetsuo Handa
2018-01-17 21:39                                   ` Linus Torvalds
2018-01-17 21:51                                     ` Linus Torvalds
2018-01-17 22:04                                       ` Dave Hansen
2018-01-17 22:00                                     ` Dave Hansen
2018-01-17 22:15                                       ` Linus Torvalds
2018-01-18  8:12                                   ` Tetsuo Handa
2018-01-18 12:25                                     ` Kirill A. Shutemov
2018-01-18 13:12                                       ` Kirill A. Shutemov
2018-01-18 14:34                                         ` Kirill A. Shutemov
2018-01-18 14:38                                         ` Dave Hansen
2018-01-18 14:45                                           ` Kirill A. Shutemov
2018-01-18 14:51                                             ` Dave Hansen
2018-01-18 16:58                                           ` Linus Torvalds
2018-01-18 14:45                                       ` Dave Hansen
2018-01-18 14:58                                         ` Andrea Arcangeli
2018-01-18 16:56                                           ` Kirill A. Shutemov
2018-01-18 17:26                                             ` Luck, Tony
2018-01-18 17:28                                               ` Linus Torvalds
2018-01-18 17:26                                             ` Linus Torvalds
2018-01-18 23:49                                               ` Kirill A. Shutemov
2018-01-19 12:55                                                 ` Matthew Wilcox
2018-01-19 18:42                                                   ` Linus Torvalds
2018-01-19 22:12                                                     ` Al Viro
2018-01-19 22:53                                                       ` Linus Torvalds
2018-01-20  2:02                                                         ` Al Viro
2018-01-20  5:24                                                           ` Al Viro
2018-01-20  9:38                                                             ` Luc Van Oostenryck
2018-01-18 15:40                                         ` Kirill A. Shutemov
2018-01-18 17:22                                           ` Michal Hocko
2018-01-19 10:02                                             ` Kirill A. Shutemov
2018-01-19 10:33                                               ` Michal Hocko
2018-01-19 11:49                                                 ` Kirill A. Shutemov
2018-01-19 12:07                                                   ` Michal Hocko [this message]
2018-01-19 12:30                                                     ` Kirill A. Shutemov
2018-01-19  2:01                                           ` Tetsuo Handa
2018-01-11 18:11             ` [mm? 4.15-rc7] " Linus Torvalds
2018-01-11 20:59               ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180119120747.GV6584@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=vdavydov.dev@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).