All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kani, Toshi" <toshi.kani@hpe.com>
To: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"gratian.crisan@ni.com" <gratian.crisan@ni.com>
Cc: "mingo@kernel.org" <mingo@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"julia.cartwright@ni.com" <julia.cartwright@ni.com>,
	"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"bp@suse.de" <bp@suse.de>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"brgerst@gmail.com" <brgerst@gmail.com>,
	"luto@kernel.org" <luto@kernel.org>,
	"dave.hansen@intel.com" <dave.hansen@intel.com>,
	"dvlasenk@redhat.com" <dvlasenk@redhat.com>,
	"gratian@gmail.com" <gratian@gmail.com>
Subject: Re: Kernel page fault in vmalloc_fault() after a preempted ioremap
Date: Thu, 8 Mar 2018 21:43:25 +0000	[thread overview]
Message-ID: <1520548101.2693.106.camel@hpe.com> (raw)
In-Reply-To: <87a7vi1f3h.fsf@kerf.amer.corp.natinst.com>

On Thu, 2018-03-08 at 14:34 -0600, Gratian Crisan wrote:
> Hi all,
> 
> We are seeing kernel page faults happening on module loads with certain
> drivers like the i915 video driver[1]. This was initially discovered on
> a 4.9 PREEMPT_RT kernel. It takes 5 days on average to reproduce using a
> simple reboot loop test. Looking at the code paths involved I believe
> the issue is still present in the latest vanilla kernel.
> 
> Some relevant points are:
> 
>   * x86_64 CPU: Intel Atom E3940
> 
>   * CONFIG_HUGETLBFS is not set (which also gates CONFIG_HUGETLB_PAGE)
> 
> Based on function traces I was able to gather the sequence of events is:
> 
>   1. Driver starts a ioremap operation for a region that is PMD_SIZE in
>   size (or PUD_SIZE).
> 
>   2. The ioremap() operation is preempted while it's in the middle of
>   setting up the page mappings:
>   ioremap_page_range->...->ioremap_pmd_range->pmd_set_huge <<preempted>>
> 
>   3. Unrelated tasks run. Traces also include some cross core scheduling
>   IPI calls.
> 
>   4. Driver resumes execution finishes the ioremap operation and tries to
>   access the newly mapped IO region. This triggers a vmalloc fault.
> 
>   5. The vmalloc_fault() function hits a kernel page fault when trying to
>   dereference a non-existent *pte_ref.
> 
> The reason this happens is the code paths called from ioremap_page_range()
> make different assumptions about when a large page (pud/pmd) mapping can be
> used versus the code paths in vmalloc_fault().
> 
> Using the PMD sized ioremap case as an example (the PUD case is similar):
> ioremap_pmd_range() calls ioremap_pmd_enabled() which is gated by
> CONFIG_HAVE_ARCH_HUGE_VMAP. On x86_64 this will return true unless the
> "nohugeiomap" kernel boot parameter is passed in.
> 
> On the other hand, in the rare case when a page fault happens in the
> ioremap'ed region, vmalloc_fault() calls the pmd_huge() function to check
> if a PMD page is marked huge or if it should go on and get a reference to
> the PTE. However pmd_huge() is conditionally compiled based on the user
> configured CONFIG_HUGETLB_PAGE selected by CONFIG_HUGETLBFS. If the
> CONFIG_HUGETLBFS option is not enabled pmd_huge() is always defined to be
> 0.
> 
> The end result is an OOPS in vmalloc_fault() when the non-existent pte_ref
> is dereferenced because the test for pmd_huge() failed.
> 
> Commit f4eafd8bcd52 ("x86/mm: Fix vmalloc_fault() to handle large pages
> properly") attempted to fix the mismatch between ioremap() and
> vmalloc_fault() with regards to huge page handling but it missed this use
> case.
> 
> I am working on a simpler reproducing case however so far I've been
> unsuccessful in re-creating the conditions that trigger the vmalloc fault
> in the first place. Adding explicit scheduling points in
> ioremap_pmd_range/pmd_set_huge doesn't seem to be sufficient. Ideas
> appreciated.
> 
> Any thoughts on what a correct fix would look like? Should the ioremap
> code paths respect the HUGETLBFS config or would it be better for the
> vmalloc fault code paths to match the tests used in ioremap and not rely
> on the HUGETLBFS option being enabled?

Thanks for the report and analysis!  I believe pud_large() and
pmd_large() should have been used here.  I will try to reproduce the
issue and verify the fix.

-Toshi

  reply	other threads:[~2018-03-08 21:43 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-08 20:34 Kernel page fault in vmalloc_fault() after a preempted ioremap Gratian Crisan
2018-03-08 21:43 ` Kani, Toshi [this message]
2018-03-08 22:38   ` Andy Lutomirski
2018-03-08 23:08     ` Kani, Toshi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1520548101.2693.106.camel@hpe.com \
    --to=toshi.kani@hpe.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@suse.de \
    --cc=brgerst@gmail.com \
    --cc=dave.hansen@intel.com \
    --cc=dvlasenk@redhat.com \
    --cc=gratian.crisan@ni.com \
    --cc=gratian@gmail.com \
    --cc=hpa@zytor.com \
    --cc=julia.cartwright@ni.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.