linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Dave Jones <davej@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Rik van Riel <riel@redhat.com>, Ingo Molnar <mingo@redhat.com>,
	Michel Lespinasse <walken@google.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Hugh Dickins <hughd@google.com>
Subject: Re: pipe/page fault oddness.
Date: Tue, 30 Sep 2014 11:58:00 -0700	[thread overview]
Message-ID: <CA+55aFzfvXHd2LUhQ5OiV1H1Oq2y3PL8hX_Hrv-C907PyDNugA@mail.gmail.com> (raw)
In-Reply-To: <20140930182059.GA24431@redhat.com>

On Tue, Sep 30, 2014 at 11:20 AM, Dave Jones <davej@redhat.com> wrote:
>
> page_fault_kernel:    address=__per_cpu_end ip=copy_page_to_iter error_code=0x2

Interesting. "error_code" in particular. The value "2" means that the
CPU thinks that the page is not present (bit zero is clear).

(That "address" is useless - it's tried to turn a user address into a
kernel symbol, and the percpu symbols are zero-based, so it picks the
last of them. The "ip" is useless too, since it doesn't give the
offset)

So the CPU thinks it's a write to a not-present page, which means that
_PAGE_PRESENT bit is clear.

Now the *kernel* thinks a page is present not just if _PAGE_PRESENT is
set, but also if _PAGE_PROTNONE or _PAGE_NUMA are set. Sadly, your
trace is not very useful, because inlining has caused pretty much all
the cases to be in "handle_mm_fault()", so the trace doesn't really
tell which path this all takes.

But we can still do *some* analysis on the trace: do_wp_page()
shouldn't have been inlined, so it would have shown up in the trace if
it had been called. So I think we can be pretty confident that the
ptep_set_access_flags() we see is the one from handle_pte_fault().

And if that is the case, then we know that "pte_present()" is indeed
true as far a the kernel is concerned. So with _PAGE_PRESENT not being
set (based on the error code), we know that _PAGE_PROTNONE must be
set, otherwise we'd have triggered the pte_numa() check and exited
through do_numa_page().

So it smells like we have a PROT_NONE VM area (at least the paeg table
entries imply that). But "access_error()" should have flagged that (it
checks "vma->vm_flags & VM_WRITE"). How do we have a page table entry
marked _PAGE_PROTNONE, but VM_WRITE set in the vma?

Or, possibly, we have some confusion about the page tables themselves
(corruption, wrong %cr3 value, whatever), explaining why the CPU
thinks one thing, but our software page table walker thinks another.

I'm not seeing how this all happens. But I'm adding Kirill to the cc,
since he might see something I missed, and he touched some of this
code last ("tag, you're it").

Kirill: the thread is on lkml, but basically it boils down to the
second byte write in fault_in_pages_writeable() faulting forever,
despite handle_mm_fault() apparently thinking that everything is fine.

Also adding Hugh Dickins, just because the more people who know this
code that are involved, the better.

              Linus

  reply	other threads:[~2014-09-30 18:58 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-30  3:33 pipe/page fault oddness Dave Jones
2014-09-30  4:27 ` Linus Torvalds
2014-09-30  4:33   ` Dave Jones
     [not found]     ` <CA+55aFwxdOBKHwwp7Zq1k19mHCyHYmYqigCVt59AtB-P7Zva1w@mail.gmail.com>
2014-09-30 15:52       ` Linus Torvalds
2014-09-30 16:03         ` Rik van Riel
2014-09-30 16:07           ` Dave Jones
2014-09-30 16:26           ` Linus Torvalds
2014-09-30 16:05         ` Dave Jones
2014-09-30 16:10           ` Linus Torvalds
2014-09-30 16:22             ` Dave Jones
2014-09-30 16:40               ` Dave Jones
2014-09-30 16:46                 ` Linus Torvalds
2014-09-30 18:20                   ` Dave Jones
2014-09-30 18:58                     ` Linus Torvalds [this message]
2014-10-01  8:19                       ` Hugh Dickins
2014-10-01 16:01                         ` Linus Torvalds
2014-10-01 16:18                           ` Linus Torvalds
2014-10-01 17:29                             ` Rik van Riel
2014-10-02  8:28                               ` Peter Zijlstra
2014-10-01 20:20                             ` Linus Torvalds
2014-10-01 21:09                               ` Rik van Riel
2014-10-01 22:08                               ` Sasha Levin
2014-10-01 22:28                                 ` Chuck Ebbert
2014-10-02  3:32                                   ` Sasha Levin
2014-10-02  8:03                                     ` Chuck Ebbert
2014-10-02 14:49                                       ` Sasha Levin
2014-10-01 22:42                                 ` Linus Torvalds
2014-10-02 14:25                                   ` Kirill A. Shutemov
2014-10-02 16:01                                     ` Linus Torvalds
2014-10-02 16:35                                       ` Kirill A. Shutemov
2014-10-02 15:04                                   ` Sasha Levin
2014-10-02 16:10                                     ` Linus Torvalds
2014-10-03  5:00                                       ` Sasha Levin
2014-10-03 15:43                                         ` Linus Torvalds
2014-10-03 15:58                                           ` Dave Jones
2014-10-03 16:02                                             ` Sasha Levin
2014-10-02 12:45                             ` Mel Gorman
2014-10-06 19:18                               ` Aneesh Kumar K.V
2014-10-07 12:45                                 ` Linus Torvalds
2014-10-08 10:37                                   ` Aneesh Kumar K.V
2014-10-02  8:47                           ` Hugh Dickins
2014-10-02 15:57                             ` Linus Torvalds
2014-09-30  4:35   ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFzfvXHd2LUhQ5OiV1H1Oq2y3PL8hX_Hrv-C907PyDNugA@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=riel@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).