linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>,
	hch@lst.de, airlied@linux.ie, akpm@linux-foundation.org,
	torvalds@linux-foundation.org, ak@suse.de, pq@iki.fi,
	jbeulich@novell.com, linux-kernel@vger.kernel.org,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	Masami Hiramatsu <mhiramat@redhat.com>
Subject: Re: [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft"
Date: Wed, 9 Jan 2008 11:52:54 -0500	[thread overview]
Message-ID: <20080109165254.GA4912@Krystal> (raw)
In-Reply-To: <20080109104149.GB8041@elte.hu>

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> (kprobes folks Cc:-ed)
> 
> * David Miller <davem@davemloft.net> wrote:
> 
> > From: Christoph Hellwig <hch@lst.de>
> > Date: Wed, 9 Jan 2008 08:19:45 +0100
> > 
> > > On Wed, Jan 09, 2008 at 03:55:20AM +0000, Dave Airlie wrote:
> > > > now because Linus said send him a patch to revert regressions rather than 
> > > > just complain,
> > > 
> > > this is not a regression by any definition.  You were abusing 
> > > exported symbols for out of tree junk, so you'll lose.
> > 
> > And furthermore, they don't even need it, use a kprobe.
> 
> i agree. There a few practical complication on x86: the do_page_fault() 
> function is currently excluded from kprobe probing, for recursion 
> reasons. handle_mm_fault() can be probed OTOH - but that does not catch 
> vmalloc()-ed faults. The middle of do_page_fault() [line 348] should 
> work better [the point after notify_page_fault()] - but it's usually 
> more fragile to insert probes to such middle-of-the-function places.
> 
> So probing pagefaults is not as easy as it should/could be. We should 
> put a practical NOP marker to around line 348, to make it easier (and 
> faster) for systemtap to probe there.
> 
> (__kprobes is a highly confusing newspeak name btw - it should be 
> __noprobe instead.)
> 
> 	Ingo


Probing vmalloc faults is _really_ tricky : it also implies that the
handler (let's call it probe) connected to the probe point (marker or
kprobe) should _never_ cause a vmalloc page fault, it should therefore
never touch vmalloc'd memory, which is a very restrictive constraint,
especially for tracing which may need large buffers (the only sane
alternative is to allocate the buffers statically before the kernel
boots).

As for the location of the probe point, we have to determine how we want
to handle a OOPSing probe. If we put the probe point too soon in the
do_page_fault function, we will end up doing recursive page_fault rather
than a OOPS, which may make things harder to debug.

In the LTTng instrumentation, I volountarily excluded the
bad_area and bad_area_nosemaphore paths from the page fault
instrumentation for this exact reason.

Currently, I have markers around the handle_mm_fault call :

        trace_mark(kernel_arch_trap_entry, "trap_id %d ip #p%ld",
                14, instruction_pointer(regs));
        fault = handle_mm_fault(mm, vma, address, write);
        trace_mark(kernel_arch_trap_exit, MARK_NOARGS);

I also instrument handle_mm_fault, but I leave these markers in
do_page_fault to get the architecture specific trap id (the "trap_entry"
and "trap_exit" events) and the instruction pointer causing the fault.


My handle_mm_fault instrumentation :

(note that handle_mm_fault is also called by get_user_pages, not only
do_page_fault)

int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
                unsigned long address, int write_access)
{
        int res;
        pgd_t *pgd;
        pud_t *pud;
        pmd_t *pmd;
        pte_t *pte;

        trace_mark(mm_handle_fault_entry,
                "address %lu ip #p%ld write_access %d",
                address, KSTK_EIP(current), write_access);

        __set_current_state(TASK_RUNNING);

        count_vm_event(PGFAULT);

        if (unlikely(is_vm_hugetlb_page(vma))) {
                res = hugetlb_fault(mm, vma, address, write_access);
                goto end;
        }

        pgd = pgd_offset(mm, address);
        pud = pud_alloc(mm, pgd, address);
        if (!pud) {
                res = VM_FAULT_OOM;
                goto end;
        }
        pmd = pmd_alloc(mm, pud, address);
        if (!pmd) {
                res = VM_FAULT_OOM;
                goto end;
        }
        pte = pte_alloc_map(mm, pmd, address);
        if (!pte) {
                res = VM_FAULT_OOM;
                goto end;
        }

        res = handle_pte_fault(mm, vma, address, pte, pmd, write_access);
end:
        trace_mark(mm_handle_fault_exit, MARK_NOARGS);
        return res;
}

Mathieu


-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2008-01-09 17:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-09  2:34 [PATCH] Revert "x86: optimize page faults like all other achitectures and kill notifier cruft" Dave Airlie
2008-01-09  2:37 ` Andi Kleen
2008-01-09  3:06 ` Andrew Morton
2008-01-09  3:17   ` Dave Airlie
2008-01-09  3:29     ` Andrew Morton
2008-01-09  3:55       ` Dave Airlie
2008-01-09  7:19         ` Christoph Hellwig
2008-01-09  7:23           ` David Miller
2008-01-09 10:41             ` Ingo Molnar
2008-01-09 16:52               ` Mathieu Desnoyers [this message]
2008-01-09 19:07                 ` Andi Kleen
2008-01-09 20:01               ` Pekka Paalanen
2008-01-09 20:08                 ` Andi Kleen
2008-01-09  4:17       ` Andi Kleen
2008-01-09  9:12       ` Jan Beulich
2008-01-09 15:18     ` Arjan van de Ven
2008-01-09 15:21       ` Andi Kleen
2008-01-09 15:26         ` Ingo Molnar
2008-01-09  7:17 ` Christoph Hellwig
2008-01-09  7:22   ` David Miller
2008-01-09 13:19     ` Jiri Kosina
2008-01-09 13:48       ` David Miller
2008-01-09 14:23     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080109165254.GA4912@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=airlied@linux.ie \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=ananth@in.ibm.com \
    --cc=davem@davemloft.net \
    --cc=hch@lst.de \
    --cc=jbeulich@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@redhat.com \
    --cc=mingo@elte.hu \
    --cc=pq@iki.fi \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).