linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>, Andi Kleen <ak@muc.de>,
	systemtap@sources.redhat.com, prasanna@in.ibm.com,
	anil.s.keshavamurthy@intel.com, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, hch@infradead.org,
	richardj_moore@uk.ibm.com, suparna@in.ibm.com
Subject: Re: [patch 05/10] Linux Kernel Markers - i386 optimized version
Date: Fri, 11 May 2007 14:55:14 -0400	[thread overview]
Message-ID: <20070511185514.GA29945@Krystal> (raw)
In-Reply-To: <20070511045729.GA8143@in.ibm.com>

* Ananth N Mavinakayanahalli (ananth@in.ibm.com) wrote:
> On Thu, May 10, 2007 at 12:59:18PM -0400, Mathieu Desnoyers wrote:
> > * Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
> 
> ...
> > > > * Third issue : Scalability. Changing code will stop every CPU on the
> > > >   system for a while. Compared to this, the int3-based approach will run
> > > >   through the breakpoint handler "if" one of the CPU happens to execute
> > > >   this code at the wrong time. The standard case is just an IPI (to
> > > 
> > > If I read the errata right then patching in an int3 will itself trigger
> > > the errata so anything could happen.
> > > 
> > > I believe there are other safe sequences for doing code patching - perhaps
> > > one of the Intel folk can advise ?
> 
> IIRC, when the first implementation of what exists now as kprobes was
> done (as part of the dprobes framework), this question did come up. I
> think the conclusion was that the errata applies only to multi-byte
> modifications and single-byte changes are guaranteed to be atomic.
> Given int3 on Intel is just 1-byte, we are safe.
> 
> > I'll let the Intel guys confirm this, I don't have the reference nearby
> > (I got this information by talking with the kprobe team members, and
> > they got this information directly from Intel developers) but the
> > int3 is the one special case to which the errata does not apply.
> > Otherwise, kprobes and gdb would have a big, big issue.
> 
> Perhaps Richard/Suparna can confirm.
> 

Ha-ha! I found the reference. It's worth quoting in full :
http://sourceware.org/ml/systemtap/2005-q3/msg00208.html
------
From: Richard J Moore <richardj_moore at uk dot ibm dot com>

There is another issue to consider when looking into using probes other
then int3:

Intel erratum 54 - Unsynchronized Cross-modifying code - refers to the
practice of modifying code on one processor where another has prefetched
the unmodified version of the code. Intel states that unpredictable
general protection faults may result if a synchronizing instruction
(iret, int, int3, cpuid, etc ) is not executed on the second processor
before it executes the pre-fetched out-of-date copy of the instruction.

When we became aware of this I had a long discussion with Intel's
microarchitecture guys. It turns out that the reason for this erratum
(which incidentally Intel does not intend to fix) is because the trace
cache - the stream of micorops resulting from instruction interpretation
- cannot guaranteed to be valid. Reading between the lines I assume this
issue arises because of optimization done in the trace cache, where it
is no longer possible to identify the original instruction boundaries.
If the CPU discoverers that the trace cache has been invalidated because
of unsynchronized cross-modification then instruction execution will be
aborted with a GPF. Further discussion with Intel revealed that
replacing the first opcode byte with an int3 would not be subject to
this erratum.

So, is cmpxchg reliable? One has to guarantee more than mere atomicity.

-----

Therefore, it is exactly what my implementation is doing : I make sure
that no CPU sees an out-of-date copy of a pre-fetched instruction by 1 -
using a breakpoint, which skips the instruction that is going to be
modified, 2 - issuing an IPI to every CPU to execute a sync_core(), to
make sure that even when the breakpoint is removed, no cpu could
possibly still have the out-of-date copy of the instruction, modify the
now unused 2nd byte of the instruction, and then put back the original
1st byte of the instruction.

It has exactly the same intent as the algorithm proposed by Intel, but
it has less side-effects, scales better and supports NMI, SMI and MCE.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2007-05-11 18:55 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-10  1:55 [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Mathieu Desnoyers
2007-05-10  1:55 ` [patch 01/10] Linux Kernel Markers - Add kconfig menus for the marker code Mathieu Desnoyers
2007-05-10  6:57   ` Christoph Hellwig
2007-05-10  1:55 ` [patch 02/10] Linux Kernel Markers, architecture independent code Mathieu Desnoyers
2007-05-10  5:10   ` Alexey Dobriyan
2007-05-10 12:58     ` Mathieu Desnoyers
2007-05-10 13:12     ` Mathieu Desnoyers
2007-05-10 19:00       ` Alexey Dobriyan
2007-05-10 19:46         ` Mathieu Desnoyers
2007-05-10  1:55 ` [patch 03/10] Allow userspace applications to use marker.h to parse the markers section in the kernel binary Mathieu Desnoyers
2007-05-10  6:51   ` Christoph Hellwig
2007-05-10 22:14     ` David Smith
2007-06-23  8:09       ` Christoph Hellwig
2007-06-23  9:25         ` Alan Cox
2007-06-23  9:32           ` Christoph Hellwig
2007-06-23  9:49             ` Alan Cox
2007-06-23 10:06               ` Christoph Hellwig
2007-06-23 14:55                 ` Alan Cox
2007-05-10  1:55 ` [patch 04/10] Linux Kernel Markers - PowerPC optimized version Mathieu Desnoyers
2007-05-10  6:57   ` Christoph Hellwig
2007-05-10  1:56 ` [patch 05/10] Linux Kernel Markers - i386 " Mathieu Desnoyers
2007-05-10  9:06   ` Andi Kleen
2007-05-10 15:55     ` Mathieu Desnoyers
2007-05-10 16:28       ` Alan Cox
2007-05-10 16:59         ` Mathieu Desnoyers
2007-05-11  4:57           ` Ananth N Mavinakayanahalli
2007-05-11 18:55             ` Mathieu Desnoyers [this message]
2007-05-12  5:29             ` Suparna Bhattacharya
2007-05-11  6:04           ` Andi Kleen
2007-05-11 18:02             ` Mathieu Desnoyers
2007-05-11 21:56               ` Alan Cox
2007-05-13 15:20                 ` Mathieu Desnoyers
2007-05-10  1:56 ` [patch 06/10] Linux Kernel Markers - Non optimized architectures Mathieu Desnoyers
2007-05-10  5:13   ` Alexey Dobriyan
2007-05-10  6:56   ` Christoph Hellwig
2007-05-10 13:11     ` Mathieu Desnoyers
2007-05-10 13:40       ` Alan Cox
2007-05-10 14:25         ` Mathieu Desnoyers
2007-05-10 15:33         ` Nicholas Berry
2007-05-10 16:09           ` Alan Cox
2007-05-10  1:56 ` [patch 07/10] Linux Kernel Markers - Documentation Mathieu Desnoyers
2007-05-10  6:58   ` Christoph Hellwig
2007-05-10 11:41     ` Alan Cox
2007-05-10 11:41       ` Christoph Hellwig
2007-05-10 12:48         ` Alan Cox
2007-05-10 12:52           ` Pekka Enberg
2007-05-10 13:04             ` Alan Cox
2007-05-10 13:16               ` Pekka J Enberg
2007-05-10 13:43                 ` Alan Cox
2007-05-10 14:04                   ` Pekka J Enberg
2007-05-10 14:12     ` Mathieu Desnoyers
2007-05-10 14:14     ` Mathieu Desnoyers
2007-05-11 15:05     ` Valdis.Kletnieks
2007-05-10 12:00   ` Christoph Hellwig
2007-05-10 15:51   ` Scott Preece
2007-05-10  1:56 ` [patch 08/10] Defines the linker macro EXTRA_RWDATA for the marker data section Mathieu Desnoyers
2007-05-10  1:56 ` [patch 09/10] Linux Kernel Markers - Use EXTRA_RWDATA in architectures Mathieu Desnoyers
2007-05-10  1:56 ` [patch 10/10] Port of blktrace to the Linux Kernel Markers Mathieu Desnoyers
2007-05-10  6:53   ` Christoph Hellwig
2007-05-10  9:20   ` Jens Axboe
2007-05-10  2:30 ` [patch 00/10] Linux Kernel Markers for 2.6.21-mm2 Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070511185514.GA29945@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=ak@muc.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=ananth@in.ibm.com \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=prasanna@in.ibm.com \
    --cc=richardj_moore@uk.ibm.com \
    --cc=suparna@in.ibm.com \
    --cc=systemtap@sources.redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).