linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Why do kprobes and uprobes singlestep?
@ 2021-02-23 23:24 Andy Lutomirski
  2021-02-24  1:17 ` Masami Hiramatsu
  2021-03-01 16:51 ` Oleg Nesterov
  0 siblings, 2 replies; 41+ messages in thread
From: Andy Lutomirski @ 2021-02-23 23:24 UTC (permalink / raw)
  To: Oleg Nesterov, Masami Hiramatsu, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S. Miller, X86 ML, Andrew Cooper

A while back, I let myself be convinced that kprobes genuinely need to
single-step the kernel on occasion, and I decided that this sucked but
I could live with it.  it would, however, be Really Really Nice (tm)
if we could have a rule that anyone running x86 Linux who single-steps
the kernel (e.g. kgdb and nothing else) gets to keep all the pieces
when the system falls apart around them.  Specifically, if we don't
allow kernel single-stepping and if we suitably limit kernel
instruction breakpoints (the latter isn't actually a major problem),
then we don't really really need to use IRET to return to the kernel,
and that means we can avoid some massive NMI nastiness.

But I was contemplating the code, and I'm no longer convinced.
Uprobes seem to single-step user code for no discernable reason.
(They want to trap after executing an out of line instruction, AFAICT.
Surely INT3 or even CALL after the out-of-line insn would work as well
or better.)  Why does kprobe single-step?  I spend a while staring at
the code, and it was entirely unclear to me what the purpose of the
single-step is.

--Andy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-02-23 23:24 Why do kprobes and uprobes singlestep? Andy Lutomirski
@ 2021-02-24  1:17 ` Masami Hiramatsu
  2021-02-24 19:45   ` Andy Lutomirski
  2021-03-01 16:51 ` Oleg Nesterov
  1 sibling, 1 reply; 41+ messages in thread
From: Masami Hiramatsu @ 2021-02-24  1:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, Peter Zijlstra, LKML, Anil S Keshavamurthy,
	David S. Miller, X86 ML, Andrew Cooper

On Tue, 23 Feb 2021 15:24:19 -0800
Andy Lutomirski <luto@kernel.org> wrote:

> A while back, I let myself be convinced that kprobes genuinely need to
> single-step the kernel on occasion, and I decided that this sucked but
> I could live with it.  it would, however, be Really Really Nice (tm)
> if we could have a rule that anyone running x86 Linux who single-steps
> the kernel (e.g. kgdb and nothing else) gets to keep all the pieces
> when the system falls apart around them.  Specifically, if we don't
> allow kernel single-stepping and if we suitably limit kernel
> instruction breakpoints (the latter isn't actually a major problem),
> then we don't really really need to use IRET to return to the kernel,
> and that means we can avoid some massive NMI nastiness.

Would you mean using "pop regs + popf + ret" instead of IRET after
int3 handled for avoiding IRET releasing the NMI mask? Yeah, it is
possible. I don't complain about that.

However, what is the relationship between the IRET and single-stepping?
I think we can do same thing in do_debug...

> But I was contemplating the code, and I'm no longer convinced.
> Uprobes seem to single-step user code for no discernable reason.
> (They want to trap after executing an out of line instruction, AFAICT.
> Surely INT3 or even CALL after the out-of-line insn would work as well
> or better.)  Why does kprobe single-step?  I spend a while staring at
> the code, and it was entirely unclear to me what the purpose of the
> single-step is.

For kprobes, there are 2 major reasons for (still relaying on) single stepping.
One is to provide post_handler, another is executing the original code,
which is replaced by int3, without modifying code nor emulation.
Indeed, most of the instructions actually not depends on the ip register,
in that case (and user doesn't set post_handler), kprobe already skips
single stepping (a.k.a. kprobe booster, jump back to the kernel code after
executing out-of-line instruction.)
However, since some instructions, e.g. jump, call and ret, changes the ip
register (and stack), we have to do a fixup afterwards. 

But yes, it is possible to emulate, as same as arm/arm64 does. I just
concern about side-effects of the emulation, need to be carefully
implemented.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-02-24  1:17 ` Masami Hiramatsu
@ 2021-02-24 19:45   ` Andy Lutomirski
  2021-02-25  2:22     ` Masami Hiramatsu
  2021-02-25  9:59     ` Why do kprobes and uprobes singlestep? Peter Zijlstra
  0 siblings, 2 replies; 41+ messages in thread
From: Andy Lutomirski @ 2021-02-24 19:45 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andy Lutomirski, Oleg Nesterov, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S. Miller, X86 ML, Andrew Cooper

On Tue, Feb 23, 2021 at 5:18 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Tue, 23 Feb 2021 15:24:19 -0800
> Andy Lutomirski <luto@kernel.org> wrote:
>
> > A while back, I let myself be convinced that kprobes genuinely need to
> > single-step the kernel on occasion, and I decided that this sucked but
> > I could live with it.  it would, however, be Really Really Nice (tm)
> > if we could have a rule that anyone running x86 Linux who single-steps
> > the kernel (e.g. kgdb and nothing else) gets to keep all the pieces
> > when the system falls apart around them.  Specifically, if we don't
> > allow kernel single-stepping and if we suitably limit kernel
> > instruction breakpoints (the latter isn't actually a major problem),
> > then we don't really really need to use IRET to return to the kernel,
> > and that means we can avoid some massive NMI nastiness.
>
> Would you mean using "pop regs + popf + ret" instead of IRET after
> int3 handled for avoiding IRET releasing the NMI mask? Yeah, it is
> possible. I don't complain about that.

Yes, more or less.

>
> However, what is the relationship between the IRET and single-stepping?
> I think we can do same thing in do_debug...

Because there is no way to single-step without using IRET.  POPF; RET
will trap after RET and you won't make forward progress.

>
> > But I was contemplating the code, and I'm no longer convinced.
> > Uprobes seem to single-step user code for no discernable reason.
> > (They want to trap after executing an out of line instruction, AFAICT.
> > Surely INT3 or even CALL after the out-of-line insn would work as well
> > or better.)  Why does kprobe single-step?  I spend a while staring at
> > the code, and it was entirely unclear to me what the purpose of the
> > single-step is.
>
> For kprobes, there are 2 major reasons for (still relaying on) single stepping.
> One is to provide post_handler, another is executing the original code,
> which is replaced by int3, without modifying code nor emulation.

I don't follow.  Suppose we execute out of line.  If we originally have:

INSN

we replace it with:

INT3

and we have, out of line:

INSN [but with displacement modified if it's RIP-relative]

right now, we single-step the out of line copy.  But couldn't we instead do:

INSN [but with displacement modified if it's RIP-relative]
INT3

or even

INSN [but with displacement modified if it's RIP-relative]
JMP kprobe_post_handler

and avoid single-stepping?

I guess I see the point for CALL, JMP and RET, but it seems like we
could emulate those cases instead fairly easily.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-02-24 19:45   ` Andy Lutomirski
@ 2021-02-25  2:22     ` Masami Hiramatsu
  2021-02-25  6:03       ` Andy Lutomirski
                         ` (2 more replies)
  2021-02-25  9:59     ` Why do kprobes and uprobes singlestep? Peter Zijlstra
  1 sibling, 3 replies; 41+ messages in thread
From: Masami Hiramatsu @ 2021-02-25  2:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, Peter Zijlstra, LKML, Anil S Keshavamurthy,
	David S. Miller, X86 ML, Andrew Cooper

On Wed, 24 Feb 2021 11:45:10 -0800
Andy Lutomirski <luto@kernel.org> wrote:

> On Tue, Feb 23, 2021 at 5:18 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > On Tue, 23 Feb 2021 15:24:19 -0800
> > Andy Lutomirski <luto@kernel.org> wrote:
> >
> > > A while back, I let myself be convinced that kprobes genuinely need to
> > > single-step the kernel on occasion, and I decided that this sucked but
> > > I could live with it.  it would, however, be Really Really Nice (tm)
> > > if we could have a rule that anyone running x86 Linux who single-steps
> > > the kernel (e.g. kgdb and nothing else) gets to keep all the pieces
> > > when the system falls apart around them.  Specifically, if we don't
> > > allow kernel single-stepping and if we suitably limit kernel
> > > instruction breakpoints (the latter isn't actually a major problem),
> > > then we don't really really need to use IRET to return to the kernel,
> > > and that means we can avoid some massive NMI nastiness.
> >
> > Would you mean using "pop regs + popf + ret" instead of IRET after
> > int3 handled for avoiding IRET releasing the NMI mask? Yeah, it is
> > possible. I don't complain about that.
> 
> Yes, more or less.
> 
> >
> > However, what is the relationship between the IRET and single-stepping?
> > I think we can do same thing in do_debug...
> 
> Because there is no way to single-step without using IRET.  POPF; RET
> will trap after RET and you won't make forward progress.

Ah, indeed. "POPF; RET" is not atomically exceute.

> > > But I was contemplating the code, and I'm no longer convinced.
> > > Uprobes seem to single-step user code for no discernable reason.
> > > (They want to trap after executing an out of line instruction, AFAICT.
> > > Surely INT3 or even CALL after the out-of-line insn would work as well
> > > or better.)  Why does kprobe single-step?  I spend a while staring at
> > > the code, and it was entirely unclear to me what the purpose of the
> > > single-step is.
> >
> > For kprobes, there are 2 major reasons for (still relaying on) single stepping.
> > One is to provide post_handler, another is executing the original code,
> > which is replaced by int3, without modifying code nor emulation.
> 
> I don't follow.  Suppose we execute out of line.  If we originally have:
> 
> INSN
> 
> we replace it with:
> 
> INT3
> 
> and we have, out of line:
> 
> INSN [but with displacement modified if it's RIP-relative]
> 
> right now, we single-step the out of line copy.  But couldn't we instead do:
> 
> INSN [but with displacement modified if it's RIP-relative]
> INT3

If the INSN is "jmp +127", it will skip the INT3. So those instructions
must be identified and emulated. We did it already in the arm64 (see commit
7ee31a3aa8f4 ("arm64: kprobes: Use BRK instead of single-step when executing
 instructions out-of-line")), because arm64 already emulated the branch
instructions. I have to check x86 insns can be emulated without side-effects.

> 
> or even
> 
> INSN [but with displacement modified if it's RIP-relative]
> JMP kprobe_post_handler

This needs a sequence of push-regs etc. ;)

> 
> and avoid single-stepping?
> 
> I guess I see the point for CALL, JMP and RET, but it seems like we
> could emulate those cases instead fairly easily.

OK, let's try to do it. I think it should be possible because even in the
current code, resume fixup code (adjust IP register) works only for a few
groups of instructions.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-02-25  2:22     ` Masami Hiramatsu
@ 2021-02-25  6:03       ` Andy Lutomirski
  2021-02-25  9:11         ` Masami Hiramatsu
  2021-03-01 14:08       ` [RFC PATCH 0/1] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
  2021-03-02 15:25       ` [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
  2 siblings, 1 reply; 41+ messages in thread
From: Andy Lutomirski @ 2021-02-25  6:03 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andy Lutomirski, Oleg Nesterov, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S. Miller, X86 ML, Andrew Cooper

On Wed, Feb 24, 2021 at 6:22 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Wed, 24 Feb 2021 11:45:10 -0800
> Andy Lutomirski <luto@kernel.org> wrote:
>
> > On Tue, Feb 23, 2021 at 5:18 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > >
> > > On Tue, 23 Feb 2021 15:24:19 -0800
> > > Andy Lutomirski <luto@kernel.org> wrote:
> > >
> > > > A while back, I let myself be convinced that kprobes genuinely need to
> > > > single-step the kernel on occasion, and I decided that this sucked but
> > > > I could live with it.  it would, however, be Really Really Nice (tm)
> > > > if we could have a rule that anyone running x86 Linux who single-steps
> > > > the kernel (e.g. kgdb and nothing else) gets to keep all the pieces
> > > > when the system falls apart around them.  Specifically, if we don't
> > > > allow kernel single-stepping and if we suitably limit kernel
> > > > instruction breakpoints (the latter isn't actually a major problem),
> > > > then we don't really really need to use IRET to return to the kernel,
> > > > and that means we can avoid some massive NMI nastiness.
> > >
> > > Would you mean using "pop regs + popf + ret" instead of IRET after
> > > int3 handled for avoiding IRET releasing the NMI mask? Yeah, it is
> > > possible. I don't complain about that.
> >
> > Yes, more or less.
> >
> > >
> > > However, what is the relationship between the IRET and single-stepping?
> > > I think we can do same thing in do_debug...
> >
> > Because there is no way to single-step without using IRET.  POPF; RET
> > will trap after RET and you won't make forward progress.
>
> Ah, indeed. "POPF; RET" is not atomically exceute.
>
> > > > But I was contemplating the code, and I'm no longer convinced.
> > > > Uprobes seem to single-step user code for no discernable reason.
> > > > (They want to trap after executing an out of line instruction, AFAICT.
> > > > Surely INT3 or even CALL after the out-of-line insn would work as well
> > > > or better.)  Why does kprobe single-step?  I spend a while staring at
> > > > the code, and it was entirely unclear to me what the purpose of the
> > > > single-step is.
> > >
> > > For kprobes, there are 2 major reasons for (still relaying on) single stepping.
> > > One is to provide post_handler, another is executing the original code,
> > > which is replaced by int3, without modifying code nor emulation.
> >
> > I don't follow.  Suppose we execute out of line.  If we originally have:
> >
> > INSN
> >
> > we replace it with:
> >
> > INT3
> >
> > and we have, out of line:
> >
> > INSN [but with displacement modified if it's RIP-relative]
> >
> > right now, we single-step the out of line copy.  But couldn't we instead do:
> >
> > INSN [but with displacement modified if it's RIP-relative]
> > INT3
>
> If the INSN is "jmp +127", it will skip the INT3. So those instructions
> must be identified and emulated. We did it already in the arm64 (see commit
> 7ee31a3aa8f4 ("arm64: kprobes: Use BRK instead of single-step when executing
>  instructions out-of-line")), because arm64 already emulated the branch
> instructions. I have to check x86 insns can be emulated without side-effects.

Off the top of my head:

JMP changes RIP but has no other side effects.  Jcc is the same except
that the condition needs checking, which would be a bit tedious.

CALL changes RIP and does a push but has no other side effects.  We
have special infrastructure to emulate CALL from int3 context:
int3_emulate_call().

RET pops and changes RIP.  No other side effects.

RET imm is rare.  I don't think it occurs in the kernel at all.

LRET is rare.  I don't think kprobe needs to support it.

IRET is rare, and trying to kprobe it seems likely to cause a
disaster, although it's within the realm of possibility that the IRET
in sync_core() could work.

JMP FAR and CALL FAR are rare.  I see no reason to support them.

>
> >
> > or even
> >
> > INSN [but with displacement modified if it's RIP-relative]
> > JMP kprobe_post_handler
>
> This needs a sequence of push-regs etc. ;)
>
> >
> > and avoid single-stepping?
> >
> > I guess I see the point for CALL, JMP and RET, but it seems like we
> > could emulate those cases instead fairly easily.
>
> OK, let's try to do it. I think it should be possible because even in the
> current code, resume fixup code (adjust IP register) works only for a few
> groups of instructions.

I suspect that emulating them would give a nice performance boost,
too.  Single-stepping is very slow on x86.

I should let you know, though, that I might have found a sneaky
alternative solution to handling NMIs, so this is a bit lower priority
from my perspective than I thought it was.  I'm not quite 100%
convinced my idea works, but I'll play with it.

--Andy

>
> Thank you,
>
> --
> Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-02-25  6:03       ` Andy Lutomirski
@ 2021-02-25  9:11         ` Masami Hiramatsu
  0 siblings, 0 replies; 41+ messages in thread
From: Masami Hiramatsu @ 2021-02-25  9:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Oleg Nesterov, Peter Zijlstra, LKML, Anil S Keshavamurthy,
	David S. Miller, X86 ML, Andrew Cooper

On Wed, 24 Feb 2021 22:03:12 -0800
Andy Lutomirski <luto@kernel.org> wrote:

> On Wed, Feb 24, 2021 at 6:22 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > On Wed, 24 Feb 2021 11:45:10 -0800
> > Andy Lutomirski <luto@kernel.org> wrote:
> >
> > > On Tue, Feb 23, 2021 at 5:18 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > > >
> > > > On Tue, 23 Feb 2021 15:24:19 -0800
> > > > Andy Lutomirski <luto@kernel.org> wrote:
> > > >
> > > > > A while back, I let myself be convinced that kprobes genuinely need to
> > > > > single-step the kernel on occasion, and I decided that this sucked but
> > > > > I could live with it.  it would, however, be Really Really Nice (tm)
> > > > > if we could have a rule that anyone running x86 Linux who single-steps
> > > > > the kernel (e.g. kgdb and nothing else) gets to keep all the pieces
> > > > > when the system falls apart around them.  Specifically, if we don't
> > > > > allow kernel single-stepping and if we suitably limit kernel
> > > > > instruction breakpoints (the latter isn't actually a major problem),
> > > > > then we don't really really need to use IRET to return to the kernel,
> > > > > and that means we can avoid some massive NMI nastiness.
> > > >
> > > > Would you mean using "pop regs + popf + ret" instead of IRET after
> > > > int3 handled for avoiding IRET releasing the NMI mask? Yeah, it is
> > > > possible. I don't complain about that.
> > >
> > > Yes, more or less.
> > >
> > > >
> > > > However, what is the relationship between the IRET and single-stepping?
> > > > I think we can do same thing in do_debug...
> > >
> > > Because there is no way to single-step without using IRET.  POPF; RET
> > > will trap after RET and you won't make forward progress.
> >
> > Ah, indeed. "POPF; RET" is not atomically exceute.
> >
> > > > > But I was contemplating the code, and I'm no longer convinced.
> > > > > Uprobes seem to single-step user code for no discernable reason.
> > > > > (They want to trap after executing an out of line instruction, AFAICT.
> > > > > Surely INT3 or even CALL after the out-of-line insn would work as well
> > > > > or better.)  Why does kprobe single-step?  I spend a while staring at
> > > > > the code, and it was entirely unclear to me what the purpose of the
> > > > > single-step is.
> > > >
> > > > For kprobes, there are 2 major reasons for (still relaying on) single stepping.
> > > > One is to provide post_handler, another is executing the original code,
> > > > which is replaced by int3, without modifying code nor emulation.
> > >
> > > I don't follow.  Suppose we execute out of line.  If we originally have:
> > >
> > > INSN
> > >
> > > we replace it with:
> > >
> > > INT3
> > >
> > > and we have, out of line:
> > >
> > > INSN [but with displacement modified if it's RIP-relative]
> > >
> > > right now, we single-step the out of line copy.  But couldn't we instead do:
> > >
> > > INSN [but with displacement modified if it's RIP-relative]
> > > INT3
> >
> > If the INSN is "jmp +127", it will skip the INT3. So those instructions
> > must be identified and emulated. We did it already in the arm64 (see commit
> > 7ee31a3aa8f4 ("arm64: kprobes: Use BRK instead of single-step when executing
> >  instructions out-of-line")), because arm64 already emulated the branch
> > instructions. I have to check x86 insns can be emulated without side-effects.
> 
> Off the top of my head:
> 
> JMP changes RIP but has no other side effects.  Jcc is the same except
> that the condition needs checking, which would be a bit tedious.
> 
> CALL changes RIP and does a push but has no other side effects.  We
> have special infrastructure to emulate CALL from int3 context:
> int3_emulate_call().

Yeah, I remember that a gap was introduced for int3_emulate_call().
These helps me to implement emulation.

> 
> RET pops and changes RIP.  No other side effects.
> 
> RET imm is rare.  I don't think it occurs in the kernel at all.
> 
> LRET is rare.  I don't think kprobe needs to support it.
> 
> JMP FAR and CALL FAR are rare.  I see no reason to support them.

I see those are rare, but supporting those is not hard.

> 
> IRET is rare, and trying to kprobe it seems likely to cause a
> disaster, although it's within the realm of possibility that the IRET
> in sync_core() could work.

Agreed. Iret should not be probed.


> > > or even
> > >
> > > INSN [but with displacement modified if it's RIP-relative]
> > > JMP kprobe_post_handler
> >
> > This needs a sequence of push-regs etc. ;)
> >
> > >
> > > and avoid single-stepping?
> > >
> > > I guess I see the point for CALL, JMP and RET, but it seems like we
> > > could emulate those cases instead fairly easily.
> >
> > OK, let's try to do it. I think it should be possible because even in the
> > current code, resume fixup code (adjust IP register) works only for a few
> > groups of instructions.
> 
> I suspect that emulating them would give a nice performance boost,
> too.  Single-stepping is very slow on x86.

Yeah, that's same on arm64. Jean reported eliminating single-step
gained the performance.

> 
> I should let you know, though, that I might have found a sneaky
> alternative solution to handling NMIs, so this is a bit lower priority
> from my perspective than I thought it was.  I'm not quite 100%
> convinced my idea works, but I'll play with it.

Does that involve kprobes? Anyway, I'll try to remove single-step by
emulation and int3.

Thank you,


> 
> --Andy
> 
> >
> > Thank you,
> >
> > --
> > Masami Hiramatsu <mhiramat@kernel.org>


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-02-24 19:45   ` Andy Lutomirski
  2021-02-25  2:22     ` Masami Hiramatsu
@ 2021-02-25  9:59     ` Peter Zijlstra
  1 sibling, 0 replies; 41+ messages in thread
From: Peter Zijlstra @ 2021-02-25  9:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Masami Hiramatsu, Oleg Nesterov, LKML, Anil S Keshavamurthy,
	David S. Miller, X86 ML, Andrew Cooper

On Wed, Feb 24, 2021 at 11:45:10AM -0800, Andy Lutomirski wrote:
> I guess I see the point for CALL, JMP and RET, but it seems like we
> could emulate those cases instead fairly easily.

Today, yes. CALL emulation was 'recently' made possible by having #BP
have a stack gap. We have emulation for all 3 those instructions
implemented in asm/text-patching.h, see int3_emulate_$insn().

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [RFC PATCH 0/1] x86/kprobes: Remoev single-step trap from x86 kprobes
  2021-02-25  2:22     ` Masami Hiramatsu
  2021-02-25  6:03       ` Andy Lutomirski
@ 2021-03-01 14:08       ` Masami Hiramatsu
  2021-03-01 14:08         ` [RFC PATCH 1/1] x86/kprobes: Use int3 instead of debug trap for single-step Masami Hiramatsu
  2021-03-02 15:25       ` [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
  2 siblings, 1 reply; 41+ messages in thread
From: Masami Hiramatsu @ 2021-03-01 14:08 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar
  Cc: Oleg Nesterov, Masami Hiramatsu, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S . Miller, X86 ML, Andrew Cooper,
	Steven Rostedt

Hi Andy,

Here is the patch to remove the single-step debug trap from the x86 kprobe.
This uses int3 as you suggested instead of the debug trap, for removing the
iret which returns to kernel.
Some instructions must be emulated and some instructions becomes not able
to be probed, but as far as I can see those are not rare case.

TODOs/discussions
 - extend kprobe debugfs interface to show the arch-dependent flags, like
   [BOOST], [EMULATE].
 - Consider this emulation code split into a file and share it with uprobe
   (uprobe already has its own emulator) and alternatives (it has int3_emulate_*)
   but it can be overengineering.
 - Add testcases for emulator. Current kprobe smoke test is not arch specific.
   Maybe better to probe an assembly target code so that it can test "boosted",
   "emulated" or "int3-single-stepped" cases.

Thank you,

---

Masami Hiramatsu (1):
      x86/kprobes: Use int3 instead of debug trap for single-step


 arch/x86/include/asm/kprobes.h |   21 +-
 arch/x86/kernel/kprobes/core.c |  515 ++++++++++++++++++++++++++--------------
 arch/x86/kernel/traps.c        |    3 
 3 files changed, 351 insertions(+), 188 deletions(-)

--
Masami Hiramatsu (Linaro) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [RFC PATCH 1/1] x86/kprobes: Use int3 instead of debug trap for single-step
  2021-03-01 14:08       ` [RFC PATCH 0/1] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
@ 2021-03-01 14:08         ` Masami Hiramatsu
  2021-03-02  8:06           ` Peter Zijlstra
                             ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Masami Hiramatsu @ 2021-03-01 14:08 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar
  Cc: Oleg Nesterov, Masami Hiramatsu, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S . Miller, X86 ML, Andrew Cooper,
	Steven Rostedt

Use int3 instead of debug trap exception for single-stepping the
probed instructions. Some instructions which change the ip
registers or modify IF flags are emulated because those are not
able to be single-stepped by int3 or may allow the interrupt
while single-stepping.

This actually changes the kprobes behavior.

- kprobes can not probe following instructions; int3, iret,
  far jmp/call which get absolute address as immediate,
  indirect far jmp/call, indirect near jmp/call with addressing
  by memory (register-based indirect jmp/call are OK), and
  vmcall/vmlaunch/vmresume/vmxoff.

- If the kprobe post_handler doesn't set before registering,
  it may not be called in some case even if you set it afterwards.
  (IOW, kprobe booster is enabled at registration, user can not
   change it)

But both are rare issue, unsupported instructions will not be
used in the kernel (or rarely used), and post_handlers are
rarely used (I don't see it except for the test code).

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/x86/include/asm/kprobes.h |   21 +-
 arch/x86/kernel/kprobes/core.c |  515 ++++++++++++++++++++++++++--------------
 arch/x86/kernel/traps.c        |    3 
 3 files changed, 351 insertions(+), 188 deletions(-)

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index d20a3d6be36e..bd7f5886a789 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -65,10 +65,22 @@ struct arch_specific_insn {
 	 * a post_handler).
 	 */
 	unsigned boostable:1;
-	unsigned if_modifier:1;
-	unsigned is_call:1;
-	unsigned is_pushf:1;
-	unsigned is_abs_ip:1;
+	unsigned char size;	/* The size of insn */
+	union {
+		unsigned char opcode;
+		struct {
+			unsigned char type;
+		} jcc;
+		struct {
+			unsigned char type;
+			unsigned char asize;
+		} loop;
+		struct {
+			unsigned char reg;
+		} indirect;
+	};
+	s32 rel32;	/* relative offset must be s32, s16, or s8 */
+	void (*emulate_op)(struct kprobe *p, struct pt_regs *regs);
 	/* Number of bytes of text poked */
 	int tp_len;
 };
@@ -107,7 +119,6 @@ extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
 extern int kprobe_exceptions_notify(struct notifier_block *self,
 				    unsigned long val, void *data);
 extern int kprobe_int3_handler(struct pt_regs *regs);
-extern int kprobe_debug_handler(struct pt_regs *regs);
 
 #else
 
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index be76568d57a5..30c34797933a 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -177,6 +177,9 @@ int can_boost(struct insn *insn, void *addr)
 	case 0xf0:
 		/* clear and set flags are boostable */
 		return (opcode == 0xf5 || (0xf7 < opcode && opcode < 0xfe));
+	case 0xff:
+		/* indirect jmp is boostable */
+		return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
 	default:
 		/* CS override prefix and call are not boostable */
 		return (opcode != 0x2e && opcode != 0x9a);
@@ -357,13 +360,14 @@ int __copy_instruction(u8 *dest, u8 *src, u8 *real, struct insn *insn)
 	return insn->length;
 }
 
-/* Prepare reljump right after instruction to boost */
-static int prepare_boost(kprobe_opcode_t *buf, struct kprobe *p,
-			  struct insn *insn)
+/* Prepare reljump or int3 right after instruction */
+static int prepare_singlestep(kprobe_opcode_t *buf, struct kprobe *p,
+			      struct insn *insn)
 {
 	int len = insn->length;
 
-	if (can_boost(insn, p->addr) &&
+	if (!IS_ENABLED(CONFIG_PREEMPTION) &&
+	    !p->post_handler && can_boost(insn, p->addr) &&
 	    MAX_INSN_SIZE - len >= JMP32_INSN_SIZE) {
 		/*
 		 * These instructions can be executed directly if it
@@ -374,7 +378,12 @@ static int prepare_boost(kprobe_opcode_t *buf, struct kprobe *p,
 		len += JMP32_INSN_SIZE;
 		p->ainsn.boostable = 1;
 	} else {
-		p->ainsn.boostable = 0;
+		/* Otherwise, put an int3 for trapping singlestep */
+		if (MAX_INSN_SIZE - len < INT3_INSN_SIZE)
+			return -ENOSPC;
+
+		buf[len] = INT3_INSN_OPCODE;
+		len += INT3_INSN_SIZE;
 	}
 
 	return len;
@@ -411,42 +420,225 @@ void free_insn_page(void *page)
 	module_memfree(page);
 }
 
-static void set_resume_flags(struct kprobe *p, struct insn *insn)
+/* Kprobe x86 instruction emulation - only regs->ip or IF flag modifiers */
+
+static void kprobe_emulate_ifmodifiers(struct kprobe *p, struct pt_regs *regs)
+{
+	switch (p->ainsn.opcode) {
+	case 0xfa:	/* cli */
+		regs->flags &= ~(X86_EFLAGS_IF);
+		break;
+	case 0xfb:	/* sti */
+		regs->flags |= X86_EFLAGS_IF;
+		break;
+	case 0x9c:	/* pushf */
+		int3_emulate_push(regs, regs->flags);
+		break;
+	case 0x9d:	/* popf */
+		regs->flags = int3_emulate_pop(regs);
+		break;
+	}
+	regs->ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
+}
+NOKPROBE_SYMBOL(kprobe_emulate_ifmodifiers);
+
+static void kprobe_emulate_ret(struct kprobe *p, struct pt_regs *regs)
+{
+	int3_emulate_ret(regs);
+}
+NOKPROBE_SYMBOL(kprobe_emulate_ret);
+
+static void kprobe_emulate_call(struct kprobe *p, struct pt_regs *regs)
+{
+	unsigned long func = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
+
+	func += p->ainsn.rel32;
+	int3_emulate_call(regs, func);
+}
+NOKPROBE_SYMBOL(kprobe_emulate_call);
+
+static void kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs)
+{
+	unsigned long ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
+
+	ip += p->ainsn.rel32;
+	int3_emulate_jmp(regs, ip);
+}
+NOKPROBE_SYMBOL(kprobe_emulate_jmp);
+
+static const unsigned long jcc_mask[6] = {
+	[0] = X86_EFLAGS_OF,
+	[1] = X86_EFLAGS_CF,
+	[2] = X86_EFLAGS_ZF,
+	[3] = X86_EFLAGS_CF | X86_EFLAGS_ZF,
+	[4] = X86_EFLAGS_SF,
+	[5] = X86_EFLAGS_PF,
+};
+
+static void kprobe_emulate_jcc(struct kprobe *p, struct pt_regs *regs)
+{
+	bool invert = p->ainsn.jcc.type & 1;
+	bool match;
+
+	if (p->ainsn.jcc.type < 0xc) {
+		match = regs->flags & jcc_mask[p->ainsn.jcc.type >> 1];
+	} else {
+		match = ((regs->flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^
+			((regs->flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT);
+		if (p->ainsn.jcc.type >= 0xe)
+			match = match && (regs->flags & X86_EFLAGS_ZF);
+	}
+	if ((match && !invert) || (!match && invert))
+		kprobe_emulate_jmp(p, regs);
+	else
+		regs->ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
+}
+NOKPROBE_SYMBOL(kprobe_emulate_jcc);
+
+static void kprobe_emulate_loop(struct kprobe *p, struct pt_regs *regs)
+{
+	bool match;
+
+	if (p->ainsn.loop.type != 3) {	/* LOOP* */
+		if (p->ainsn.loop.asize == 32)
+			match = ((*(u32 *)&regs->cx)--) != 0;
+#ifdef CONFIG_X86_64
+		else if (p->ainsn.loop.asize == 64)
+			match = ((*(u64 *)&regs->cx)--) != 0;
+#endif
+		else
+			match = ((*(u16 *)&regs->cx)--) != 0;
+	} else {			/* JCXZ */
+		if (p->ainsn.loop.asize == 32)
+			match = *(u32 *)(&regs->cx) == 0;
+#ifdef CONFIG_X86_64
+		else if (p->ainsn.loop.asize == 64)
+			match = *(u64 *)(&regs->cx) == 0;
+#endif
+		else
+			match = *(u16 *)(&regs->cx) == 0;
+	}
+
+	if (p->ainsn.loop.type == 0)	/* LOOPNE */
+		match = match && !(regs->flags & X86_EFLAGS_ZF);
+	else if (p->ainsn.loop.type == 1)	/* LOOPE */
+		match = match && (regs->flags & X86_EFLAGS_ZF);
+
+	if (match)
+		kprobe_emulate_jmp(p, regs);
+	else
+		regs->ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
+}
+NOKPROBE_SYMBOL(kprobe_emulate_loop);
+
+static const int addrmode_regoffs[] = {
+	offsetof(struct pt_regs, ax),
+	offsetof(struct pt_regs, cx),
+	offsetof(struct pt_regs, dx),
+	offsetof(struct pt_regs, bx),
+	offsetof(struct pt_regs, sp),
+	offsetof(struct pt_regs, bp),
+	offsetof(struct pt_regs, si),
+	offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+	offsetof(struct pt_regs, r8),
+	offsetof(struct pt_regs, r9),
+	offsetof(struct pt_regs, r10),
+	offsetof(struct pt_regs, r11),
+	offsetof(struct pt_regs, r12),
+	offsetof(struct pt_regs, r13),
+	offsetof(struct pt_regs, r14),
+	offsetof(struct pt_regs, r15),
+#endif
+};
+
+static void kprobe_emulate_call_indirect(struct kprobe *p, struct pt_regs *regs)
+{
+	unsigned long offs = addrmode_regoffs[p->ainsn.indirect.reg];
+
+	int3_emulate_call(regs, regs_get_register(regs, offs));
+}
+NOKPROBE_SYMBOL(kprobe_emulate_call_indirect);
+
+static void kprobe_emulate_jmp_indirect(struct kprobe *p, struct pt_regs *regs)
+{
+	unsigned long offs = addrmode_regoffs[p->ainsn.indirect.reg];
+
+	int3_emulate_jmp(regs, regs_get_register(regs, offs));
+}
+NOKPROBE_SYMBOL(kprobe_emulate_jmp_indirect);
+
+static int prepare_emulation(struct kprobe *p, struct insn *insn)
 {
 	insn_byte_t opcode = insn->opcode.bytes[0];
 
 	switch (opcode) {
 	case 0xfa:		/* cli */
 	case 0xfb:		/* sti */
+	case 0x9c:		/* pushfl */
 	case 0x9d:		/* popf/popfd */
-		/* Check whether the instruction modifies Interrupt Flag or not */
-		p->ainsn.if_modifier = 1;
-		break;
-	case 0x9c:	/* pushfl */
-		p->ainsn.is_pushf = 1;
+		/*
+		 * IF modifiers must be emulated since it will enable interrupt while
+		 * int3 single stepping.
+		 */
+		p->ainsn.emulate_op = kprobe_emulate_ifmodifiers;
+		p->ainsn.opcode = opcode;
 		break;
-	case 0xcf:	/* iret */
-		p->ainsn.if_modifier = 1;
-		fallthrough;
 	case 0xc2:	/* ret/lret */
 	case 0xc3:
 	case 0xca:
 	case 0xcb:
-	case 0xea:	/* jmp absolute -- ip is correct */
-		/* ip is already adjusted, no more changes required */
-		p->ainsn.is_abs_ip = 1;
-		/* Without resume jump, this is boostable */
-		p->ainsn.boostable = 1;
+		p->ainsn.emulate_op = kprobe_emulate_ret;
 		break;
-	case 0xe8:	/* call relative - Fix return addr */
-		p->ainsn.is_call = 1;
+	case 0x9a:	/* far call absolute -- segment is not supported */
+	case 0xea:	/* far jmp absolute -- segment is not supported */
+	case 0xcc:	/* int3 */
+	case 0xcf:	/* iret -- in-kernel IRET is not supported */
+		return -EOPNOTSUPP;
 		break;
-#ifdef CONFIG_X86_32
-	case 0x9a:	/* call absolute -- same as call absolute, indirect */
-		p->ainsn.is_call = 1;
-		p->ainsn.is_abs_ip = 1;
+	case 0xe8:	/* near call relative */
+		p->ainsn.emulate_op = kprobe_emulate_call;
+		if (insn->immediate.nbytes == 2)
+			p->ainsn.rel32 = *(s16 *)&insn->immediate.value;
+		else
+			p->ainsn.rel32 = *(s32 *)&insn->immediate.value;
+		break;
+	case 0xeb:	/* short jump relative */
+	case 0xe9:	/* near jump relative */
+		p->ainsn.emulate_op = kprobe_emulate_jmp;
+		if (insn->immediate.nbytes == 1)
+			p->ainsn.rel32 = *(s8 *)&insn->immediate.value;
+		else if (insn->immediate.nbytes == 2)
+			p->ainsn.rel32 = *(s16 *)&insn->immediate.value;
+		else
+			p->ainsn.rel32 = *(s32 *)&insn->immediate.value;
+		break;
+	case 0x0f:
+		opcode = insn->opcode.bytes[1];
+		if ((opcode & 0xf0) == 0x80) {
+			/* 2 bytes Conditional Jump */
+			p->ainsn.emulate_op = kprobe_emulate_jcc;
+			p->ainsn.jcc.type = opcode & 0xf;
+			if (insn->immediate.nbytes == 2)
+				p->ainsn.rel32 = *(s16 *)&insn->immediate.value;
+			else
+				p->ainsn.rel32 = *(s32 *)&insn->immediate.value;
+		} else if (opcode == 0x01 &&
+			   X86_MODRM_REG(insn->modrm.bytes[0]) == 0 &&
+			   X86_MODRM_MOD(insn->modrm.bytes[0]) == 3) {
+			/* VM extensions - not supported */
+			return -EOPNOTSUPP;
+		}
+		break;
+	case 0xe0:	/* Loop NZ */
+	case 0xe1:	/* Loop */
+	case 0xe2:	/* Loop */
+	case 0xe3:	/* J*CXZ */
+		p->ainsn.emulate_op = kprobe_emulate_loop;
+		p->ainsn.loop.type = opcode & 0x3;
+		p->ainsn.loop.asize = insn->addr_bytes * 8;
+		p->ainsn.rel32 = *(s8 *)&insn->immediate.value;
 		break;
-#endif
 	case 0xff:
 		/*
 		 * Since the 0xff is an extended group opcode, the instruction
@@ -454,46 +646,62 @@ static void set_resume_flags(struct kprobe *p, struct insn *insn)
 		 */
 		opcode = insn->modrm.bytes[0];
 		if ((opcode & 0x30) == 0x10) {
-			/*
-			 * call absolute, indirect
-			 * Fix return addr; ip is correct.
-			 * But this is not boostable
-			 */
-			p->ainsn.is_call = 1;
-			p->ainsn.is_abs_ip = 1;
-			break;
+			if ((opcode & 0x8) == 0x8)
+				return -EOPNOTSUPP;	/* far call */
+			/* call absolute, indirect */
+			p->ainsn.emulate_op = kprobe_emulate_call_indirect;
 		} else if ((opcode & 0x30) == 0x20) {
-			/*
-			 * jmp near and far, absolute indirect
-			 * ip is correct.
-			 */
-			p->ainsn.is_abs_ip = 1;
-			/* Without resume jump, this is boostable */
-			p->ainsn.boostable = 1;
-		}
+			if ((opcode & 0x8) == 0x8)
+				return -EOPNOTSUPP;	/* far jmp */
+			/* jmp near absolute indirect */
+			p->ainsn.emulate_op = kprobe_emulate_jmp_indirect;
+		} else
+			break;
+
+		if (insn->addr_bytes != sizeof(unsigned long))
+			return -EOPNOTSUPP;	/* Don't support differnt size */
+		if (X86_MODRM_MOD(opcode) != 3)
+			return -EOPNOTSUPP;	/* TODO: support memory addressing */
+
+		p->ainsn.indirect.reg = X86_MODRM_RM(opcode);
+#ifdef CONFIG_X86_64
+		if (X86_REX_B(insn->rex_prefix.value))
+			p->ainsn.indirect.reg += 8;
+#endif
 		break;
+	default:
+		if ((opcode & 0xf0) == 0x70) {
+			/* 1 byte conditional jump */
+			p->ainsn.emulate_op = kprobe_emulate_jcc;
+			p->ainsn.jcc.type = opcode & 0xf;
+			p->ainsn.rel32 = *(char *)insn->immediate.bytes;
+		}
 	}
+	p->ainsn.size = insn->length;
+
+	return 0;
 }
 
 static int arch_copy_kprobe(struct kprobe *p)
 {
 	struct insn insn;
 	kprobe_opcode_t buf[MAX_INSN_SIZE];
-	int len;
+	int ret, len;
 
 	/* Copy an instruction with recovering if other optprobe modifies it.*/
 	len = __copy_instruction(buf, p->addr, p->ainsn.insn, &insn);
 	if (!len)
 		return -EINVAL;
 
-	/*
-	 * __copy_instruction can modify the displacement of the instruction,
-	 * but it doesn't affect boostable check.
-	 */
-	len = prepare_boost(buf, p, &insn);
+	/* Analyze the opcode and setup emulate functions */
+	ret = prepare_emulation(p, &insn);
+	if (ret < 0)
+		return ret;
 
-	/* Analyze the opcode and set resume flags */
-	set_resume_flags(p, &insn);
+	/* Add int3 for single-step or booster jmp */
+	len = prepare_singlestep(buf, p, &insn);
+	if (len < 0)
+		return len;
 
 	/* Also, displacement change doesn't affect the first byte */
 	p->opcode = buf[0];
@@ -586,29 +794,7 @@ set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
 {
 	__this_cpu_write(current_kprobe, p);
 	kcb->kprobe_saved_flags = kcb->kprobe_old_flags
-		= (regs->flags & (X86_EFLAGS_TF | X86_EFLAGS_IF));
-	if (p->ainsn.if_modifier)
-		kcb->kprobe_saved_flags &= ~X86_EFLAGS_IF;
-}
-
-static nokprobe_inline void clear_btf(void)
-{
-	if (test_thread_flag(TIF_BLOCKSTEP)) {
-		unsigned long debugctl = get_debugctlmsr();
-
-		debugctl &= ~DEBUGCTLMSR_BTF;
-		update_debugctlmsr(debugctl);
-	}
-}
-
-static nokprobe_inline void restore_btf(void)
-{
-	if (test_thread_flag(TIF_BLOCKSTEP)) {
-		unsigned long debugctl = get_debugctlmsr();
-
-		debugctl |= DEBUGCTLMSR_BTF;
-		update_debugctlmsr(debugctl);
-	}
+		= (regs->flags & X86_EFLAGS_IF);
 }
 
 void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
@@ -623,6 +809,22 @@ void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(arch_prepare_kretprobe);
 
+static void kprobe_post_process(struct kprobe *cur, struct pt_regs *regs,
+			       struct kprobe_ctlblk *kcb)
+{
+	if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
+		kcb->kprobe_status = KPROBE_HIT_SSDONE;
+		cur->post_handler(cur, regs, 0);
+	}
+
+	/* Restore back the original saved kprobes variables and continue. */
+	if (kcb->kprobe_status == KPROBE_REENTER)
+		restore_previous_kprobe(kcb);
+	else
+		reset_current_kprobe();
+}
+NOKPROBE_SYMBOL(kprobe_post_process);
+
 static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 			     struct kprobe_ctlblk *kcb, int reenter)
 {
@@ -630,7 +832,7 @@ static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 		return;
 
 #if !defined(CONFIG_PREEMPTION)
-	if (p->ainsn.boostable && !p->post_handler) {
+	if (p->ainsn.boostable) {
 		/* Boost up -- we can execute copied instructions directly */
 		if (!reenter)
 			reset_current_kprobe();
@@ -649,18 +851,50 @@ static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 		kcb->kprobe_status = KPROBE_REENTER;
 	} else
 		kcb->kprobe_status = KPROBE_HIT_SS;
-	/* Prepare real single stepping */
-	clear_btf();
-	regs->flags |= X86_EFLAGS_TF;
+
+	if (p->ainsn.emulate_op) {
+		p->ainsn.emulate_op(p, regs);
+		kprobe_post_process(p, regs, kcb);
+		return;
+	}
+
+	/* Disable interrupt, and set ip register on trampoline */
 	regs->flags &= ~X86_EFLAGS_IF;
-	/* single step inline if the instruction is an int3 */
-	if (p->opcode == INT3_INSN_OPCODE)
-		regs->ip = (unsigned long)p->addr;
-	else
-		regs->ip = (unsigned long)p->ainsn.insn;
+	regs->ip = (unsigned long)p->ainsn.insn;
 }
 NOKPROBE_SYMBOL(setup_singlestep);
 
+/*
+ * Called after single-stepping.  p->addr is the address of the
+ * instruction whose first byte has been replaced by the "int3"
+ * instruction.  To avoid the SMP problems that can occur when we
+ * temporarily put back the original opcode to single-step, we
+ * single-stepped a copy of the instruction.  The address of this
+ * copy is p->ainsn.insn. We also doesn't use trap, but "int3" again
+ * right after the copied instruction.
+ * Different from the trap single-step, "int3" single-step can not
+ * handle the instruction which changes the ip register, e.g. jmp,
+ * call, conditional jmp, and the instructions which changes the IF
+ * flags because interrupt must be disabled around the single-stepping.
+ * Such instructions are software emulated, but others are single-stepped
+ * using "int3".
+ *
+ * When the 2nd "int3" handled, the regs->ip and regs->flags needs to
+ * be adjusted, so that we can resume execution on correct code.
+ */
+static void resume_singlestep(struct kprobe *p, struct pt_regs *regs,
+			      struct kprobe_ctlblk *kcb)
+{
+	unsigned long copy_ip = (unsigned long)p->ainsn.insn;
+	unsigned long orig_ip = (unsigned long)p->addr;
+
+	/* Restore saved interrupt flag and ip register */
+	regs->flags |= kcb->kprobe_saved_flags;
+	/* Note that regs->ip is executed int3 so must be a step back */
+	regs->ip += (orig_ip - copy_ip) - INT3_INSN_SIZE;
+}
+NOKPROBE_SYMBOL(resume_singlestep);
+
 /*
  * We have reentered the kprobe_handler(), since another probe was hit while
  * within the handler. We save the original kprobes variables and just single
@@ -696,6 +930,12 @@ static int reenter_kprobe(struct kprobe *p, struct pt_regs *regs,
 }
 NOKPROBE_SYMBOL(reenter_kprobe);
 
+static int nokprobe_inline kprobe_is_ss(struct kprobe_ctlblk *kcb)
+{
+	return (kcb->kprobe_status == KPROBE_HIT_SS ||
+		kcb->kprobe_status == KPROBE_REENTER);
+}
+
 /*
  * Interrupts are disabled on entry as trap3 is an interrupt gate and they
  * remain disabled throughout this function.
@@ -740,7 +980,18 @@ int kprobe_int3_handler(struct pt_regs *regs)
 				reset_current_kprobe();
 			return 1;
 		}
-	} else if (*addr != INT3_INSN_OPCODE) {
+	} else if (kprobe_is_ss(kcb)) {
+		p = kprobe_running();
+		if ((unsigned long)p->ainsn.insn < regs->ip &&
+		    (unsigned long)p->ainsn.insn + MAX_INSN_SIZE > regs->ip) {
+			/* Most provably this is the second int3 for singlestep */
+			resume_singlestep(p, regs, kcb);
+			kprobe_post_process(p, regs, kcb);
+			return 1;
+		}
+	}
+
+	if (*addr != INT3_INSN_OPCODE) {
 		/*
 		 * The breakpoint instruction was removed right
 		 * after we hit it.  Another cpu has removed
@@ -813,91 +1064,6 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(trampoline_handler);
 
-/*
- * Called after single-stepping.  p->addr is the address of the
- * instruction whose first byte has been replaced by the "int 3"
- * instruction.  To avoid the SMP problems that can occur when we
- * temporarily put back the original opcode to single-step, we
- * single-stepped a copy of the instruction.  The address of this
- * copy is p->ainsn.insn.
- *
- * This function prepares to return from the post-single-step
- * interrupt.  We have to fix up the stack as follows:
- *
- * 0) Except in the case of absolute or indirect jump or call instructions,
- * the new ip is relative to the copied instruction.  We need to make
- * it relative to the original instruction.
- *
- * 1) If the single-stepped instruction was pushfl, then the TF and IF
- * flags are set in the just-pushed flags, and may need to be cleared.
- *
- * 2) If the single-stepped instruction was a call, the return address
- * that is atop the stack is the address following the copied instruction.
- * We need to make it the address following the original instruction.
- */
-static void resume_execution(struct kprobe *p, struct pt_regs *regs,
-			     struct kprobe_ctlblk *kcb)
-{
-	unsigned long *tos = stack_addr(regs);
-	unsigned long copy_ip = (unsigned long)p->ainsn.insn;
-	unsigned long orig_ip = (unsigned long)p->addr;
-
-	regs->flags &= ~X86_EFLAGS_TF;
-
-	/* Fixup the contents of top of stack */
-	if (p->ainsn.is_pushf) {
-		*tos &= ~(X86_EFLAGS_TF | X86_EFLAGS_IF);
-		*tos |= kcb->kprobe_old_flags;
-	} else if (p->ainsn.is_call) {
-		*tos = orig_ip + (*tos - copy_ip);
-	}
-
-	if (!p->ainsn.is_abs_ip)
-		regs->ip += orig_ip - copy_ip;
-
-	restore_btf();
-}
-NOKPROBE_SYMBOL(resume_execution);
-
-/*
- * Interrupts are disabled on entry as trap1 is an interrupt gate and they
- * remain disabled throughout this function.
- */
-int kprobe_debug_handler(struct pt_regs *regs)
-{
-	struct kprobe *cur = kprobe_running();
-	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
-	if (!cur)
-		return 0;
-
-	resume_execution(cur, regs, kcb);
-	regs->flags |= kcb->kprobe_saved_flags;
-
-	if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
-		kcb->kprobe_status = KPROBE_HIT_SSDONE;
-		cur->post_handler(cur, regs, 0);
-	}
-
-	/* Restore back the original saved kprobes variables and continue. */
-	if (kcb->kprobe_status == KPROBE_REENTER) {
-		restore_previous_kprobe(kcb);
-		goto out;
-	}
-	reset_current_kprobe();
-out:
-	/*
-	 * if somebody else is singlestepping across a probe point, flags
-	 * will have TF set, in which case, continue the remaining processing
-	 * of do_debug, as if this is not a probe hit.
-	 */
-	if (regs->flags & X86_EFLAGS_TF)
-		return 0;
-
-	return 1;
-}
-NOKPROBE_SYMBOL(kprobe_debug_handler);
-
 int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
 {
 	struct kprobe *cur = kprobe_running();
@@ -915,20 +1081,9 @@ int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
 		 * normal page fault.
 		 */
 		regs->ip = (unsigned long)cur->addr;
-		/*
-		 * Trap flag (TF) has been set here because this fault
-		 * happened where the single stepping will be done.
-		 * So clear it by resetting the current kprobe:
-		 */
-		regs->flags &= ~X86_EFLAGS_TF;
-		/*
-		 * Since the single step (trap) has been cancelled,
-		 * we need to restore BTF here.
-		 */
-		restore_btf();
 
 		/*
-		 * If the TF flag was set before the kprobe hit,
+		 * If the IF flag was set before the kprobe hit,
 		 * don't touch it:
 		 */
 		regs->flags |= kcb->kprobe_old_flags;
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 7f5aec758f0e..1235c85c8b3a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -890,9 +890,6 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
 	if ((dr6 & DR_STEP) && is_sysenter_singlestep(regs))
 		dr6 &= ~DR_STEP;
 
-	if (kprobe_debug_handler(regs))
-		goto out;
-
 	/*
 	 * The kernel doesn't use INT1
 	 */


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-02-23 23:24 Why do kprobes and uprobes singlestep? Andy Lutomirski
  2021-02-24  1:17 ` Masami Hiramatsu
@ 2021-03-01 16:51 ` Oleg Nesterov
  2021-03-02  1:36   ` Andy Lutomirski
  2021-03-02  2:22   ` Masami Hiramatsu
  1 sibling, 2 replies; 41+ messages in thread
From: Oleg Nesterov @ 2021-03-01 16:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Masami Hiramatsu, Peter Zijlstra, LKML, Anil S Keshavamurthy,
	David S. Miller, X86 ML, Andrew Cooper

Hi Andy,

sorry for delay.

On 02/23, Andy Lutomirski wrote:
>
> A while back, I let myself be convinced that kprobes genuinely need to
> single-step the kernel on occasion, and I decided that this sucked but
> I could live with it.  it would, however, be Really Really Nice (tm)
> if we could have a rule that anyone running x86 Linux who single-steps
> the kernel (e.g. kgdb and nothing else) gets to keep all the pieces
> when the system falls apart around them.  Specifically, if we don't
> allow kernel single-stepping and if we suitably limit kernel
> instruction breakpoints (the latter isn't actually a major problem),
> then we don't really really need to use IRET to return to the kernel,
> and that means we can avoid some massive NMI nastiness.

Not sure I understand you correctly, I know almost nothing about low-level
x86  magic.

But I guess this has nothing to do with uprobes, they do not single-step
in kernel mode, right?

> Uprobes seem to single-step user code for no discernable reason.
> (They want to trap after executing an out of line instruction, AFAICT.
> Surely INT3 or even CALL after the out-of-line insn would work as well
> or better.)

Uprobes use single-step from the very beginning, probably because this
is the most simple and "standard" way to implement xol.

And please note that CALL/JMP/etc emulation was added much later to fix the
problems with non-canonical addresses, and this emulation it still incomplete.

Oleg.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-01 16:51 ` Oleg Nesterov
@ 2021-03-02  1:36   ` Andy Lutomirski
  2021-03-02 20:24     ` Alexei Starovoitov
                       ` (2 more replies)
  2021-03-02  2:22   ` Masami Hiramatsu
  1 sibling, 3 replies; 41+ messages in thread
From: Andy Lutomirski @ 2021-03-02  1:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Masami Hiramatsu, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S. Miller, X86 ML, Andrew Cooper

On Mon, Mar 1, 2021 at 8:51 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> Hi Andy,
>
> sorry for delay.
>
> On 02/23, Andy Lutomirski wrote:
> >
> > A while back, I let myself be convinced that kprobes genuinely need to
> > single-step the kernel on occasion, and I decided that this sucked but
> > I could live with it.  it would, however, be Really Really Nice (tm)
> > if we could have a rule that anyone running x86 Linux who single-steps
> > the kernel (e.g. kgdb and nothing else) gets to keep all the pieces
> > when the system falls apart around them.  Specifically, if we don't
> > allow kernel single-stepping and if we suitably limit kernel
> > instruction breakpoints (the latter isn't actually a major problem),
> > then we don't really really need to use IRET to return to the kernel,
> > and that means we can avoid some massive NMI nastiness.
>
> Not sure I understand you correctly, I know almost nothing about low-level
> x86  magic.
>
> But I guess this has nothing to do with uprobes, they do not single-step
> in kernel mode, right?

They single-step user code, though, and the code that makes this work
is quite ugly.  Single-stepping on x86 is a mess.

>
> > Uprobes seem to single-step user code for no discernable reason.
> > (They want to trap after executing an out of line instruction, AFAICT.
> > Surely INT3 or even CALL after the out-of-line insn would work as well
> > or better.)
>
> Uprobes use single-step from the very beginning, probably because this
> is the most simple and "standard" way to implement xol.
>
> And please note that CALL/JMP/etc emulation was added much later to fix the
> problems with non-canonical addresses, and this emulation it still incomplete.

Is there something like a uprobe test suite?  How maintained /
actively used is uprobe?

--Andy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-01 16:51 ` Oleg Nesterov
  2021-03-02  1:36   ` Andy Lutomirski
@ 2021-03-02  2:22   ` Masami Hiramatsu
  2021-03-02  2:48     ` Andy Lutomirski
  2021-03-02 20:31     ` Oleg Nesterov
  1 sibling, 2 replies; 41+ messages in thread
From: Masami Hiramatsu @ 2021-03-02  2:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Masami Hiramatsu, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S. Miller, X86 ML, Andrew Cooper

Hi Oleg and Andy,

On Mon, 1 Mar 2021 17:51:31 +0100
Oleg Nesterov <oleg@redhat.com> wrote:

> Hi Andy,
> 
> sorry for delay.
> 
> On 02/23, Andy Lutomirski wrote:
> >
> > A while back, I let myself be convinced that kprobes genuinely need to
> > single-step the kernel on occasion, and I decided that this sucked but
> > I could live with it.  it would, however, be Really Really Nice (tm)
> > if we could have a rule that anyone running x86 Linux who single-steps
> > the kernel (e.g. kgdb and nothing else) gets to keep all the pieces
> > when the system falls apart around them.  Specifically, if we don't
> > allow kernel single-stepping and if we suitably limit kernel
> > instruction breakpoints (the latter isn't actually a major problem),
> > then we don't really really need to use IRET to return to the kernel,
> > and that means we can avoid some massive NMI nastiness.
> 
> Not sure I understand you correctly, I know almost nothing about low-level
> x86  magic.

x86 has normal interrupt and NMI. When an NMI occurs the CPU masks NMI
(the mask itself is hidden status) and IRET releases the mask. The problem
is that if an INT3 is hit in the NMI handler and does a single-stepping,
it has to use IRET for atomically setting TF and return.

> 
> But I guess this has nothing to do with uprobes, they do not single-step
> in kernel mode, right?

Agreed, if the problematic case is IRET from NMI handler, uprobes doesn't
hit because it only invoked from user-space.
Andy, what would you think?

> > Uprobes seem to single-step user code for no discernable reason.
> > (They want to trap after executing an out of line instruction, AFAICT.
> > Surely INT3 or even CALL after the out-of-line insn would work as well
> > or better.)
> 
> Uprobes use single-step from the very beginning, probably because this
> is the most simple and "standard" way to implement xol.
> 
> And please note that CALL/JMP/etc emulation was added much later to fix the
> problems with non-canonical addresses, and this emulation it still incomplete.

Yeah, I found another implementation of the emulation afterwards. Of cource
since uprobes only treat user-space, it maybe need more care.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-02  2:22   ` Masami Hiramatsu
@ 2021-03-02  2:48     ` Andy Lutomirski
  2021-03-02 20:31     ` Oleg Nesterov
  1 sibling, 0 replies; 41+ messages in thread
From: Andy Lutomirski @ 2021-03-02  2:48 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Oleg Nesterov, Andy Lutomirski, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S. Miller, X86 ML, Andrew Cooper

On Mon, Mar 1, 2021 at 6:22 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> Hi Oleg and Andy,
>
> On Mon, 1 Mar 2021 17:51:31 +0100
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Hi Andy,
> >
> > sorry for delay.
> >
> > On 02/23, Andy Lutomirski wrote:
> > >
> > > A while back, I let myself be convinced that kprobes genuinely need to
> > > single-step the kernel on occasion, and I decided that this sucked but
> > > I could live with it.  it would, however, be Really Really Nice (tm)
> > > if we could have a rule that anyone running x86 Linux who single-steps
> > > the kernel (e.g. kgdb and nothing else) gets to keep all the pieces
> > > when the system falls apart around them.  Specifically, if we don't
> > > allow kernel single-stepping and if we suitably limit kernel
> > > instruction breakpoints (the latter isn't actually a major problem),
> > > then we don't really really need to use IRET to return to the kernel,
> > > and that means we can avoid some massive NMI nastiness.
> >
> > Not sure I understand you correctly, I know almost nothing about low-level
> > x86  magic.
>
> x86 has normal interrupt and NMI. When an NMI occurs the CPU masks NMI
> (the mask itself is hidden status) and IRET releases the mask. The problem
> is that if an INT3 is hit in the NMI handler and does a single-stepping,
> it has to use IRET for atomically setting TF and return.
>
> >
> > But I guess this has nothing to do with uprobes, they do not single-step
> > in kernel mode, right?
>
> Agreed, if the problematic case is IRET from NMI handler, uprobes doesn't
> hit because it only invoked from user-space.
> Andy, what would you think?

Indeed, this isn't a problem for uprobes.  The problem for uprobes is
that all the notifiers from #DB are kind of messy, and I would like to
get rid of them if possible.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH 1/1] x86/kprobes: Use int3 instead of debug trap for single-step
  2021-03-01 14:08         ` [RFC PATCH 1/1] x86/kprobes: Use int3 instead of debug trap for single-step Masami Hiramatsu
@ 2021-03-02  8:06           ` Peter Zijlstra
  2021-03-02  8:38           ` Peter Zijlstra
  2021-03-02  8:41           ` Peter Zijlstra
  2 siblings, 0 replies; 41+ messages in thread
From: Peter Zijlstra @ 2021-03-02  8:06 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andy Lutomirski, Ingo Molnar, Oleg Nesterov, LKML,
	Anil S Keshavamurthy, David S . Miller, X86 ML, Andrew Cooper,
	Steven Rostedt

On Mon, Mar 01, 2021 at 11:08:15PM +0900, Masami Hiramatsu wrote:

> +static void kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs)
> +{
> +	unsigned long ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
> +
> +	ip += p->ainsn.rel32;
> +	int3_emulate_jmp(regs, ip);
> +}
> +NOKPROBE_SYMBOL(kprobe_emulate_jmp);

Would it make sense to have this be something like:

static void kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs, bool cond)
{
	unsigned long ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;

	if (cond)
		ip += p->ainsn.rel32;
	int3_emulate_jmp(regs, ip);
}

> +static void kprobe_emulate_jcc(struct kprobe *p, struct pt_regs *regs)
> +{
> +	bool invert = p->ainsn.jcc.type & 1;
> +	bool match;
> +
> +	if (p->ainsn.jcc.type < 0xc) {
> +		match = regs->flags & jcc_mask[p->ainsn.jcc.type >> 1];
> +	} else {
> +		match = ((regs->flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^
> +			((regs->flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT);
> +		if (p->ainsn.jcc.type >= 0xe)
> +			match = match && (regs->flags & X86_EFLAGS_ZF);
> +	}
> +	if ((match && !invert) || (!match && invert))
> +		kprobe_emulate_jmp(p, regs);
> +	else
> +		regs->ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;

Then you can do:

	kprobe_emulate_jmp(p, regs, match);

> +}
> +NOKPROBE_SYMBOL(kprobe_emulate_jcc);
> +
> +static void kprobe_emulate_loop(struct kprobe *p, struct pt_regs *regs)
> +{
> +	bool match;
> +
> +	if (p->ainsn.loop.type != 3) {	/* LOOP* */
> +		if (p->ainsn.loop.asize == 32)
> +			match = ((*(u32 *)&regs->cx)--) != 0;
> +#ifdef CONFIG_X86_64
> +		else if (p->ainsn.loop.asize == 64)
> +			match = ((*(u64 *)&regs->cx)--) != 0;
> +#endif
> +		else
> +			match = ((*(u16 *)&regs->cx)--) != 0;
> +	} else {			/* JCXZ */
> +		if (p->ainsn.loop.asize == 32)
> +			match = *(u32 *)(&regs->cx) == 0;
> +#ifdef CONFIG_X86_64
> +		else if (p->ainsn.loop.asize == 64)
> +			match = *(u64 *)(&regs->cx) == 0;
> +#endif
> +		else
> +			match = *(u16 *)(&regs->cx) == 0;
> +	}
> +
> +	if (p->ainsn.loop.type == 0)	/* LOOPNE */
> +		match = match && !(regs->flags & X86_EFLAGS_ZF);
> +	else if (p->ainsn.loop.type == 1)	/* LOOPE */
> +		match = match && (regs->flags & X86_EFLAGS_ZF);
> +
> +	if (match)
> +		kprobe_emulate_jmp(p, regs);
> +	else
> +		regs->ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;

and here.

> +}
> +NOKPROBE_SYMBOL(kprobe_emulate_loop);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH 1/1] x86/kprobes: Use int3 instead of debug trap for single-step
  2021-03-01 14:08         ` [RFC PATCH 1/1] x86/kprobes: Use int3 instead of debug trap for single-step Masami Hiramatsu
  2021-03-02  8:06           ` Peter Zijlstra
@ 2021-03-02  8:38           ` Peter Zijlstra
  2021-03-02  8:41           ` Peter Zijlstra
  2 siblings, 0 replies; 41+ messages in thread
From: Peter Zijlstra @ 2021-03-02  8:38 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andy Lutomirski, Ingo Molnar, Oleg Nesterov, LKML,
	Anil S Keshavamurthy, David S . Miller, X86 ML, Andrew Cooper,
	Steven Rostedt

On Mon, Mar 01, 2021 at 11:08:15PM +0900, Masami Hiramatsu wrote:

> +	default:
> +		if ((opcode & 0xf0) == 0x70) {
> +			/* 1 byte conditional jump */
> +			p->ainsn.emulate_op = kprobe_emulate_jcc;
> +			p->ainsn.jcc.type = opcode & 0xf;
> +			p->ainsn.rel32 = *(char *)insn->immediate.bytes;
> +		}
>  	}

Would it make sense to write that as:

	case 0x70 ... 0x7f:
		/* 1 byte conditional jump */
		p->ainsn.emulate_op = kprobe_emulate_jcc;
		p->ainsn.jcc.type = opcode & 0xf;
		p->ainsn.rel32 = *(char *)insn->immediate.bytes;
		break;

instead? Preferably right before the 0x0f case :-)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH 1/1] x86/kprobes: Use int3 instead of debug trap for single-step
  2021-03-01 14:08         ` [RFC PATCH 1/1] x86/kprobes: Use int3 instead of debug trap for single-step Masami Hiramatsu
  2021-03-02  8:06           ` Peter Zijlstra
  2021-03-02  8:38           ` Peter Zijlstra
@ 2021-03-02  8:41           ` Peter Zijlstra
  2021-03-02  8:54             ` Peter Zijlstra
  2 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2021-03-02  8:41 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andy Lutomirski, Ingo Molnar, Oleg Nesterov, LKML,
	Anil S Keshavamurthy, David S . Miller, X86 ML, Andrew Cooper,
	Steven Rostedt

On Mon, Mar 01, 2021 at 11:08:15PM +0900, Masami Hiramatsu wrote:
> Use int3 instead of debug trap exception for single-stepping the
> probed instructions. Some instructions which change the ip
> registers or modify IF flags are emulated because those are not
> able to be single-stepped by int3 or may allow the interrupt
> while single-stepping.
> 
> This actually changes the kprobes behavior.
> 
> - kprobes can not probe following instructions; int3, iret,
>   far jmp/call which get absolute address as immediate,
>   indirect far jmp/call, indirect near jmp/call with addressing
>   by memory (register-based indirect jmp/call are OK), and
>   vmcall/vmlaunch/vmresume/vmxoff.
> 
> - If the kprobe post_handler doesn't set before registering,
>   it may not be called in some case even if you set it afterwards.
>   (IOW, kprobe booster is enabled at registration, user can not
>    change it)
> 
> But both are rare issue, unsupported instructions will not be
> used in the kernel (or rarely used), and post_handlers are
> rarely used (I don't see it except for the test code).
> 
> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>

Very nice!

Aside of a few nits:

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH 1/1] x86/kprobes: Use int3 instead of debug trap for single-step
  2021-03-02  8:41           ` Peter Zijlstra
@ 2021-03-02  8:54             ` Peter Zijlstra
  2021-03-02 12:51               ` Masami Hiramatsu
  2021-03-02 13:58               ` Peter Zijlstra
  0 siblings, 2 replies; 41+ messages in thread
From: Peter Zijlstra @ 2021-03-02  8:54 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andy Lutomirski, Ingo Molnar, Oleg Nesterov, LKML,
	Anil S Keshavamurthy, David S . Miller, X86 ML, Andrew Cooper,
	Steven Rostedt

On Tue, Mar 02, 2021 at 09:41:32AM +0100, Peter Zijlstra wrote:

> Aside of a few nits:

something like so..

--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -457,13 +457,20 @@ static void kprobe_emulate_call(struct k
 }
 NOKPROBE_SYMBOL(kprobe_emulate_call);
 
-static void kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs)
+static __always_inline 
+void __kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs, bool cond)
 {
 	unsigned long ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
 
-	ip += p->ainsn.rel32;
+	if (cond)
+		ip += p->ainsn.rel32;
 	int3_emulate_jmp(regs, ip);
 }
+
+static void kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs)
+{
+	__kprobe_emulate_jmp(p, regs, true);
+}
 NOKPROBE_SYMBOL(kprobe_emulate_jmp);
 
 static const unsigned long jcc_mask[6] = {
@@ -488,10 +495,7 @@ static void kprobe_emulate_jcc(struct kp
 		if (p->ainsn.jcc.type >= 0xe)
 			match = match && (regs->flags & X86_EFLAGS_ZF);
 	}
-	if ((match && !invert) || (!match && invert))
-		kprobe_emulate_jmp(p, regs);
-	else
-		regs->ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
+	__kprobe_emulate_jmp(p, regs, (match && !invert) || (!match && invert));
 }
 NOKPROBE_SYMBOL(kprobe_emulate_jcc);
 
@@ -524,10 +528,7 @@ static void kprobe_emulate_loop(struct k
 	else if (p->ainsn.loop.type == 1)	/* LOOPE */
 		match = match && (regs->flags & X86_EFLAGS_ZF);
 
-	if (match)
-		kprobe_emulate_jmp(p, regs);
-	else
-		regs->ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
+	__kprobe_emulate_jmp(p, regs, match);
 }
 NOKPROBE_SYMBOL(kprobe_emulate_loop);
 
@@ -613,6 +614,12 @@ static int prepare_emulation(struct kpro
 		else
 			p->ainsn.rel32 = *(s32 *)&insn->immediate.value;
 		break;
+	case 0x70 ... 0x7f:
+		/* 1 byte conditional jump */
+		p->ainsn.emulate_op = kprobe_emulate_jcc;
+		p->ainsn.jcc.type = opcode & 0xf;
+		p->ainsn.rel32 = *(char *)insn->immediate.bytes;
+		break;
 	case 0x0f:
 		opcode = insn->opcode.bytes[1];
 		if ((opcode & 0xf0) == 0x80) {
@@ -667,12 +674,7 @@ static int prepare_emulation(struct kpro
 #endif
 		break;
 	default:
-		if ((opcode & 0xf0) == 0x70) {
-			/* 1 byte conditional jump */
-			p->ainsn.emulate_op = kprobe_emulate_jcc;
-			p->ainsn.jcc.type = opcode & 0xf;
-			p->ainsn.rel32 = *(char *)insn->immediate.bytes;
-		}
+		break;
 	}
 	p->ainsn.size = insn->length;
 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH 1/1] x86/kprobes: Use int3 instead of debug trap for single-step
  2021-03-02  8:54             ` Peter Zijlstra
@ 2021-03-02 12:51               ` Masami Hiramatsu
  2021-03-02 13:58               ` Peter Zijlstra
  1 sibling, 0 replies; 41+ messages in thread
From: Masami Hiramatsu @ 2021-03-02 12:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Ingo Molnar, Oleg Nesterov, LKML,
	Anil S Keshavamurthy, David S . Miller, X86 ML, Andrew Cooper,
	Steven Rostedt

On Tue, 2 Mar 2021 09:54:43 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Mar 02, 2021 at 09:41:32AM +0100, Peter Zijlstra wrote:
> 
> > Aside of a few nits:
> 
> something like so..

Thanks Peter,
I completely agree with that change!
Let me update the patch.


> 
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -457,13 +457,20 @@ static void kprobe_emulate_call(struct k
>  }
>  NOKPROBE_SYMBOL(kprobe_emulate_call);
>  
> -static void kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs)
> +static __always_inline 
> +void __kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs, bool cond)
>  {
>  	unsigned long ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
>  
> -	ip += p->ainsn.rel32;
> +	if (cond)
> +		ip += p->ainsn.rel32;
>  	int3_emulate_jmp(regs, ip);
>  }
> +
> +static void kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs)
> +{
> +	__kprobe_emulate_jmp(p, regs, true);
> +}
>  NOKPROBE_SYMBOL(kprobe_emulate_jmp);
>  
>  static const unsigned long jcc_mask[6] = {
> @@ -488,10 +495,7 @@ static void kprobe_emulate_jcc(struct kp
>  		if (p->ainsn.jcc.type >= 0xe)
>  			match = match && (regs->flags & X86_EFLAGS_ZF);
>  	}
> -	if ((match && !invert) || (!match && invert))
> -		kprobe_emulate_jmp(p, regs);
> -	else
> -		regs->ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
> +	__kprobe_emulate_jmp(p, regs, (match && !invert) || (!match && invert));
>  }
>  NOKPROBE_SYMBOL(kprobe_emulate_jcc);
>  
> @@ -524,10 +528,7 @@ static void kprobe_emulate_loop(struct k
>  	else if (p->ainsn.loop.type == 1)	/* LOOPE */
>  		match = match && (regs->flags & X86_EFLAGS_ZF);
>  
> -	if (match)
> -		kprobe_emulate_jmp(p, regs);
> -	else
> -		regs->ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
> +	__kprobe_emulate_jmp(p, regs, match);
>  }
>  NOKPROBE_SYMBOL(kprobe_emulate_loop);
>  
> @@ -613,6 +614,12 @@ static int prepare_emulation(struct kpro
>  		else
>  			p->ainsn.rel32 = *(s32 *)&insn->immediate.value;
>  		break;
> +	case 0x70 ... 0x7f:
> +		/* 1 byte conditional jump */
> +		p->ainsn.emulate_op = kprobe_emulate_jcc;
> +		p->ainsn.jcc.type = opcode & 0xf;
> +		p->ainsn.rel32 = *(char *)insn->immediate.bytes;
> +		break;
>  	case 0x0f:
>  		opcode = insn->opcode.bytes[1];
>  		if ((opcode & 0xf0) == 0x80) {
> @@ -667,12 +674,7 @@ static int prepare_emulation(struct kpro
>  #endif
>  		break;
>  	default:
> -		if ((opcode & 0xf0) == 0x70) {
> -			/* 1 byte conditional jump */
> -			p->ainsn.emulate_op = kprobe_emulate_jcc;
> -			p->ainsn.jcc.type = opcode & 0xf;
> -			p->ainsn.rel32 = *(char *)insn->immediate.bytes;
> -		}
> +		break;
>  	}
>  	p->ainsn.size = insn->length;
>  


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC PATCH 1/1] x86/kprobes: Use int3 instead of debug trap for single-step
  2021-03-02  8:54             ` Peter Zijlstra
  2021-03-02 12:51               ` Masami Hiramatsu
@ 2021-03-02 13:58               ` Peter Zijlstra
  1 sibling, 0 replies; 41+ messages in thread
From: Peter Zijlstra @ 2021-03-02 13:58 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andy Lutomirski, Ingo Molnar, Oleg Nesterov, LKML,
	Anil S Keshavamurthy, David S . Miller, X86 ML, Andrew Cooper,
	Steven Rostedt

On Tue, Mar 02, 2021 at 09:54:43AM +0100, Peter Zijlstra wrote:


> +static void kprobe_emulate_jcc(struct kprobe *p, struct pt_regs *regs)
> +{
> +	bool invert = p->ainsn.jcc.type & 1;
> +	bool match;
> +
> +	if (p->ainsn.jcc.type < 0xc) {
> +		match = regs->flags & jcc_mask[p->ainsn.jcc.type >> 1];
> +	} else {
> +		match = ((regs->flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^
> +			((regs->flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT);
> +		if (p->ainsn.jcc.type >= 0xe)
> +			match = match && (regs->flags & X86_EFLAGS_ZF);
> +	}
> +	__kprobe_emulate_jmp(p, regs, (match && !invert) || (!match && invert));

Also, isn't that: 'match ^ invert' ?

> +}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes
  2021-02-25  2:22     ` Masami Hiramatsu
  2021-02-25  6:03       ` Andy Lutomirski
  2021-03-01 14:08       ` [RFC PATCH 0/1] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
@ 2021-03-02 15:25       ` Masami Hiramatsu
  2021-03-02 15:25         ` [PATCH -tip 1/3] x86/kprobes: Retrieve correct opcode for group instruction Masami Hiramatsu
                           ` (3 more replies)
  2 siblings, 4 replies; 41+ messages in thread
From: Masami Hiramatsu @ 2021-03-02 15:25 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar
  Cc: Oleg Nesterov, Masami Hiramatsu, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S . Miller, X86 ML, Andrew Cooper,
	Steven Rostedt

Hi,

Here is a series of patches to remove the single-step debug trap from the
x86 kprobe.

The first 2 patches ([1/3][2/3]) are bugfixes which I've sent recently;

 https://lore.kernel.org/lkml/161425451732.93763.18329509061375062554.stgit@devnote2/

And [3/3] is actually the patch to remove single-step from kprobes. The RFC
version is here;

 https://lore.kernel.org/lkml/161460768474.430263.18425867006584111900.stgit@devnote2/

This uses int3 as Andy suggested instead of the debug trap, for removing the
IRET which returns to kernel.
Some instructions must be emulated and some instructions becomes not able
to be probed, but as far as I can see those are not rare case.

Thank you,

---

Masami Hiramatsu (3):
      x86/kprobes: Retrieve correct opcode for group instruction
      x86/kprobes: Identify far indirect JMP correctly
      x86/kprobes: Use int3 instead of debug trap for single-step


 arch/x86/include/asm/kprobes.h |   21 +-
 arch/x86/kernel/kprobes/core.c |  524 ++++++++++++++++++++++++++--------------
 arch/x86/kernel/traps.c        |    3 
 3 files changed, 358 insertions(+), 190 deletions(-)

--
Masami Hiramatsu (Linaro) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH -tip 1/3] x86/kprobes: Retrieve correct opcode for group instruction
  2021-03-02 15:25       ` [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
@ 2021-03-02 15:25         ` Masami Hiramatsu
  2021-03-23 15:15           ` [tip: x86/core] " tip-bot2 for Masami Hiramatsu
  2021-03-02 15:25         ` [PATCH -tip 2/3] x86/kprobes: Identify far indirect JMP correctly Masami Hiramatsu
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Masami Hiramatsu @ 2021-03-02 15:25 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar
  Cc: Oleg Nesterov, Masami Hiramatsu, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S . Miller, X86 ML, Andrew Cooper,
	Steven Rostedt

Since the opcodes start from 0xff are group5 instruction group which is
not 2 bytes opcode but the extended opcode determined by the MOD/RM byte.

The commit abd82e533d88 ("x86/kprobes: Do not decode opcode in resume_execution()")
used insn->opcode.bytes[1], but that is not correct. We have to refer
the insn->modrm.bytes[1] instead.

Fixes: abd82e533d88 ("x86/kprobes: Do not decode opcode in resume_execution()")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/x86/kernel/kprobes/core.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index df776cdca327..08674e7a5d7b 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -448,7 +448,11 @@ static void set_resume_flags(struct kprobe *p, struct insn *insn)
 		break;
 #endif
 	case 0xff:
-		opcode = insn->opcode.bytes[1];
+		/*
+		 * Since the 0xff is an extended group opcode, the instruction
+		 * is determined by the MOD/RM byte.
+		 */
+		opcode = insn->modrm.bytes[0];
 		if ((opcode & 0x30) == 0x10) {
 			/*
 			 * call absolute, indirect


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH -tip 2/3] x86/kprobes: Identify far indirect JMP correctly
  2021-03-02 15:25       ` [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
  2021-03-02 15:25         ` [PATCH -tip 1/3] x86/kprobes: Retrieve correct opcode for group instruction Masami Hiramatsu
@ 2021-03-02 15:25         ` Masami Hiramatsu
  2021-03-23 15:15           ` [tip: x86/core] " tip-bot2 for Masami Hiramatsu
  2021-03-02 15:25         ` [PATCH -tip 3/3] x86/kprobes: Use int3 instead of debug trap for single-step Masami Hiramatsu
  2021-03-17 14:55         ` [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
  3 siblings, 1 reply; 41+ messages in thread
From: Masami Hiramatsu @ 2021-03-02 15:25 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar
  Cc: Oleg Nesterov, Masami Hiramatsu, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S . Miller, X86 ML, Andrew Cooper,
	Steven Rostedt

Since Grp5 far indirect JMP is FF "mod 101 r/m", it should be
(modrm & 0x38) == 0x28, and near indirect JMP is also 0x38 == 0x20.
So we can mask modrm with 0x30 and check 0x20.
This is actually what the original code does, it also doesn't care
the last bit. So the result code is same.

Thus, I think this is just a cosmetic cleanup.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/x86/kernel/kprobes/core.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 08674e7a5d7b..be76568d57a5 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -462,8 +462,7 @@ static void set_resume_flags(struct kprobe *p, struct insn *insn)
 			p->ainsn.is_call = 1;
 			p->ainsn.is_abs_ip = 1;
 			break;
-		} else if (((opcode & 0x31) == 0x20) ||
-			   ((opcode & 0x31) == 0x21)) {
+		} else if ((opcode & 0x30) == 0x20) {
 			/*
 			 * jmp near and far, absolute indirect
 			 * ip is correct.


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH -tip 3/3] x86/kprobes: Use int3 instead of debug trap for single-step
  2021-03-02 15:25       ` [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
  2021-03-02 15:25         ` [PATCH -tip 1/3] x86/kprobes: Retrieve correct opcode for group instruction Masami Hiramatsu
  2021-03-02 15:25         ` [PATCH -tip 2/3] x86/kprobes: Identify far indirect JMP correctly Masami Hiramatsu
@ 2021-03-02 15:25         ` Masami Hiramatsu
  2021-03-23 15:15           ` [tip: x86/core] " tip-bot2 for Masami Hiramatsu
  2021-03-17 14:55         ` [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
  3 siblings, 1 reply; 41+ messages in thread
From: Masami Hiramatsu @ 2021-03-02 15:25 UTC (permalink / raw)
  To: Andy Lutomirski, Ingo Molnar
  Cc: Oleg Nesterov, Masami Hiramatsu, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S . Miller, X86 ML, Andrew Cooper,
	Steven Rostedt

Use int3 instead of debug trap exception for single-stepping the
probed instructions. Some instructions which change the ip
registers or modify IF flags are emulated because those are not
able to be single-stepped by int3 or may allow the interrupt
while single-stepping.

This actually changes the kprobes behavior.

- kprobes can not probe following instructions; int3, iret,
  far jmp/call which get absolute address as immediate,
  indirect far jmp/call, indirect near jmp/call with addressing
  by memory (register-based indirect jmp/call are OK), and
  vmcall/vmlaunch/vmresume/vmxoff.

- If the kprobe post_handler doesn't set before registering,
  it may not be called in some case even if you set it afterwards.
  (IOW, kprobe booster is enabled at registration, user can not
   change it)

But both are rare issue, unsupported instructions will not be
used in the kernel (or rarely used), and post_handlers are
rarely used (I don't see it except for the test code).

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Changes in v1:
  - According to Peter's suggestion, merge the conditional jump
    emulation code and use range style of switch-case for 1 byte
    conditional jump
---
 arch/x86/include/asm/kprobes.h |   21 +-
 arch/x86/kernel/kprobes/core.c |  517 ++++++++++++++++++++++++++--------------
 arch/x86/kernel/traps.c        |    3 
 3 files changed, 353 insertions(+), 188 deletions(-)

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index d20a3d6be36e..bd7f5886a789 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -65,10 +65,22 @@ struct arch_specific_insn {
 	 * a post_handler).
 	 */
 	unsigned boostable:1;
-	unsigned if_modifier:1;
-	unsigned is_call:1;
-	unsigned is_pushf:1;
-	unsigned is_abs_ip:1;
+	unsigned char size;	/* The size of insn */
+	union {
+		unsigned char opcode;
+		struct {
+			unsigned char type;
+		} jcc;
+		struct {
+			unsigned char type;
+			unsigned char asize;
+		} loop;
+		struct {
+			unsigned char reg;
+		} indirect;
+	};
+	s32 rel32;	/* relative offset must be s32, s16, or s8 */
+	void (*emulate_op)(struct kprobe *p, struct pt_regs *regs);
 	/* Number of bytes of text poked */
 	int tp_len;
 };
@@ -107,7 +119,6 @@ extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
 extern int kprobe_exceptions_notify(struct notifier_block *self,
 				    unsigned long val, void *data);
 extern int kprobe_int3_handler(struct pt_regs *regs);
-extern int kprobe_debug_handler(struct pt_regs *regs);
 
 #else
 
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index be76568d57a5..1f58e89eeccd 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -177,6 +177,9 @@ int can_boost(struct insn *insn, void *addr)
 	case 0xf0:
 		/* clear and set flags are boostable */
 		return (opcode == 0xf5 || (0xf7 < opcode && opcode < 0xfe));
+	case 0xff:
+		/* indirect jmp is boostable */
+		return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
 	default:
 		/* CS override prefix and call are not boostable */
 		return (opcode != 0x2e && opcode != 0x9a);
@@ -357,13 +360,14 @@ int __copy_instruction(u8 *dest, u8 *src, u8 *real, struct insn *insn)
 	return insn->length;
 }
 
-/* Prepare reljump right after instruction to boost */
-static int prepare_boost(kprobe_opcode_t *buf, struct kprobe *p,
-			  struct insn *insn)
+/* Prepare reljump or int3 right after instruction */
+static int prepare_singlestep(kprobe_opcode_t *buf, struct kprobe *p,
+			      struct insn *insn)
 {
 	int len = insn->length;
 
-	if (can_boost(insn, p->addr) &&
+	if (!IS_ENABLED(CONFIG_PREEMPTION) &&
+	    !p->post_handler && can_boost(insn, p->addr) &&
 	    MAX_INSN_SIZE - len >= JMP32_INSN_SIZE) {
 		/*
 		 * These instructions can be executed directly if it
@@ -374,7 +378,12 @@ static int prepare_boost(kprobe_opcode_t *buf, struct kprobe *p,
 		len += JMP32_INSN_SIZE;
 		p->ainsn.boostable = 1;
 	} else {
-		p->ainsn.boostable = 0;
+		/* Otherwise, put an int3 for trapping singlestep */
+		if (MAX_INSN_SIZE - len < INT3_INSN_SIZE)
+			return -ENOSPC;
+
+		buf[len] = INT3_INSN_OPCODE;
+		len += INT3_INSN_SIZE;
 	}
 
 	return len;
@@ -411,42 +420,232 @@ void free_insn_page(void *page)
 	module_memfree(page);
 }
 
-static void set_resume_flags(struct kprobe *p, struct insn *insn)
+/* Kprobe x86 instruction emulation - only regs->ip or IF flag modifiers */
+
+static void kprobe_emulate_ifmodifiers(struct kprobe *p, struct pt_regs *regs)
+{
+	switch (p->ainsn.opcode) {
+	case 0xfa:	/* cli */
+		regs->flags &= ~(X86_EFLAGS_IF);
+		break;
+	case 0xfb:	/* sti */
+		regs->flags |= X86_EFLAGS_IF;
+		break;
+	case 0x9c:	/* pushf */
+		int3_emulate_push(regs, regs->flags);
+		break;
+	case 0x9d:	/* popf */
+		regs->flags = int3_emulate_pop(regs);
+		break;
+	}
+	regs->ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
+}
+NOKPROBE_SYMBOL(kprobe_emulate_ifmodifiers);
+
+static void kprobe_emulate_ret(struct kprobe *p, struct pt_regs *regs)
+{
+	int3_emulate_ret(regs);
+}
+NOKPROBE_SYMBOL(kprobe_emulate_ret);
+
+static void kprobe_emulate_call(struct kprobe *p, struct pt_regs *regs)
+{
+	unsigned long func = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
+
+	func += p->ainsn.rel32;
+	int3_emulate_call(regs, func);
+}
+NOKPROBE_SYMBOL(kprobe_emulate_call);
+
+static nokprobe_inline
+void __kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs, bool cond)
+{
+	unsigned long ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
+
+	if (cond)
+		ip += p->ainsn.rel32;
+	int3_emulate_jmp(regs, ip);
+}
+
+static void kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs)
+{
+	__kprobe_emulate_jmp(p, regs, true);
+}
+NOKPROBE_SYMBOL(kprobe_emulate_jmp);
+
+static const unsigned long jcc_mask[6] = {
+	[0] = X86_EFLAGS_OF,
+	[1] = X86_EFLAGS_CF,
+	[2] = X86_EFLAGS_ZF,
+	[3] = X86_EFLAGS_CF | X86_EFLAGS_ZF,
+	[4] = X86_EFLAGS_SF,
+	[5] = X86_EFLAGS_PF,
+};
+
+static void kprobe_emulate_jcc(struct kprobe *p, struct pt_regs *regs)
+{
+	bool invert = p->ainsn.jcc.type & 1;
+	bool match;
+
+	if (p->ainsn.jcc.type < 0xc) {
+		match = regs->flags & jcc_mask[p->ainsn.jcc.type >> 1];
+	} else {
+		match = ((regs->flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^
+			((regs->flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT);
+		if (p->ainsn.jcc.type >= 0xe)
+			match = match && (regs->flags & X86_EFLAGS_ZF);
+	}
+	__kprobe_emulate_jmp(p, regs, (match && !invert) || (!match && invert));
+}
+NOKPROBE_SYMBOL(kprobe_emulate_jcc);
+
+static void kprobe_emulate_loop(struct kprobe *p, struct pt_regs *regs)
+{
+	bool match;
+
+	if (p->ainsn.loop.type != 3) {	/* LOOP* */
+		if (p->ainsn.loop.asize == 32)
+			match = ((*(u32 *)&regs->cx)--) != 0;
+#ifdef CONFIG_X86_64
+		else if (p->ainsn.loop.asize == 64)
+			match = ((*(u64 *)&regs->cx)--) != 0;
+#endif
+		else
+			match = ((*(u16 *)&regs->cx)--) != 0;
+	} else {			/* JCXZ */
+		if (p->ainsn.loop.asize == 32)
+			match = *(u32 *)(&regs->cx) == 0;
+#ifdef CONFIG_X86_64
+		else if (p->ainsn.loop.asize == 64)
+			match = *(u64 *)(&regs->cx) == 0;
+#endif
+		else
+			match = *(u16 *)(&regs->cx) == 0;
+	}
+
+	if (p->ainsn.loop.type == 0)	/* LOOPNE */
+		match = match && !(regs->flags & X86_EFLAGS_ZF);
+	else if (p->ainsn.loop.type == 1)	/* LOOPE */
+		match = match && (regs->flags & X86_EFLAGS_ZF);
+
+	__kprobe_emulate_jmp(p, regs, match);
+}
+NOKPROBE_SYMBOL(kprobe_emulate_loop);
+
+static const int addrmode_regoffs[] = {
+	offsetof(struct pt_regs, ax),
+	offsetof(struct pt_regs, cx),
+	offsetof(struct pt_regs, dx),
+	offsetof(struct pt_regs, bx),
+	offsetof(struct pt_regs, sp),
+	offsetof(struct pt_regs, bp),
+	offsetof(struct pt_regs, si),
+	offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+	offsetof(struct pt_regs, r8),
+	offsetof(struct pt_regs, r9),
+	offsetof(struct pt_regs, r10),
+	offsetof(struct pt_regs, r11),
+	offsetof(struct pt_regs, r12),
+	offsetof(struct pt_regs, r13),
+	offsetof(struct pt_regs, r14),
+	offsetof(struct pt_regs, r15),
+#endif
+};
+
+static void kprobe_emulate_call_indirect(struct kprobe *p, struct pt_regs *regs)
+{
+	unsigned long offs = addrmode_regoffs[p->ainsn.indirect.reg];
+
+	int3_emulate_call(regs, regs_get_register(regs, offs));
+}
+NOKPROBE_SYMBOL(kprobe_emulate_call_indirect);
+
+static void kprobe_emulate_jmp_indirect(struct kprobe *p, struct pt_regs *regs)
+{
+	unsigned long offs = addrmode_regoffs[p->ainsn.indirect.reg];
+
+	int3_emulate_jmp(regs, regs_get_register(regs, offs));
+}
+NOKPROBE_SYMBOL(kprobe_emulate_jmp_indirect);
+
+static int prepare_emulation(struct kprobe *p, struct insn *insn)
 {
 	insn_byte_t opcode = insn->opcode.bytes[0];
 
 	switch (opcode) {
 	case 0xfa:		/* cli */
 	case 0xfb:		/* sti */
+	case 0x9c:		/* pushfl */
 	case 0x9d:		/* popf/popfd */
-		/* Check whether the instruction modifies Interrupt Flag or not */
-		p->ainsn.if_modifier = 1;
-		break;
-	case 0x9c:	/* pushfl */
-		p->ainsn.is_pushf = 1;
+		/*
+		 * IF modifiers must be emulated since it will enable interrupt while
+		 * int3 single stepping.
+		 */
+		p->ainsn.emulate_op = kprobe_emulate_ifmodifiers;
+		p->ainsn.opcode = opcode;
 		break;
-	case 0xcf:	/* iret */
-		p->ainsn.if_modifier = 1;
-		fallthrough;
 	case 0xc2:	/* ret/lret */
 	case 0xc3:
 	case 0xca:
 	case 0xcb:
-	case 0xea:	/* jmp absolute -- ip is correct */
-		/* ip is already adjusted, no more changes required */
-		p->ainsn.is_abs_ip = 1;
-		/* Without resume jump, this is boostable */
-		p->ainsn.boostable = 1;
+		p->ainsn.emulate_op = kprobe_emulate_ret;
 		break;
-	case 0xe8:	/* call relative - Fix return addr */
-		p->ainsn.is_call = 1;
+	case 0x9a:	/* far call absolute -- segment is not supported */
+	case 0xea:	/* far jmp absolute -- segment is not supported */
+	case 0xcc:	/* int3 */
+	case 0xcf:	/* iret -- in-kernel IRET is not supported */
+		return -EOPNOTSUPP;
 		break;
-#ifdef CONFIG_X86_32
-	case 0x9a:	/* call absolute -- same as call absolute, indirect */
-		p->ainsn.is_call = 1;
-		p->ainsn.is_abs_ip = 1;
+	case 0xe8:	/* near call relative */
+		p->ainsn.emulate_op = kprobe_emulate_call;
+		if (insn->immediate.nbytes == 2)
+			p->ainsn.rel32 = *(s16 *)&insn->immediate.value;
+		else
+			p->ainsn.rel32 = *(s32 *)&insn->immediate.value;
+		break;
+	case 0xeb:	/* short jump relative */
+	case 0xe9:	/* near jump relative */
+		p->ainsn.emulate_op = kprobe_emulate_jmp;
+		if (insn->immediate.nbytes == 1)
+			p->ainsn.rel32 = *(s8 *)&insn->immediate.value;
+		else if (insn->immediate.nbytes == 2)
+			p->ainsn.rel32 = *(s16 *)&insn->immediate.value;
+		else
+			p->ainsn.rel32 = *(s32 *)&insn->immediate.value;
+		break;
+	case 0x70 ... 0x7f:
+		/* 1 byte conditional jump */
+		p->ainsn.emulate_op = kprobe_emulate_jcc;
+		p->ainsn.jcc.type = opcode & 0xf;
+		p->ainsn.rel32 = *(char *)insn->immediate.bytes;
+		break;
+	case 0x0f:
+		opcode = insn->opcode.bytes[1];
+		if ((opcode & 0xf0) == 0x80) {
+			/* 2 bytes Conditional Jump */
+			p->ainsn.emulate_op = kprobe_emulate_jcc;
+			p->ainsn.jcc.type = opcode & 0xf;
+			if (insn->immediate.nbytes == 2)
+				p->ainsn.rel32 = *(s16 *)&insn->immediate.value;
+			else
+				p->ainsn.rel32 = *(s32 *)&insn->immediate.value;
+		} else if (opcode == 0x01 &&
+			   X86_MODRM_REG(insn->modrm.bytes[0]) == 0 &&
+			   X86_MODRM_MOD(insn->modrm.bytes[0]) == 3) {
+			/* VM extensions - not supported */
+			return -EOPNOTSUPP;
+		}
+		break;
+	case 0xe0:	/* Loop NZ */
+	case 0xe1:	/* Loop */
+	case 0xe2:	/* Loop */
+	case 0xe3:	/* J*CXZ */
+		p->ainsn.emulate_op = kprobe_emulate_loop;
+		p->ainsn.loop.type = opcode & 0x3;
+		p->ainsn.loop.asize = insn->addr_bytes * 8;
+		p->ainsn.rel32 = *(s8 *)&insn->immediate.value;
 		break;
-#endif
 	case 0xff:
 		/*
 		 * Since the 0xff is an extended group opcode, the instruction
@@ -454,46 +653,57 @@ static void set_resume_flags(struct kprobe *p, struct insn *insn)
 		 */
 		opcode = insn->modrm.bytes[0];
 		if ((opcode & 0x30) == 0x10) {
-			/*
-			 * call absolute, indirect
-			 * Fix return addr; ip is correct.
-			 * But this is not boostable
-			 */
-			p->ainsn.is_call = 1;
-			p->ainsn.is_abs_ip = 1;
-			break;
+			if ((opcode & 0x8) == 0x8)
+				return -EOPNOTSUPP;	/* far call */
+			/* call absolute, indirect */
+			p->ainsn.emulate_op = kprobe_emulate_call_indirect;
 		} else if ((opcode & 0x30) == 0x20) {
-			/*
-			 * jmp near and far, absolute indirect
-			 * ip is correct.
-			 */
-			p->ainsn.is_abs_ip = 1;
-			/* Without resume jump, this is boostable */
-			p->ainsn.boostable = 1;
-		}
+			if ((opcode & 0x8) == 0x8)
+				return -EOPNOTSUPP;	/* far jmp */
+			/* jmp near absolute indirect */
+			p->ainsn.emulate_op = kprobe_emulate_jmp_indirect;
+		} else
+			break;
+
+		if (insn->addr_bytes != sizeof(unsigned long))
+			return -EOPNOTSUPP;	/* Don't support differnt size */
+		if (X86_MODRM_MOD(opcode) != 3)
+			return -EOPNOTSUPP;	/* TODO: support memory addressing */
+
+		p->ainsn.indirect.reg = X86_MODRM_RM(opcode);
+#ifdef CONFIG_X86_64
+		if (X86_REX_B(insn->rex_prefix.value))
+			p->ainsn.indirect.reg += 8;
+#endif
+		break;
+	default:
 		break;
 	}
+	p->ainsn.size = insn->length;
+
+	return 0;
 }
 
 static int arch_copy_kprobe(struct kprobe *p)
 {
 	struct insn insn;
 	kprobe_opcode_t buf[MAX_INSN_SIZE];
-	int len;
+	int ret, len;
 
 	/* Copy an instruction with recovering if other optprobe modifies it.*/
 	len = __copy_instruction(buf, p->addr, p->ainsn.insn, &insn);
 	if (!len)
 		return -EINVAL;
 
-	/*
-	 * __copy_instruction can modify the displacement of the instruction,
-	 * but it doesn't affect boostable check.
-	 */
-	len = prepare_boost(buf, p, &insn);
+	/* Analyze the opcode and setup emulate functions */
+	ret = prepare_emulation(p, &insn);
+	if (ret < 0)
+		return ret;
 
-	/* Analyze the opcode and set resume flags */
-	set_resume_flags(p, &insn);
+	/* Add int3 for single-step or booster jmp */
+	len = prepare_singlestep(buf, p, &insn);
+	if (len < 0)
+		return len;
 
 	/* Also, displacement change doesn't affect the first byte */
 	p->opcode = buf[0];
@@ -586,29 +796,7 @@ set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
 {
 	__this_cpu_write(current_kprobe, p);
 	kcb->kprobe_saved_flags = kcb->kprobe_old_flags
-		= (regs->flags & (X86_EFLAGS_TF | X86_EFLAGS_IF));
-	if (p->ainsn.if_modifier)
-		kcb->kprobe_saved_flags &= ~X86_EFLAGS_IF;
-}
-
-static nokprobe_inline void clear_btf(void)
-{
-	if (test_thread_flag(TIF_BLOCKSTEP)) {
-		unsigned long debugctl = get_debugctlmsr();
-
-		debugctl &= ~DEBUGCTLMSR_BTF;
-		update_debugctlmsr(debugctl);
-	}
-}
-
-static nokprobe_inline void restore_btf(void)
-{
-	if (test_thread_flag(TIF_BLOCKSTEP)) {
-		unsigned long debugctl = get_debugctlmsr();
-
-		debugctl |= DEBUGCTLMSR_BTF;
-		update_debugctlmsr(debugctl);
-	}
+		= (regs->flags & X86_EFLAGS_IF);
 }
 
 void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
@@ -623,6 +811,22 @@ void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(arch_prepare_kretprobe);
 
+static void kprobe_post_process(struct kprobe *cur, struct pt_regs *regs,
+			       struct kprobe_ctlblk *kcb)
+{
+	if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
+		kcb->kprobe_status = KPROBE_HIT_SSDONE;
+		cur->post_handler(cur, regs, 0);
+	}
+
+	/* Restore back the original saved kprobes variables and continue. */
+	if (kcb->kprobe_status == KPROBE_REENTER)
+		restore_previous_kprobe(kcb);
+	else
+		reset_current_kprobe();
+}
+NOKPROBE_SYMBOL(kprobe_post_process);
+
 static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 			     struct kprobe_ctlblk *kcb, int reenter)
 {
@@ -630,7 +834,7 @@ static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 		return;
 
 #if !defined(CONFIG_PREEMPTION)
-	if (p->ainsn.boostable && !p->post_handler) {
+	if (p->ainsn.boostable) {
 		/* Boost up -- we can execute copied instructions directly */
 		if (!reenter)
 			reset_current_kprobe();
@@ -649,18 +853,50 @@ static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 		kcb->kprobe_status = KPROBE_REENTER;
 	} else
 		kcb->kprobe_status = KPROBE_HIT_SS;
-	/* Prepare real single stepping */
-	clear_btf();
-	regs->flags |= X86_EFLAGS_TF;
+
+	if (p->ainsn.emulate_op) {
+		p->ainsn.emulate_op(p, regs);
+		kprobe_post_process(p, regs, kcb);
+		return;
+	}
+
+	/* Disable interrupt, and set ip register on trampoline */
 	regs->flags &= ~X86_EFLAGS_IF;
-	/* single step inline if the instruction is an int3 */
-	if (p->opcode == INT3_INSN_OPCODE)
-		regs->ip = (unsigned long)p->addr;
-	else
-		regs->ip = (unsigned long)p->ainsn.insn;
+	regs->ip = (unsigned long)p->ainsn.insn;
 }
 NOKPROBE_SYMBOL(setup_singlestep);
 
+/*
+ * Called after single-stepping.  p->addr is the address of the
+ * instruction whose first byte has been replaced by the "int3"
+ * instruction.  To avoid the SMP problems that can occur when we
+ * temporarily put back the original opcode to single-step, we
+ * single-stepped a copy of the instruction.  The address of this
+ * copy is p->ainsn.insn. We also doesn't use trap, but "int3" again
+ * right after the copied instruction.
+ * Different from the trap single-step, "int3" single-step can not
+ * handle the instruction which changes the ip register, e.g. jmp,
+ * call, conditional jmp, and the instructions which changes the IF
+ * flags because interrupt must be disabled around the single-stepping.
+ * Such instructions are software emulated, but others are single-stepped
+ * using "int3".
+ *
+ * When the 2nd "int3" handled, the regs->ip and regs->flags needs to
+ * be adjusted, so that we can resume execution on correct code.
+ */
+static void resume_singlestep(struct kprobe *p, struct pt_regs *regs,
+			      struct kprobe_ctlblk *kcb)
+{
+	unsigned long copy_ip = (unsigned long)p->ainsn.insn;
+	unsigned long orig_ip = (unsigned long)p->addr;
+
+	/* Restore saved interrupt flag and ip register */
+	regs->flags |= kcb->kprobe_saved_flags;
+	/* Note that regs->ip is executed int3 so must be a step back */
+	regs->ip += (orig_ip - copy_ip) - INT3_INSN_SIZE;
+}
+NOKPROBE_SYMBOL(resume_singlestep);
+
 /*
  * We have reentered the kprobe_handler(), since another probe was hit while
  * within the handler. We save the original kprobes variables and just single
@@ -696,6 +932,12 @@ static int reenter_kprobe(struct kprobe *p, struct pt_regs *regs,
 }
 NOKPROBE_SYMBOL(reenter_kprobe);
 
+static int nokprobe_inline kprobe_is_ss(struct kprobe_ctlblk *kcb)
+{
+	return (kcb->kprobe_status == KPROBE_HIT_SS ||
+		kcb->kprobe_status == KPROBE_REENTER);
+}
+
 /*
  * Interrupts are disabled on entry as trap3 is an interrupt gate and they
  * remain disabled throughout this function.
@@ -740,7 +982,18 @@ int kprobe_int3_handler(struct pt_regs *regs)
 				reset_current_kprobe();
 			return 1;
 		}
-	} else if (*addr != INT3_INSN_OPCODE) {
+	} else if (kprobe_is_ss(kcb)) {
+		p = kprobe_running();
+		if ((unsigned long)p->ainsn.insn < regs->ip &&
+		    (unsigned long)p->ainsn.insn + MAX_INSN_SIZE > regs->ip) {
+			/* Most provably this is the second int3 for singlestep */
+			resume_singlestep(p, regs, kcb);
+			kprobe_post_process(p, regs, kcb);
+			return 1;
+		}
+	}
+
+	if (*addr != INT3_INSN_OPCODE) {
 		/*
 		 * The breakpoint instruction was removed right
 		 * after we hit it.  Another cpu has removed
@@ -813,91 +1066,6 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(trampoline_handler);
 
-/*
- * Called after single-stepping.  p->addr is the address of the
- * instruction whose first byte has been replaced by the "int 3"
- * instruction.  To avoid the SMP problems that can occur when we
- * temporarily put back the original opcode to single-step, we
- * single-stepped a copy of the instruction.  The address of this
- * copy is p->ainsn.insn.
- *
- * This function prepares to return from the post-single-step
- * interrupt.  We have to fix up the stack as follows:
- *
- * 0) Except in the case of absolute or indirect jump or call instructions,
- * the new ip is relative to the copied instruction.  We need to make
- * it relative to the original instruction.
- *
- * 1) If the single-stepped instruction was pushfl, then the TF and IF
- * flags are set in the just-pushed flags, and may need to be cleared.
- *
- * 2) If the single-stepped instruction was a call, the return address
- * that is atop the stack is the address following the copied instruction.
- * We need to make it the address following the original instruction.
- */
-static void resume_execution(struct kprobe *p, struct pt_regs *regs,
-			     struct kprobe_ctlblk *kcb)
-{
-	unsigned long *tos = stack_addr(regs);
-	unsigned long copy_ip = (unsigned long)p->ainsn.insn;
-	unsigned long orig_ip = (unsigned long)p->addr;
-
-	regs->flags &= ~X86_EFLAGS_TF;
-
-	/* Fixup the contents of top of stack */
-	if (p->ainsn.is_pushf) {
-		*tos &= ~(X86_EFLAGS_TF | X86_EFLAGS_IF);
-		*tos |= kcb->kprobe_old_flags;
-	} else if (p->ainsn.is_call) {
-		*tos = orig_ip + (*tos - copy_ip);
-	}
-
-	if (!p->ainsn.is_abs_ip)
-		regs->ip += orig_ip - copy_ip;
-
-	restore_btf();
-}
-NOKPROBE_SYMBOL(resume_execution);
-
-/*
- * Interrupts are disabled on entry as trap1 is an interrupt gate and they
- * remain disabled throughout this function.
- */
-int kprobe_debug_handler(struct pt_regs *regs)
-{
-	struct kprobe *cur = kprobe_running();
-	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
-	if (!cur)
-		return 0;
-
-	resume_execution(cur, regs, kcb);
-	regs->flags |= kcb->kprobe_saved_flags;
-
-	if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
-		kcb->kprobe_status = KPROBE_HIT_SSDONE;
-		cur->post_handler(cur, regs, 0);
-	}
-
-	/* Restore back the original saved kprobes variables and continue. */
-	if (kcb->kprobe_status == KPROBE_REENTER) {
-		restore_previous_kprobe(kcb);
-		goto out;
-	}
-	reset_current_kprobe();
-out:
-	/*
-	 * if somebody else is singlestepping across a probe point, flags
-	 * will have TF set, in which case, continue the remaining processing
-	 * of do_debug, as if this is not a probe hit.
-	 */
-	if (regs->flags & X86_EFLAGS_TF)
-		return 0;
-
-	return 1;
-}
-NOKPROBE_SYMBOL(kprobe_debug_handler);
-
 int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
 {
 	struct kprobe *cur = kprobe_running();
@@ -915,20 +1083,9 @@ int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
 		 * normal page fault.
 		 */
 		regs->ip = (unsigned long)cur->addr;
-		/*
-		 * Trap flag (TF) has been set here because this fault
-		 * happened where the single stepping will be done.
-		 * So clear it by resetting the current kprobe:
-		 */
-		regs->flags &= ~X86_EFLAGS_TF;
-		/*
-		 * Since the single step (trap) has been cancelled,
-		 * we need to restore BTF here.
-		 */
-		restore_btf();
 
 		/*
-		 * If the TF flag was set before the kprobe hit,
+		 * If the IF flag was set before the kprobe hit,
 		 * don't touch it:
 		 */
 		regs->flags |= kcb->kprobe_old_flags;
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 7f5aec758f0e..1235c85c8b3a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -890,9 +890,6 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
 	if ((dr6 & DR_STEP) && is_sysenter_singlestep(regs))
 		dr6 &= ~DR_STEP;
 
-	if (kprobe_debug_handler(regs))
-		goto out;
-
 	/*
 	 * The kernel doesn't use INT1
 	 */


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-02  1:36   ` Andy Lutomirski
@ 2021-03-02 20:24     ` Alexei Starovoitov
  2021-03-02 21:02       ` Andy Lutomirski
  2021-03-02 20:25     ` Oleg Nesterov
  2021-03-02 20:28     ` Oleg Nesterov
  2 siblings, 1 reply; 41+ messages in thread
From: Alexei Starovoitov @ 2021-03-02 20:24 UTC (permalink / raw)
  To: Andy Lutomirski, bpf
  Cc: Oleg Nesterov, Masami Hiramatsu, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S. Miller, X86 ML, Andrew Cooper

On Tue, Mar 2, 2021 at 10:38 AM Andy Lutomirski <luto@kernel.org> wrote:
>
> Is there something like a uprobe test suite?  How maintained /
> actively used is uprobe?

uprobe+bpf is heavily used in production.
selftests/bpf has only one test for it though.

Why are you asking?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-02  1:36   ` Andy Lutomirski
  2021-03-02 20:24     ` Alexei Starovoitov
@ 2021-03-02 20:25     ` Oleg Nesterov
  2021-03-02 20:35       ` Andy Lutomirski
  2021-03-02 20:28     ` Oleg Nesterov
  2 siblings, 1 reply; 41+ messages in thread
From: Oleg Nesterov @ 2021-03-02 20:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Masami Hiramatsu, Peter Zijlstra, LKML, Anil S Keshavamurthy,
	David S. Miller, X86 ML, Andrew Cooper

On 03/01, Andy Lutomirski wrote:
>
> On Mon, Mar 1, 2021 at 8:51 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > But I guess this has nothing to do with uprobes, they do not single-step
> > in kernel mode, right?
>
> They single-step user code, though, and the code that makes this work
> is quite ugly.  Single-stepping on x86 is a mess.

But this doesn't really differ from, say, gdb doing si ? OK, except uprobes
have to hook DIE_DEBUG. Nevermind...

> > > Uprobes seem to single-step user code for no discernable reason.
> > > (They want to trap after executing an out of line instruction, AFAICT.
> > > Surely INT3 or even CALL after the out-of-line insn would work as well
> > > or better.)
> >
> > Uprobes use single-step from the very beginning, probably because this
> > is the most simple and "standard" way to implement xol.
> >
> > And please note that CALL/JMP/etc emulation was added much later to fix the
> > problems with non-canonical addresses, and this emulation it still incomplete.
>
> Is there something like a uprobe test suite?

Afaik, no.

> How maintained /

Add Srikar who sent the initial implementation. I can only say that I am glad that
./scripts/get_maintainer.pl no longer mentions me ;) I did some changes (including
emulation) but a) this was a long ago and b) only because I was forced^W asked to
fix the numerous bugs in this code.

> actively used is uprobe?

I have no idea, sorry ;)

Oleg.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-02  1:36   ` Andy Lutomirski
  2021-03-02 20:24     ` Alexei Starovoitov
  2021-03-02 20:25     ` Oleg Nesterov
@ 2021-03-02 20:28     ` Oleg Nesterov
  2 siblings, 0 replies; 41+ messages in thread
From: Oleg Nesterov @ 2021-03-02 20:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Masami Hiramatsu, Peter Zijlstra, LKML, Anil S Keshavamurthy,
	David S. Miller, X86 ML, Andrew Cooper, Srikar Dronamraju

forgot to add Srikar, sorry for resend...

On 03/01, Andy Lutomirski wrote:
>
> On Mon, Mar 1, 2021 at 8:51 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > But I guess this has nothing to do with uprobes, they do not single-step
> > in kernel mode, right?
>
> They single-step user code, though, and the code that makes this work
> is quite ugly.  Single-stepping on x86 is a mess.

But this doesn't really differ from, say, gdb doing si ? OK, except uprobes
have to hook DIE_DEBUG. Nevermind...

> > > Uprobes seem to single-step user code for no discernable reason.
> > > (They want to trap after executing an out of line instruction, AFAICT.
> > > Surely INT3 or even CALL after the out-of-line insn would work as well
> > > or better.)
> >
> > Uprobes use single-step from the very beginning, probably because this
> > is the most simple and "standard" way to implement xol.
> >
> > And please note that CALL/JMP/etc emulation was added much later to fix the
> > problems with non-canonical addresses, and this emulation it still incomplete.
>
> Is there something like a uprobe test suite?

Afaik, no.

> How maintained /

Add Srikar who sent the initial implementation. I can only say that I am glad that
./scripts/get_maintainer.pl no longer mentions me ;) I did some changes (including
emulation) but a) this was a long ago and b) only because I was forced^W asked to
fix the numerous bugs in this code.

> actively used is uprobe?

I have no idea, sorry ;)

Oleg.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-02  2:22   ` Masami Hiramatsu
  2021-03-02  2:48     ` Andy Lutomirski
@ 2021-03-02 20:31     ` Oleg Nesterov
  1 sibling, 0 replies; 41+ messages in thread
From: Oleg Nesterov @ 2021-03-02 20:31 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andy Lutomirski, Peter Zijlstra, LKML, Anil S Keshavamurthy,
	David S. Miller, X86 ML, Andrew Cooper

On 03/02, Masami Hiramatsu wrote:
>
> > Not sure I understand you correctly, I know almost nothing about low-level
> > x86  magic.
>
> x86 has normal interrupt and NMI. When an NMI occurs the CPU masks NMI
> (the mask itself is hidden status) and IRET releases the mask. The problem
> is that if an INT3 is hit in the NMI handler and does a single-stepping,
> it has to use IRET for atomically setting TF and return.

Ah, thanks a lot,

Oleg.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-02 20:25     ` Oleg Nesterov
@ 2021-03-02 20:35       ` Andy Lutomirski
  0 siblings, 0 replies; 41+ messages in thread
From: Andy Lutomirski @ 2021-03-02 20:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andy Lutomirski, Masami Hiramatsu, Peter Zijlstra, LKML,
	Anil S Keshavamurthy, David S. Miller, X86 ML, Andrew Cooper



> On Mar 2, 2021, at 12:25 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> 
> On 03/01, Andy Lutomirski wrote:
>> 
>>> On Mon, Mar 1, 2021 at 8:51 AM Oleg Nesterov <oleg@redhat.com> wrote:
>>> 
>>> But I guess this has nothing to do with uprobes, they do not single-step
>>> in kernel mode, right?
>> 
>> They single-step user code, though, and the code that makes this work
>> is quite ugly.  Single-stepping on x86 is a mess.
> 
> But this doesn't really differ from, say, gdb doing si ? OK, except uprobes
> have to hook DIE_DEBUG. Nevermind...

Also, gdb doing so isn’t great either.  Single stepping over a pushf instruction, signal delivery, or a syscall on x86 is a mess.

> 
>>>> Uprobes seem to single-step user code for no discernable reason.
>>>> (They want to trap after executing an out of line instruction, AFAICT.
>>>> Surely INT3 or even CALL after the out-of-line insn would work as well
>>>> or better.)
>>> 
>>> Uprobes use single-step from the very beginning, probably because this
>>> is the most simple and "standard" way to implement xol.
>>> 
>>> And please note that CALL/JMP/etc emulation was added much later to fix the
>>> problems with non-canonical addresses, and this emulation it still incomplete.
>> 
>> Is there something like a uprobe test suite?
> 
> Afaik, no.
> 
>> How maintained /
> 
> Add Srikar who sent the initial implementation. I can only say that I am glad that
> ./scripts/get_maintainer.pl no longer mentions me ;) I did some changes (including
> emulation) but a) this was a long ago and b) only because I was forced^W asked to
> fix the numerous bugs in this code.
> 
>> actively used is uprobe?
> 
> I have no idea, sorry ;)
> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-02 20:24     ` Alexei Starovoitov
@ 2021-03-02 21:02       ` Andy Lutomirski
  2021-03-03  1:22         ` Alexei Starovoitov
  0 siblings, 1 reply; 41+ messages in thread
From: Andy Lutomirski @ 2021-03-02 21:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, bpf, Oleg Nesterov, Masami Hiramatsu,
	Peter Zijlstra, LKML, Anil S Keshavamurthy, David S. Miller,
	X86 ML, Andrew Cooper


> On Mar 2, 2021, at 12:24 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Tue, Mar 2, 2021 at 10:38 AM Andy Lutomirski <luto@kernel.org> wrote:
>> 
>> Is there something like a uprobe test suite?  How maintained /
>> actively used is uprobe?
> 
> uprobe+bpf is heavily used in production.
> selftests/bpf has only one test for it though.
> 
> Why are you asking?

Because the integration with the x86 entry code is a mess, and I want to know whether to mark it BROKEN or how to make sure the any cleanups actually work.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-02 21:02       ` Andy Lutomirski
@ 2021-03-03  1:22         ` Alexei Starovoitov
  2021-03-03  1:46           ` Andy Lutomirski
  0 siblings, 1 reply; 41+ messages in thread
From: Alexei Starovoitov @ 2021-03-03  1:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, bpf, Oleg Nesterov, Masami Hiramatsu,
	Peter Zijlstra, LKML, Anil S Keshavamurthy, David S. Miller,
	X86 ML, Andrew Cooper

On Tue, Mar 2, 2021 at 1:02 PM Andy Lutomirski <luto@amacapital.net> wrote:
>
>
> > On Mar 2, 2021, at 12:24 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Mar 2, 2021 at 10:38 AM Andy Lutomirski <luto@kernel.org> wrote:
> >>
> >> Is there something like a uprobe test suite?  How maintained /
> >> actively used is uprobe?
> >
> > uprobe+bpf is heavily used in production.
> > selftests/bpf has only one test for it though.
> >
> > Why are you asking?
>
> Because the integration with the x86 entry code is a mess, and I want to know whether to mark it BROKEN or how to make sure the any cleanups actually work.

Any test case to repro the issue you found?
Is it a bug or just messy code?
Nowadays a good chunk of popular applications (python, mysql, etc) has
USDTs in them.
Issues reported with bcc:
https://github.com/iovisor/bcc/issues?q=is%3Aissue+USDT
Similar thing with bpftrace.
Both standard USDT and semaphore based are used in the wild.
uprobe for containers has been a long standing feature request.
If you can improve uprobe performance that would be awesome.
That's another thing that people report often. We optimized it a bit.
More can be done.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-03  1:22         ` Alexei Starovoitov
@ 2021-03-03  1:46           ` Andy Lutomirski
  2021-03-03  2:18             ` Alexei Starovoitov
  0 siblings, 1 reply; 41+ messages in thread
From: Andy Lutomirski @ 2021-03-03  1:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, bpf, Oleg Nesterov, Masami Hiramatsu,
	Peter Zijlstra, LKML, Anil S Keshavamurthy, David S. Miller,
	X86 ML, Andrew Cooper


> On Mar 2, 2021, at 5:22 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Tue, Mar 2, 2021 at 1:02 PM Andy Lutomirski <luto@amacapital.net> wrote:
>> 
>> 
>>>> On Mar 2, 2021, at 12:24 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>> 
>>> On Tue, Mar 2, 2021 at 10:38 AM Andy Lutomirski <luto@kernel.org> wrote:
>>>> 
>>>> Is there something like a uprobe test suite?  How maintained /
>>>> actively used is uprobe?
>>> 
>>> uprobe+bpf is heavily used in production.
>>> selftests/bpf has only one test for it though.
>>> 
>>> Why are you asking?
>> 
>> Because the integration with the x86 entry code is a mess, and I want to know whether to mark it BROKEN or how to make sure the any cleanups actually work.
> 
> Any test case to repro the issue you found?
> Is it a bug or just messy code?

Just messy code.

> Nowadays a good chunk of popular applications (python, mysql, etc) has
> USDTs in them.
> Issues reported with bcc:
> https://github.com/iovisor/bcc/issues?q=is%3Aissue+USDT
> Similar thing with bpftrace.
> Both standard USDT and semaphore based are used in the wild.
> uprobe for containers has been a long standing feature request.
> If you can improve uprobe performance that would be awesome.
> That's another thing that people report often. We optimized it a bit.
> More can be done.


Wait... USDT is much easier to implement well.  Are we talking just USDT or are we talking about general uprobes in which almost any instruction can get probed?  If the only users that care about uprobes are doing USDT, we could vastly simplify the implementation and probably make it faster, too.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-03  1:46           ` Andy Lutomirski
@ 2021-03-03  2:18             ` Alexei Starovoitov
  2021-03-03 13:27               ` Oleg Nesterov
  2021-03-03 18:11               ` Daniel Xu
  0 siblings, 2 replies; 41+ messages in thread
From: Alexei Starovoitov @ 2021-03-03  2:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, bpf, Oleg Nesterov, Masami Hiramatsu,
	Peter Zijlstra, LKML, Anil S Keshavamurthy, David S. Miller,
	X86 ML, Andrew Cooper

On Tue, Mar 2, 2021 at 5:46 PM Andy Lutomirski <luto@amacapital.net> wrote:
>
>
> > On Mar 2, 2021, at 5:22 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Mar 2, 2021 at 1:02 PM Andy Lutomirski <luto@amacapital.net> wrote:
> >>
> >>
> >>>> On Mar 2, 2021, at 12:24 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >>>
> >>> On Tue, Mar 2, 2021 at 10:38 AM Andy Lutomirski <luto@kernel.org> wrote:
> >>>>
> >>>> Is there something like a uprobe test suite?  How maintained /
> >>>> actively used is uprobe?
> >>>
> >>> uprobe+bpf is heavily used in production.
> >>> selftests/bpf has only one test for it though.
> >>>
> >>> Why are you asking?
> >>
> >> Because the integration with the x86 entry code is a mess, and I want to know whether to mark it BROKEN or how to make sure the any cleanups actually work.
> >
> > Any test case to repro the issue you found?
> > Is it a bug or just messy code?
>
> Just messy code.
>
> > Nowadays a good chunk of popular applications (python, mysql, etc) has
> > USDTs in them.
> > Issues reported with bcc:
> > https://github.com/iovisor/bcc/issues?q=is%3Aissue+USDT
> > Similar thing with bpftrace.
> > Both standard USDT and semaphore based are used in the wild.
> > uprobe for containers has been a long standing feature request.
> > If you can improve uprobe performance that would be awesome.
> > That's another thing that people report often. We optimized it a bit.
> > More can be done.
>
>
> Wait... USDT is much easier to implement well.  Are we talking just USDT or are we talking about general uprobes in which almost any instruction can get probed?  If the only users that care about uprobes are doing USDT, we could vastly simplify the implementation and probably make it faster, too.

USDTs are driving the majority of uprobe usage.
If they can get faster it will increase their adoption even more.
There are certainly cases of normal uprobes.
They are at the start of the function 99% of the time.
Like the following:
"uprobe:/lib64/libc.so:malloc(u64 size):size:size,_ret",
"uprobe:/lib64/libc.so:free(void *ptr)::ptr",
is common despite its overhead.

Here is the most interesting and practical usage of uprobes:
https://github.com/iovisor/bcc/blob/master/tools/sslsniff.py
and the manpage for the tool:
https://github.com/iovisor/bcc/blob/master/tools/sslsniff_example.txt

uprobe in the middle of the function is very rare.
If the kernel starts rejecting uprobes on some weird instructions
I suspect no one will complain.
Especially if such tightening will come with performance boost for
uprobe on a nop and unprobe at the start (which is typically push or
alu on %sp).
That would be a great step forward.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-03  2:18             ` Alexei Starovoitov
@ 2021-03-03 13:27               ` Oleg Nesterov
  2021-03-03 18:11               ` Daniel Xu
  1 sibling, 0 replies; 41+ messages in thread
From: Oleg Nesterov @ 2021-03-03 13:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Andy Lutomirski, bpf, Masami Hiramatsu,
	Peter Zijlstra, LKML, Anil S Keshavamurthy, David S. Miller,
	X86 ML, Andrew Cooper

On 03/02, Alexei Starovoitov wrote:
>
> Especially if such tightening will come with performance boost for
> uprobe on a nop and unprobe at the start (which is typically push or
> alu on %sp).
> That would be a great step forward.

Just in case, nop and push are emulated without additional overhead.

Oleg.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-03  2:18             ` Alexei Starovoitov
  2021-03-03 13:27               ` Oleg Nesterov
@ 2021-03-03 18:11               ` Daniel Xu
  2021-03-03 19:14                 ` Andy Lutomirski
  1 sibling, 1 reply; 41+ messages in thread
From: Daniel Xu @ 2021-03-03 18:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Andy Lutomirski, bpf, Oleg Nesterov,
	Masami Hiramatsu, Peter Zijlstra, LKML, Anil S Keshavamurthy,
	David S. Miller, X86 ML, Andrew Cooper

On Tue, Mar 02, 2021 at 06:18:23PM -0800, Alexei Starovoitov wrote:
> On Tue, Mar 2, 2021 at 5:46 PM Andy Lutomirski <luto@amacapital.net> wrote:
> >
> >
> > > On Mar 2, 2021, at 5:22 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Tue, Mar 2, 2021 at 1:02 PM Andy Lutomirski <luto@amacapital.net> wrote:
> > >>
> > >>
> > >>>> On Mar 2, 2021, at 12:24 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > >>>
> > >>> On Tue, Mar 2, 2021 at 10:38 AM Andy Lutomirski <luto@kernel.org> wrote:
> > >>>>
> > >>>> Is there something like a uprobe test suite?  How maintained /
> > >>>> actively used is uprobe?
> > >>>
> > >>> uprobe+bpf is heavily used in production.
> > >>> selftests/bpf has only one test for it though.
> > >>>
> > >>> Why are you asking?
> > >>
> > >> Because the integration with the x86 entry code is a mess, and I want to know whether to mark it BROKEN or how to make sure the any cleanups actually work.
> > >
> > > Any test case to repro the issue you found?
> > > Is it a bug or just messy code?
> >
> > Just messy code.
> >
> > > Nowadays a good chunk of popular applications (python, mysql, etc) has
> > > USDTs in them.
> > > Issues reported with bcc:
> > > https://github.com/iovisor/bcc/issues?q=is%3Aissue+USDT
> > > Similar thing with bpftrace.
> > > Both standard USDT and semaphore based are used in the wild.
> > > uprobe for containers has been a long standing feature request.
> > > If you can improve uprobe performance that would be awesome.
> > > That's another thing that people report often. We optimized it a bit.
> > > More can be done.
> >
> >
> > Wait... USDT is much easier to implement well.  Are we talking just USDT or are we talking about general uprobes in which almost any instruction can get probed?  If the only users that care about uprobes are doing USDT, we could vastly simplify the implementation and probably make it faster, too.
> 
> USDTs are driving the majority of uprobe usage.

I'd say 50/50 in my experience. Larger userspace applications using bpf
for production monitoring tend to use USDT for stability and ABI reasons
(hard for bpf to read C++ classes). Bare uprobes (ie not USDT) are used
quite often for ad-hoc production debugging.

> If they can get faster it will increase their adoption even more.
> There are certainly cases of normal uprobes.
> They are at the start of the function 99% of the time.
> Like the following:
> "uprobe:/lib64/libc.so:malloc(u64 size):size:size,_ret",
> "uprobe:/lib64/libc.so:free(void *ptr)::ptr",
> is common despite its overhead.
> 
> Here is the most interesting and practical usage of uprobes:
> https://github.com/iovisor/bcc/blob/master/tools/sslsniff.py
> and the manpage for the tool:
> https://github.com/iovisor/bcc/blob/master/tools/sslsniff_example.txt
> 
> uprobe in the middle of the function is very rare.
> If the kernel starts rejecting uprobes on some weird instructions
> I suspect no one will complain.

I think it would be great if the kernel could reject mid-instruction
uprobes. Unlike with kprobes, you can place uprobes on immediate
operands which can cause silent data corruption. See
https://github.com/iovisor/bpftrace/pull/803#issuecomment-507693933
for a funny example.

To prevent accidental (and silent) data corruption, bpftrace uses a
disassembler to ensure uprobes are placed on instruction boundaries.

<...>

Daniel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Why do kprobes and uprobes singlestep?
  2021-03-03 18:11               ` Daniel Xu
@ 2021-03-03 19:14                 ` Andy Lutomirski
  0 siblings, 0 replies; 41+ messages in thread
From: Andy Lutomirski @ 2021-03-03 19:14 UTC (permalink / raw)
  To: Daniel Xu
  Cc: Alexei Starovoitov, Andy Lutomirski, bpf, Oleg Nesterov,
	Masami Hiramatsu, Peter Zijlstra, LKML, Anil S Keshavamurthy,
	David S. Miller, X86 ML, Andrew Cooper


> On Mar 3, 2021, at 10:11 AM, Daniel Xu <dxu@dxuuu.xyz> wrote:
> 
> On Tue, Mar 02, 2021 at 06:18:23PM -0800, Alexei Starovoitov wrote:
>>> On Tue, Mar 2, 2021 at 5:46 PM Andy Lutomirski <luto@amacapital.net> wrote:
>>> 
>>> 
>>>> On Mar 2, 2021, at 5:22 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>>> 
>>>> On Tue, Mar 2, 2021 at 1:02 PM Andy Lutomirski <luto@amacapital.net> wrote:
>>>>> 
>>>>> 
>>>>>>> On Mar 2, 2021, at 12:24 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>>>>>> 
>>>>>> On Tue, Mar 2, 2021 at 10:38 AM Andy Lutomirski <luto@kernel.org> wrote:
>>>>>>> 
>>>>>>> Is there something like a uprobe test suite?  How maintained /
>>>>>>> actively used is uprobe?
>>>>>> 
>>>>>> uprobe+bpf is heavily used in production.
>>>>>> selftests/bpf has only one test for it though.
>>>>>> 
>>>>>> Why are you asking?
>>>>> 
>>>>> Because the integration with the x86 entry code is a mess, and I want to know whether to mark it BROKEN or how to make sure the any cleanups actually work.
>>>> 
>>>> Any test case to repro the issue you found?
>>>> Is it a bug or just messy code?
>>> 
>>> Just messy code.
>>> 
>>>> Nowadays a good chunk of popular applications (python, mysql, etc) has
>>>> USDTs in them.
>>>> Issues reported with bcc:
>>>> https://github.com/iovisor/bcc/issues?q=is%3Aissue+USDT
>>>> Similar thing with bpftrace.
>>>> Both standard USDT and semaphore based are used in the wild.
>>>> uprobe for containers has been a long standing feature request.
>>>> If you can improve uprobe performance that would be awesome.
>>>> That's another thing that people report often. We optimized it a bit.
>>>> More can be done.
>>> 
>>> 
>>> Wait... USDT is much easier to implement well.  Are we talking just USDT or are we talking about general uprobes in which almost any instruction can get probed?  If the only users that care about uprobes are doing USDT, we could vastly simplify the implementation and probably make it faster, too.
>> 
>> USDTs are driving the majority of uprobe usage.
> 
> I'd say 50/50 in my experience. Larger userspace applications using bpf
> for production monitoring tend to use USDT for stability and ABI reasons
> (hard for bpf to read C++ classes). Bare uprobes (ie not USDT) are used
> quite often for ad-hoc production debugging.
> 
>> If they can get faster it will increase their adoption even more.
>> There are certainly cases of normal uprobes.
>> They are at the start of the function 99% of the time.
>> Like the following:
>> "uprobe:/lib64/libc.so:malloc(u64 size):size:size,_ret",
>> "uprobe:/lib64/libc.so:free(void *ptr)::ptr",
>> is common despite its overhead.
>> 
>> Here is the most interesting and practical usage of uprobes:
>> https://github.com/iovisor/bcc/blob/master/tools/sslsniff.py
>> and the manpage for the tool:
>> https://github.com/iovisor/bcc/blob/master/tools/sslsniff_example.txt
>> 
>> uprobe in the middle of the function is very rare.
>> If the kernel starts rejecting uprobes on some weird instructions
>> I suspect no one will complain.
> 
> I think it would be great if the kernel could reject mid-instruction
> uprobes. Unlike with kprobes, you can place uprobes on immediate
> operands which can cause silent data corruption. See
> https://github.com/iovisor/bpftrace/pull/803#issuecomment-507693933
> for a funny example.

This can’t be done in general on x86. One cannot look at code and find the instruction boundaries.

> 
> To prevent accidental (and silent) data corruption, bpftrace uses a
> disassembler to ensure uprobes are placed on instruction boundaries.
> 
> <...>
> 
> Daniel

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes
  2021-03-02 15:25       ` [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
                           ` (2 preceding siblings ...)
  2021-03-02 15:25         ` [PATCH -tip 3/3] x86/kprobes: Use int3 instead of debug trap for single-step Masami Hiramatsu
@ 2021-03-17 14:55         ` Masami Hiramatsu
  2021-03-17 16:26           ` Peter Zijlstra
  3 siblings, 1 reply; 41+ messages in thread
From: Masami Hiramatsu @ 2021-03-17 14:55 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andy Lutomirski, Ingo Molnar, Oleg Nesterov, Peter Zijlstra,
	LKML, Anil S Keshavamurthy, David S . Miller, X86 ML,
	Andrew Cooper, Steven Rostedt

Hi Andy,

Would you think you still need this series to remove iret to kernel?

Thank you,

On Wed,  3 Mar 2021 00:25:12 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Hi,
> 
> Here is a series of patches to remove the single-step debug trap from the
> x86 kprobe.
> 
> The first 2 patches ([1/3][2/3]) are bugfixes which I've sent recently;
> 
>  https://lore.kernel.org/lkml/161425451732.93763.18329509061375062554.stgit@devnote2/
> 
> And [3/3] is actually the patch to remove single-step from kprobes. The RFC
> version is here;
> 
>  https://lore.kernel.org/lkml/161460768474.430263.18425867006584111900.stgit@devnote2/
> 
> This uses int3 as Andy suggested instead of the debug trap, for removing the
> IRET which returns to kernel.
> Some instructions must be emulated and some instructions becomes not able
> to be probed, but as far as I can see those are not rare case.
> 
> Thank you,
> 
> ---
> 
> Masami Hiramatsu (3):
>       x86/kprobes: Retrieve correct opcode for group instruction
>       x86/kprobes: Identify far indirect JMP correctly
>       x86/kprobes: Use int3 instead of debug trap for single-step
> 
> 
>  arch/x86/include/asm/kprobes.h |   21 +-
>  arch/x86/kernel/kprobes/core.c |  524 ++++++++++++++++++++++++++--------------
>  arch/x86/kernel/traps.c        |    3 
>  3 files changed, 358 insertions(+), 190 deletions(-)
> 
> --
> Masami Hiramatsu (Linaro) <mhiramat@kernel.org>


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes
  2021-03-17 14:55         ` [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
@ 2021-03-17 16:26           ` Peter Zijlstra
  2021-03-17 17:45             ` Andy Lutomirski
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2021-03-17 16:26 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andy Lutomirski, Ingo Molnar, Oleg Nesterov, LKML,
	Anil S Keshavamurthy, David S . Miller, X86 ML, Andrew Cooper,
	Steven Rostedt

On Wed, Mar 17, 2021 at 11:55:22PM +0900, Masami Hiramatsu wrote:
> Hi Andy,
> 
> Would you think you still need this series to remove iret to kernel?

They're an improvement anyway, let me queue them so that they don't get
lost.

I'll line them up for tip/x86/core unless anybody else thinks of a
better place for them and tells me in time ;-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes
  2021-03-17 16:26           ` Peter Zijlstra
@ 2021-03-17 17:45             ` Andy Lutomirski
  0 siblings, 0 replies; 41+ messages in thread
From: Andy Lutomirski @ 2021-03-17 17:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masami Hiramatsu, Andy Lutomirski, Ingo Molnar, Oleg Nesterov,
	LKML, Anil S Keshavamurthy, David S . Miller, X86 ML,
	Andrew Cooper, Steven Rostedt

On Wed, Mar 17, 2021 at 9:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Mar 17, 2021 at 11:55:22PM +0900, Masami Hiramatsu wrote:
> > Hi Andy,
> >
> > Would you think you still need this series to remove iret to kernel?
>
> They're an improvement anyway, let me queue them so that they don't get
> lost.
>
> I'll line them up for tip/x86/core unless anybody else thinks of a
> better place for them and tells me in time ;-)

I agree.  The new code is quite nice.  I'm still not quite sure what
I'm doing with IRET, but thank you very much for the kprobe
improvements.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [tip: x86/core] x86/kprobes: Use int3 instead of debug trap for single-step
  2021-03-02 15:25         ` [PATCH -tip 3/3] x86/kprobes: Use int3 instead of debug trap for single-step Masami Hiramatsu
@ 2021-03-23 15:15           ` tip-bot2 for Masami Hiramatsu
  0 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Masami Hiramatsu @ 2021-03-23 15:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Andy Lutomirski, Masami Hiramatsu, Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the x86/core branch of tip:

Commit-ID:     6256e668b7af9d81472e03c6a171630c08f8858a
Gitweb:        https://git.kernel.org/tip/6256e668b7af9d81472e03c6a171630c08f8858a
Author:        Masami Hiramatsu <mhiramat@kernel.org>
AuthorDate:    Wed, 03 Mar 2021 00:25:46 +09:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 23 Mar 2021 16:07:56 +01:00

x86/kprobes: Use int3 instead of debug trap for single-step

Use int3 instead of debug trap exception for single-stepping the
probed instructions. Some instructions which change the ip
registers or modify IF flags are emulated because those are not
able to be single-stepped by int3 or may allow the interrupt
while single-stepping.

This actually changes the kprobes behavior.

- kprobes can not probe following instructions; int3, iret,
  far jmp/call which get absolute address as immediate,
  indirect far jmp/call, indirect near jmp/call with addressing
  by memory (register-based indirect jmp/call are OK), and
  vmcall/vmlaunch/vmresume/vmxoff.

- If the kprobe post_handler doesn't set before registering,
  it may not be called in some case even if you set it afterwards.
  (IOW, kprobe booster is enabled at registration, user can not
   change it)

But both are rare issue, unsupported instructions will not be
used in the kernel (or rarely used), and post_handlers are
rarely used (I don't see it except for the test code).

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/161469874601.49483.11985325887166921076.stgit@devnote2
---
 arch/x86/include/asm/kprobes.h |  21 +-
 arch/x86/kernel/kprobes/core.c | 517 ++++++++++++++++++++------------
 arch/x86/kernel/traps.c        |   3 +-
 3 files changed, 353 insertions(+), 188 deletions(-)

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index d20a3d6..bd7f588 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -65,10 +65,22 @@ struct arch_specific_insn {
 	 * a post_handler).
 	 */
 	unsigned boostable:1;
-	unsigned if_modifier:1;
-	unsigned is_call:1;
-	unsigned is_pushf:1;
-	unsigned is_abs_ip:1;
+	unsigned char size;	/* The size of insn */
+	union {
+		unsigned char opcode;
+		struct {
+			unsigned char type;
+		} jcc;
+		struct {
+			unsigned char type;
+			unsigned char asize;
+		} loop;
+		struct {
+			unsigned char reg;
+		} indirect;
+	};
+	s32 rel32;	/* relative offset must be s32, s16, or s8 */
+	void (*emulate_op)(struct kprobe *p, struct pt_regs *regs);
 	/* Number of bytes of text poked */
 	int tp_len;
 };
@@ -107,7 +119,6 @@ extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
 extern int kprobe_exceptions_notify(struct notifier_block *self,
 				    unsigned long val, void *data);
 extern int kprobe_int3_handler(struct pt_regs *regs);
-extern int kprobe_debug_handler(struct pt_regs *regs);
 
 #else
 
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index f6ec57f..3a14e8a 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -177,6 +177,9 @@ int can_boost(struct insn *insn, void *addr)
 	case 0xf0:
 		/* clear and set flags are boostable */
 		return (opcode == 0xf5 || (0xf7 < opcode && opcode < 0xfe));
+	case 0xff:
+		/* indirect jmp is boostable */
+		return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
 	default:
 		/* CS override prefix and call are not boostable */
 		return (opcode != 0x2e && opcode != 0x9a);
@@ -362,13 +365,14 @@ int __copy_instruction(u8 *dest, u8 *src, u8 *real, struct insn *insn)
 	return insn->length;
 }
 
-/* Prepare reljump right after instruction to boost */
-static int prepare_boost(kprobe_opcode_t *buf, struct kprobe *p,
-			  struct insn *insn)
+/* Prepare reljump or int3 right after instruction */
+static int prepare_singlestep(kprobe_opcode_t *buf, struct kprobe *p,
+			      struct insn *insn)
 {
 	int len = insn->length;
 
-	if (can_boost(insn, p->addr) &&
+	if (!IS_ENABLED(CONFIG_PREEMPTION) &&
+	    !p->post_handler && can_boost(insn, p->addr) &&
 	    MAX_INSN_SIZE - len >= JMP32_INSN_SIZE) {
 		/*
 		 * These instructions can be executed directly if it
@@ -379,7 +383,12 @@ static int prepare_boost(kprobe_opcode_t *buf, struct kprobe *p,
 		len += JMP32_INSN_SIZE;
 		p->ainsn.boostable = 1;
 	} else {
-		p->ainsn.boostable = 0;
+		/* Otherwise, put an int3 for trapping singlestep */
+		if (MAX_INSN_SIZE - len < INT3_INSN_SIZE)
+			return -ENOSPC;
+
+		buf[len] = INT3_INSN_OPCODE;
+		len += INT3_INSN_SIZE;
 	}
 
 	return len;
@@ -416,42 +425,232 @@ void free_insn_page(void *page)
 	module_memfree(page);
 }
 
-static void set_resume_flags(struct kprobe *p, struct insn *insn)
+/* Kprobe x86 instruction emulation - only regs->ip or IF flag modifiers */
+
+static void kprobe_emulate_ifmodifiers(struct kprobe *p, struct pt_regs *regs)
+{
+	switch (p->ainsn.opcode) {
+	case 0xfa:	/* cli */
+		regs->flags &= ~(X86_EFLAGS_IF);
+		break;
+	case 0xfb:	/* sti */
+		regs->flags |= X86_EFLAGS_IF;
+		break;
+	case 0x9c:	/* pushf */
+		int3_emulate_push(regs, regs->flags);
+		break;
+	case 0x9d:	/* popf */
+		regs->flags = int3_emulate_pop(regs);
+		break;
+	}
+	regs->ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
+}
+NOKPROBE_SYMBOL(kprobe_emulate_ifmodifiers);
+
+static void kprobe_emulate_ret(struct kprobe *p, struct pt_regs *regs)
+{
+	int3_emulate_ret(regs);
+}
+NOKPROBE_SYMBOL(kprobe_emulate_ret);
+
+static void kprobe_emulate_call(struct kprobe *p, struct pt_regs *regs)
+{
+	unsigned long func = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
+
+	func += p->ainsn.rel32;
+	int3_emulate_call(regs, func);
+}
+NOKPROBE_SYMBOL(kprobe_emulate_call);
+
+static nokprobe_inline
+void __kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs, bool cond)
+{
+	unsigned long ip = regs->ip - INT3_INSN_SIZE + p->ainsn.size;
+
+	if (cond)
+		ip += p->ainsn.rel32;
+	int3_emulate_jmp(regs, ip);
+}
+
+static void kprobe_emulate_jmp(struct kprobe *p, struct pt_regs *regs)
+{
+	__kprobe_emulate_jmp(p, regs, true);
+}
+NOKPROBE_SYMBOL(kprobe_emulate_jmp);
+
+static const unsigned long jcc_mask[6] = {
+	[0] = X86_EFLAGS_OF,
+	[1] = X86_EFLAGS_CF,
+	[2] = X86_EFLAGS_ZF,
+	[3] = X86_EFLAGS_CF | X86_EFLAGS_ZF,
+	[4] = X86_EFLAGS_SF,
+	[5] = X86_EFLAGS_PF,
+};
+
+static void kprobe_emulate_jcc(struct kprobe *p, struct pt_regs *regs)
+{
+	bool invert = p->ainsn.jcc.type & 1;
+	bool match;
+
+	if (p->ainsn.jcc.type < 0xc) {
+		match = regs->flags & jcc_mask[p->ainsn.jcc.type >> 1];
+	} else {
+		match = ((regs->flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^
+			((regs->flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT);
+		if (p->ainsn.jcc.type >= 0xe)
+			match = match && (regs->flags & X86_EFLAGS_ZF);
+	}
+	__kprobe_emulate_jmp(p, regs, (match && !invert) || (!match && invert));
+}
+NOKPROBE_SYMBOL(kprobe_emulate_jcc);
+
+static void kprobe_emulate_loop(struct kprobe *p, struct pt_regs *regs)
+{
+	bool match;
+
+	if (p->ainsn.loop.type != 3) {	/* LOOP* */
+		if (p->ainsn.loop.asize == 32)
+			match = ((*(u32 *)&regs->cx)--) != 0;
+#ifdef CONFIG_X86_64
+		else if (p->ainsn.loop.asize == 64)
+			match = ((*(u64 *)&regs->cx)--) != 0;
+#endif
+		else
+			match = ((*(u16 *)&regs->cx)--) != 0;
+	} else {			/* JCXZ */
+		if (p->ainsn.loop.asize == 32)
+			match = *(u32 *)(&regs->cx) == 0;
+#ifdef CONFIG_X86_64
+		else if (p->ainsn.loop.asize == 64)
+			match = *(u64 *)(&regs->cx) == 0;
+#endif
+		else
+			match = *(u16 *)(&regs->cx) == 0;
+	}
+
+	if (p->ainsn.loop.type == 0)	/* LOOPNE */
+		match = match && !(regs->flags & X86_EFLAGS_ZF);
+	else if (p->ainsn.loop.type == 1)	/* LOOPE */
+		match = match && (regs->flags & X86_EFLAGS_ZF);
+
+	__kprobe_emulate_jmp(p, regs, match);
+}
+NOKPROBE_SYMBOL(kprobe_emulate_loop);
+
+static const int addrmode_regoffs[] = {
+	offsetof(struct pt_regs, ax),
+	offsetof(struct pt_regs, cx),
+	offsetof(struct pt_regs, dx),
+	offsetof(struct pt_regs, bx),
+	offsetof(struct pt_regs, sp),
+	offsetof(struct pt_regs, bp),
+	offsetof(struct pt_regs, si),
+	offsetof(struct pt_regs, di),
+#ifdef CONFIG_X86_64
+	offsetof(struct pt_regs, r8),
+	offsetof(struct pt_regs, r9),
+	offsetof(struct pt_regs, r10),
+	offsetof(struct pt_regs, r11),
+	offsetof(struct pt_regs, r12),
+	offsetof(struct pt_regs, r13),
+	offsetof(struct pt_regs, r14),
+	offsetof(struct pt_regs, r15),
+#endif
+};
+
+static void kprobe_emulate_call_indirect(struct kprobe *p, struct pt_regs *regs)
+{
+	unsigned long offs = addrmode_regoffs[p->ainsn.indirect.reg];
+
+	int3_emulate_call(regs, regs_get_register(regs, offs));
+}
+NOKPROBE_SYMBOL(kprobe_emulate_call_indirect);
+
+static void kprobe_emulate_jmp_indirect(struct kprobe *p, struct pt_regs *regs)
+{
+	unsigned long offs = addrmode_regoffs[p->ainsn.indirect.reg];
+
+	int3_emulate_jmp(regs, regs_get_register(regs, offs));
+}
+NOKPROBE_SYMBOL(kprobe_emulate_jmp_indirect);
+
+static int prepare_emulation(struct kprobe *p, struct insn *insn)
 {
 	insn_byte_t opcode = insn->opcode.bytes[0];
 
 	switch (opcode) {
 	case 0xfa:		/* cli */
 	case 0xfb:		/* sti */
+	case 0x9c:		/* pushfl */
 	case 0x9d:		/* popf/popfd */
-		/* Check whether the instruction modifies Interrupt Flag or not */
-		p->ainsn.if_modifier = 1;
-		break;
-	case 0x9c:	/* pushfl */
-		p->ainsn.is_pushf = 1;
+		/*
+		 * IF modifiers must be emulated since it will enable interrupt while
+		 * int3 single stepping.
+		 */
+		p->ainsn.emulate_op = kprobe_emulate_ifmodifiers;
+		p->ainsn.opcode = opcode;
 		break;
-	case 0xcf:	/* iret */
-		p->ainsn.if_modifier = 1;
-		fallthrough;
 	case 0xc2:	/* ret/lret */
 	case 0xc3:
 	case 0xca:
 	case 0xcb:
-	case 0xea:	/* jmp absolute -- ip is correct */
-		/* ip is already adjusted, no more changes required */
-		p->ainsn.is_abs_ip = 1;
-		/* Without resume jump, this is boostable */
-		p->ainsn.boostable = 1;
+		p->ainsn.emulate_op = kprobe_emulate_ret;
 		break;
-	case 0xe8:	/* call relative - Fix return addr */
-		p->ainsn.is_call = 1;
+	case 0x9a:	/* far call absolute -- segment is not supported */
+	case 0xea:	/* far jmp absolute -- segment is not supported */
+	case 0xcc:	/* int3 */
+	case 0xcf:	/* iret -- in-kernel IRET is not supported */
+		return -EOPNOTSUPP;
 		break;
-#ifdef CONFIG_X86_32
-	case 0x9a:	/* call absolute -- same as call absolute, indirect */
-		p->ainsn.is_call = 1;
-		p->ainsn.is_abs_ip = 1;
+	case 0xe8:	/* near call relative */
+		p->ainsn.emulate_op = kprobe_emulate_call;
+		if (insn->immediate.nbytes == 2)
+			p->ainsn.rel32 = *(s16 *)&insn->immediate.value;
+		else
+			p->ainsn.rel32 = *(s32 *)&insn->immediate.value;
+		break;
+	case 0xeb:	/* short jump relative */
+	case 0xe9:	/* near jump relative */
+		p->ainsn.emulate_op = kprobe_emulate_jmp;
+		if (insn->immediate.nbytes == 1)
+			p->ainsn.rel32 = *(s8 *)&insn->immediate.value;
+		else if (insn->immediate.nbytes == 2)
+			p->ainsn.rel32 = *(s16 *)&insn->immediate.value;
+		else
+			p->ainsn.rel32 = *(s32 *)&insn->immediate.value;
+		break;
+	case 0x70 ... 0x7f:
+		/* 1 byte conditional jump */
+		p->ainsn.emulate_op = kprobe_emulate_jcc;
+		p->ainsn.jcc.type = opcode & 0xf;
+		p->ainsn.rel32 = *(char *)insn->immediate.bytes;
+		break;
+	case 0x0f:
+		opcode = insn->opcode.bytes[1];
+		if ((opcode & 0xf0) == 0x80) {
+			/* 2 bytes Conditional Jump */
+			p->ainsn.emulate_op = kprobe_emulate_jcc;
+			p->ainsn.jcc.type = opcode & 0xf;
+			if (insn->immediate.nbytes == 2)
+				p->ainsn.rel32 = *(s16 *)&insn->immediate.value;
+			else
+				p->ainsn.rel32 = *(s32 *)&insn->immediate.value;
+		} else if (opcode == 0x01 &&
+			   X86_MODRM_REG(insn->modrm.bytes[0]) == 0 &&
+			   X86_MODRM_MOD(insn->modrm.bytes[0]) == 3) {
+			/* VM extensions - not supported */
+			return -EOPNOTSUPP;
+		}
+		break;
+	case 0xe0:	/* Loop NZ */
+	case 0xe1:	/* Loop */
+	case 0xe2:	/* Loop */
+	case 0xe3:	/* J*CXZ */
+		p->ainsn.emulate_op = kprobe_emulate_loop;
+		p->ainsn.loop.type = opcode & 0x3;
+		p->ainsn.loop.asize = insn->addr_bytes * 8;
+		p->ainsn.rel32 = *(s8 *)&insn->immediate.value;
 		break;
-#endif
 	case 0xff:
 		/*
 		 * Since the 0xff is an extended group opcode, the instruction
@@ -459,46 +658,57 @@ static void set_resume_flags(struct kprobe *p, struct insn *insn)
 		 */
 		opcode = insn->modrm.bytes[0];
 		if ((opcode & 0x30) == 0x10) {
-			/*
-			 * call absolute, indirect
-			 * Fix return addr; ip is correct.
-			 * But this is not boostable
-			 */
-			p->ainsn.is_call = 1;
-			p->ainsn.is_abs_ip = 1;
-			break;
+			if ((opcode & 0x8) == 0x8)
+				return -EOPNOTSUPP;	/* far call */
+			/* call absolute, indirect */
+			p->ainsn.emulate_op = kprobe_emulate_call_indirect;
 		} else if ((opcode & 0x30) == 0x20) {
-			/*
-			 * jmp near and far, absolute indirect
-			 * ip is correct.
-			 */
-			p->ainsn.is_abs_ip = 1;
-			/* Without resume jump, this is boostable */
-			p->ainsn.boostable = 1;
-		}
+			if ((opcode & 0x8) == 0x8)
+				return -EOPNOTSUPP;	/* far jmp */
+			/* jmp near absolute indirect */
+			p->ainsn.emulate_op = kprobe_emulate_jmp_indirect;
+		} else
+			break;
+
+		if (insn->addr_bytes != sizeof(unsigned long))
+			return -EOPNOTSUPP;	/* Don't support differnt size */
+		if (X86_MODRM_MOD(opcode) != 3)
+			return -EOPNOTSUPP;	/* TODO: support memory addressing */
+
+		p->ainsn.indirect.reg = X86_MODRM_RM(opcode);
+#ifdef CONFIG_X86_64
+		if (X86_REX_B(insn->rex_prefix.value))
+			p->ainsn.indirect.reg += 8;
+#endif
+		break;
+	default:
 		break;
 	}
+	p->ainsn.size = insn->length;
+
+	return 0;
 }
 
 static int arch_copy_kprobe(struct kprobe *p)
 {
 	struct insn insn;
 	kprobe_opcode_t buf[MAX_INSN_SIZE];
-	int len;
+	int ret, len;
 
 	/* Copy an instruction with recovering if other optprobe modifies it.*/
 	len = __copy_instruction(buf, p->addr, p->ainsn.insn, &insn);
 	if (!len)
 		return -EINVAL;
 
-	/*
-	 * __copy_instruction can modify the displacement of the instruction,
-	 * but it doesn't affect boostable check.
-	 */
-	len = prepare_boost(buf, p, &insn);
+	/* Analyze the opcode and setup emulate functions */
+	ret = prepare_emulation(p, &insn);
+	if (ret < 0)
+		return ret;
 
-	/* Analyze the opcode and set resume flags */
-	set_resume_flags(p, &insn);
+	/* Add int3 for single-step or booster jmp */
+	len = prepare_singlestep(buf, p, &insn);
+	if (len < 0)
+		return len;
 
 	/* Also, displacement change doesn't affect the first byte */
 	p->opcode = buf[0];
@@ -591,29 +801,7 @@ set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
 {
 	__this_cpu_write(current_kprobe, p);
 	kcb->kprobe_saved_flags = kcb->kprobe_old_flags
-		= (regs->flags & (X86_EFLAGS_TF | X86_EFLAGS_IF));
-	if (p->ainsn.if_modifier)
-		kcb->kprobe_saved_flags &= ~X86_EFLAGS_IF;
-}
-
-static nokprobe_inline void clear_btf(void)
-{
-	if (test_thread_flag(TIF_BLOCKSTEP)) {
-		unsigned long debugctl = get_debugctlmsr();
-
-		debugctl &= ~DEBUGCTLMSR_BTF;
-		update_debugctlmsr(debugctl);
-	}
-}
-
-static nokprobe_inline void restore_btf(void)
-{
-	if (test_thread_flag(TIF_BLOCKSTEP)) {
-		unsigned long debugctl = get_debugctlmsr();
-
-		debugctl |= DEBUGCTLMSR_BTF;
-		update_debugctlmsr(debugctl);
-	}
+		= (regs->flags & X86_EFLAGS_IF);
 }
 
 void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
@@ -628,6 +816,22 @@ void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(arch_prepare_kretprobe);
 
+static void kprobe_post_process(struct kprobe *cur, struct pt_regs *regs,
+			       struct kprobe_ctlblk *kcb)
+{
+	if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
+		kcb->kprobe_status = KPROBE_HIT_SSDONE;
+		cur->post_handler(cur, regs, 0);
+	}
+
+	/* Restore back the original saved kprobes variables and continue. */
+	if (kcb->kprobe_status == KPROBE_REENTER)
+		restore_previous_kprobe(kcb);
+	else
+		reset_current_kprobe();
+}
+NOKPROBE_SYMBOL(kprobe_post_process);
+
 static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 			     struct kprobe_ctlblk *kcb, int reenter)
 {
@@ -635,7 +839,7 @@ static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 		return;
 
 #if !defined(CONFIG_PREEMPTION)
-	if (p->ainsn.boostable && !p->post_handler) {
+	if (p->ainsn.boostable) {
 		/* Boost up -- we can execute copied instructions directly */
 		if (!reenter)
 			reset_current_kprobe();
@@ -654,19 +858,51 @@ static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 		kcb->kprobe_status = KPROBE_REENTER;
 	} else
 		kcb->kprobe_status = KPROBE_HIT_SS;
-	/* Prepare real single stepping */
-	clear_btf();
-	regs->flags |= X86_EFLAGS_TF;
+
+	if (p->ainsn.emulate_op) {
+		p->ainsn.emulate_op(p, regs);
+		kprobe_post_process(p, regs, kcb);
+		return;
+	}
+
+	/* Disable interrupt, and set ip register on trampoline */
 	regs->flags &= ~X86_EFLAGS_IF;
-	/* single step inline if the instruction is an int3 */
-	if (p->opcode == INT3_INSN_OPCODE)
-		regs->ip = (unsigned long)p->addr;
-	else
-		regs->ip = (unsigned long)p->ainsn.insn;
+	regs->ip = (unsigned long)p->ainsn.insn;
 }
 NOKPROBE_SYMBOL(setup_singlestep);
 
 /*
+ * Called after single-stepping.  p->addr is the address of the
+ * instruction whose first byte has been replaced by the "int3"
+ * instruction.  To avoid the SMP problems that can occur when we
+ * temporarily put back the original opcode to single-step, we
+ * single-stepped a copy of the instruction.  The address of this
+ * copy is p->ainsn.insn. We also doesn't use trap, but "int3" again
+ * right after the copied instruction.
+ * Different from the trap single-step, "int3" single-step can not
+ * handle the instruction which changes the ip register, e.g. jmp,
+ * call, conditional jmp, and the instructions which changes the IF
+ * flags because interrupt must be disabled around the single-stepping.
+ * Such instructions are software emulated, but others are single-stepped
+ * using "int3".
+ *
+ * When the 2nd "int3" handled, the regs->ip and regs->flags needs to
+ * be adjusted, so that we can resume execution on correct code.
+ */
+static void resume_singlestep(struct kprobe *p, struct pt_regs *regs,
+			      struct kprobe_ctlblk *kcb)
+{
+	unsigned long copy_ip = (unsigned long)p->ainsn.insn;
+	unsigned long orig_ip = (unsigned long)p->addr;
+
+	/* Restore saved interrupt flag and ip register */
+	regs->flags |= kcb->kprobe_saved_flags;
+	/* Note that regs->ip is executed int3 so must be a step back */
+	regs->ip += (orig_ip - copy_ip) - INT3_INSN_SIZE;
+}
+NOKPROBE_SYMBOL(resume_singlestep);
+
+/*
  * We have reentered the kprobe_handler(), since another probe was hit while
  * within the handler. We save the original kprobes variables and just single
  * step on the instruction of the new probe without calling any user handlers.
@@ -701,6 +937,12 @@ static int reenter_kprobe(struct kprobe *p, struct pt_regs *regs,
 }
 NOKPROBE_SYMBOL(reenter_kprobe);
 
+static int nokprobe_inline kprobe_is_ss(struct kprobe_ctlblk *kcb)
+{
+	return (kcb->kprobe_status == KPROBE_HIT_SS ||
+		kcb->kprobe_status == KPROBE_REENTER);
+}
+
 /*
  * Interrupts are disabled on entry as trap3 is an interrupt gate and they
  * remain disabled throughout this function.
@@ -745,7 +987,18 @@ int kprobe_int3_handler(struct pt_regs *regs)
 				reset_current_kprobe();
 			return 1;
 		}
-	} else if (*addr != INT3_INSN_OPCODE) {
+	} else if (kprobe_is_ss(kcb)) {
+		p = kprobe_running();
+		if ((unsigned long)p->ainsn.insn < regs->ip &&
+		    (unsigned long)p->ainsn.insn + MAX_INSN_SIZE > regs->ip) {
+			/* Most provably this is the second int3 for singlestep */
+			resume_singlestep(p, regs, kcb);
+			kprobe_post_process(p, regs, kcb);
+			return 1;
+		}
+	}
+
+	if (*addr != INT3_INSN_OPCODE) {
 		/*
 		 * The breakpoint instruction was removed right
 		 * after we hit it.  Another cpu has removed
@@ -818,91 +1071,6 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(trampoline_handler);
 
-/*
- * Called after single-stepping.  p->addr is the address of the
- * instruction whose first byte has been replaced by the "int 3"
- * instruction.  To avoid the SMP problems that can occur when we
- * temporarily put back the original opcode to single-step, we
- * single-stepped a copy of the instruction.  The address of this
- * copy is p->ainsn.insn.
- *
- * This function prepares to return from the post-single-step
- * interrupt.  We have to fix up the stack as follows:
- *
- * 0) Except in the case of absolute or indirect jump or call instructions,
- * the new ip is relative to the copied instruction.  We need to make
- * it relative to the original instruction.
- *
- * 1) If the single-stepped instruction was pushfl, then the TF and IF
- * flags are set in the just-pushed flags, and may need to be cleared.
- *
- * 2) If the single-stepped instruction was a call, the return address
- * that is atop the stack is the address following the copied instruction.
- * We need to make it the address following the original instruction.
- */
-static void resume_execution(struct kprobe *p, struct pt_regs *regs,
-			     struct kprobe_ctlblk *kcb)
-{
-	unsigned long *tos = stack_addr(regs);
-	unsigned long copy_ip = (unsigned long)p->ainsn.insn;
-	unsigned long orig_ip = (unsigned long)p->addr;
-
-	regs->flags &= ~X86_EFLAGS_TF;
-
-	/* Fixup the contents of top of stack */
-	if (p->ainsn.is_pushf) {
-		*tos &= ~(X86_EFLAGS_TF | X86_EFLAGS_IF);
-		*tos |= kcb->kprobe_old_flags;
-	} else if (p->ainsn.is_call) {
-		*tos = orig_ip + (*tos - copy_ip);
-	}
-
-	if (!p->ainsn.is_abs_ip)
-		regs->ip += orig_ip - copy_ip;
-
-	restore_btf();
-}
-NOKPROBE_SYMBOL(resume_execution);
-
-/*
- * Interrupts are disabled on entry as trap1 is an interrupt gate and they
- * remain disabled throughout this function.
- */
-int kprobe_debug_handler(struct pt_regs *regs)
-{
-	struct kprobe *cur = kprobe_running();
-	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
-
-	if (!cur)
-		return 0;
-
-	resume_execution(cur, regs, kcb);
-	regs->flags |= kcb->kprobe_saved_flags;
-
-	if ((kcb->kprobe_status != KPROBE_REENTER) && cur->post_handler) {
-		kcb->kprobe_status = KPROBE_HIT_SSDONE;
-		cur->post_handler(cur, regs, 0);
-	}
-
-	/* Restore back the original saved kprobes variables and continue. */
-	if (kcb->kprobe_status == KPROBE_REENTER) {
-		restore_previous_kprobe(kcb);
-		goto out;
-	}
-	reset_current_kprobe();
-out:
-	/*
-	 * if somebody else is singlestepping across a probe point, flags
-	 * will have TF set, in which case, continue the remaining processing
-	 * of do_debug, as if this is not a probe hit.
-	 */
-	if (regs->flags & X86_EFLAGS_TF)
-		return 0;
-
-	return 1;
-}
-NOKPROBE_SYMBOL(kprobe_debug_handler);
-
 int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
 {
 	struct kprobe *cur = kprobe_running();
@@ -920,20 +1088,9 @@ int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
 		 * normal page fault.
 		 */
 		regs->ip = (unsigned long)cur->addr;
-		/*
-		 * Trap flag (TF) has been set here because this fault
-		 * happened where the single stepping will be done.
-		 * So clear it by resetting the current kprobe:
-		 */
-		regs->flags &= ~X86_EFLAGS_TF;
-		/*
-		 * Since the single step (trap) has been cancelled,
-		 * we need to restore BTF here.
-		 */
-		restore_btf();
 
 		/*
-		 * If the TF flag was set before the kprobe hit,
+		 * If the IF flag was set before the kprobe hit,
 		 * don't touch it:
 		 */
 		regs->flags |= kcb->kprobe_old_flags;
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 0da8d2a..a5d2540 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -890,9 +890,6 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
 	if ((dr6 & DR_STEP) && is_sysenter_singlestep(regs))
 		dr6 &= ~DR_STEP;
 
-	if (kprobe_debug_handler(regs))
-		goto out;
-
 	/*
 	 * The kernel doesn't use INT1
 	 */

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: x86/core] x86/kprobes: Identify far indirect JMP correctly
  2021-03-02 15:25         ` [PATCH -tip 2/3] x86/kprobes: Identify far indirect JMP correctly Masami Hiramatsu
@ 2021-03-23 15:15           ` tip-bot2 for Masami Hiramatsu
  0 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Masami Hiramatsu @ 2021-03-23 15:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Masami Hiramatsu, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/core branch of tip:

Commit-ID:     a194acd316f93f3435a64de3b37dca2b5a77b338
Gitweb:        https://git.kernel.org/tip/a194acd316f93f3435a64de3b37dca2b5a77b338
Author:        Masami Hiramatsu <mhiramat@kernel.org>
AuthorDate:    Wed, 03 Mar 2021 00:25:34 +09:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 23 Mar 2021 16:07:56 +01:00

x86/kprobes: Identify far indirect JMP correctly

Since Grp5 far indirect JMP is FF "mod 101 r/m", it should be
(modrm & 0x38) == 0x28, and near indirect JMP is also 0x38 == 0x20.
So we can mask modrm with 0x30 and check 0x20.
This is actually what the original code does, it also doesn't care
the last bit. So the result code is same.

Thus, I think this is just a cosmetic cleanup.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/161469873475.49483.13257083019966335137.stgit@devnote2
---
 arch/x86/kernel/kprobes/core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 9b31790..f6ec57f 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -467,8 +467,7 @@ static void set_resume_flags(struct kprobe *p, struct insn *insn)
 			p->ainsn.is_call = 1;
 			p->ainsn.is_abs_ip = 1;
 			break;
-		} else if (((opcode & 0x31) == 0x20) ||
-			   ((opcode & 0x31) == 0x21)) {
+		} else if ((opcode & 0x30) == 0x20) {
 			/*
 			 * jmp near and far, absolute indirect
 			 * ip is correct.

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [tip: x86/core] x86/kprobes: Retrieve correct opcode for group instruction
  2021-03-02 15:25         ` [PATCH -tip 1/3] x86/kprobes: Retrieve correct opcode for group instruction Masami Hiramatsu
@ 2021-03-23 15:15           ` tip-bot2 for Masami Hiramatsu
  0 siblings, 0 replies; 41+ messages in thread
From: tip-bot2 for Masami Hiramatsu @ 2021-03-23 15:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Masami Hiramatsu, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the x86/core branch of tip:

Commit-ID:     d60ad3d46f1d04a282c56159f1deb675c12733fd
Gitweb:        https://git.kernel.org/tip/d60ad3d46f1d04a282c56159f1deb675c12733fd
Author:        Masami Hiramatsu <mhiramat@kernel.org>
AuthorDate:    Wed, 03 Mar 2021 00:25:24 +09:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 23 Mar 2021 16:07:55 +01:00

x86/kprobes: Retrieve correct opcode for group instruction

Since the opcodes start from 0xff are group5 instruction group which is
not 2 bytes opcode but the extended opcode determined by the MOD/RM byte.

The commit abd82e533d88 ("x86/kprobes: Do not decode opcode in resume_execution()")
used insn->opcode.bytes[1], but that is not correct. We have to refer
the insn->modrm.bytes[1] instead.

Fixes: abd82e533d88 ("x86/kprobes: Do not decode opcode in resume_execution()")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/161469872400.49483.18214724458034233166.stgit@devnote2
---
 arch/x86/kernel/kprobes/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 60a540f..9b31790 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -453,7 +453,11 @@ static void set_resume_flags(struct kprobe *p, struct insn *insn)
 		break;
 #endif
 	case 0xff:
-		opcode = insn->opcode.bytes[1];
+		/*
+		 * Since the 0xff is an extended group opcode, the instruction
+		 * is determined by the MOD/RM byte.
+		 */
+		opcode = insn->modrm.bytes[0];
 		if ((opcode & 0x30) == 0x10) {
 			/*
 			 * call absolute, indirect

^ permalink raw reply related	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2021-03-23 15:15 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-23 23:24 Why do kprobes and uprobes singlestep? Andy Lutomirski
2021-02-24  1:17 ` Masami Hiramatsu
2021-02-24 19:45   ` Andy Lutomirski
2021-02-25  2:22     ` Masami Hiramatsu
2021-02-25  6:03       ` Andy Lutomirski
2021-02-25  9:11         ` Masami Hiramatsu
2021-03-01 14:08       ` [RFC PATCH 0/1] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
2021-03-01 14:08         ` [RFC PATCH 1/1] x86/kprobes: Use int3 instead of debug trap for single-step Masami Hiramatsu
2021-03-02  8:06           ` Peter Zijlstra
2021-03-02  8:38           ` Peter Zijlstra
2021-03-02  8:41           ` Peter Zijlstra
2021-03-02  8:54             ` Peter Zijlstra
2021-03-02 12:51               ` Masami Hiramatsu
2021-03-02 13:58               ` Peter Zijlstra
2021-03-02 15:25       ` [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
2021-03-02 15:25         ` [PATCH -tip 1/3] x86/kprobes: Retrieve correct opcode for group instruction Masami Hiramatsu
2021-03-23 15:15           ` [tip: x86/core] " tip-bot2 for Masami Hiramatsu
2021-03-02 15:25         ` [PATCH -tip 2/3] x86/kprobes: Identify far indirect JMP correctly Masami Hiramatsu
2021-03-23 15:15           ` [tip: x86/core] " tip-bot2 for Masami Hiramatsu
2021-03-02 15:25         ` [PATCH -tip 3/3] x86/kprobes: Use int3 instead of debug trap for single-step Masami Hiramatsu
2021-03-23 15:15           ` [tip: x86/core] " tip-bot2 for Masami Hiramatsu
2021-03-17 14:55         ` [PATCH -tip 0/3] x86/kprobes: Remoev single-step trap from x86 kprobes Masami Hiramatsu
2021-03-17 16:26           ` Peter Zijlstra
2021-03-17 17:45             ` Andy Lutomirski
2021-02-25  9:59     ` Why do kprobes and uprobes singlestep? Peter Zijlstra
2021-03-01 16:51 ` Oleg Nesterov
2021-03-02  1:36   ` Andy Lutomirski
2021-03-02 20:24     ` Alexei Starovoitov
2021-03-02 21:02       ` Andy Lutomirski
2021-03-03  1:22         ` Alexei Starovoitov
2021-03-03  1:46           ` Andy Lutomirski
2021-03-03  2:18             ` Alexei Starovoitov
2021-03-03 13:27               ` Oleg Nesterov
2021-03-03 18:11               ` Daniel Xu
2021-03-03 19:14                 ` Andy Lutomirski
2021-03-02 20:25     ` Oleg Nesterov
2021-03-02 20:35       ` Andy Lutomirski
2021-03-02 20:28     ` Oleg Nesterov
2021-03-02  2:22   ` Masami Hiramatsu
2021-03-02  2:48     ` Andy Lutomirski
2021-03-02 20:31     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).