All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] kprobes crashing because of preempt count
@ 2011-06-30 13:23 Steven Rostedt
  2011-06-30 15:51 ` [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes Steven Rostedt
  2011-07-01  1:12 ` [BUG] kprobes crashing because of preempt count Masami Hiramatsu
  0 siblings, 2 replies; 27+ messages in thread
From: Steven Rostedt @ 2011-06-30 13:23 UTC (permalink / raw)
  To: LKML
  Cc: Masami Hiramatsu, Peter Zijlstra, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar

Hi Masami,

While testing some changes in -rt against kprobes, I hit a crash that
exists in mainline. If we stick a probe in a location that reads
preempt_count, we corrupt the kernel itself.

Reason is that the kprobe single step handler disables preemption, sets
up the single step, returns to the code to execute that single step,
takes the trap, enables preemption, and continues.

The issue, is because we disabled preemption in the trap, returned, then
enabled it again in another trap, we just changed what the code sees
that does that single step.

If we add a kprobe on a inc_preempt_count() call:

	[ preempt_count = 0 ]

	ld preempt_count, %eax	<<--- trap

		<trap>
		preempt_disable();
		[ preempt_count = 1]
		setup_singlestep();
		<trap return>

	[ preempt_count = 1 ]

	ld preempt_count, %eax

	[ %eax = 1 ]

		<trap>
		post_kprobe_handler()
			preempt_enable_no_resched();
			[ preempt_count = 0 ]
		<trap return>

	[ %eax = 1 ]

	add %eax,1

	[ %eax = 2 ]

	st %eax, preempt_count

	[ preempt_count = 2 ]


We just caused preempt count to increment twice when it should have only
incremented once, and this screws everything else up.

Do we really need to have preemption disabled throughout this? Is it
because we don't want to migrate or call schedule? Not sure what the
best way to fix this is. Perhaps we add a kprobe_preempt_disable() that
is checked as well?

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-06-30 13:23 [BUG] kprobes crashing because of preempt count Steven Rostedt
@ 2011-06-30 15:51 ` Steven Rostedt
  2011-06-30 16:14   ` Frederic Weisbecker
                     ` (3 more replies)
  2011-07-01  1:12 ` [BUG] kprobes crashing because of preempt count Masami Hiramatsu
  1 sibling, 4 replies; 27+ messages in thread
From: Steven Rostedt @ 2011-06-30 15:51 UTC (permalink / raw)
  To: LKML
  Cc: Masami Hiramatsu, Peter Zijlstra, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar, Andrew Morton

Kprobes requires preemption to be disabled as it single steps the code
it replaced with a breakpoint. But because the code that is single
stepped could be reading the preempt count, the kprobe disabling of the
preempt count can cause the wrong value to end up as a result. Here's an
example:

If we add a kprobe on a inc_preempt_count() call:

	[ preempt_count = 0 ]

	ld preempt_count, %eax	<<--- trap

		<trap>
		preempt_disable();
		[ preempt_count = 1]
		setup_singlestep();
		<trap return>

	[ preempt_count = 1 ]

	ld preempt_count, %eax

	[ %eax = 1 ]

		<trap>
		post_kprobe_handler()
			preempt_enable_no_resched();
			[ preempt_count = 0 ]
		<trap return>

	[ %eax = 1 ]

	add %eax,1

	[ %eax = 2 ]

	st %eax, preempt_count

	[ preempt_count = 2 ]


We just caused preempt count to increment twice when it should have only
incremented once, and this screws everything else up.

To solve this, I've added a per_cpu variable called
kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
the preempt_schedule() will not preempt the code.

I did not update task_struct for a separate variable for kprobes to test
because it would just bloat that structure with one more test. Adding a
flag per cpu is much better than adding one per task. The kprobes are
slow anyway, so causing it to do a little bit more work is not a
problem.

I tested this code against the probes that caused my previous crashes,
and it works fine.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index f1a6244..2ed67e6 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -475,7 +475,7 @@ static void __kprobes setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 		 * stepping.
 		 */
 		regs->ip = (unsigned long)p->ainsn.insn;
-		preempt_enable_no_resched();
+		kprobe_preempt_enable();
 		return;
 	}
 #endif
@@ -547,7 +547,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
 	 * re-enable preemption at the end of this function,
 	 * and also in reenter_kprobe() and setup_singlestep().
 	 */
-	preempt_disable();
+	kprobe_preempt_disable();
 
 	kcb = get_kprobe_ctlblk();
 	p = get_kprobe(addr);
@@ -583,7 +583,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
 		 * the original instruction.
 		 */
 		regs->ip = (unsigned long)addr;
-		preempt_enable_no_resched();
+		kprobe_preempt_enable();
 		return 1;
 	} else if (kprobe_running()) {
 		p = __this_cpu_read(current_kprobe);
@@ -593,7 +593,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
 		}
 	} /* else: not a kprobe fault; let the kernel handle it */
 
-	preempt_enable_no_resched();
+	kprobe_preempt_enable();
 	return 0;
 }
 
@@ -917,7 +917,7 @@ static int __kprobes post_kprobe_handler(struct pt_regs *regs)
 	}
 	reset_current_kprobe();
 out:
-	preempt_enable_no_resched();
+	kprobe_preempt_enable();
 
 	/*
 	 * if somebody else is singlestepping across a probe point, flags
@@ -951,7 +951,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
 			restore_previous_kprobe(kcb);
 		else
 			reset_current_kprobe();
-		preempt_enable_no_resched();
+		kprobe_preempt_enable();
 		break;
 	case KPROBE_HIT_ACTIVE:
 	case KPROBE_HIT_SSDONE:
@@ -1098,7 +1098,7 @@ int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
 		memcpy((kprobe_opcode_t *)(kcb->jprobe_saved_sp),
 		       kcb->jprobes_stack,
 		       MIN_STACK_SIZE(kcb->jprobe_saved_sp));
-		preempt_enable_no_resched();
+		kprobe_preempt_enable();
 		return 1;
 	}
 	return 0;
@@ -1530,7 +1530,7 @@ static int  __kprobes setup_detour_execution(struct kprobe *p,
 		regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
 		if (!reenter)
 			reset_current_kprobe();
-		preempt_enable_no_resched();
+		kprobe_preempt_enable();
 		return 1;
 	}
 	return 0;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a837b20..c7c423e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -258,6 +258,24 @@ extern spinlock_t mmlist_lock;
 
 struct task_struct;
 
+#ifdef CONFIG_KPROBES
+DECLARE_PER_CPU(int, kprobe_preempt_disabled);
+
+static inline void kprobe_preempt_disable(void)
+{
+	preempt_disable();
+	__get_cpu_var(kprobe_preempt_disabled) = 1;
+	preempt_enable_no_resched();
+}
+
+static inline void kprobe_preempt_enable(void)
+{
+	preempt_disable();
+	__get_cpu_var(kprobe_preempt_disabled) = 0;
+	preempt_enable_no_resched();
+}
+#endif
+
 #ifdef CONFIG_PROVE_RCU
 extern int lockdep_tasklist_lock_is_held(void);
 #endif /* #ifdef CONFIG_PROVE_RCU */
diff --git a/kernel/sched.c b/kernel/sched.c
index 3f2e502..990d53d 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -443,6 +443,30 @@ static struct root_domain def_root_domain;
 
 #endif /* CONFIG_SMP */
 
+#ifdef CONFIG_KPROBES
+/*
+ * Kprobes can not disable preempting with the preempt_count.
+ * If it does, and it single steps kernel code that modifies
+ * the preempt_count, it will corrupt the result.
+ *
+ * Since kprobes is slow anyways, we do not need to bloat the
+ * task_struct or thread_struct with another variable to test.
+ * Simply use a per_cpu variable and require the kprobe code
+ * to disable preemption normally to set it.
+ */
+DEFINE_PER_CPU(int, kprobe_preempt_disabled);
+
+static inline int preempt_disabled_kprobe(void)
+{
+	return __get_cpu_var(kprobe_preempt_disabled);
+}
+#else
+static inline int preempt_disabled_kprobe(void)
+{
+	return 0;
+}
+#endif
+
 /*
  * This is the main, per-CPU runqueue data structure.
  *
@@ -4106,7 +4130,7 @@ void __kprobes add_preempt_count(int val)
 	DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
 				PREEMPT_MASK - 10);
 #endif
-	if (preempt_count() == val)
+	if (preempt_count() == val && !preempt_disabled_kprobe())
 		trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
 }
 EXPORT_SYMBOL(add_preempt_count);
@@ -4127,7 +4151,7 @@ void __kprobes sub_preempt_count(int val)
 		return;
 #endif
 
-	if (preempt_count() == val)
+	if (preempt_count() == val && !preempt_disabled_kprobe())
 		trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
 	preempt_count() -= val;
 }
@@ -4373,6 +4397,10 @@ asmlinkage void __sched notrace preempt_schedule(void)
 
 	do {
 		add_preempt_count_notrace(PREEMPT_ACTIVE);
+		if (unlikely(preempt_disabled_kprobe())) {
+			sub_preempt_count_notrace(PREEMPT_ACTIVE);
+			break;
+		}
 		schedule();
 		sub_preempt_count_notrace(PREEMPT_ACTIVE);
 



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-06-30 15:51 ` [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes Steven Rostedt
@ 2011-06-30 16:14   ` Frederic Weisbecker
  2011-06-30 16:46     ` Steven Rostedt
  2011-06-30 19:40   ` Jason Baron
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Frederic Weisbecker @ 2011-06-30 16:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Andrew Morton

On Thu, Jun 30, 2011 at 11:51:57AM -0400, Steven Rostedt wrote:
> Kprobes requires preemption to be disabled as it single steps the code
> it replaced with a breakpoint. But because the code that is single
> stepped could be reading the preempt count, the kprobe disabling of the
> preempt count can cause the wrong value to end up as a result. Here's an
> example:
> 
> If we add a kprobe on a inc_preempt_count() call:
> 
> 	[ preempt_count = 0 ]
> 
> 	ld preempt_count, %eax	<<--- trap
> 
> 		<trap>
> 		preempt_disable();
> 		[ preempt_count = 1]
> 		setup_singlestep();
> 		<trap return>
> 
> 	[ preempt_count = 1 ]
> 
> 	ld preempt_count, %eax
> 
> 	[ %eax = 1 ]
> 
> 		<trap>
> 		post_kprobe_handler()
> 			preempt_enable_no_resched();
> 			[ preempt_count = 0 ]
> 		<trap return>
> 
> 	[ %eax = 1 ]
> 
> 	add %eax,1
> 
> 	[ %eax = 2 ]
> 
> 	st %eax, preempt_count
> 
> 	[ preempt_count = 2 ]
> 
> 
> We just caused preempt count to increment twice when it should have only
> incremented once, and this screws everything else up.
> 
> To solve this, I've added a per_cpu variable called
> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
> the preempt_schedule() will not preempt the code.
> 
> I did not update task_struct for a separate variable for kprobes to test
> because it would just bloat that structure with one more test. Adding a
> flag per cpu is much better than adding one per task. The kprobes are
> slow anyway, so causing it to do a little bit more work is not a
> problem.
> 
> I tested this code against the probes that caused my previous crashes,
> and it works fine.

That's a bit sad we need to bloat preempt_schedule() with a new test, especially
as kprobes should not add overhead in the off case and then I guess many distros
enable kprobes by default, so it's probably not just enabled on some debug kernel.

Another idea could be to turn current_thread_info()->preempt_count into a local_t
var which ensures a single incrementation is performed in a single instruction.

Well, in the hope that every arch can do something like:

inc (mem_addr)

IIRC, I believe arm can't... And the default local_inc is probably not helpful
in our case...

> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> 
> diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
> index f1a6244..2ed67e6 100644
> --- a/arch/x86/kernel/kprobes.c
> +++ b/arch/x86/kernel/kprobes.c
> @@ -475,7 +475,7 @@ static void __kprobes setup_singlestep(struct kprobe *p, struct pt_regs *regs,
>  		 * stepping.
>  		 */
>  		regs->ip = (unsigned long)p->ainsn.insn;
> -		preempt_enable_no_resched();
> +		kprobe_preempt_enable();
>  		return;
>  	}
>  #endif
> @@ -547,7 +547,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
>  	 * re-enable preemption at the end of this function,
>  	 * and also in reenter_kprobe() and setup_singlestep().
>  	 */
> -	preempt_disable();
> +	kprobe_preempt_disable();
>  
>  	kcb = get_kprobe_ctlblk();
>  	p = get_kprobe(addr);
> @@ -583,7 +583,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
>  		 * the original instruction.
>  		 */
>  		regs->ip = (unsigned long)addr;
> -		preempt_enable_no_resched();
> +		kprobe_preempt_enable();
>  		return 1;
>  	} else if (kprobe_running()) {
>  		p = __this_cpu_read(current_kprobe);
> @@ -593,7 +593,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
>  		}
>  	} /* else: not a kprobe fault; let the kernel handle it */
>  
> -	preempt_enable_no_resched();
> +	kprobe_preempt_enable();
>  	return 0;
>  }
>  
> @@ -917,7 +917,7 @@ static int __kprobes post_kprobe_handler(struct pt_regs *regs)
>  	}
>  	reset_current_kprobe();
>  out:
> -	preempt_enable_no_resched();
> +	kprobe_preempt_enable();
>  
>  	/*
>  	 * if somebody else is singlestepping across a probe point, flags
> @@ -951,7 +951,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
>  			restore_previous_kprobe(kcb);
>  		else
>  			reset_current_kprobe();
> -		preempt_enable_no_resched();
> +		kprobe_preempt_enable();
>  		break;
>  	case KPROBE_HIT_ACTIVE:
>  	case KPROBE_HIT_SSDONE:
> @@ -1098,7 +1098,7 @@ int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
>  		memcpy((kprobe_opcode_t *)(kcb->jprobe_saved_sp),
>  		       kcb->jprobes_stack,
>  		       MIN_STACK_SIZE(kcb->jprobe_saved_sp));
> -		preempt_enable_no_resched();
> +		kprobe_preempt_enable();
>  		return 1;
>  	}
>  	return 0;
> @@ -1530,7 +1530,7 @@ static int  __kprobes setup_detour_execution(struct kprobe *p,
>  		regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
>  		if (!reenter)
>  			reset_current_kprobe();
> -		preempt_enable_no_resched();
> +		kprobe_preempt_enable();
>  		return 1;
>  	}
>  	return 0;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index a837b20..c7c423e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -258,6 +258,24 @@ extern spinlock_t mmlist_lock;
>  
>  struct task_struct;
>  
> +#ifdef CONFIG_KPROBES
> +DECLARE_PER_CPU(int, kprobe_preempt_disabled);
> +
> +static inline void kprobe_preempt_disable(void)
> +{
> +	preempt_disable();
> +	__get_cpu_var(kprobe_preempt_disabled) = 1;
> +	preempt_enable_no_resched();
> +}
> +
> +static inline void kprobe_preempt_enable(void)
> +{
> +	preempt_disable();
> +	__get_cpu_var(kprobe_preempt_disabled) = 0;
> +	preempt_enable_no_resched();
> +}
> +#endif
> +
>  #ifdef CONFIG_PROVE_RCU
>  extern int lockdep_tasklist_lock_is_held(void);
>  #endif /* #ifdef CONFIG_PROVE_RCU */
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 3f2e502..990d53d 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -443,6 +443,30 @@ static struct root_domain def_root_domain;
>  
>  #endif /* CONFIG_SMP */
>  
> +#ifdef CONFIG_KPROBES
> +/*
> + * Kprobes can not disable preempting with the preempt_count.
> + * If it does, and it single steps kernel code that modifies
> + * the preempt_count, it will corrupt the result.
> + *
> + * Since kprobes is slow anyways, we do not need to bloat the
> + * task_struct or thread_struct with another variable to test.
> + * Simply use a per_cpu variable and require the kprobe code
> + * to disable preemption normally to set it.
> + */
> +DEFINE_PER_CPU(int, kprobe_preempt_disabled);
> +
> +static inline int preempt_disabled_kprobe(void)
> +{
> +	return __get_cpu_var(kprobe_preempt_disabled);
> +}
> +#else
> +static inline int preempt_disabled_kprobe(void)
> +{
> +	return 0;
> +}
> +#endif
> +
>  /*
>   * This is the main, per-CPU runqueue data structure.
>   *
> @@ -4106,7 +4130,7 @@ void __kprobes add_preempt_count(int val)
>  	DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
>  				PREEMPT_MASK - 10);
>  #endif
> -	if (preempt_count() == val)
> +	if (preempt_count() == val && !preempt_disabled_kprobe())
>  		trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
>  }
>  EXPORT_SYMBOL(add_preempt_count);
> @@ -4127,7 +4151,7 @@ void __kprobes sub_preempt_count(int val)
>  		return;
>  #endif
>  
> -	if (preempt_count() == val)
> +	if (preempt_count() == val && !preempt_disabled_kprobe())
>  		trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
>  	preempt_count() -= val;
>  }
> @@ -4373,6 +4397,10 @@ asmlinkage void __sched notrace preempt_schedule(void)
>  
>  	do {
>  		add_preempt_count_notrace(PREEMPT_ACTIVE);
> +		if (unlikely(preempt_disabled_kprobe())) {
> +			sub_preempt_count_notrace(PREEMPT_ACTIVE);
> +			break;
> +		}
>  		schedule();
>  		sub_preempt_count_notrace(PREEMPT_ACTIVE);
>  
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-06-30 16:14   ` Frederic Weisbecker
@ 2011-06-30 16:46     ` Steven Rostedt
  0 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2011-06-30 16:46 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Masami Hiramatsu, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Andrew Morton

On Thu, 2011-06-30 at 18:14 +0200, Frederic Weisbecker wrote:


> That's a bit sad we need to bloat preempt_schedule() with a new test, especially
> as kprobes should not add overhead in the off case and then I guess many distros
> enable kprobes by default, so it's probably not just enabled on some debug kernel.

A simple per_cpu var test is not that bad, and that's also why I put it
where I did. It only gets checked after all the other locations fail. I
doubt this really adds any measurable overhead. Note, most distro's
don't even enable CONFIG_PREEMPT so this isn't even touched by them.

> 
> Another idea could be to turn current_thread_info()->preempt_count into a local_t
> var which ensures a single incrementation is performed in a single instruction.

Not all archs support such a thing.

> 
> Well, in the hope that every arch can do something like:
> 
> inc (mem_addr)
> 
> IIRC, I believe arm can't... And the default local_inc is probably not helpful
> in our case...

Some archs (I believe) perform local_inc() slower than i++. In which
case, this solution will more likely slow things down even more than
what I proposed. As the impact will be on every preempt_disable and
enable.

And this would not even help for the place that I actually hit the crash
on. Which was in the scheduler. It wasn't the addition of the
preempt_count that caused my crash, it was just reading it.

	if (unlikely(in_atomic_preempt_off() && !prev->exit_state))
		__schedule_bug(prev);

The in_atomic_preempt_off() is what blew up on me.

#define in_atomic_preempt_off() \
		((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_CHECK_OFFSET)

What happened was that the reading of preempt_count was single stepped.
And that had preempt_count set to one more due to the preempt_disable()
in kprobes. Even if preempt_count was a local_t, this bug would have
still triggered, and my system would have crashed anyway. (every
schedule caused a printk to happen).


-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-06-30 15:51 ` [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes Steven Rostedt
  2011-06-30 16:14   ` Frederic Weisbecker
@ 2011-06-30 19:40   ` Jason Baron
  2011-06-30 19:42     ` Steven Rostedt
  2011-06-30 21:56   ` Peter Zijlstra
  2011-07-01  5:09   ` Masami Hiramatsu
  3 siblings, 1 reply; 27+ messages in thread
From: Jason Baron @ 2011-06-30 19:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Peter Zijlstra, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar, Andrew Morton

On Thu, Jun 30, 2011 at 11:51:57AM -0400, Steven Rostedt wrote:
> Kprobes requires preemption to be disabled as it single steps the code
> it replaced with a breakpoint. But because the code that is single
> stepped could be reading the preempt count, the kprobe disabling of the
> preempt count can cause the wrong value to end up as a result. Here's an
> example:
> 
> If we add a kprobe on a inc_preempt_count() call:
> 
> 	[ preempt_count = 0 ]
> 
> 	ld preempt_count, %eax	<<--- trap
> 
> 		<trap>
> 		preempt_disable();
> 		[ preempt_count = 1]
> 		setup_singlestep();
> 		<trap return>
> 
> 	[ preempt_count = 1 ]
> 
> 	ld preempt_count, %eax
> 
> 	[ %eax = 1 ]
> 
> 		<trap>
> 		post_kprobe_handler()
> 			preempt_enable_no_resched();
> 			[ preempt_count = 0 ]
> 		<trap return>
> 
> 	[ %eax = 1 ]
> 
> 	add %eax,1
> 
> 	[ %eax = 2 ]
> 
> 	st %eax, preempt_count
> 
> 	[ preempt_count = 2 ]
> 
> 
> We just caused preempt count to increment twice when it should have only
> incremented once, and this screws everything else up.
> 
> To solve this, I've added a per_cpu variable called
> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
> the preempt_schedule() will not preempt the code.
> 
> I did not update task_struct for a separate variable for kprobes to test
> because it would just bloat that structure with one more test. Adding a
> flag per cpu is much better than adding one per task. The kprobes are
> slow anyway, so causing it to do a little bit more work is not a
> problem.
> 
> I tested this code against the probes that caused my previous crashes,
> and it works fine.
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> 
> diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
> index f1a6244..2ed67e6 100644
> --- a/arch/x86/kernel/kprobes.c
> +++ b/arch/x86/kernel/kprobes.c

Since this is an issue for all arches, shouldn't this include the
transform:

preempt_enable_no_resched() -> kprobe_preempt_enable()

and

preempt_disable() -> kprobe_preempt_disable()

for all arches?

Or, is that a follow-up patch?

Thanks,

-Jason


> @@ -475,7 +475,7 @@ static void __kprobes setup_singlestep(struct kprobe *p, struct pt_regs *regs,
>  		 * stepping.
>  		 */
>  		regs->ip = (unsigned long)p->ainsn.insn;
> -		preempt_enable_no_resched();
> +		kprobe_preempt_enable();
>  		return;
>  	}
>  #endif
> @@ -547,7 +547,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
>  	 * re-enable preemption at the end of this function,
>  	 * and also in reenter_kprobe() and setup_singlestep().
>  	 */
> -	preempt_disable();
> +	kprobe_preempt_disable();
>  
>  	kcb = get_kprobe_ctlblk();
>  	p = get_kprobe(addr);
> @@ -583,7 +583,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
>  		 * the original instruction.
>  		 */
>  		regs->ip = (unsigned long)addr;
> -		preempt_enable_no_resched();
> +		kprobe_preempt_enable();
>  		return 1;
>  	} else if (kprobe_running()) {
>  		p = __this_cpu_read(current_kprobe);
> @@ -593,7 +593,7 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
>  		}
>  	} /* else: not a kprobe fault; let the kernel handle it */
>  
> -	preempt_enable_no_resched();
> +	kprobe_preempt_enable();
>  	return 0;
>  }
>  
> @@ -917,7 +917,7 @@ static int __kprobes post_kprobe_handler(struct pt_regs *regs)
>  	}
>  	reset_current_kprobe();
>  out:
> -	preempt_enable_no_resched();
> +	kprobe_preempt_enable();
>  
>  	/*
>  	 * if somebody else is singlestepping across a probe point, flags
> @@ -951,7 +951,7 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
>  			restore_previous_kprobe(kcb);
>  		else
>  			reset_current_kprobe();
> -		preempt_enable_no_resched();
> +		kprobe_preempt_enable();
>  		break;
>  	case KPROBE_HIT_ACTIVE:
>  	case KPROBE_HIT_SSDONE:
> @@ -1098,7 +1098,7 @@ int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
>  		memcpy((kprobe_opcode_t *)(kcb->jprobe_saved_sp),
>  		       kcb->jprobes_stack,
>  		       MIN_STACK_SIZE(kcb->jprobe_saved_sp));
> -		preempt_enable_no_resched();
> +		kprobe_preempt_enable();
>  		return 1;
>  	}
>  	return 0;
> @@ -1530,7 +1530,7 @@ static int  __kprobes setup_detour_execution(struct kprobe *p,
>  		regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
>  		if (!reenter)
>  			reset_current_kprobe();
> -		preempt_enable_no_resched();
> +		kprobe_preempt_enable();
>  		return 1;
>  	}
>  	return 0;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index a837b20..c7c423e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -258,6 +258,24 @@ extern spinlock_t mmlist_lock;
>  
>  struct task_struct;
>  
> +#ifdef CONFIG_KPROBES
> +DECLARE_PER_CPU(int, kprobe_preempt_disabled);
> +
> +static inline void kprobe_preempt_disable(void)
> +{
> +	preempt_disable();
> +	__get_cpu_var(kprobe_preempt_disabled) = 1;
> +	preempt_enable_no_resched();
> +}
> +
> +static inline void kprobe_preempt_enable(void)
> +{
> +	preempt_disable();
> +	__get_cpu_var(kprobe_preempt_disabled) = 0;
> +	preempt_enable_no_resched();
> +}
> +#endif
> +
>  #ifdef CONFIG_PROVE_RCU
>  extern int lockdep_tasklist_lock_is_held(void);
>  #endif /* #ifdef CONFIG_PROVE_RCU */
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 3f2e502..990d53d 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -443,6 +443,30 @@ static struct root_domain def_root_domain;
>  
>  #endif /* CONFIG_SMP */
>  
> +#ifdef CONFIG_KPROBES
> +/*
> + * Kprobes can not disable preempting with the preempt_count.
> + * If it does, and it single steps kernel code that modifies
> + * the preempt_count, it will corrupt the result.
> + *
> + * Since kprobes is slow anyways, we do not need to bloat the
> + * task_struct or thread_struct with another variable to test.
> + * Simply use a per_cpu variable and require the kprobe code
> + * to disable preemption normally to set it.
> + */
> +DEFINE_PER_CPU(int, kprobe_preempt_disabled);
> +
> +static inline int preempt_disabled_kprobe(void)
> +{
> +	return __get_cpu_var(kprobe_preempt_disabled);
> +}
> +#else
> +static inline int preempt_disabled_kprobe(void)
> +{
> +	return 0;
> +}
> +#endif
> +
>  /*
>   * This is the main, per-CPU runqueue data structure.
>   *
> @@ -4106,7 +4130,7 @@ void __kprobes add_preempt_count(int val)
>  	DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
>  				PREEMPT_MASK - 10);
>  #endif
> -	if (preempt_count() == val)
> +	if (preempt_count() == val && !preempt_disabled_kprobe())
>  		trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
>  }
>  EXPORT_SYMBOL(add_preempt_count);
> @@ -4127,7 +4151,7 @@ void __kprobes sub_preempt_count(int val)
>  		return;
>  #endif
>  
> -	if (preempt_count() == val)
> +	if (preempt_count() == val && !preempt_disabled_kprobe())
>  		trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
>  	preempt_count() -= val;
>  }
> @@ -4373,6 +4397,10 @@ asmlinkage void __sched notrace preempt_schedule(void)
>  
>  	do {
>  		add_preempt_count_notrace(PREEMPT_ACTIVE);
> +		if (unlikely(preempt_disabled_kprobe())) {
> +			sub_preempt_count_notrace(PREEMPT_ACTIVE);
> +			break;
> +		}
>  		schedule();
>  		sub_preempt_count_notrace(PREEMPT_ACTIVE);
>  
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-06-30 19:40   ` Jason Baron
@ 2011-06-30 19:42     ` Steven Rostedt
  0 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2011-06-30 19:42 UTC (permalink / raw)
  To: Jason Baron
  Cc: LKML, Masami Hiramatsu, Peter Zijlstra, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar, Andrew Morton

On Thu, 2011-06-30 at 15:40 -0400, Jason Baron wrote:

> Since this is an issue for all arches, shouldn't this include the
> transform:
> 
> preempt_enable_no_resched() -> kprobe_preempt_enable()
> 
> and
> 
> preempt_disable() -> kprobe_preempt_disable()
> 
> for all arches?
> 
> Or, is that a follow-up patch?
> 

I'm cross compiling all my changes as I type this ;)

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-06-30 15:51 ` [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes Steven Rostedt
  2011-06-30 16:14   ` Frederic Weisbecker
  2011-06-30 19:40   ` Jason Baron
@ 2011-06-30 21:56   ` Peter Zijlstra
  2011-07-01  1:22     ` Masami Hiramatsu
  2011-07-01  5:09   ` Masami Hiramatsu
  3 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2011-06-30 21:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Masami Hiramatsu, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, Andrew Morton

On Thu, 2011-06-30 at 11:51 -0400, Steven Rostedt wrote:
> 
> To solve this, I've added a per_cpu variable called
> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
> the preempt_schedule() will not preempt the code.

Damn this is ugly. Can we step back and see if we can make the
requirement for kprobe to disable preemption go away?

Why does it have to do that anyway? Isn't it keeping enough per-task
state to allow preemption over the single step?



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] kprobes crashing because of preempt count
  2011-06-30 13:23 [BUG] kprobes crashing because of preempt count Steven Rostedt
  2011-06-30 15:51 ` [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes Steven Rostedt
@ 2011-07-01  1:12 ` Masami Hiramatsu
  2011-07-01  1:33   ` Steven Rostedt
  2011-07-01 11:36   ` Ananth N Mavinakayanahalli
  1 sibling, 2 replies; 27+ messages in thread
From: Masami Hiramatsu @ 2011-07-01  1:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, yrl.pp-manager.tt

(2011/06/30 22:23), Steven Rostedt wrote:
> Hi Masami,
> 
> While testing some changes in -rt against kprobes, I hit a crash that
> exists in mainline. If we stick a probe in a location that reads
> preempt_count, we corrupt the kernel itself.
> 
> Reason is that the kprobe single step handler disables preemption, sets
> up the single step, returns to the code to execute that single step,
> takes the trap, enables preemption, and continues.
> 
> The issue, is because we disabled preemption in the trap, returned, then
> enabled it again in another trap, we just changed what the code sees
> that does that single step.
> 
> If we add a kprobe on a inc_preempt_count() call:
> 
> 	[ preempt_count = 0 ]
> 
> 	ld preempt_count, %eax	<<--- trap
> 
> 		<trap>
> 		preempt_disable();
> 		[ preempt_count = 1]
> 		setup_singlestep();
> 		<trap return>
> 
> 	[ preempt_count = 1 ]
> 
> 	ld preempt_count, %eax
> 
> 	[ %eax = 1 ]
> 
> 		<trap>
> 		post_kprobe_handler()
> 			preempt_enable_no_resched();
> 			[ preempt_count = 0 ]
> 		<trap return>
> 
> 	[ %eax = 1 ]
> 
> 	add %eax,1
> 
> 	[ %eax = 2 ]
> 
> 	st %eax, preempt_count
> 
> 	[ preempt_count = 2 ]
> 
> 
> We just caused preempt count to increment twice when it should have only
> incremented once, and this screws everything else up.

Ah! right!

> Do we really need to have preemption disabled throughout this? Is it
> because we don't want to migrate or call schedule? Not sure what the
> best way to fix this is. Perhaps we add a kprobe_preempt_disable() that
> is checked as well?

I think the best way to do that is just removing preemption disabling
code, because
- breakpoint exception itself disables interrupt (at least on x86)
- While single stepping, interrupts also be disabled.
(BTW, theoretically, boosted and optimized kprobes shouldn't have
this problem, because those doesn't execute single-stepping)

So, I think there is no reason of disabling preemption.

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-06-30 21:56   ` Peter Zijlstra
@ 2011-07-01  1:22     ` Masami Hiramatsu
  2011-07-01  1:38       ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Masami Hiramatsu @ 2011-07-01  1:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, LKML, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, Andrew Morton, yrl.pp-manager.tt

(2011/07/01 6:56), Peter Zijlstra wrote:
> On Thu, 2011-06-30 at 11:51 -0400, Steven Rostedt wrote:
>>
>> To solve this, I've added a per_cpu variable called
>> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
>> the preempt_schedule() will not preempt the code.

Sorry for replying so late :(

> Damn this is ugly. Can we step back and see if we can make the
> requirement for kprobe to disable preemption go away?

As I replied right now, I think we can just eliminate that
disabling preemption code. At least we'd better try it.
I agree with you, introducing this kind of complexity
just for kprobes is not what I want. :(

> Why does it have to do that anyway? Isn't it keeping enough per-task
> state to allow preemption over the single step?

preemption itself must not happen on single stepping, but it seems
impossible to do heavy context switching with setting TF bit...

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] kprobes crashing because of preempt count
  2011-07-01  1:12 ` [BUG] kprobes crashing because of preempt count Masami Hiramatsu
@ 2011-07-01  1:33   ` Steven Rostedt
  2011-07-01  2:23     ` Masami Hiramatsu
  2011-07-01 11:36   ` Ananth N Mavinakayanahalli
  1 sibling, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2011-07-01  1:33 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: LKML, Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, yrl.pp-manager.tt

On Fri, 2011-07-01 at 10:12 +0900, Masami Hiramatsu wrote:

> > Do we really need to have preemption disabled throughout this? Is it
> > because we don't want to migrate or call schedule? Not sure what the
> > best way to fix this is. Perhaps we add a kprobe_preempt_disable() that
> > is checked as well?
> 
> I think the best way to do that is just removing preemption disabling
> code, because
> - breakpoint exception itself disables interrupt (at least on x86)
> - While single stepping, interrupts also be disabled.

I guess the above point is critical. If interrupts are disabled through
out the entire walk through, then we are fine, as that just guarantees
preemption is disabled anyway. But! if it does get enabled anywhere,
then we will have issues as the two traps require using the same state
data that is stored per cpu.

> (BTW, theoretically, boosted and optimized kprobes shouldn't have
> this problem, because those doesn't execute single-stepping)

Does the optimized kprobes even disable preemption?

> 
> So, I think there is no reason of disabling preemption.

That would be the best solution.

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-07-01  1:22     ` Masami Hiramatsu
@ 2011-07-01  1:38       ` Steven Rostedt
  2011-07-01  1:52         ` Masami Hiramatsu
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2011-07-01  1:38 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, LKML, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, Andrew Morton, yrl.pp-manager.tt,
	Benjamin Herrenschmidt, Hans-Christian Egtvedt, Russell King,
	Tony Luck, Ralf Baechle, MartinSchwidefsky, Paul Mundt

[ Added some of the affected maintainers, left off David Howells and
David Miller due to LKML Cc limit ]

On Fri, 2011-07-01 at 10:22 +0900, Masami Hiramatsu wrote:
> (2011/07/01 6:56), Peter Zijlstra wrote:
> > On Thu, 2011-06-30 at 11:51 -0400, Steven Rostedt wrote:
> >>
> >> To solve this, I've added a per_cpu variable called
> >> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
> >> the preempt_schedule() will not preempt the code.
> 
> Sorry for replying so late :(

Heh, who can blame you? Timezones make open source development a
wait-and-see affair.

> 
> > Damn this is ugly. Can we step back and see if we can make the
> > requirement for kprobe to disable preemption go away?
> 
> As I replied right now, I think we can just eliminate that
> disabling preemption code. At least we'd better try it.
> I agree with you, introducing this kind of complexity
> just for kprobes is not what I want. :(

Note, I did clean up this patch, so it is not as fugly.

> 
> > Why does it have to do that anyway? Isn't it keeping enough per-task
> > state to allow preemption over the single step?
> 
> preemption itself must not happen on single stepping, but it seems
> impossible to do heavy context switching with setting TF bit...

Yeah, if all archs single step with interrupts disabled, then we should
be fine with removing preemption.

-- Steve




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-07-01  1:38       ` Steven Rostedt
@ 2011-07-01  1:52         ` Masami Hiramatsu
  0 siblings, 0 replies; 27+ messages in thread
From: Masami Hiramatsu @ 2011-07-01  1:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, LKML, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, Andrew Morton, yrl.pp-manager.tt,
	Benjamin Herrenschmidt, Hans-Christian Egtvedt, Russell King,
	Tony Luck, Ralf Baechle, MartinSchwidefsky, Paul Mundt

(2011/07/01 10:38), Steven Rostedt wrote:
> [ Added some of the affected maintainers, left off David Howells and
> David Miller due to LKML Cc limit ]
> 
> On Fri, 2011-07-01 at 10:22 +0900, Masami Hiramatsu wrote:
>> (2011/07/01 6:56), Peter Zijlstra wrote:
>>> On Thu, 2011-06-30 at 11:51 -0400, Steven Rostedt wrote:
>>>>
>>>> To solve this, I've added a per_cpu variable called
>>>> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
>>>> the preempt_schedule() will not preempt the code.
>>
>> Sorry for replying so late :(
> 
> Heh, who can blame you? Timezones make open source development a
> wait-and-see affair.

Yeah, we can't go over light speed.

>>> Damn this is ugly. Can we step back and see if we can make the
>>> requirement for kprobe to disable preemption go away?
>>
>> As I replied right now, I think we can just eliminate that
>> disabling preemption code. At least we'd better try it.
>> I agree with you, introducing this kind of complexity
>> just for kprobes is not what I want. :(
> 
> Note, I did clean up this patch, so it is not as fugly.

Hm, I think you don't need to introduce new flag for that
purpose, there is current_kprobe and kprobe status flag.

if (kprobe_running() &&
    get_kprobe_ctlblk()->kprobe_status == KPROBE_HIT_SS)
	/*Running under kprobe's single stepping*/

But I'm not sure that is there any code which can run
under TF=1. (maybe NMI code? but it would not cause preemption)

>>> Why does it have to do that anyway? Isn't it keeping enough per-task
>>> state to allow preemption over the single step?
>>
>> preemption itself must not happen on single stepping, but it seems
>> impossible to do heavy context switching with setting TF bit...
> 
> Yeah, if all archs single step with interrupts disabled, then we should
> be fine with removing preemption.

OK, I'll check that.

Thank you ;)

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] kprobes crashing because of preempt count
  2011-07-01  1:33   ` Steven Rostedt
@ 2011-07-01  2:23     ` Masami Hiramatsu
  0 siblings, 0 replies; 27+ messages in thread
From: Masami Hiramatsu @ 2011-07-01  2:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, yrl.pp-manager.tt

(2011/07/01 10:33), Steven Rostedt wrote:
> On Fri, 2011-07-01 at 10:12 +0900, Masami Hiramatsu wrote:
> 
>>> Do we really need to have preemption disabled throughout this? Is it
>>> because we don't want to migrate or call schedule? Not sure what the
>>> best way to fix this is. Perhaps we add a kprobe_preempt_disable() that
>>> is checked as well?
>>
>> I think the best way to do that is just removing preemption disabling
>> code, because
>> - breakpoint exception itself disables interrupt (at least on x86)
>> - While single stepping, interrupts also be disabled.
> 
> I guess the above point is critical. If interrupts are disabled through
> out the entire walk through, then we are fine, as that just guarantees
> preemption is disabled anyway. But! if it does get enabled anywhere,
> then we will have issues as the two traps require using the same state
> data that is stored per cpu.

That should be a bug, or kprobe's assumption was so fragile (and must
be rewritten.)

Anyway, kprobe_handler() in arch/x86/kernel/kprobes.c expects that
it is executed in a critical section, and it ensures that if there
is no other kprobes running on that processor. (however, as you can
see in reenter_kprobe(), if the breakpoint hits under single stepping,
it calls BUG() because kprobes guess that someone put another kprobe
inside kprobe's critical section)

>> (BTW, theoretically, boosted and optimized kprobes shouldn't have
>> this problem, because those doesn't execute single-stepping)
> 
> Does the optimized kprobes even disable preemption?

Yeah, just while calling its handler, since someone will
call may_sleep() in it... Anyway, nowadays it disables
interruption for emulating breakpoint behavior.

>>
>> So, I think there is no reason of disabling preemption.
> 
> That would be the best solution.

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-06-30 15:51 ` [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes Steven Rostedt
                     ` (2 preceding siblings ...)
  2011-06-30 21:56   ` Peter Zijlstra
@ 2011-07-01  5:09   ` Masami Hiramatsu
  2011-07-01 11:13     ` Masami Hiramatsu
  2011-07-01 12:19     ` Steven Rostedt
  3 siblings, 2 replies; 27+ messages in thread
From: Masami Hiramatsu @ 2011-07-01  5:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, Andrew Morton

(2011/07/01 0:51), Steven Rostedt wrote:
> Kprobes requires preemption to be disabled as it single steps the code
> it replaced with a breakpoint. But because the code that is single
> stepped could be reading the preempt count, the kprobe disabling of the
> preempt count can cause the wrong value to end up as a result. Here's an
> example:
> 
> If we add a kprobe on a inc_preempt_count() call:

BTW, on my tip tree, add_preempt_count (a.k.a. inc_preempt_count())
is marked as __kprobes, so it can not be probed. Is there any change?

Anyway, I'll send the removing preempt_disable from kprobe patch.

Thank you,

> 
> 	[ preempt_count = 0 ]
> 
> 	ld preempt_count, %eax	<<--- trap
> 
> 		<trap>
> 		preempt_disable();
> 		[ preempt_count = 1]
> 		setup_singlestep();
> 		<trap return>
> 
> 	[ preempt_count = 1 ]
> 
> 	ld preempt_count, %eax
> 
> 	[ %eax = 1 ]
> 
> 		<trap>
> 		post_kprobe_handler()
> 			preempt_enable_no_resched();
> 			[ preempt_count = 0 ]
> 		<trap return>
> 
> 	[ %eax = 1 ]
> 
> 	add %eax,1
> 
> 	[ %eax = 2 ]
> 
> 	st %eax, preempt_count
> 
> 	[ preempt_count = 2 ]
> 
> 
> We just caused preempt count to increment twice when it should have only
> incremented once, and this screws everything else up.
> 
> To solve this, I've added a per_cpu variable called
> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
> the preempt_schedule() will not preempt the code.
> 

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-07-01  5:09   ` Masami Hiramatsu
@ 2011-07-01 11:13     ` Masami Hiramatsu
  2011-07-01 12:54       ` Steven Rostedt
  2011-07-01 12:19     ` Steven Rostedt
  1 sibling, 1 reply; 27+ messages in thread
From: Masami Hiramatsu @ 2011-07-01 11:13 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, Andrew Morton, yrl.pp-manager.tt

Hi Steve,

(2011/07/01 14:09), Masami Hiramatsu wrote:
> (2011/07/01 0:51), Steven Rostedt wrote:
>> Kprobes requires preemption to be disabled as it single steps the code
>> it replaced with a breakpoint. But because the code that is single
>> stepped could be reading the preempt count, the kprobe disabling of the
>> preempt count can cause the wrong value to end up as a result. Here's an
>> example:
>>
>> If we add a kprobe on a inc_preempt_count() call:
> 
> BTW, on my tip tree, add_preempt_count (a.k.a. inc_preempt_count())
> is marked as __kprobes, so it can not be probed. Is there any change?

Finally, I've stacked on this point. It seems that
the add_preempt_count() (or inc_preempt_count) is called somewhere
inside the do_int3 and it causes double fault and reboot.

I guess following loop could be happen,
inc_preempt_count->int3->do_int3->preempt_conditional_sti->inc_preempt_count..

I'm still investigating that. Could you tell me what the basic tree
you are working on? I'm using the latest -tip tree.

Thank you,

> 
> Anyway, I'll send the removing preempt_disable from kprobe patch.
> 
> Thank you,
> 
>>
>> 	[ preempt_count = 0 ]
>>
>> 	ld preempt_count, %eax	<<--- trap
>>
>> 		<trap>
>> 		preempt_disable();
>> 		[ preempt_count = 1]
>> 		setup_singlestep();
>> 		<trap return>
>>
>> 	[ preempt_count = 1 ]
>>
>> 	ld preempt_count, %eax
>>
>> 	[ %eax = 1 ]
>>
>> 		<trap>
>> 		post_kprobe_handler()
>> 			preempt_enable_no_resched();
>> 			[ preempt_count = 0 ]
>> 		<trap return>
>>
>> 	[ %eax = 1 ]
>>
>> 	add %eax,1
>>
>> 	[ %eax = 2 ]
>>
>> 	st %eax, preempt_count
>>
>> 	[ preempt_count = 2 ]
>>
>>
>> We just caused preempt count to increment twice when it should have only
>> incremented once, and this screws everything else up.
>>
>> To solve this, I've added a per_cpu variable called
>> kprobe_preempt_disabled, that is set by the kprobe code. If it is set,
>> the preempt_schedule() will not preempt the code.
>>
> 


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] kprobes crashing because of preempt count
  2011-07-01  1:12 ` [BUG] kprobes crashing because of preempt count Masami Hiramatsu
  2011-07-01  1:33   ` Steven Rostedt
@ 2011-07-01 11:36   ` Ananth N Mavinakayanahalli
  2011-07-01 12:01     ` Masami Hiramatsu
  1 sibling, 1 reply; 27+ messages in thread
From: Ananth N Mavinakayanahalli @ 2011-07-01 11:36 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, LKML, Peter Zijlstra, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar, yrl.pp-manager.tt

On Fri, Jul 01, 2011 at 10:12:03AM +0900, Masami Hiramatsu wrote:
> (2011/06/30 22:23), Steven Rostedt wrote:

...

> > Do we really need to have preemption disabled throughout this? Is it
> > because we don't want to migrate or call schedule? Not sure what the
> > best way to fix this is. Perhaps we add a kprobe_preempt_disable() that
> > is checked as well?
> 
> I think the best way to do that is just removing preemption disabling
> code, because
> - breakpoint exception itself disables interrupt (at least on x86)
> - While single stepping, interrupts also be disabled.

On 64-bit powerpc, kprobe handlers are run with interrupts enabled
(MSR_EE = 1), but most instructions (including loads/stores) are
emulated, so for the most part, we don't take the sstep exception.

Ananth

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] kprobes crashing because of preempt count
  2011-07-01 11:36   ` Ananth N Mavinakayanahalli
@ 2011-07-01 12:01     ` Masami Hiramatsu
  2011-07-01 13:03       ` Ananth N Mavinakayanahalli
  0 siblings, 1 reply; 27+ messages in thread
From: Masami Hiramatsu @ 2011-07-01 12:01 UTC (permalink / raw)
  To: ananth
  Cc: Steven Rostedt, LKML, Peter Zijlstra, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar, yrl.pp-manager.tt

(2011/07/01 20:36), Ananth N Mavinakayanahalli wrote:
> On Fri, Jul 01, 2011 at 10:12:03AM +0900, Masami Hiramatsu wrote:
>> (2011/06/30 22:23), Steven Rostedt wrote:
> 
> ...
> 
>>> Do we really need to have preemption disabled throughout this? Is it
>>> because we don't want to migrate or call schedule? Not sure what the
>>> best way to fix this is. Perhaps we add a kprobe_preempt_disable() that
>>> is checked as well?
>>
>> I think the best way to do that is just removing preemption disabling
>> code, because
>> - breakpoint exception itself disables interrupt (at least on x86)
>> - While single stepping, interrupts also be disabled.
> 
> On 64-bit powerpc, kprobe handlers are run with interrupts enabled
> (MSR_EE = 1), but most instructions (including loads/stores) are
> emulated, so for the most part, we don't take the sstep exception.

Yeah, it seems that same thing is done on arm too. And I'm sure that
However, I'm still not sure that entire int3 exec path can run without
calling inc_preempt_count.
It seems that the function is very primitive, and I doubt we can
allow to put kprobes on that...

Thank you,

> 
> Ananth


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-07-01  5:09   ` Masami Hiramatsu
  2011-07-01 11:13     ` Masami Hiramatsu
@ 2011-07-01 12:19     ` Steven Rostedt
  2011-07-01 13:15       ` Masami Hiramatsu
  1 sibling, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2011-07-01 12:19 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: LKML, Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, Andrew Morton

On Fri, 2011-07-01 at 14:09 +0900, Masami Hiramatsu wrote:

> BTW, on my tip tree, add_preempt_count (a.k.a. inc_preempt_count())
> is marked as __kprobes, so it can not be probed. Is there any change?
> 

That is when debug or preempt tracer is enabled. Otherwise it's
hardcoded into whatever calls it:

#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
  extern void add_preempt_count(int val);
  extern void sub_preempt_count(int val);
#else
# define add_preempt_count(val)	do { preempt_count() += (val); } while (0)
# define sub_preempt_count(val)	do { preempt_count() -= (val); } while (0)
#endif


Anyway, it still doesn't help the place that reads preempt_count() and
then later uses the result. I hit the crash in schedule_bug() where it
increments preempt count, then reads it to see if it is only 1.

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-07-01 11:13     ` Masami Hiramatsu
@ 2011-07-01 12:54       ` Steven Rostedt
  0 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2011-07-01 12:54 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: LKML, Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, Andrew Morton, yrl.pp-manager.tt

On Fri, 2011-07-01 at 20:13 +0900, Masami Hiramatsu wrote:
> Hi Steve,
> 
> (2011/07/01 14:09), Masami Hiramatsu wrote:
> > (2011/07/01 0:51), Steven Rostedt wrote:
> >> Kprobes requires preemption to be disabled as it single steps the code
> >> it replaced with a breakpoint. But because the code that is single
> >> stepped could be reading the preempt count, the kprobe disabling of the
> >> preempt count can cause the wrong value to end up as a result. Here's an
> >> example:
> >>
> >> If we add a kprobe on a inc_preempt_count() call:
> > 
> > BTW, on my tip tree, add_preempt_count (a.k.a. inc_preempt_count())
> > is marked as __kprobes, so it can not be probed. Is there any change?
> 
> Finally, I've stacked on this point. It seems that
> the add_preempt_count() (or inc_preempt_count) is called somewhere
> inside the do_int3 and it causes double fault and reboot.
> 
> I guess following loop could be happen,
> inc_preempt_count->int3->do_int3->preempt_conditional_sti->inc_preempt_count..
> 
> I'm still investigating that. Could you tell me what the basic tree
> you are working on? I'm using the latest -tip tree.

I'm using latest Linus tree. I think it was v3.0-rc5.

The bug I hit was when I added my trace point here:

Dump of assembler code for function schedule:
   0xffffffff814e042d <+0>:	push   %rbp
   0xffffffff814e042e <+1>:	mov    %rsp,%rbp
   0xffffffff814e0431 <+4>:	push   %r15
   0xffffffff814e0433 <+6>:	push   %r14
   0xffffffff814e0435 <+8>:	push   %r13
   0xffffffff814e0437 <+10>:	push   %r12
   0xffffffff814e0439 <+12>:	push   %rbx
   0xffffffff814e043a <+13>:	sub    $0x88,%rsp
   0xffffffff814e0441 <+20>:	callq  0xffffffff814e8b40
   0xffffffff814e0446 <+25>:	mov    %gs:0xb5c8,%rdx
   0xffffffff814e044f <+34>:	mov    $0x10f40,%rax
   0xffffffff814e0456 <+41>:	mov    %rdx,-0x60(%rbp)
   0xffffffff814e045a <+45>:	mov    %rdx,-0x88(%rbp)
   0xffffffff814e0461 <+52>:	mov    %gs:0xb5c0,%rcx
   0xffffffff814e046a <+61>:	mov    %rax,-0x58(%rbp)
   0xffffffff814e046e <+65>:	mov    %rax,-0x80(%rbp)
   0xffffffff814e0472 <+69>:	mov    %rax,-0x68(%rbp)
   0xffffffff814e0476 <+73>:	mov    %rcx,-0x90(%rbp)
   0xffffffff814e047d <+80>:	mov    %rax,-0x98(%rbp)
   0xffffffff814e0484 <+87>:	mov    %rax,-0x70(%rbp)
   0xffffffff814e0488 <+91>:	mov    %rax,-0x78(%rbp)
   0xffffffff814e048c <+95>:	mov    %rax,-0xa0(%rbp)
   0xffffffff814e0493 <+102>:	mov    $0x1,%edi
   0xffffffff814e0498 <+107>:	callq  0xffffffff814e5d53 <add_preempt_count>
   0xffffffff814e049d <+112>:	callq  0xffffffff81248b98 <debug_smp_processor_id>
   0xffffffff814e04a2 <+117>:	mov    %eax,%r14d
   0xffffffff814e04a5 <+120>:	cltq   
   0xffffffff814e04a7 <+122>:	mov    -0x58(%rbp),%rbx
   0xffffffff814e04ab <+126>:	mov    %r14d,%edi
   0xffffffff814e04ae <+129>:	add    -0x7e4adcf0(,%rax,8),%rbx
   0xffffffff814e04b6 <+137>:	callq  0xffffffff810a9fa0 <rcu_note_context_switch>
   0xffffffff814e04bb <+142>:	mov    -0x60(%rbp),%rdi
   0xffffffff814e04bf <+146>:	mov    0x8b8(%rbx),%rsi
   0xffffffff814e04c6 <+153>:	mov    -0x1fbc(%rdi),%eax
   0xffffffff814e04cc <+159>:	mov    %rsi,-0x48(%rbp)
   0xffffffff814e04d0 <+163>:	and    $0xefffffff,%eax
   0xffffffff814e04d5 <+168>:	dec    %eax
   0xffffffff814e04d7 <+170>:	je     0xffffffff814e04ea <schedule+189>

I added a probe at +153. That is where it reads preempt_count(). Funny
thing is, I only inserted it there, as a random place to find the "prev"
pointer. I hit this bug just by coincidence.

Here's the code:

asmlinkage void __sched schedule(void)
{
	struct task_struct *prev, *next;
	unsigned long *switch_count;
	struct rq *rq;
	int cpu;

need_resched:
	preempt_disable();
	cpu = smp_processor_id();
	rq = cpu_rq(cpu);
	rcu_note_context_switch(cpu);
	prev = rq->curr;

	schedule_debug(prev);

----

static inline void schedule_debug(struct task_struct *prev)
{
	/*
	 * Test if we are atomic. Since do_exit() needs to call into
	 * schedule() atomically, we ignore that path for now.
	 * Otherwise, whine if we are scheduling when we should not be.
	 */
	if (unlikely(in_atomic_preempt_off() && !prev->exit_state))
		__schedule_bug(prev);

	profile_hit(SCHED_PROFILING, __builtin_return_address(0));

	schedstat_inc(this_rq(), sched_count);
}

# define PREEMPT_CHECK_OFFSET 1

#define in_atomic_preempt_off() \
		((preempt_count() & ~PREEMPT_ACTIVE) != PREEMPT_CHECK_OFFSET)

The read of preempt_count() in in_atomic_preempt_off() was where I
placed my probe. Instead of placing a 1 into %eax, because kprobes had
preemption disabled as well, it placed a 2 there. This made the compare
to PREEMPT_CHECK_OFFSET fail.

What happened next was that every schedule triggered the schedule_bug,
and caused the printk to write. The problem wasn't that the system
crashed, but it lived locked. Nothing would move forward as by the time
printk finished, something else was scheduled in. By changing the code
to:

static noinline void __schedule_bug(struct task_struct *prev)
{
	struct pt_regs *regs = get_irq_regs();
	static int once;

	if (once)
		return;
	once++;

	printk(KERN_ERR "BUG: scheduling while atomic: %s/%d/0x%08x\n",
		prev->comm, prev->pid, preempt_count());

It printed the error once, and everything continued to run normally.
That's because the value of the corrupted preempt count only was used
for this test and not for anything else more meaningful.

But, this is when I realized that it is possible to corrupt
inc_preempt_count() when debug and preempt tracing is not set.

-- Steve








^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] kprobes crashing because of preempt count
  2011-07-01 12:01     ` Masami Hiramatsu
@ 2011-07-01 13:03       ` Ananth N Mavinakayanahalli
  2011-07-01 13:19         ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Ananth N Mavinakayanahalli @ 2011-07-01 13:03 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, LKML, Peter Zijlstra, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar, yrl.pp-manager.tt

On Fri, Jul 01, 2011 at 09:01:00PM +0900, Masami Hiramatsu wrote:
> (2011/07/01 20:36), Ananth N Mavinakayanahalli wrote:
> > On Fri, Jul 01, 2011 at 10:12:03AM +0900, Masami Hiramatsu wrote:
> >> (2011/06/30 22:23), Steven Rostedt wrote:
> > 
> > ...
> > 
> >>> Do we really need to have preemption disabled throughout this? Is it
> >>> because we don't want to migrate or call schedule? Not sure what the
> >>> best way to fix this is. Perhaps we add a kprobe_preempt_disable() that
> >>> is checked as well?
> >>
> >> I think the best way to do that is just removing preemption disabling
> >> code, because
> >> - breakpoint exception itself disables interrupt (at least on x86)
> >> - While single stepping, interrupts also be disabled.
> > 
> > On 64-bit powerpc, kprobe handlers are run with interrupts enabled
> > (MSR_EE = 1), but most instructions (including loads/stores) are
> > emulated, so for the most part, we don't take the sstep exception.
> 
> Yeah, it seems that same thing is done on arm too. And I'm sure that
> However, I'm still not sure that entire int3 exec path can run without
> calling inc_preempt_count.
> It seems that the function is very primitive, and I doubt we can
> allow to put kprobes on that...

Right. I think all preempt manipulation routines need to be __kprobes.

Also, Steve is testing the -rt tree where artefacts related to
locking/preemption are possibly quite different from the mainline that
may also be at play here.

Ananth

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH -tip ] [BUGFIX] x86: Remove preempt disabling from kprobes
  2011-07-01 13:15       ` Masami Hiramatsu
@ 2011-07-01 13:14         ` Masami Hiramatsu
  2011-07-01 13:43           ` Steven Rostedt
  2011-07-02  6:09           ` Ananth N Mavinakayanahalli
  0 siblings, 2 replies; 27+ messages in thread
From: Masami Hiramatsu @ 2011-07-01 13:14 UTC (permalink / raw)
  To: Steven Rostedt, linux-kernel
  Cc: Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, Andrew Morton, yrl.pp-manager.tt, Masami Hiramatsu

Steven Rostedt reported that putting kprobe on the instruction which
loads preempt_count causes the wrong result, as below.

Kprobes requires preemption to be disabled as it single steps the code
it replaced with a breakpoint. But because the code that is single
stepped could be reading the preempt count, the kprobe disabling of the
preempt count can cause the wrong value to end up as a result. Here's an
example:

If we add a kprobe on a inc_preempt_count() call:

	[ preempt_count = 0 ]

	ld preempt_count, %eax	<<--- trap

		<trap>
		preempt_disable();
		[ preempt_count = 1]
		setup_singlestep();
		<trap return>

	[ preempt_count = 1 ]

	ld preempt_count, %eax

	[ %eax = 1 ]

		<trap>
		post_kprobe_handler()
			preempt_enable_no_resched();
			[ preempt_count = 0 ]
		<trap return>

	[ %eax = 1 ]

	add %eax,1

	[ %eax = 2 ]

	st %eax, preempt_count

	[ preempt_count = 2 ]


We just caused preempt count to increment twice when it should have only
incremented once, and this screws everything else up.

To solve this, I've removed preempt disabling code from kprobes,
since the breakpoint exception and kprobes single step routine
disables interrupts, it doesn't need to disable preemption while
single-stepping anymore.

This patch is for -tip tree, and it can be applied to linus tree too.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
---

 arch/x86/kernel/kprobes.c |   18 ++++++------------
 1 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
index f1a6244..5de9cfb 100644
--- a/arch/x86/kernel/kprobes.c
+++ b/arch/x86/kernel/kprobes.c
@@ -475,7 +475,6 @@ static void __kprobes setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 		 * stepping.
 		 */
 		regs->ip = (unsigned long)p->ainsn.insn;
-		preempt_enable_no_resched();
 		return;
 	}
 #endif
@@ -542,12 +541,13 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
 
 	addr = (kprobe_opcode_t *)(regs->ip - sizeof(kprobe_opcode_t));
 	/*
-	 * We don't want to be preempted for the entire
-	 * duration of kprobe processing. We conditionally
-	 * re-enable preemption at the end of this function,
-	 * and also in reenter_kprobe() and setup_singlestep().
+	 * We don't want to be preempted for the entire duration of kprobe
+	 * processing. This should be done by disabling interrupts instead
+	 * of incrementing preempt_count, because if the probed instruction
+	 * reads the count, it will get a wrong value.
+	 * Disabling irq is done by breakpoint exception.
 	 */
-	preempt_disable();
+	BUG_ON(!irqs_disabled());
 
 	kcb = get_kprobe_ctlblk();
 	p = get_kprobe(addr);
@@ -583,7 +583,6 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
 		 * the original instruction.
 		 */
 		regs->ip = (unsigned long)addr;
-		preempt_enable_no_resched();
 		return 1;
 	} else if (kprobe_running()) {
 		p = __this_cpu_read(current_kprobe);
@@ -593,7 +592,6 @@ static int __kprobes kprobe_handler(struct pt_regs *regs)
 		}
 	} /* else: not a kprobe fault; let the kernel handle it */
 
-	preempt_enable_no_resched();
 	return 0;
 }
 
@@ -917,7 +915,6 @@ static int __kprobes post_kprobe_handler(struct pt_regs *regs)
 	}
 	reset_current_kprobe();
 out:
-	preempt_enable_no_resched();
 
 	/*
 	 * if somebody else is singlestepping across a probe point, flags
@@ -951,7 +948,6 @@ int __kprobes kprobe_fault_handler(struct pt_regs *regs, int trapnr)
 			restore_previous_kprobe(kcb);
 		else
 			reset_current_kprobe();
-		preempt_enable_no_resched();
 		break;
 	case KPROBE_HIT_ACTIVE:
 	case KPROBE_HIT_SSDONE:
@@ -1098,7 +1094,6 @@ int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
 		memcpy((kprobe_opcode_t *)(kcb->jprobe_saved_sp),
 		       kcb->jprobes_stack,
 		       MIN_STACK_SIZE(kcb->jprobe_saved_sp));
-		preempt_enable_no_resched();
 		return 1;
 	}
 	return 0;
@@ -1530,7 +1525,6 @@ static int  __kprobes setup_detour_execution(struct kprobe *p,
 		regs->ip = (unsigned long)op->optinsn.insn + TMPL_END_IDX;
 		if (!reenter)
 			reset_current_kprobe();
-		preempt_enable_no_resched();
 		return 1;
 	}
 	return 0;


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes
  2011-07-01 12:19     ` Steven Rostedt
@ 2011-07-01 13:15       ` Masami Hiramatsu
  2011-07-01 13:14         ` [RFC PATCH -tip ] [BUGFIX] x86: Remove preempt disabling from kprobes Masami Hiramatsu
  0 siblings, 1 reply; 27+ messages in thread
From: Masami Hiramatsu @ 2011-07-01 13:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Ingo Molnar, Andrew Morton

(2011/07/01 21:19), Steven Rostedt wrote:
> On Fri, 2011-07-01 at 14:09 +0900, Masami Hiramatsu wrote:
> 
>> BTW, on my tip tree, add_preempt_count (a.k.a. inc_preempt_count())
>> is marked as __kprobes, so it can not be probed. Is there any change?
>>
> 
> That is when debug or preempt tracer is enabled. Otherwise it's
> hardcoded into whatever calls it:
> 
> #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
>   extern void add_preempt_count(int val);
>   extern void sub_preempt_count(int val);
> #else
> # define add_preempt_count(val)	do { preempt_count() += (val); } while (0)
> # define sub_preempt_count(val)	do { preempt_count() -= (val); } while (0)
> #endif
> 
> 
> Anyway, it still doesn't help the place that reads preempt_count() and
> then later uses the result. I hit the crash in schedule_bug() where it
> increments preempt count, then reads it to see if it is only 1.

Ah, I see, that what I need...
I've made a fix patch for that, so could you test that?

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] kprobes crashing because of preempt count
  2011-07-01 13:03       ` Ananth N Mavinakayanahalli
@ 2011-07-01 13:19         ` Steven Rostedt
  0 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2011-07-01 13:19 UTC (permalink / raw)
  To: ananth
  Cc: Masami Hiramatsu, LKML, Peter Zijlstra, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar, yrl.pp-manager.tt

On Fri, 2011-07-01 at 18:33 +0530, Ananth N Mavinakayanahalli wrote:
> On Fri, Jul 01, 2011 at 09:01:00PM +0900, Masami Hiramatsu wrote:
> > 
> > Yeah, it seems that same thing is done on arm too. And I'm sure that
> > However, I'm still not sure that entire int3 exec path can run without
> > calling inc_preempt_count.
> > It seems that the function is very primitive, and I doubt we can
> > allow to put kprobes on that...
> 
> Right. I think all preempt manipulation routines need to be __kprobes.
> 

As I stated earlier, it can be triggered on just a read of preempt
count. I would also hate to force preempt_count() reads and updates to
be function calls (that's the only way to make it __kprobes). The
inc/dec preempt count routines are only function calls if debug preempt
or preempt tracing is enabled, which is not the case on most distros.

Not to mention, I think reading the value of preempt count by kprobes is
a legit use case.

> Also, Steve is testing the -rt tree where artefacts related to
> locking/preemption are possibly quite different from the mainline that
> may also be at play here.

Yes, I noticed this first on an -rt variant. But I'm actually testing
patches to move from -rt to mainline. This happens to be more a mainline
issue. Once I noticed this bug, all my tests afterward was based on
v3.0-rc5.

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH -tip ] [BUGFIX] x86: Remove preempt disabling from kprobes
  2011-07-01 13:14         ` [RFC PATCH -tip ] [BUGFIX] x86: Remove preempt disabling from kprobes Masami Hiramatsu
@ 2011-07-01 13:43           ` Steven Rostedt
  2011-07-01 13:53             ` Steven Rostedt
  2011-07-02  6:09           ` Ananth N Mavinakayanahalli
  1 sibling, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2011-07-01 13:43 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar, Andrew Morton, yrl.pp-manager.tt

On Fri, 2011-07-01 at 22:14 +0900, Masami Hiramatsu wrote:
> Steven Rostedt reported that putting kprobe on the instruction which
> loads preempt_count causes the wrong result, as below.
> 
> Kprobes requires preemption to be disabled as it single steps the code
> it replaced with a breakpoint. But because the code that is single
> stepped could be reading the preempt count, the kprobe disabling of the
> preempt count can cause the wrong value to end up as a result. Here's an
> example:
> 
> If we add a kprobe on a inc_preempt_count() call:
> 
> 	[ preempt_count = 0 ]
> 
> 	ld preempt_count, %eax	<<--- trap
> 
> 		<trap>
> 		preempt_disable();
> 		[ preempt_count = 1]
> 		setup_singlestep();
> 		<trap return>
> 
> 	[ preempt_count = 1 ]
> 
> 	ld preempt_count, %eax
> 
> 	[ %eax = 1 ]
> 
> 		<trap>
> 		post_kprobe_handler()
> 			preempt_enable_no_resched();
> 			[ preempt_count = 0 ]
> 		<trap return>
> 
> 	[ %eax = 1 ]
> 
> 	add %eax,1
> 
> 	[ %eax = 2 ]
> 
> 	st %eax, preempt_count
> 
> 	[ preempt_count = 2 ]
> 
> 
> We just caused preempt count to increment twice when it should have only
> incremented once, and this screws everything else up.
> 
> To solve this, I've removed preempt disabling code from kprobes,
> since the breakpoint exception and kprobes single step routine
> disables interrupts, it doesn't need to disable preemption while
> single-stepping anymore.
> 
> This patch is for -tip tree, and it can be applied to linus tree too.

I applied it to v3.0-rc5. And not surprisingly it works. But the
question I have is, when we return from the trap, and NEED_RECHED is
set, will it schedule? My test placed the probe within the scheduler
where preemption is already disabled. Let me do this in places that has
preemption and interrupts enabled.

I'll also try a kernel mod that adds a probe handler that does a
udelay() loop, forcing the timer interrupt to be set on return of the
trap, and see what happens there.

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH -tip ] [BUGFIX] x86: Remove preempt disabling from kprobes
  2011-07-01 13:43           ` Steven Rostedt
@ 2011-07-01 13:53             ` Steven Rostedt
  2011-07-03  2:05               ` Masami Hiramatsu
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2011-07-01 13:53 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar, Andrew Morton, yrl.pp-manager.tt

On Fri, 2011-07-01 at 09:43 -0400, Steven Rostedt wrote:

> I applied it to v3.0-rc5. And not surprisingly it works. But the
> question I have is, when we return from the trap, and NEED_RECHED is
> set, will it schedule? My test placed the probe within the scheduler
> where preemption is already disabled. Let me do this in places that has
> preemption and interrupts enabled.
> 
> I'll also try a kernel mod that adds a probe handler that does a
> udelay() loop, forcing the timer interrupt to be set on return of the
> trap, and see what happens there.

I didn't need to do the above. Just adding a probe at the beginning of
schedule() should do the trick. As I see from the latency-format option
set, that NEED_RESCHED is enabled when we enter the path. If interrupts
are enabled on single step, I would think that the code would go into an
infinite loop if it had issues there.

Thus I think we can say that removing preempt_disable() from kprobes in
x86_64 (and probably 32) is safe. I'll go and test this on my PPC64 and
mips (if I can get that compiled).

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH -tip ] [BUGFIX] x86: Remove preempt disabling from kprobes
  2011-07-01 13:14         ` [RFC PATCH -tip ] [BUGFIX] x86: Remove preempt disabling from kprobes Masami Hiramatsu
  2011-07-01 13:43           ` Steven Rostedt
@ 2011-07-02  6:09           ` Ananth N Mavinakayanahalli
  1 sibling, 0 replies; 27+ messages in thread
From: Ananth N Mavinakayanahalli @ 2011-07-02  6:09 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, linux-kernel, Peter Zijlstra,
	Frederic Weisbecker, Thomas Gleixner, Ingo Molnar, Andrew Morton,
	yrl.pp-manager.tt

On Fri, Jul 01, 2011 at 10:14:08PM +0900, Masami Hiramatsu wrote:
> Steven Rostedt reported that putting kprobe on the instruction which
> loads preempt_count causes the wrong result, as below.
> 
> Kprobes requires preemption to be disabled as it single steps the code
> it replaced with a breakpoint. But because the code that is single
> stepped could be reading the preempt count, the kprobe disabling of the
> preempt count can cause the wrong value to end up as a result. Here's an
> example:

...

> We just caused preempt count to increment twice when it should have only
> incremented once, and this screws everything else up.
> 
> To solve this, I've removed preempt disabling code from kprobes,
> since the breakpoint exception and kprobes single step routine
> disables interrupts, it doesn't need to disable preemption while
> single-stepping anymore.
> 
> This patch is for -tip tree, and it can be applied to linus tree too.
> 
> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Reported-by: Steven Rostedt <rostedt@goodmis.org>
> ---

Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH -tip ] [BUGFIX] x86: Remove preempt disabling from kprobes
  2011-07-01 13:53             ` Steven Rostedt
@ 2011-07-03  2:05               ` Masami Hiramatsu
  0 siblings, 0 replies; 27+ messages in thread
From: Masami Hiramatsu @ 2011-07-03  2:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker,
	Thomas Gleixner, Ingo Molnar, Andrew Morton, yrl.pp-manager.tt

(2011/07/01 22:53), Steven Rostedt wrote:
> On Fri, 2011-07-01 at 09:43 -0400, Steven Rostedt wrote:
>
>> I applied it to v3.0-rc5. And not surprisingly it works. But the
>> question I have is, when we return from the trap, and NEED_RECHED is
>> set, will it schedule?

No, AFAIK, when breakpoint (a.k.a. int3 on x86) and single-step (a.k.a.
debug on x86) exception handlers are kicked in kernel space, it doesn't
schedule (if kicked from userspace, it does).
You can see that at paranoid_exit or ret_from_exception in entry_*.S.

>> My test placed the probe within the scheduler
>> where preemption is already disabled. Let me do this in places that has
>> preemption and interrupts enabled.
>>
>> I'll also try a kernel mod that adds a probe handler that does a
>> udelay() loop, forcing the timer interrupt to be set on return of the
>> trap, and see what happens there.
>
> I didn't need to do the above. Just adding a probe at the beginning of
> schedule() should do the trick. As I see from the latency-format option
> set, that NEED_RESCHED is enabled when we enter the path. If interrupts
> are enabled on single step, I would think that the code would go into an
> infinite loop if it had issues there.
>
> Thus I think we can say that removing preempt_disable() from kprobes in
> x86_64 (and probably 32) is safe. I'll go and test this on my PPC64 and
> mips (if I can get that compiled).

Thank you for checking that.

And here maybe another topic, but related.
I still have to tell you that kprobes can't probe add_preempt_count
when CONFIG_TRACE_PREEMPT/DEBUG_PREEMPT=y, even with this patch, because
another path disables preemption.
Here is a test log with removing __kprobes from add_preempt_count() and
add  "DEBUG_LOCKS_WARN_ON(kprobe_running());" on the entry of it.
(As you can see, this is on a kvm and lockdep is enabled)

 Kprobe smoke test started
 ------------[ cut here ]------------
 WARNING: at /home/mhiramat/ksrc/linux-2.6/kernel/sched.c:4094
add_preempt_count+0x11a/0x120()
 Hardware name: Bochs
 Modules linked in:
 Pid: 1, comm: swapper Not tainted 3.0.0-rc4-tip+ #16
 Call Trace:
 <#DB>  [<ffffffff8105b84f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff8105b8aa>] warn_slowpath_null+0x1a/0x20
 [<ffffffff8105007a>] add_preempt_count+0x11a/0x120
 [<ffffffff81033b43>] kvm_clock_read+0x13/0x70
 [<ffffffff810128b9>] sched_clock+0x9/0x10
 [<ffffffff81087f05>] sched_clock_local+0x25/0x90
 [<ffffffff810880a8>] sched_clock_cpu+0xb8/0x130
 [<ffffffff81088152>] local_clock+0x32/0x80
 [<ffffffff810951bc>] lock_release_holdtime.part.3+0x1c/0x160
 [<ffffffff815799d0>] ? notifier_call_chain+0x70/0x70
 [<ffffffff81097d00>] lock_release+0x1b0/0x270
 [<ffffffff815799ad>] ? notifier_call_chain+0x4d/0x70
 [<ffffffff81579a5b>] __atomic_notifier_call_chain+0x8b/0xb0
 [<ffffffff815799d0>] ? notifier_call_chain+0x70/0x70
 [<ffffffff81b50470>] ? audit_tree_init+0x58/0x58
 [<ffffffff81579a96>] atomic_notifier_call_chain+0x16/0x20
 [<ffffffff81579ace>] notify_die+0x2e/0x30
 [<ffffffff8157721f>] do_int3+0x5f/0xe0
 [<ffffffff815766fd>] int3+0x2d/0x40
 [<ffffffff81b50470>] ? audit_tree_init+0x58/0x58
 <<EOE>>  [<ffffffff810b79cd>] ? init_test_probes+0x6d/0x560
 [<ffffffff810a1d58>] ? register_module_notifier+0x18/0x20
 [<ffffffff81b505d7>] init_kprobes+0x167/0x189
 [<ffffffff812c5608>] ? __raw_spin_lock_init+0x38/0x70
 [<ffffffff81b50418>] ? audit_watch_init+0x3a/0x3a
 [<ffffffff811a48e9>] ? fsnotify_alloc_group+0xb9/0x100
 [<ffffffff81002042>] do_one_initcall+0x42/0x180
 [<ffffffff81b33d11>] kernel_init+0xda/0x154
 [<ffffffff8157e624>] kernel_thread_helper+0x4/0x10
 [<ffffffff81047565>] ? finish_task_switch+0x85/0x110
 [<ffffffff81576558>] ? retint_restore_args+0x13/0x13
 [<ffffffff81b33c37>] ? start_kernel+0x3f3/0x3f3
 [<ffffffff8157e620>] ? gs_change+0x13/0x13
 ---[ end trace fe4ebffb2b8ff187 ]---

So, even if some __kprobes functions really doesn't called from
kprobes itself, another path in do_int3/do_debug can call it and
causes an infinit loop.

Perhaps, we'd better not use this kind of "normal" notifier call
chain path, and call kprobe handler directly, as kgdb does, to
clarify what functions can be called on the path of kprobes.

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-07-03  2:05 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-30 13:23 [BUG] kprobes crashing because of preempt count Steven Rostedt
2011-06-30 15:51 ` [RFC][PATCH] kprobes: Add separate preempt_disabling for kprobes Steven Rostedt
2011-06-30 16:14   ` Frederic Weisbecker
2011-06-30 16:46     ` Steven Rostedt
2011-06-30 19:40   ` Jason Baron
2011-06-30 19:42     ` Steven Rostedt
2011-06-30 21:56   ` Peter Zijlstra
2011-07-01  1:22     ` Masami Hiramatsu
2011-07-01  1:38       ` Steven Rostedt
2011-07-01  1:52         ` Masami Hiramatsu
2011-07-01  5:09   ` Masami Hiramatsu
2011-07-01 11:13     ` Masami Hiramatsu
2011-07-01 12:54       ` Steven Rostedt
2011-07-01 12:19     ` Steven Rostedt
2011-07-01 13:15       ` Masami Hiramatsu
2011-07-01 13:14         ` [RFC PATCH -tip ] [BUGFIX] x86: Remove preempt disabling from kprobes Masami Hiramatsu
2011-07-01 13:43           ` Steven Rostedt
2011-07-01 13:53             ` Steven Rostedt
2011-07-03  2:05               ` Masami Hiramatsu
2011-07-02  6:09           ` Ananth N Mavinakayanahalli
2011-07-01  1:12 ` [BUG] kprobes crashing because of preempt count Masami Hiramatsu
2011-07-01  1:33   ` Steven Rostedt
2011-07-01  2:23     ` Masami Hiramatsu
2011-07-01 11:36   ` Ananth N Mavinakayanahalli
2011-07-01 12:01     ` Masami Hiramatsu
2011-07-01 13:03       ` Ananth N Mavinakayanahalli
2011-07-01 13:19         ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.