linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
@ 2012-11-27  5:15 Li Zhong
  2012-11-27  5:58 ` [PATCH rcu] use new nesting value for rcu_dyntick trace in rcu_eqs_enter_common Li Zhong
                   ` (5 more replies)
  0 siblings, 6 replies; 50+ messages in thread
From: Li Zhong @ 2012-11-27  5:15 UTC (permalink / raw)
  To: linux-next list, LKML; +Cc: paulmck, sasha.levin, gleb, avi, fweisbec

I noticed some warnings complaining about dynticks_nesting value, like 

[  267.545032] ------------[ cut here ]------------
[  267.545032] WARNING: at kernel/rcutree.c:382 rcu_eqs_enter+0xab/0xc0()
[  267.545032] Hardware name: Bochs
[  267.545032] Modules linked in:
[  267.545032] Pid: 0, comm: swapper/2 Not tainted 3.7.0-rc5-next-20121115 #8
[  267.545032] Call Trace:
[  267.545032]  [<ffffffff8104714f>] warn_slowpath_common+0x7f/0xc0
[  267.545032]  [<ffffffff810471aa>] warn_slowpath_null+0x1a/0x20
[  267.545032]  [<ffffffff810e607b>] rcu_eqs_enter+0xab/0xc0
[  267.545032]  [<ffffffff810e60bb>] rcu_idle_enter+0x2b/0x70
[  267.545032]  [<ffffffff8100d44f>] cpu_idle+0x6f/0x100
[  267.545032]  [<ffffffff814bf055>] start_secondary+0x205/0x20c
[  267.545032] ---[ end trace 924ae80da035028d ]---

After enabling rcu-dyntick tracing, I got following abnormal
dynticks_nesting values (13fffffffffffff, ff00000000000001,etc):
			...
 1	<idle>-0     [002] dN.2 18739.518567: rcu_dyntick: End 0 140000000000000		rcu_idle_exit
 2	  sshd-696   [002] d..1 18739.518675: rcu_dyntick: ++= 140000000000000 140000000000001	rcu_irq_enter   - apf (not present)

 3	<idle>-0     [002] d..2 18739.518705: rcu_dyntick: Start 140000000000001 0		rcu_idle_enter
 4	<idle>-0     [002] d..2 18739.521252: rcu_dyntick: End 0 1				rcu_irq_enter	- apf (page ready)
 5	<idle>-0     [002] dN.2 18739.521261: rcu_dyntick: Start 1 0				rcu_irq_exit	- apf (page ready)
 6	<idle>-0     [002] dN.2 18739.521263: rcu_dyntick: End 0 140000000000000		rcu_idle_exit

 7	  sshd-696   [002] d..1 18739.521299: rcu_dyntick: --= 140000000000000 13fffffffffffff	rcu_irq_exit	- apf (not present)
 8	  sshd-696   [002] d..1 18739.521302: rcu_dyntick: Start 13fffffffffffff 0		rcu_user_enter
 9	  sshd-696   [002] d..1 18739.521330: rcu_dyntick: End 0 1				rcu_irq_enter   - apf (not present)

10	<idle>-0     [002] d..2 18739.521346: rcu_dyntick: Start 1 0				rcu_idle_enter - old value 1, warning
11	<idle>-0     [002] d..2 18739.530021: rcu_dyntick: ++= ff00000000000001 ff00000000000002
12	<idle>-0     [002] dN.2 18739.530029: rcu_dyntick: --= ff00000000000002 ff00000000000001
		...

I added the functions I guess which printed the tracing after each
line. 

Line #1, the idle-0 process calls rcu_idle_exit(), and finishes one
loop, to switch to sshd-696

Line #2, sshd-696 calls rcu_irq_enter() because of async page fault(page
not present), and puts itself to wait for page ready

Line #3, idle-0 is switched in, and clears the dynticks_nesting to 0

Line #4-5, I think the rcu_irq_enter/exit() is called because the page
for sshd-696 is ready

Line #6, idle-0 calls rcu_idle_exit(), to switch to sshd-696

Line #7, sshd-696 calls rcu_irq_exit() in the apf (page not present)
code path, decreasing dynticks_nesting to 13fffffffffffff. 

Line #8-9, sshd-696 calls rcu_user_enter() to start user eqs, and gets
async page fault again. It puts itself sleep again, with
dynticks_nesting value as 1.

Line #10, idle-0 switches in, as the dynticks_nesting value is 1, so
warning is reported in rcu_idle_enter(), then the value is decreased to
ff00000000000001. (In the tracing log, the new value is 0, that's
because rcu hard-code the value to be 0. I will send another patch for
this.)

This patch below tries to replace the rcu_irq_enter/exit() with
rcu_idle_exit/enter(), if it is in rcu idle, and it is idle process;
otherwise, rcu_user_exit() is called to exit user eqs if necessary. 

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/x86/kernel/kvm.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 4180a87..f65648d 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -247,10 +247,17 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
 		break;
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
 		/* page is swapped out by the host. */
-		rcu_irq_enter();
+		if (is_idle_task(current) && rcu_is_cpu_idle())
+			rcu_idle_exit();
+		else
+			rcu_user_exit();
+
 		exit_idle();
 		kvm_async_pf_task_wait((u32)read_cr2());
-		rcu_irq_exit();
+
+		if (is_idle_task(current) && rcu_is_cpu_idle())
+			rcu_idle_enter();
+
 		break;
 	case KVM_PV_REASON_PAGE_READY:
 		rcu_irq_enter();
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH rcu] use new nesting value for rcu_dyntick trace in rcu_eqs_enter_common
  2012-11-27  5:15 [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault Li Zhong
@ 2012-11-27  5:58 ` Li Zhong
  2012-11-27 15:18   ` Paul E. McKenney
  2012-11-27 13:07 ` [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault Gleb Natapov
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 50+ messages in thread
From: Li Zhong @ 2012-11-27  5:58 UTC (permalink / raw)
  To: linux-next list, LKML; +Cc: Paul E. McKenney, dipankar

This patch uses the real new value of dynticks_nesting instead of 0 in
rcu_eqs_enter_common(). 

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 kernel/rcutree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 7733eb5..6b7e314 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -339,7 +339,7 @@ static struct rcu_node *rcu_get_root(struct rcu_state *rsp)
 static void rcu_eqs_enter_common(struct rcu_dynticks *rdtp, long long oldval,
 				bool user)
 {
-	trace_rcu_dyntick("Start", oldval, 0);
+	trace_rcu_dyntick("Start", oldval, rdtp->dynticks_nesting);
 	if (!user && !is_idle_task(current)) {
 		struct task_struct *idle = idle_task(smp_processor_id());
 
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27  5:15 [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault Li Zhong
  2012-11-27  5:58 ` [PATCH rcu] use new nesting value for rcu_dyntick trace in rcu_eqs_enter_common Li Zhong
@ 2012-11-27 13:07 ` Gleb Natapov
  2012-11-27 14:01   ` Sasha Levin
  2012-11-27 15:34   ` Frederic Weisbecker
  2012-11-27 14:38 ` Frederic Weisbecker
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 50+ messages in thread
From: Gleb Natapov @ 2012-11-27 13:07 UTC (permalink / raw)
  To: Li Zhong; +Cc: linux-next list, LKML, paulmck, sasha.levin, avi, fweisbec

On Tue, Nov 27, 2012 at 01:15:25PM +0800, Li Zhong wrote:
> I noticed some warnings complaining about dynticks_nesting value, like 
> 
> [  267.545032] ------------[ cut here ]------------
> [  267.545032] WARNING: at kernel/rcutree.c:382 rcu_eqs_enter+0xab/0xc0()
> [  267.545032] Hardware name: Bochs
> [  267.545032] Modules linked in:
> [  267.545032] Pid: 0, comm: swapper/2 Not tainted 3.7.0-rc5-next-20121115 #8
> [  267.545032] Call Trace:
> [  267.545032]  [<ffffffff8104714f>] warn_slowpath_common+0x7f/0xc0
> [  267.545032]  [<ffffffff810471aa>] warn_slowpath_null+0x1a/0x20
> [  267.545032]  [<ffffffff810e607b>] rcu_eqs_enter+0xab/0xc0
> [  267.545032]  [<ffffffff810e60bb>] rcu_idle_enter+0x2b/0x70
> [  267.545032]  [<ffffffff8100d44f>] cpu_idle+0x6f/0x100
> [  267.545032]  [<ffffffff814bf055>] start_secondary+0x205/0x20c
> [  267.545032] ---[ end trace 924ae80da035028d ]---
> 
> After enabling rcu-dyntick tracing, I got following abnormal
> dynticks_nesting values (13fffffffffffff, ff00000000000001,etc):
> 			...
>  1	<idle>-0     [002] dN.2 18739.518567: rcu_dyntick: End 0 140000000000000		rcu_idle_exit
>  2	  sshd-696   [002] d..1 18739.518675: rcu_dyntick: ++= 140000000000000 140000000000001	rcu_irq_enter   - apf (not present)
> 
>  3	<idle>-0     [002] d..2 18739.518705: rcu_dyntick: Start 140000000000001 0		rcu_idle_enter
>  4	<idle>-0     [002] d..2 18739.521252: rcu_dyntick: End 0 1				rcu_irq_enter	- apf (page ready)
>  5	<idle>-0     [002] dN.2 18739.521261: rcu_dyntick: Start 1 0				rcu_irq_exit	- apf (page ready)
>  6	<idle>-0     [002] dN.2 18739.521263: rcu_dyntick: End 0 140000000000000		rcu_idle_exit
> 
>  7	  sshd-696   [002] d..1 18739.521299: rcu_dyntick: --= 140000000000000 13fffffffffffff	rcu_irq_exit	- apf (not present)
>  8	  sshd-696   [002] d..1 18739.521302: rcu_dyntick: Start 13fffffffffffff 0		rcu_user_enter
>  9	  sshd-696   [002] d..1 18739.521330: rcu_dyntick: End 0 1				rcu_irq_enter   - apf (not present)
> 
> 10	<idle>-0     [002] d..2 18739.521346: rcu_dyntick: Start 1 0				rcu_idle_enter - old value 1, warning
> 11	<idle>-0     [002] d..2 18739.530021: rcu_dyntick: ++= ff00000000000001 ff00000000000002
> 12	<idle>-0     [002] dN.2 18739.530029: rcu_dyntick: --= ff00000000000002 ff00000000000001
> 		...
> 
> I added the functions I guess which printed the tracing after each
> line. 
> 
> Line #1, the idle-0 process calls rcu_idle_exit(), and finishes one
> loop, to switch to sshd-696
> 
> Line #2, sshd-696 calls rcu_irq_enter() because of async page fault(page
> not present), and puts itself to wait for page ready
> 
> Line #3, idle-0 is switched in, and clears the dynticks_nesting to 0
> 
> Line #4-5, I think the rcu_irq_enter/exit() is called because the page
> for sshd-696 is ready
> 
> Line #6, idle-0 calls rcu_idle_exit(), to switch to sshd-696
> 
> Line #7, sshd-696 calls rcu_irq_exit() in the apf (page not present)
> code path, decreasing dynticks_nesting to 13fffffffffffff. 
> 
> Line #8-9, sshd-696 calls rcu_user_enter() to start user eqs, and gets
> async page fault again. It puts itself sleep again, with
> dynticks_nesting value as 1.
> 
> Line #10, idle-0 switches in, as the dynticks_nesting value is 1, so
> warning is reported in rcu_idle_enter(), then the value is decreased to
> ff00000000000001. (In the tracing log, the new value is 0, that's
> because rcu hard-code the value to be 0. I will send another patch for
> this.)
> 
> This patch below tries to replace the rcu_irq_enter/exit() with
> rcu_idle_exit/enter(), if it is in rcu idle, and it is idle process;
> otherwise, rcu_user_exit() is called to exit user eqs if necessary. 
> 
> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> ---
>  arch/x86/kernel/kvm.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 4180a87..f65648d 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -247,10 +247,17 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
>  		break;
>  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
>  		/* page is swapped out by the host. */
> -		rcu_irq_enter();
> +		if (is_idle_task(current) && rcu_is_cpu_idle())
> +			rcu_idle_exit();
> +		else
> +			rcu_user_exit();
> +
>  		exit_idle();
>  		kvm_async_pf_task_wait((u32)read_cr2());
> -		rcu_irq_exit();
> +
> +		if (is_idle_task(current) && rcu_is_cpu_idle())
> +			rcu_idle_enter();
> +
>  		break;
>  	case KVM_PV_REASON_PAGE_READY:
>  		rcu_irq_enter();
Those rcu_irq_enter()/rcu_irq_exit() were introduced by commit
c5e015d4949aa665 "KVM guest: exit idleness when handling
KVM_PV_REASON_PAGE_NOT_PRESENT", but now I am starting to question this
commit. KVM_PV_REASON_PAGE_NOT_PRESENT should not kick cpu out of
idleness. kvm_async_pf_task_wait() checks that cpu is idle and calls
halt if it is. After that commit schedule() may be called between
rcu_irq_enter()/rcu_irq_exit() which is probably illegal. Paul?

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 13:07 ` [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault Gleb Natapov
@ 2012-11-27 14:01   ` Sasha Levin
  2012-11-27 15:25     ` Paul E. McKenney
  2012-11-27 15:34   ` Frederic Weisbecker
  1 sibling, 1 reply; 50+ messages in thread
From: Sasha Levin @ 2012-11-27 14:01 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Li Zhong, linux-next list, LKML, paulmck, avi, fweisbec

On 11/27/2012 08:07 AM, Gleb Natapov wrote:
> Those rcu_irq_enter()/rcu_irq_exit() were introduced by commit
> c5e015d4949aa665 "KVM guest: exit idleness when handling
> KVM_PV_REASON_PAGE_NOT_PRESENT", but now I am starting to question this
> commit. KVM_PV_REASON_PAGE_NOT_PRESENT should not kick cpu out of
> idleness. kvm_async_pf_task_wait() checks that cpu is idle and calls
> halt if it is. After that commit schedule() may be called between
> rcu_irq_enter()/rcu_irq_exit() which is probably illegal. Paul?

otoh, calling schedule() apparently kicks cpu out of idleness now.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27  5:15 [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault Li Zhong
  2012-11-27  5:58 ` [PATCH rcu] use new nesting value for rcu_dyntick trace in rcu_eqs_enter_common Li Zhong
  2012-11-27 13:07 ` [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault Gleb Natapov
@ 2012-11-27 14:38 ` Frederic Weisbecker
  2012-11-27 15:44   ` Gleb Natapov
  2012-11-27 15:39 ` Frederic Weisbecker
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-27 14:38 UTC (permalink / raw)
  To: Li Zhong; +Cc: linux-next list, LKML, paulmck, sasha.levin, gleb, avi

2012/11/27 Li Zhong <zhong@linux.vnet.ibm.com>:
> I noticed some warnings complaining about dynticks_nesting value, like
>
> [  267.545032] ------------[ cut here ]------------
> [  267.545032] WARNING: at kernel/rcutree.c:382 rcu_eqs_enter+0xab/0xc0()
> [  267.545032] Hardware name: Bochs
> [  267.545032] Modules linked in:
> [  267.545032] Pid: 0, comm: swapper/2 Not tainted 3.7.0-rc5-next-20121115 #8
> [  267.545032] Call Trace:
> [  267.545032]  [<ffffffff8104714f>] warn_slowpath_common+0x7f/0xc0
> [  267.545032]  [<ffffffff810471aa>] warn_slowpath_null+0x1a/0x20
> [  267.545032]  [<ffffffff810e607b>] rcu_eqs_enter+0xab/0xc0
> [  267.545032]  [<ffffffff810e60bb>] rcu_idle_enter+0x2b/0x70
> [  267.545032]  [<ffffffff8100d44f>] cpu_idle+0x6f/0x100
> [  267.545032]  [<ffffffff814bf055>] start_secondary+0x205/0x20c
> [  267.545032] ---[ end trace 924ae80da035028d ]---
>
> After enabling rcu-dyntick tracing, I got following abnormal
> dynticks_nesting values (13fffffffffffff, ff00000000000001,etc):
>                         ...
>  1      <idle>-0     [002] dN.2 18739.518567: rcu_dyntick: End 0 140000000000000                rcu_idle_exit
>  2        sshd-696   [002] d..1 18739.518675: rcu_dyntick: ++= 140000000000000 140000000000001  rcu_irq_enter   - apf (not present)

How did that happen? When I look at do_async_page_fault(),
KVM_PV_REASON_PAGE_NOT_PRESENT doesn't do rcu_irq_enter().

>
>  3      <idle>-0     [002] d..2 18739.518705: rcu_dyntick: Start 140000000000001 0              rcu_idle_enter
>  4      <idle>-0     [002] d..2 18739.521252: rcu_dyntick: End 0 1                              rcu_irq_enter   - apf (page ready)
>  5      <idle>-0     [002] dN.2 18739.521261: rcu_dyntick: Start 1 0                            rcu_irq_exit    - apf (page ready)
>  6      <idle>-0     [002] dN.2 18739.521263: rcu_dyntick: End 0 140000000000000                rcu_idle_exit
>
>  7        sshd-696   [002] d..1 18739.521299: rcu_dyntick: --= 140000000000000 13fffffffffffff  rcu_irq_exit    - apf (not present)

I'm confused for the same reason here.

>  8        sshd-696   [002] d..1 18739.521302: rcu_dyntick: Start 13fffffffffffff 0              rcu_user_enter
>  9        sshd-696   [002] d..1 18739.521330: rcu_dyntick: End 0 1                              rcu_irq_enter   - apf (not present)

Same. Now we certainly need to add some rcu_user_exit() on
do_async_page_fault(). Although I'm not quite sure when this function
is called. Is it an exception or an irq?

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH rcu] use new nesting value for rcu_dyntick trace in rcu_eqs_enter_common
  2012-11-27  5:58 ` [PATCH rcu] use new nesting value for rcu_dyntick trace in rcu_eqs_enter_common Li Zhong
@ 2012-11-27 15:18   ` Paul E. McKenney
  0 siblings, 0 replies; 50+ messages in thread
From: Paul E. McKenney @ 2012-11-27 15:18 UTC (permalink / raw)
  To: Li Zhong; +Cc: linux-next list, LKML, dipankar

On Tue, Nov 27, 2012 at 01:58:27PM +0800, Li Zhong wrote:
> This patch uses the real new value of dynticks_nesting instead of 0 in
> rcu_eqs_enter_common(). 
> 
> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>

Good catch!  Queued for 3.9.

							Thanx, Paul

> ---
>  kernel/rcutree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 7733eb5..6b7e314 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -339,7 +339,7 @@ static struct rcu_node *rcu_get_root(struct rcu_state *rsp)
>  static void rcu_eqs_enter_common(struct rcu_dynticks *rdtp, long long oldval,
>  				bool user)
>  {
> -	trace_rcu_dyntick("Start", oldval, 0);
> +	trace_rcu_dyntick("Start", oldval, rdtp->dynticks_nesting);
>  	if (!user && !is_idle_task(current)) {
>  		struct task_struct *idle = idle_task(smp_processor_id());
> 
> -- 
> 1.7.11.4
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 14:01   ` Sasha Levin
@ 2012-11-27 15:25     ` Paul E. McKenney
  0 siblings, 0 replies; 50+ messages in thread
From: Paul E. McKenney @ 2012-11-27 15:25 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Gleb Natapov, Li Zhong, linux-next list, LKML, avi, fweisbec

On Tue, Nov 27, 2012 at 09:01:33AM -0500, Sasha Levin wrote:
> On 11/27/2012 08:07 AM, Gleb Natapov wrote:
> > Those rcu_irq_enter()/rcu_irq_exit() were introduced by commit
> > c5e015d4949aa665 "KVM guest: exit idleness when handling
> > KVM_PV_REASON_PAGE_NOT_PRESENT", but now I am starting to question this
> > commit. KVM_PV_REASON_PAGE_NOT_PRESENT should not kick cpu out of
> > idleness. kvm_async_pf_task_wait() checks that cpu is idle and calls
> > halt if it is. After that commit schedule() may be called between
> > rcu_irq_enter()/rcu_irq_exit() which is probably illegal. Paul?

It is legal to call rcu_irq_enter() and then schedule().

In fact, it turns out that it -has- to be legal, due to some
architectures' quaint habit of entering interrupt/exception handlers
that they never leave, and possibly vice versa.

> otoh, calling schedule() apparently kicks cpu out of idleness now.

But if you call rcu_irq_enter() and then schedule(), and if schedule()
switches to the idle thread, and if execution proceeds to the point
where rcu_idle_enter() is called, then, RCU will quite naturally
decide that it is fully idle.  At that point, it is illegal to invoke
rcu_irq_exit() unless/until you have either: (1) exited the idle loop
(as in called rcu_idle_exit()) or (2) taken an interrupt, which will
call rcu_irq_enter().

And to think that when I started coding RCU's dyntick-idle funtionality,
I was thinking in terms of a simple nesting counter.  Silly me!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 13:07 ` [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault Gleb Natapov
  2012-11-27 14:01   ` Sasha Levin
@ 2012-11-27 15:34   ` Frederic Weisbecker
  1 sibling, 0 replies; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-27 15:34 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

2012/11/27 Gleb Natapov <gleb@redhat.com>:
> On Tue, Nov 27, 2012 at 01:15:25PM +0800, Li Zhong wrote:
>> @@ -247,10 +247,17 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
>>               break;
>>       case KVM_PV_REASON_PAGE_NOT_PRESENT:
>>               /* page is swapped out by the host. */
>> -             rcu_irq_enter();
>> +             if (is_idle_task(current) && rcu_is_cpu_idle())
>> +                     rcu_idle_exit();
>> +             else
>> +                     rcu_user_exit();
>> +
>>               exit_idle();
>>               kvm_async_pf_task_wait((u32)read_cr2());
>> -             rcu_irq_exit();
>> +
>> +             if (is_idle_task(current) && rcu_is_cpu_idle())
>> +                     rcu_idle_enter();
>> +
>>               break;
>>       case KVM_PV_REASON_PAGE_READY:
>>               rcu_irq_enter();
> Those rcu_irq_enter()/rcu_irq_exit() were introduced by commit
> c5e015d4949aa665 "KVM guest: exit idleness when handling
> KVM_PV_REASON_PAGE_NOT_PRESENT",

Ah!

    $ git name-rev  c5e015d4949aa665
    c5e015d4949aa665 remotes/tip/core/locking~3^2

Ok so that's not in -rcu. That's why I missed it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27  5:15 [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault Li Zhong
                   ` (2 preceding siblings ...)
  2012-11-27 14:38 ` Frederic Weisbecker
@ 2012-11-27 15:39 ` Frederic Weisbecker
  2012-11-27 16:16   ` Paul E. McKenney
  2012-11-27 16:29 ` Frederic Weisbecker
  2012-11-27 16:43 ` [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to " Frederic Weisbecker
  5 siblings, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-27 15:39 UTC (permalink / raw)
  To: Li Zhong; +Cc: linux-next list, LKML, paulmck, sasha.levin, gleb, avi

2012/11/27 Li Zhong <zhong@linux.vnet.ibm.com>:
> I noticed some warnings complaining about dynticks_nesting value, like
>
> [  267.545032] ------------[ cut here ]------------
> [  267.545032] WARNING: at kernel/rcutree.c:382 rcu_eqs_enter+0xab/0xc0()
> [  267.545032] Hardware name: Bochs
> [  267.545032] Modules linked in:
> [  267.545032] Pid: 0, comm: swapper/2 Not tainted 3.7.0-rc5-next-20121115 #8
> [  267.545032] Call Trace:
> [  267.545032]  [<ffffffff8104714f>] warn_slowpath_common+0x7f/0xc0
> [  267.545032]  [<ffffffff810471aa>] warn_slowpath_null+0x1a/0x20
> [  267.545032]  [<ffffffff810e607b>] rcu_eqs_enter+0xab/0xc0
> [  267.545032]  [<ffffffff810e60bb>] rcu_idle_enter+0x2b/0x70
> [  267.545032]  [<ffffffff8100d44f>] cpu_idle+0x6f/0x100
> [  267.545032]  [<ffffffff814bf055>] start_secondary+0x205/0x20c
> [  267.545032] ---[ end trace 924ae80da035028d ]---
>
> After enabling rcu-dyntick tracing, I got following abnormal
> dynticks_nesting values (13fffffffffffff, ff00000000000001,etc):
>                         ...
>  1      <idle>-0     [002] dN.2 18739.518567: rcu_dyntick: End 0 140000000000000                rcu_idle_exit
>  2        sshd-696   [002] d..1 18739.518675: rcu_dyntick: ++= 140000000000000 140000000000001  rcu_irq_enter   - apf (not present)
>
>  3      <idle>-0     [002] d..2 18739.518705: rcu_dyntick: Start 140000000000001 0              rcu_idle_enter
>  4      <idle>-0     [002] d..2 18739.521252: rcu_dyntick: End 0 1                              rcu_irq_enter   - apf (page ready)
>  5      <idle>-0     [002] dN.2 18739.521261: rcu_dyntick: Start 1 0                            rcu_irq_exit    - apf (page ready)
>  6      <idle>-0     [002] dN.2 18739.521263: rcu_dyntick: End 0 140000000000000                rcu_idle_exit
>
>  7        sshd-696   [002] d..1 18739.521299: rcu_dyntick: --= 140000000000000 13fffffffffffff  rcu_irq_exit    - apf (not present)

Calling rcu_irq_exit() without a matching rcu_irq_enter() after the
last rcu_idle_exit() is illegal, isn't it?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 14:38 ` Frederic Weisbecker
@ 2012-11-27 15:44   ` Gleb Natapov
  2012-11-27 15:56     ` Frederic Weisbecker
  0 siblings, 1 reply; 50+ messages in thread
From: Gleb Natapov @ 2012-11-27 15:44 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

On Tue, Nov 27, 2012 at 03:38:14PM +0100, Frederic Weisbecker wrote:
> 2012/11/27 Li Zhong <zhong@linux.vnet.ibm.com>:
> > I noticed some warnings complaining about dynticks_nesting value, like
> >
> > [  267.545032] ------------[ cut here ]------------
> > [  267.545032] WARNING: at kernel/rcutree.c:382 rcu_eqs_enter+0xab/0xc0()
> > [  267.545032] Hardware name: Bochs
> > [  267.545032] Modules linked in:
> > [  267.545032] Pid: 0, comm: swapper/2 Not tainted 3.7.0-rc5-next-20121115 #8
> > [  267.545032] Call Trace:
> > [  267.545032]  [<ffffffff8104714f>] warn_slowpath_common+0x7f/0xc0
> > [  267.545032]  [<ffffffff810471aa>] warn_slowpath_null+0x1a/0x20
> > [  267.545032]  [<ffffffff810e607b>] rcu_eqs_enter+0xab/0xc0
> > [  267.545032]  [<ffffffff810e60bb>] rcu_idle_enter+0x2b/0x70
> > [  267.545032]  [<ffffffff8100d44f>] cpu_idle+0x6f/0x100
> > [  267.545032]  [<ffffffff814bf055>] start_secondary+0x205/0x20c
> > [  267.545032] ---[ end trace 924ae80da035028d ]---
> >
> > After enabling rcu-dyntick tracing, I got following abnormal
> > dynticks_nesting values (13fffffffffffff, ff00000000000001,etc):
> >                         ...
> >  1      <idle>-0     [002] dN.2 18739.518567: rcu_dyntick: End 0 140000000000000                rcu_idle_exit
> >  2        sshd-696   [002] d..1 18739.518675: rcu_dyntick: ++= 140000000000000 140000000000001  rcu_irq_enter   - apf (not present)
> 
> How did that happen? When I look at do_async_page_fault(),
> KVM_PV_REASON_PAGE_NOT_PRESENT doesn't do rcu_irq_enter().
> 
> >
> >  3      <idle>-0     [002] d..2 18739.518705: rcu_dyntick: Start 140000000000001 0              rcu_idle_enter
> >  4      <idle>-0     [002] d..2 18739.521252: rcu_dyntick: End 0 1                              rcu_irq_enter   - apf (page ready)
> >  5      <idle>-0     [002] dN.2 18739.521261: rcu_dyntick: Start 1 0                            rcu_irq_exit    - apf (page ready)
> >  6      <idle>-0     [002] dN.2 18739.521263: rcu_dyntick: End 0 140000000000000                rcu_idle_exit
> >
> >  7        sshd-696   [002] d..1 18739.521299: rcu_dyntick: --= 140000000000000 13fffffffffffff  rcu_irq_exit    - apf (not present)
> 
> I'm confused for the same reason here.
> 
> >  8        sshd-696   [002] d..1 18739.521302: rcu_dyntick: Start 13fffffffffffff 0              rcu_user_enter
> >  9        sshd-696   [002] d..1 18739.521330: rcu_dyntick: End 0 1                              rcu_irq_enter   - apf (not present)
> 
> Same. Now we certainly need to add some rcu_user_exit() on
> do_async_page_fault(). Although I'm not quite sure when this function
> is called. Is it an exception or an irq?
> 
For KVM_PV_REASON_PAGE_NOT_PRESENT it behaves like an exception.

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 15:44   ` Gleb Natapov
@ 2012-11-27 15:56     ` Frederic Weisbecker
  2012-11-27 16:19       ` Paul E. McKenney
  2012-11-27 16:39       ` Gleb Natapov
  0 siblings, 2 replies; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-27 15:56 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

2012/11/27 Gleb Natapov <gleb@redhat.com>:
> For KVM_PV_REASON_PAGE_NOT_PRESENT it behaves like an exception.

Ok.
There seem to be a bug in kvm_async_pf_task_wait(). Using
idle_cpu(cpu) to find out if the current task is the idle task may not
work if there is pending wake up. Me may schedule another task but
when that other task sleeps later we can't schedule back to idle until
the fault is completed.

The right way is to use is_idle_task(current)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 15:39 ` Frederic Weisbecker
@ 2012-11-27 16:16   ` Paul E. McKenney
  2012-11-27 16:31     ` Frederic Weisbecker
  0 siblings, 1 reply; 50+ messages in thread
From: Paul E. McKenney @ 2012-11-27 16:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Li Zhong, linux-next list, LKML, sasha.levin, gleb, avi

On Tue, Nov 27, 2012 at 04:39:59PM +0100, Frederic Weisbecker wrote:
> 2012/11/27 Li Zhong <zhong@linux.vnet.ibm.com>:
> > I noticed some warnings complaining about dynticks_nesting value, like
> >
> > [  267.545032] ------------[ cut here ]------------
> > [  267.545032] WARNING: at kernel/rcutree.c:382 rcu_eqs_enter+0xab/0xc0()
> > [  267.545032] Hardware name: Bochs
> > [  267.545032] Modules linked in:
> > [  267.545032] Pid: 0, comm: swapper/2 Not tainted 3.7.0-rc5-next-20121115 #8
> > [  267.545032] Call Trace:
> > [  267.545032]  [<ffffffff8104714f>] warn_slowpath_common+0x7f/0xc0
> > [  267.545032]  [<ffffffff810471aa>] warn_slowpath_null+0x1a/0x20
> > [  267.545032]  [<ffffffff810e607b>] rcu_eqs_enter+0xab/0xc0
> > [  267.545032]  [<ffffffff810e60bb>] rcu_idle_enter+0x2b/0x70
> > [  267.545032]  [<ffffffff8100d44f>] cpu_idle+0x6f/0x100
> > [  267.545032]  [<ffffffff814bf055>] start_secondary+0x205/0x20c
> > [  267.545032] ---[ end trace 924ae80da035028d ]---
> >
> > After enabling rcu-dyntick tracing, I got following abnormal
> > dynticks_nesting values (13fffffffffffff, ff00000000000001,etc):
> >                         ...
> >  1      <idle>-0     [002] dN.2 18739.518567: rcu_dyntick: End 0 140000000000000                rcu_idle_exit
> >  2        sshd-696   [002] d..1 18739.518675: rcu_dyntick: ++= 140000000000000 140000000000001  rcu_irq_enter   - apf (not present)
> >
> >  3      <idle>-0     [002] d..2 18739.518705: rcu_dyntick: Start 140000000000001 0              rcu_idle_enter
> >  4      <idle>-0     [002] d..2 18739.521252: rcu_dyntick: End 0 1                              rcu_irq_enter   - apf (page ready)
> >  5      <idle>-0     [002] dN.2 18739.521261: rcu_dyntick: Start 1 0                            rcu_irq_exit    - apf (page ready)
> >  6      <idle>-0     [002] dN.2 18739.521263: rcu_dyntick: End 0 140000000000000                rcu_idle_exit
> >
> >  7        sshd-696   [002] d..1 18739.521299: rcu_dyntick: --= 140000000000000 13fffffffffffff  rcu_irq_exit    - apf (not present)
> 
> Calling rcu_irq_exit() without a matching rcu_irq_enter() after the
> last rcu_idle_exit() is illegal, isn't it?

It is OK to call rcu_irq_exit() without a matching rcu_irq_enter() -only-
if you have also called rcu_idle_exit() since the last rcu_idle_enter().
There will be a similar rule for rcu_user_exit().

More generally, it is OK to call rcu_irq_exit() without a matching
rcu_irq_enter() only if RCU believes that the CPU you are running on is
non-idle.  On 32-bit systems, you are only allowed a few tens of million
such unmatched rcu_irq_enter() calls in a given RCU-non-idle region.

All courtesy of RCU's need to tolerate architectures that enter
interrupt handlers without ever leaving them and vice versa.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 15:56     ` Frederic Weisbecker
@ 2012-11-27 16:19       ` Paul E. McKenney
  2012-11-27 16:48         ` Frederic Weisbecker
  2012-11-27 16:39       ` Gleb Natapov
  1 sibling, 1 reply; 50+ messages in thread
From: Paul E. McKenney @ 2012-11-27 16:19 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Gleb Natapov, Li Zhong, linux-next list, LKML, sasha.levin, avi

On Tue, Nov 27, 2012 at 04:56:30PM +0100, Frederic Weisbecker wrote:
> 2012/11/27 Gleb Natapov <gleb@redhat.com>:
> > For KVM_PV_REASON_PAGE_NOT_PRESENT it behaves like an exception.
> 
> Ok.
> There seem to be a bug in kvm_async_pf_task_wait(). Using
> idle_cpu(cpu) to find out if the current task is the idle task may not
> work if there is pending wake up. Me may schedule another task but
> when that other task sleeps later we can't schedule back to idle until
> the fault is completed.
> 
> The right way is to use is_idle_task(current)

Agreed.  But if you instead need to figure out whether or not RCU believes
that the CPU is idle, use rcu_is_cpu_idle().  You need to figure out
whether or not RCU believes that the CPU is idle if you are trying to
play fast and loose with rcu_irq_enter() and rcu_irq_exit().

							Thanx, Paul


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27  5:15 [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault Li Zhong
                   ` (3 preceding siblings ...)
  2012-11-27 15:39 ` Frederic Weisbecker
@ 2012-11-27 16:29 ` Frederic Weisbecker
  2012-11-28  8:18   ` [RFC PATCH v2] Add rcu user eqs exception hooks for " Li Zhong
  2012-11-27 16:43 ` [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to " Frederic Weisbecker
  5 siblings, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-27 16:29 UTC (permalink / raw)
  To: Li Zhong; +Cc: linux-next list, LKML, paulmck, sasha.levin, gleb, avi

2012/11/27 Li Zhong <zhong@linux.vnet.ibm.com>:
> @@ -247,10 +247,17 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
>                 break;
>         case KVM_PV_REASON_PAGE_NOT_PRESENT:
>                 /* page is swapped out by the host. */
> -               rcu_irq_enter();
> +               if (is_idle_task(current) && rcu_is_cpu_idle())

If the task is idle we can't schedule so I guess we don't need to call
rcu_idle_exit()

> +                       rcu_idle_exit();
> +               else
> +                       rcu_user_exit();

rcu_user_exit() must be called in any case yeah.

> +
>                 exit_idle();
>                 kvm_async_pf_task_wait((u32)read_cr2());
> -               rcu_irq_exit();
> +
> +               if (is_idle_task(current) && rcu_is_cpu_idle())
> +                       rcu_idle_enter();
> +

I think that only adding rcu_user_exit() in the beginning and let the
rest as is (rcu_irq_enter() / rcu_irq_exit()) is enough. If we are
idle we won't schedule out. If we are not idle then we exit rcu user
mode if necessary and we can call rcu_irq_enter() in any case. If we
schedule we can safely cal rcu_irq_exit() even if somebody called
rcu_idle_enter() / rcu_idle_exit() since our matching rcu_irq_enter()
called before we scheduled. This works because we are not in RCU idle
mode and Paul says this is legal to have irq exit without matching irq
entry (his famous "vice-versa" on his previous email ;-)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 16:16   ` Paul E. McKenney
@ 2012-11-27 16:31     ` Frederic Weisbecker
  0 siblings, 0 replies; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-27 16:31 UTC (permalink / raw)
  To: paulmck; +Cc: Li Zhong, linux-next list, LKML, sasha.levin, gleb, avi

2012/11/27 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> It is OK to call rcu_irq_exit() without a matching rcu_irq_enter() -only-
> if you have also called rcu_idle_exit() since the last rcu_idle_enter().
> There will be a similar rule for rcu_user_exit().
>
> More generally, it is OK to call rcu_irq_exit() without a matching
> rcu_irq_enter() only if RCU believes that the CPU you are running on is
> non-idle.

Perfect!

>  On 32-bit systems, you are only allowed a few tens of million
> such unmatched rcu_irq_enter() calls in a given RCU-non-idle region.
>
> All courtesy of RCU's need to tolerate architectures that enter
> interrupt handlers without ever leaving them and vice versa.  ;-)

RCU idle mode in a hostile environment ;-)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 15:56     ` Frederic Weisbecker
  2012-11-27 16:19       ` Paul E. McKenney
@ 2012-11-27 16:39       ` Gleb Natapov
  2012-11-27 16:51         ` Frederic Weisbecker
  1 sibling, 1 reply; 50+ messages in thread
From: Gleb Natapov @ 2012-11-27 16:39 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

On Tue, Nov 27, 2012 at 04:56:30PM +0100, Frederic Weisbecker wrote:
> 2012/11/27 Gleb Natapov <gleb@redhat.com>:
> > For KVM_PV_REASON_PAGE_NOT_PRESENT it behaves like an exception.
> 
> Ok.
> There seem to be a bug in kvm_async_pf_task_wait(). Using
> idle_cpu(cpu) to find out if the current task is the idle task may not
> work if there is pending wake up. Me may schedule another task but
> when that other task sleeps later we can't schedule back to idle until
> the fault is completed.
> 
> The right way is to use is_idle_task(current)
But if there is pending wake up then scheduling to the waked up task is
exactly what we want.

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27  5:15 [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault Li Zhong
                   ` (4 preceding siblings ...)
  2012-11-27 16:29 ` Frederic Weisbecker
@ 2012-11-27 16:43 ` Frederic Weisbecker
  5 siblings, 0 replies; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-27 16:43 UTC (permalink / raw)
  To: Li Zhong; +Cc: linux-next list, LKML, paulmck, sasha.levin, gleb, avi

2012/11/27 Li Zhong <zhong@linux.vnet.ibm.com>:
> I noticed some warnings complaining about dynticks_nesting value, like
>
> [  267.545032] ------------[ cut here ]------------
> [  267.545032] WARNING: at kernel/rcutree.c:382 rcu_eqs_enter+0xab/0xc0()
> [  267.545032] Hardware name: Bochs
> [  267.545032] Modules linked in:
> [  267.545032] Pid: 0, comm: swapper/2 Not tainted 3.7.0-rc5-next-20121115 #8
> [  267.545032] Call Trace:
> [  267.545032]  [<ffffffff8104714f>] warn_slowpath_common+0x7f/0xc0
> [  267.545032]  [<ffffffff810471aa>] warn_slowpath_null+0x1a/0x20
> [  267.545032]  [<ffffffff810e607b>] rcu_eqs_enter+0xab/0xc0
> [  267.545032]  [<ffffffff810e60bb>] rcu_idle_enter+0x2b/0x70
> [  267.545032]  [<ffffffff8100d44f>] cpu_idle+0x6f/0x100
> [  267.545032]  [<ffffffff814bf055>] start_secondary+0x205/0x20c
> [  267.545032] ---[ end trace 924ae80da035028d ]---
>
> After enabling rcu-dyntick tracing, I got following abnormal
> dynticks_nesting values (13fffffffffffff, ff00000000000001,etc):
>                         ...
>  1      <idle>-0     [002] dN.2 18739.518567: rcu_dyntick: End 0 140000000000000                rcu_idle_exit
>  2        sshd-696   [002] d..1 18739.518675: rcu_dyntick: ++= 140000000000000 140000000000001  rcu_irq_enter   - apf (not present)
>
>  3      <idle>-0     [002] d..2 18739.518705: rcu_dyntick: Start 140000000000001 0              rcu_idle_enter
>  4      <idle>-0     [002] d..2 18739.521252: rcu_dyntick: End 0 1                              rcu_irq_enter   - apf (page ready)
>  5      <idle>-0     [002] dN.2 18739.521261: rcu_dyntick: Start 1 0                            rcu_irq_exit    - apf (page ready)
>  6      <idle>-0     [002] dN.2 18739.521263: rcu_dyntick: End 0 140000000000000                rcu_idle_exit
>
>  7        sshd-696   [002] d..1 18739.521299: rcu_dyntick: --= 140000000000000 13fffffffffffff  rcu_irq_exit    - apf (not present)
>  8        sshd-696   [002] d..1 18739.521302: rcu_dyntick: Start 13fffffffffffff 0              rcu_user_enter
>  9        sshd-696   [002] d..1 18739.521330: rcu_dyntick: End 0 1                              rcu_irq_enter   - apf (not present)
>
> 10      <idle>-0     [002] d..2 18739.521346: rcu_dyntick: Start 1 0                            rcu_idle_enter - old value 1, warning

To be clear, what is illegal in the above? The fact we are calling
rcu_irq_enter() without rcu_irq_exit() while we are in idle mode? So
non matching rcu_irq_exit() is also only permitted in non-idle mode,
right?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 16:19       ` Paul E. McKenney
@ 2012-11-27 16:48         ` Frederic Weisbecker
  2012-11-27 16:59           ` Paul E. McKenney
  0 siblings, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-27 16:48 UTC (permalink / raw)
  To: paulmck; +Cc: Gleb Natapov, Li Zhong, linux-next list, LKML, sasha.levin, avi

2012/11/27 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> On Tue, Nov 27, 2012 at 04:56:30PM +0100, Frederic Weisbecker wrote:
>> 2012/11/27 Gleb Natapov <gleb@redhat.com>:
>> > For KVM_PV_REASON_PAGE_NOT_PRESENT it behaves like an exception.
>>
>> Ok.
>> There seem to be a bug in kvm_async_pf_task_wait(). Using
>> idle_cpu(cpu) to find out if the current task is the idle task may not
>> work if there is pending wake up. Me may schedule another task but
>> when that other task sleeps later we can't schedule back to idle until
>> the fault is completed.
>>
>> The right way is to use is_idle_task(current)
>
> Agreed.  But if you instead need to figure out whether or not RCU believes
> that the CPU is idle, use rcu_is_cpu_idle().  You need to figure out
> whether or not RCU believes that the CPU is idle if you are trying to
> play fast and loose with rcu_irq_enter() and rcu_irq_exit().

Right. I was referring to another problem though. They don't want to
schedule on the following cases:

1) preemption disabled, of course.
3) we are in the idle task. Because if we want to sleep in order to
wait for the page fault completion, we need to be able to schedule the
idle task. But we are already the idle task.

Now idle_cpu(cpu) is the wrong check for that.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 16:39       ` Gleb Natapov
@ 2012-11-27 16:51         ` Frederic Weisbecker
  2012-11-27 17:00           ` Gleb Natapov
  0 siblings, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-27 16:51 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

2012/11/27 Gleb Natapov <gleb@redhat.com>:
> On Tue, Nov 27, 2012 at 04:56:30PM +0100, Frederic Weisbecker wrote:
>> 2012/11/27 Gleb Natapov <gleb@redhat.com>:
>> > For KVM_PV_REASON_PAGE_NOT_PRESENT it behaves like an exception.
>>
>> Ok.
>> There seem to be a bug in kvm_async_pf_task_wait(). Using
>> idle_cpu(cpu) to find out if the current task is the idle task may not
>> work if there is pending wake up. Me may schedule another task but
>> when that other task sleeps later we can't schedule back to idle until
>> the fault is completed.
>>
>> The right way is to use is_idle_task(current)
> But if there is pending wake up then scheduling to the waked up task is
> exactly what we want.

Ok, but what if that task goes to sleep soon after beeing scheduled
and there is no other task on the runqueue and the page fault has not
been handled yet? The only thing you can do is to schedule the idle
task. But the idle task is waiting for the fault completion so you
can't do that.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 16:48         ` Frederic Weisbecker
@ 2012-11-27 16:59           ` Paul E. McKenney
  0 siblings, 0 replies; 50+ messages in thread
From: Paul E. McKenney @ 2012-11-27 16:59 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Gleb Natapov, Li Zhong, linux-next list, LKML, sasha.levin, avi

On Tue, Nov 27, 2012 at 05:48:18PM +0100, Frederic Weisbecker wrote:
> 2012/11/27 Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > On Tue, Nov 27, 2012 at 04:56:30PM +0100, Frederic Weisbecker wrote:
> >> 2012/11/27 Gleb Natapov <gleb@redhat.com>:
> >> > For KVM_PV_REASON_PAGE_NOT_PRESENT it behaves like an exception.
> >>
> >> Ok.
> >> There seem to be a bug in kvm_async_pf_task_wait(). Using
> >> idle_cpu(cpu) to find out if the current task is the idle task may not
> >> work if there is pending wake up. Me may schedule another task but
> >> when that other task sleeps later we can't schedule back to idle until
> >> the fault is completed.
> >>
> >> The right way is to use is_idle_task(current)
> >
> > Agreed.  But if you instead need to figure out whether or not RCU believes
> > that the CPU is idle, use rcu_is_cpu_idle().  You need to figure out
> > whether or not RCU believes that the CPU is idle if you are trying to
> > play fast and loose with rcu_irq_enter() and rcu_irq_exit().
> 
> Right. I was referring to another problem though. They don't want to
> schedule on the following cases:
> 
> 1) preemption disabled, of course.
> 3) we are in the idle task. Because if we want to sleep in order to
> wait for the page fault completion, we need to be able to schedule the
> idle task. But we are already the idle task.
> 
> Now idle_cpu(cpu) is the wrong check for that.

Fair enough!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 16:51         ` Frederic Weisbecker
@ 2012-11-27 17:00           ` Gleb Natapov
  2012-11-27 17:30             ` Frederic Weisbecker
  0 siblings, 1 reply; 50+ messages in thread
From: Gleb Natapov @ 2012-11-27 17:00 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

On Tue, Nov 27, 2012 at 05:51:12PM +0100, Frederic Weisbecker wrote:
> 2012/11/27 Gleb Natapov <gleb@redhat.com>:
> > On Tue, Nov 27, 2012 at 04:56:30PM +0100, Frederic Weisbecker wrote:
> >> 2012/11/27 Gleb Natapov <gleb@redhat.com>:
> >> > For KVM_PV_REASON_PAGE_NOT_PRESENT it behaves like an exception.
> >>
> >> Ok.
> >> There seem to be a bug in kvm_async_pf_task_wait(). Using
> >> idle_cpu(cpu) to find out if the current task is the idle task may not
> >> work if there is pending wake up. Me may schedule another task but
> >> when that other task sleeps later we can't schedule back to idle until
> >> the fault is completed.
> >>
> >> The right way is to use is_idle_task(current)
> > But if there is pending wake up then scheduling to the waked up task is
> > exactly what we want.
> 
> Ok, but what if that task goes to sleep soon after beeing scheduled
> and there is no other task on the runqueue and the page fault has not
> been handled yet? The only thing you can do is to schedule the idle
> task. But the idle task is waiting for the fault completion so you
> can't do that.
Yes, I see now. So even though we have runnable task we can't schedule
away from idle task. Wouldn't the patch below solve Sasha's and Li's
RCU problems then (not even compiled):


diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 4180a87..636800d 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -113,7 +113,7 @@ void kvm_async_pf_task_wait(u32 token)
 	int cpu, idle;
 
 	cpu = get_cpu();
-	idle = idle_cpu(cpu);
+	idle = is_idle_task(current);
 	put_cpu();
 
 	spin_lock(&b->lock);
@@ -247,10 +247,7 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
 		break;
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
 		/* page is swapped out by the host. */
-		rcu_irq_enter();
-		exit_idle();
 		kvm_async_pf_task_wait((u32)read_cr2());
-		rcu_irq_exit();
 		break;
 	case KVM_PV_REASON_PAGE_READY:
 		rcu_irq_enter();

--
			Gleb.

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 17:00           ` Gleb Natapov
@ 2012-11-27 17:30             ` Frederic Weisbecker
  2012-11-27 17:47               ` Gleb Natapov
  0 siblings, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-27 17:30 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

2012/11/27 Gleb Natapov <gleb@redhat.com>:
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 4180a87..636800d 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -113,7 +113,7 @@ void kvm_async_pf_task_wait(u32 token)
>         int cpu, idle;
>
>         cpu = get_cpu();
> -       idle = idle_cpu(cpu);
> +       idle = is_idle_task(current);

I suggest this part goes to a standalone patch.

>         put_cpu();
>
>         spin_lock(&b->lock);
> @@ -247,10 +247,7 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
>                 break;
>         case KVM_PV_REASON_PAGE_NOT_PRESENT:
>                 /* page is swapped out by the host. */
> -               rcu_irq_enter();
> -               exit_idle();
>                 kvm_async_pf_task_wait((u32)read_cr2());
> -               rcu_irq_exit();

Hmm, we still need those above around. I believe we just need to add
rcu_user_exit() in the beginning of that case.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 17:30             ` Frederic Weisbecker
@ 2012-11-27 17:47               ` Gleb Natapov
  2012-11-27 18:12                 ` Frederic Weisbecker
  0 siblings, 1 reply; 50+ messages in thread
From: Gleb Natapov @ 2012-11-27 17:47 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

On Tue, Nov 27, 2012 at 06:30:32PM +0100, Frederic Weisbecker wrote:
> 2012/11/27 Gleb Natapov <gleb@redhat.com>:
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 4180a87..636800d 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -113,7 +113,7 @@ void kvm_async_pf_task_wait(u32 token)
> >         int cpu, idle;
> >
> >         cpu = get_cpu();
> > -       idle = idle_cpu(cpu);
> > +       idle = is_idle_task(current);
> 
> I suggest this part goes to a standalone patch.
> 
> >         put_cpu();
> >
> >         spin_lock(&b->lock);
> > @@ -247,10 +247,7 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
> >                 break;
> >         case KVM_PV_REASON_PAGE_NOT_PRESENT:
> >                 /* page is swapped out by the host. */
> > -               rcu_irq_enter();
> > -               exit_idle();
> >                 kvm_async_pf_task_wait((u32)read_cr2());
> > -               rcu_irq_exit();
> 
> Hmm, we still need those above around. I believe we just need to add
> rcu_user_exit() in the beginning of that case.
The exception may happen in kernel space too. Is calling rcu_user_exit()
still OK? Also why calling exit_idle() if we are not exiting idle?

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 17:47               ` Gleb Natapov
@ 2012-11-27 18:12                 ` Frederic Weisbecker
  2012-11-27 19:27                   ` Gleb Natapov
  0 siblings, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-27 18:12 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

2012/11/27 Gleb Natapov <gleb@redhat.com>:
> On Tue, Nov 27, 2012 at 06:30:32PM +0100, Frederic Weisbecker wrote:
>> 2012/11/27 Gleb Natapov <gleb@redhat.com>:
>> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> > index 4180a87..636800d 100644
>> > --- a/arch/x86/kernel/kvm.c
>> > +++ b/arch/x86/kernel/kvm.c
>> > @@ -113,7 +113,7 @@ void kvm_async_pf_task_wait(u32 token)
>> >         int cpu, idle;
>> >
>> >         cpu = get_cpu();
>> > -       idle = idle_cpu(cpu);
>> > +       idle = is_idle_task(current);
>>
>> I suggest this part goes to a standalone patch.
>>
>> >         put_cpu();
>> >
>> >         spin_lock(&b->lock);
>> > @@ -247,10 +247,7 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
>> >                 break;
>> >         case KVM_PV_REASON_PAGE_NOT_PRESENT:
>> >                 /* page is swapped out by the host. */
>> > -               rcu_irq_enter();
>> > -               exit_idle();
>> >                 kvm_async_pf_task_wait((u32)read_cr2());
>> > -               rcu_irq_exit();
>>
>> Hmm, we still need those above around. I believe we just need to add
>> rcu_user_exit() in the beginning of that case.
> The exception may happen in kernel space too. Is calling rcu_user_exit()
> still OK? Also why calling exit_idle() if we are not exiting idle?

Yeah, rcu_user_exit() takes care of that. And exit_idle() also checks
we are really idle before firing the notifier.

Now we should probably call back enter_idle() before resuming idle if
needed. We disable irqs before calling enter_idle(). And exit_idle()
is called from irqs. This way we ensure it's either called before we
called local_irq_disable() or while the CPU is halt(). This provides
the guarantee that enter_idle() is always called before the CPU goes
to sleep. The fact we call exit_idle()  from an exception in idle
breaks this guarantee. But that's another issue.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 18:12                 ` Frederic Weisbecker
@ 2012-11-27 19:27                   ` Gleb Natapov
  2012-11-27 22:53                     ` Frederic Weisbecker
  0 siblings, 1 reply; 50+ messages in thread
From: Gleb Natapov @ 2012-11-27 19:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

On Tue, Nov 27, 2012 at 07:12:40PM +0100, Frederic Weisbecker wrote:
> 2012/11/27 Gleb Natapov <gleb@redhat.com>:
> > On Tue, Nov 27, 2012 at 06:30:32PM +0100, Frederic Weisbecker wrote:
> >> 2012/11/27 Gleb Natapov <gleb@redhat.com>:
> >> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> >> > index 4180a87..636800d 100644
> >> > --- a/arch/x86/kernel/kvm.c
> >> > +++ b/arch/x86/kernel/kvm.c
> >> > @@ -113,7 +113,7 @@ void kvm_async_pf_task_wait(u32 token)
> >> >         int cpu, idle;
> >> >
> >> >         cpu = get_cpu();
> >> > -       idle = idle_cpu(cpu);
> >> > +       idle = is_idle_task(current);
> >>
> >> I suggest this part goes to a standalone patch.
> >>
> >> >         put_cpu();
> >> >
> >> >         spin_lock(&b->lock);
> >> > @@ -247,10 +247,7 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
> >> >                 break;
> >> >         case KVM_PV_REASON_PAGE_NOT_PRESENT:
> >> >                 /* page is swapped out by the host. */
> >> > -               rcu_irq_enter();
> >> > -               exit_idle();
> >> >                 kvm_async_pf_task_wait((u32)read_cr2());
> >> > -               rcu_irq_exit();
> >>
> >> Hmm, we still need those above around. I believe we just need to add
> >> rcu_user_exit() in the beginning of that case.
> > The exception may happen in kernel space too. Is calling rcu_user_exit()
> > still OK? Also why calling exit_idle() if we are not exiting idle?
> 
> Yeah, rcu_user_exit() takes care of that. And exit_idle() also checks
> we are really idle before firing the notifier.
> 
> Now we should probably call back enter_idle() before resuming idle if
> needed. We disable irqs before calling enter_idle(). And exit_idle()
> is called from irqs. This way we ensure it's either called before we
> called local_irq_disable() or while the CPU is halt(). This provides
> the guarantee that enter_idle() is always called before the CPU goes
> to sleep. The fact we call exit_idle()  from an exception in idle
> breaks this guarantee. But that's another issue.

What is the semantics of enter_idle()/exit_idle(), what are they used for?
Not present fault happening in idle task does not mean we exit idle
task. If this happens exception handler will execute sti; hlt waiting
for missing page to be ready. Any interrupt happening during this hlt
will do exit_idle() by itself.

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 19:27                   ` Gleb Natapov
@ 2012-11-27 22:53                     ` Frederic Weisbecker
  2012-11-27 22:54                       ` Frederic Weisbecker
  0 siblings, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-27 22:53 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

2012/11/27 Gleb Natapov <gleb@redhat.com>:
> What is the semantics of enter_idle()/exit_idle(), what are they used for?

It's used by drivers/idle/i7300_idle.c for some tracking. I don't know much
the details.

enter_idle() is called right before the CPU is set to lower power mode: hlt()

exit_idle() is called anytime we exit this low power mode: irq,
polling on need_resched(), etc...

> Not present fault happening in idle task does not mean we exit idle
> task. If this happens exception handler will execute sti; hlt waiting
> for missing page to be ready. Any interrupt happening during this hlt
> will do exit_idle() by itself.

Well, the code executed by this fault should be considered as an exit
from idle mode. Now it's a small piece of code and we hlt() quickly so
I guess getting of idle_exit() there wouldn't hurt much.

It's buggy for now anyway due to that race where we may not have an
opportunity to call back idle_enter() if the fault happens between the
idle_enter() in the idle loop and the hlt() in the idle loop.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault
  2012-11-27 22:53                     ` Frederic Weisbecker
@ 2012-11-27 22:54                       ` Frederic Weisbecker
  0 siblings, 0 replies; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-27 22:54 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

2012/11/27 Frederic Weisbecker <fweisbec@gmail.com>:
> 2012/11/27 Gleb Natapov <gleb@redhat.com>:
>> What is the semantics of enter_idle()/exit_idle(), what are they used for?
>
> It's used by drivers/idle/i7300_idle.c for some tracking. I don't know much
> the details.
>
> enter_idle() is called right before the CPU is set to lower power mode: hlt()
>
> exit_idle() is called anytime we exit this low power mode: irq,
> polling on need_resched(), etc...
>
>> Not present fault happening in idle task does not mean we exit idle
>> task. If this happens exception handler will execute sti; hlt waiting
>> for missing page to be ready. Any interrupt happening during this hlt
>> will do exit_idle() by itself.
>
> Well, the code executed by this fault should be considered as an exit
> from idle mode. Now it's a small piece of code and we hlt() quickly so
> I guess getting of idle_exit() there wouldn't hurt much.
                        ^
                 getting "rid"

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC PATCH v2] Add rcu user eqs exception hooks for async page fault
  2012-11-27 16:29 ` Frederic Weisbecker
@ 2012-11-28  8:18   ` Li Zhong
  2012-11-28 12:55     ` Frederic Weisbecker
  0 siblings, 1 reply; 50+ messages in thread
From: Li Zhong @ 2012-11-28  8:18 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-next list, LKML, paulmck, sasha.levin, gleb, avi

Thank you all for the review and education. 

Below are my current understandings and an update version. Would you
please help to review it again and give your comments? 

Thanks, Zhong

Now it seems to me that it is legal to call rcu_irq_exit/enter() without
a matching rcu_irq_enter/exit() if the cpu is in non rcu idle state.

As opposite, it is illegal to call rcu_irq_exit() without a matching
rcu_irq_enter() if the cpu is in rcu idle state.

But it seems legal to call rcu_irq_enter() without a matching
rcu_irq_exit() if the cpu is in rcu idle state, regarding the
dynticks_nesting value.   However, it seems not good to exit the rcu
idle state, if we are actually entering into idle mode, so maybe it's
better to call a matching rcu_irq_exit() before actually idle?

As Frederic pointed out, we need a rcu_user_exit() to exit the user eqs
(if we are in this state) in the beginning. But after some more
thinking, I guess we might also need to call rcu_user_enter() after the
waiting, if we get this page fault from user space. So maybe it's better
to use rcu user eqs exception hooks here? 

With rcu_user_exit() at the beginning, now rcu_irq_enter() only protects
the cpu idle eqs, but it's not good to call rcu_irq_exit() after the cpu
halt and the page ready. 

So I still want to remove it. And later if it shows that we really needs
rcu somewhere in this code path, maybe we could use RCU_NONIDLE() to
protect it. ( The suspicious RCU usage reported in commit
c5e015d4949aa665 seems related to schedule(), which is not in the code
path if we are in cpu idle eqs )

I think we still need Gleb's patch about the idle check in
kvm_async_pf_task_wait(), and maybe another patch for the
exit_idle()/enter_idle() issue. 

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/x86/kernel/kvm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 4180a87..e3e7752 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -42,6 +42,7 @@
 #include <asm/apic.h>
 #include <asm/apicdef.h>
 #include <asm/hypervisor.h>
+#include <asm/rcu.h>
 
 static int kvmapf = 1;
 
@@ -247,10 +248,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
 		break;
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
 		/* page is swapped out by the host. */
-		rcu_irq_enter();
+		exception_enter(regs);
 		exit_idle();
 		kvm_async_pf_task_wait((u32)read_cr2());
-		rcu_irq_exit();
+		exception_exit(regs);
 		break;
 	case KVM_PV_REASON_PAGE_READY:
 		rcu_irq_enter();
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2] Add rcu user eqs exception hooks for async page fault
  2012-11-28  8:18   ` [RFC PATCH v2] Add rcu user eqs exception hooks for " Li Zhong
@ 2012-11-28 12:55     ` Frederic Weisbecker
  2012-11-28 13:53       ` Gleb Natapov
  2012-11-29  1:49       ` [RFC PATCH v2] " Li Zhong
  0 siblings, 2 replies; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-28 12:55 UTC (permalink / raw)
  To: Li Zhong; +Cc: linux-next list, LKML, paulmck, sasha.levin, gleb, avi

2012/11/28 Li Zhong <zhong@linux.vnet.ibm.com>:
> Thank you all for the review and education.
>
> Below are my current understandings and an update version. Would you
> please help to review it again and give your comments?
>
> Thanks, Zhong
>
> Now it seems to me that it is legal to call rcu_irq_exit/enter() without
> a matching rcu_irq_enter/exit() if the cpu is in non rcu idle state.
>
> As opposite, it is illegal to call rcu_irq_exit() without a matching
> rcu_irq_enter() if the cpu is in rcu idle state.
>
> But it seems legal to call rcu_irq_enter() without a matching
> rcu_irq_exit() if the cpu is in rcu idle state, regarding the
> dynticks_nesting value.   However, it seems not good to exit the rcu
> idle state, if we are actually entering into idle mode, so maybe it's
> better to call a matching rcu_irq_exit() before actually idle?
>
> As Frederic pointed out, we need a rcu_user_exit() to exit the user eqs
> (if we are in this state) in the beginning. But after some more
> thinking, I guess we might also need to call rcu_user_enter() after the
> waiting, if we get this page fault from user space. So maybe it's better
> to use rcu user eqs exception hooks here?

Makes sense.

>
> With rcu_user_exit() at the beginning, now rcu_irq_enter() only protects
> the cpu idle eqs, but it's not good to call rcu_irq_exit() after the cpu
> halt and the page ready.

Hmm, why is it not good?

>
> So I still want to remove it. And later if it shows that we really needs
> rcu somewhere in this code path, maybe we could use RCU_NONIDLE() to
> protect it. ( The suspicious RCU usage reported in commit
> c5e015d4949aa665 seems related to schedule(), which is not in the code
> path if we are in cpu idle eqs )

Yes but if rcu_irq_*() calls are fine to be called there, and I
believe they are because exception_enter() exits the user mode, we
should start to protect there right now instead of waiting for a
potential future warning of illegal RCU use.

>
> I think we still need Gleb's patch about the idle check in
> kvm_async_pf_task_wait(), and maybe another patch for the
> exit_idle()/enter_idle() issue.

Right.

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2] Add rcu user eqs exception hooks for async page fault
  2012-11-28 12:55     ` Frederic Weisbecker
@ 2012-11-28 13:53       ` Gleb Natapov
  2012-11-28 14:25         ` Frederic Weisbecker
  2012-11-29  1:49       ` [RFC PATCH v2] " Li Zhong
  1 sibling, 1 reply; 50+ messages in thread
From: Gleb Natapov @ 2012-11-28 13:53 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

On Wed, Nov 28, 2012 at 01:55:42PM +0100, Frederic Weisbecker wrote:
> > So I still want to remove it. And later if it shows that we really needs
> > rcu somewhere in this code path, maybe we could use RCU_NONIDLE() to
> > protect it. ( The suspicious RCU usage reported in commit
> > c5e015d4949aa665 seems related to schedule(), which is not in the code
> > path if we are in cpu idle eqs )
> 
> Yes but if rcu_irq_*() calls are fine to be called there, and I
> believe they are because exception_enter() exits the user mode, we
> should start to protect there right now instead of waiting for a
> potential future warning of illegal RCU use.
> 
Async page not present is not much different from regular page fault
exception when it happens not on idle task (regular #PF cannot happen
on idle task), but code have a special handling for idle task. So why
do you think rcu_irq_*() is required here, but not in page fault
handler?

> >
> > I think we still need Gleb's patch about the idle check in
> > kvm_async_pf_task_wait(), and maybe another patch for the
> > exit_idle()/enter_idle() issue.
> 
> Right.
> 
Done.

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2] Add rcu user eqs exception hooks for async page fault
  2012-11-28 13:53       ` Gleb Natapov
@ 2012-11-28 14:25         ` Frederic Weisbecker
  2012-11-29 11:07           ` Gleb Natapov
  0 siblings, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-28 14:25 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

2012/11/28 Gleb Natapov <gleb@redhat.com>:
> On Wed, Nov 28, 2012 at 01:55:42PM +0100, Frederic Weisbecker wrote:
>> Yes but if rcu_irq_*() calls are fine to be called there, and I
>> believe they are because exception_enter() exits the user mode, we
>> should start to protect there right now instead of waiting for a
>> potential future warning of illegal RCU use.
>>
> Async page not present is not much different from regular page fault
> exception when it happens not on idle task (regular #PF cannot happen
> on idle task), but code have a special handling for idle task. So why
> do you think rcu_irq_*() is required here, but not in page fault
> handler?

Because we are not supposed to fault in idle, at least I hope there
are no case around. Except on cases like here with KVM I guess but we
have that special async handler for that.

Note exception_enter() only takes cares of RCU user mode, not idle. If
the need arise in the future we can extend it to handle idle.

>> >
>> > I think we still need Gleb's patch about the idle check in
>> > kvm_async_pf_task_wait(), and maybe another patch for the
>> > exit_idle()/enter_idle() issue.
>>
>> Right.
>>
> Done.

 Thanks!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2] Add rcu user eqs exception hooks for async page fault
  2012-11-28 12:55     ` Frederic Weisbecker
  2012-11-28 13:53       ` Gleb Natapov
@ 2012-11-29  1:49       ` Li Zhong
  2012-11-29 14:40         ` Frederic Weisbecker
  1 sibling, 1 reply; 50+ messages in thread
From: Li Zhong @ 2012-11-29  1:49 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-next list, LKML, paulmck, sasha.levin, gleb, avi

On Wed, 2012-11-28 at 13:55 +0100, Frederic Weisbecker wrote:
> 2012/11/28 Li Zhong <zhong@linux.vnet.ibm.com>:
> > Thank you all for the review and education.
> >
> > Below are my current understandings and an update version. Would you
> > please help to review it again and give your comments?
> >
> > Thanks, Zhong
> >
> > Now it seems to me that it is legal to call rcu_irq_exit/enter() without
> > a matching rcu_irq_enter/exit() if the cpu is in non rcu idle state.
> >
> > As opposite, it is illegal to call rcu_irq_exit() without a matching
> > rcu_irq_enter() if the cpu is in rcu idle state.
> >
> > But it seems legal to call rcu_irq_enter() without a matching
> > rcu_irq_exit() if the cpu is in rcu idle state, regarding the
> > dynticks_nesting value.   However, it seems not good to exit the rcu
> > idle state, if we are actually entering into idle mode, so maybe it's
> > better to call a matching rcu_irq_exit() before actually idle?
> >
> > As Frederic pointed out, we need a rcu_user_exit() to exit the user eqs
> > (if we are in this state) in the beginning. But after some more
> > thinking, I guess we might also need to call rcu_user_enter() after the
> > waiting, if we get this page fault from user space. So maybe it's better
> > to use rcu user eqs exception hooks here?
> 
> Makes sense.
> 
> >
> > With rcu_user_exit() at the beginning, now rcu_irq_enter() only protects
> > the cpu idle eqs, but it's not good to call rcu_irq_exit() after the cpu
> > halt and the page ready.
> 
> Hmm, why is it not good?

I think in this case, as cpu halts and waits for page ready, it is
really in idle, and it's better for rcu to think it as rcu idle. But
with rcu_irq_enter(), the rcu would not think this cpu as idle. And the
rcu_irq_exit() is only called after this idle period to mark this cpu as
rcu idle again. 
> 
> >
> > So I still want to remove it. And later if it shows that we really needs
> > rcu somewhere in this code path, maybe we could use RCU_NONIDLE() to
> > protect it. ( The suspicious RCU usage reported in commit
> > c5e015d4949aa665 seems related to schedule(), which is not in the code
> > path if we are in cpu idle eqs )
> 
> Yes but if rcu_irq_*() calls are fine to be called there, and I
> believe they are because exception_enter() exits the user mode, we
> should start to protect there right now instead of waiting for a
> potential future warning of illegal RCU use.

I agree, but I think by only protecting the necessary code avoids
marking the entire waiting period as rcu non idle.

Thanks, Zhong

> 
> >
> > I think we still need Gleb's patch about the idle check in
> > kvm_async_pf_task_wait(), and maybe another patch for the
> > exit_idle()/enter_idle() issue.
> 
> Right.
> 
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2] Add rcu user eqs exception hooks for async page fault
  2012-11-28 14:25         ` Frederic Weisbecker
@ 2012-11-29 11:07           ` Gleb Natapov
  2012-11-29 14:47             ` Frederic Weisbecker
  0 siblings, 1 reply; 50+ messages in thread
From: Gleb Natapov @ 2012-11-29 11:07 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

On Wed, Nov 28, 2012 at 03:25:07PM +0100, Frederic Weisbecker wrote:
> 2012/11/28 Gleb Natapov <gleb@redhat.com>:
> > On Wed, Nov 28, 2012 at 01:55:42PM +0100, Frederic Weisbecker wrote:
> >> Yes but if rcu_irq_*() calls are fine to be called there, and I
> >> believe they are because exception_enter() exits the user mode, we
> >> should start to protect there right now instead of waiting for a
> >> potential future warning of illegal RCU use.
> >>
> > Async page not present is not much different from regular page fault
> > exception when it happens not on idle task (regular #PF cannot happen
> > on idle task), but code have a special handling for idle task. So why
> > do you think rcu_irq_*() is required here, but not in page fault
> > handler?
> 
> Because we are not supposed to fault in idle, at least I hope there
> are no case around. Except on cases like here with KVM I guess but we
> have that special async handler for that.
> 
As far as I understand rcu_irq_enter() should be called before entering
the mode in which read-side critical sections can occur, but async page
fault does not uses RCU in case of faulting in idle. Actually, looking
closer, it may call kfree() if page ready notification arrives ahead
of not present notification (with todays KVM it cannot happen) and we
have to assume that kfree() uses rcu read even if it currently does not.
So may be it is possible to move rcu_irq_*() calls to be around unlikely
kfree() call only.

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2] Add rcu user eqs exception hooks for async page fault
  2012-11-29  1:49       ` [RFC PATCH v2] " Li Zhong
@ 2012-11-29 14:40         ` Frederic Weisbecker
  2012-11-29 17:25           ` Gleb Natapov
  0 siblings, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-29 14:40 UTC (permalink / raw)
  To: Li Zhong; +Cc: linux-next list, LKML, paulmck, sasha.levin, gleb, avi

2012/11/29 Li Zhong <zhong@linux.vnet.ibm.com>:
> On Wed, 2012-11-28 at 13:55 +0100, Frederic Weisbecker wrote:
>> > With rcu_user_exit() at the beginning, now rcu_irq_enter() only protects
>> > the cpu idle eqs, but it's not good to call rcu_irq_exit() after the cpu
>> > halt and the page ready.
>>
>> Hmm, why is it not good?
>
> I think in this case, as cpu halts and waits for page ready, it is
> really in idle, and it's better for rcu to think it as rcu idle. But
> with rcu_irq_enter(), the rcu would not think this cpu as idle. And the
> rcu_irq_exit() is only called after this idle period to mark this cpu as
> rcu idle again.


Indeed. How about calling rcu_irq_exit() before native_safe_halt() and
rcu_irq_enter() right after?

>> > So I still want to remove it. And later if it shows that we really needs
>> > rcu somewhere in this code path, maybe we could use RCU_NONIDLE() to
>> > protect it. ( The suspicious RCU usage reported in commit
>> > c5e015d4949aa665 seems related to schedule(), which is not in the code
>> > path if we are in cpu idle eqs )
>>
>> Yes but if rcu_irq_*() calls are fine to be called there, and I
>> believe they are because exception_enter() exits the user mode, we
>> should start to protect there right now instead of waiting for a
>> potential future warning of illegal RCU use.
>
> I agree, but I think by only protecting the necessary code avoids
> marking the entire waiting period as rcu non idle.

If we use RCU_NONIDLE(), this assume we need to check all the code
there deeply for potential RCU uses and ensure it will never be
extended later to use RCU. We really don't scale enough for that.
There are too much subsystems involved there: waitqueue, spinlocks,
slab, per cpu, etc...

So I strongly suggest we use rcu_irq_*() APIs, and relax them around
native_safe_halt() like I suggested above. This seems to me the safest
solution.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2] Add rcu user eqs exception hooks for async page fault
  2012-11-29 11:07           ` Gleb Natapov
@ 2012-11-29 14:47             ` Frederic Weisbecker
  2012-11-30  9:18               ` [RFC PATCH v3] " Li Zhong
  0 siblings, 1 reply; 50+ messages in thread
From: Frederic Weisbecker @ 2012-11-29 14:47 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

2012/11/29 Gleb Natapov <gleb@redhat.com>:
> On Wed, Nov 28, 2012 at 03:25:07PM +0100, Frederic Weisbecker wrote:
>> 2012/11/28 Gleb Natapov <gleb@redhat.com>:
>> > On Wed, Nov 28, 2012 at 01:55:42PM +0100, Frederic Weisbecker wrote:
>> >> Yes but if rcu_irq_*() calls are fine to be called there, and I
>> >> believe they are because exception_enter() exits the user mode, we
>> >> should start to protect there right now instead of waiting for a
>> >> potential future warning of illegal RCU use.
>> >>
>> > Async page not present is not much different from regular page fault
>> > exception when it happens not on idle task (regular #PF cannot happen
>> > on idle task), but code have a special handling for idle task. So why
>> > do you think rcu_irq_*() is required here, but not in page fault
>> > handler?
>>
>> Because we are not supposed to fault in idle, at least I hope there
>> are no case around. Except on cases like here with KVM I guess but we
>> have that special async handler for that.
>>
> As far as I understand rcu_irq_enter() should be called before entering
> the mode in which read-side critical sections can occur, but async page
> fault does not uses RCU in case of faulting in idle. Actually, looking
> closer, it may call kfree() if page ready notification arrives ahead
> of not present notification (with todays KVM it cannot happen) and we
> have to assume that kfree() uses rcu read even if it currently does not.
> So may be it is possible to move rcu_irq_*() calls to be around unlikely
> kfree() call only.

Please, let's protect kvm_async_pf_task_wait() entirely. This is less
complicated and we don't need to audit the code and its evolution. And
that is a complicated code enough already. We just need to do:

exception_enter();
rcu_irq_enter()
kvm_async_pf_task_wait() {
     rcu_irq_exit();
     native_safe_halt();
     rcu_irq_enter();
}
rcu_irq_exit()
exception_exit();

To me it looks like the safest and simplest solution.
rcu_irq_exit()

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2] Add rcu user eqs exception hooks for async page fault
  2012-11-29 14:40         ` Frederic Weisbecker
@ 2012-11-29 17:25           ` Gleb Natapov
  2012-11-30  8:36             ` Li Zhong
  0 siblings, 1 reply; 50+ messages in thread
From: Gleb Natapov @ 2012-11-29 17:25 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Li Zhong, linux-next list, LKML, paulmck, sasha.levin, avi

On Thu, Nov 29, 2012 at 03:40:05PM +0100, Frederic Weisbecker wrote:
> 2012/11/29 Li Zhong <zhong@linux.vnet.ibm.com>:
> > On Wed, 2012-11-28 at 13:55 +0100, Frederic Weisbecker wrote:
> >> > With rcu_user_exit() at the beginning, now rcu_irq_enter() only protects
> >> > the cpu idle eqs, but it's not good to call rcu_irq_exit() after the cpu
> >> > halt and the page ready.
> >>
> >> Hmm, why is it not good?
> >
> > I think in this case, as cpu halts and waits for page ready, it is
> > really in idle, and it's better for rcu to think it as rcu idle. But
> > with rcu_irq_enter(), the rcu would not think this cpu as idle. And the
> > rcu_irq_exit() is only called after this idle period to mark this cpu as
> > rcu idle again.
> 
> 
> Indeed. How about calling rcu_irq_exit() before native_safe_halt() and
> rcu_irq_enter() right after?
> 
We can't. If exception happens in the middle of rcu read section while
preemption is disabled then calling rcu_irq_exit() before halt is
incorrect. We can do that only if exception happen during idle and this
should be rare enough for us to not care.

> >> > So I still want to remove it. And later if it shows that we really needs
> >> > rcu somewhere in this code path, maybe we could use RCU_NONIDLE() to
> >> > protect it. ( The suspicious RCU usage reported in commit
> >> > c5e015d4949aa665 seems related to schedule(), which is not in the code
> >> > path if we are in cpu idle eqs )
> >>
> >> Yes but if rcu_irq_*() calls are fine to be called there, and I
> >> believe they are because exception_enter() exits the user mode, we
> >> should start to protect there right now instead of waiting for a
> >> potential future warning of illegal RCU use.
> >
> > I agree, but I think by only protecting the necessary code avoids
> > marking the entire waiting period as rcu non idle.
> 
> If we use RCU_NONIDLE(), this assume we need to check all the code
> there deeply for potential RCU uses and ensure it will never be
> extended later to use RCU. We really don't scale enough for that.
> There are too much subsystems involved there: waitqueue, spinlocks,
> slab, per cpu, etc...
> 
> So I strongly suggest we use rcu_irq_*() APIs, and relax them around
> native_safe_halt() like I suggested above. This seems to me the safest
> solution.

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2] Add rcu user eqs exception hooks for async page fault
  2012-11-29 17:25           ` Gleb Natapov
@ 2012-11-30  8:36             ` Li Zhong
  2012-11-30 10:08               ` Gleb Natapov
  0 siblings, 1 reply; 50+ messages in thread
From: Li Zhong @ 2012-11-30  8:36 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Frederic Weisbecker, linux-next list, LKML, paulmck, sasha.levin, avi

On Thu, 2012-11-29 at 19:25 +0200, Gleb Natapov wrote:
> On Thu, Nov 29, 2012 at 03:40:05PM +0100, Frederic Weisbecker wrote:
> > 2012/11/29 Li Zhong <zhong@linux.vnet.ibm.com>:
> > > On Wed, 2012-11-28 at 13:55 +0100, Frederic Weisbecker wrote:
> > >> > With rcu_user_exit() at the beginning, now rcu_irq_enter() only protects
> > >> > the cpu idle eqs, but it's not good to call rcu_irq_exit() after the cpu
> > >> > halt and the page ready.
> > >>
> > >> Hmm, why is it not good?
> > >
> > > I think in this case, as cpu halts and waits for page ready, it is
> > > really in idle, and it's better for rcu to think it as rcu idle. But
> > > with rcu_irq_enter(), the rcu would not think this cpu as idle. And the
> > > rcu_irq_exit() is only called after this idle period to mark this cpu as
> > > rcu idle again.
> > 
> > 
> > Indeed. How about calling rcu_irq_exit() before native_safe_halt() and
> > rcu_irq_enter() right after?
> > 
> We can't. If exception happens in the middle of rcu read section while
> preemption is disabled then calling rcu_irq_exit() before halt is
> incorrect. We can do that only if exception happen during idle and this
> should be rare enough for us to not care.

If I understand correctly, maybe it doesn't cause problems.

I guess you mean in other(not idle) task's rcu read critical section, we
get an async page fault? In this case the rcu_idle_exit() won't make
this cpu to enter rcu idle state. As the task environment is rcu non
idle. This is the case we use rcu_irq_enter()/rcu_irq_exit() in code
path where it might be in rcu idle or rcu non idle, and they don't
change the rcu non-idle state if used in rcu non-idle state.

And it is also safe if it is rcu read critical section in idle, as it is
most probably wrapped in an outer rcu_irq_enter/exit(), so we have 

rcu idle
	rcu_irq_enter() - exit rcu idle, to protect the read section
	  exception
	  rcu_irq_enter() in the page not present path
		rcu_irq_exit() before halt 

so the halt is also safe here, as we have two rcu_irq_enter() and one
rcu_irq_exit().

I agree with Frederic that this is the safest and simplest way now, and
will try to update the code according to it.

Thanks, Zhong

> > >> > So I still want to remove it. And later if it shows that we really needs
> > >> > rcu somewhere in this code path, maybe we could use RCU_NONIDLE() to
> > >> > protect it. ( The suspicious RCU usage reported in commit
> > >> > c5e015d4949aa665 seems related to schedule(), which is not in the code
> > >> > path if we are in cpu idle eqs )
> > >>
> > >> Yes but if rcu_irq_*() calls are fine to be called there, and I
> > >> believe they are because exception_enter() exits the user mode, we
> > >> should start to protect there right now instead of waiting for a
> > >> potential future warning of illegal RCU use.
> > >
> > > I agree, but I think by only protecting the necessary code avoids
> > > marking the entire waiting period as rcu non idle.
> > 
> > If we use RCU_NONIDLE(), this assume we need to check all the code
> > there deeply for potential RCU uses and ensure it will never be
> > extended later to use RCU. We really don't scale enough for that.
> > There are too much subsystems involved there: waitqueue, spinlocks,
> > slab, per cpu, etc...
> > 
> > So I strongly suggest we use rcu_irq_*() APIs, and relax them around
> > native_safe_halt() like I suggested above. This seems to me the safest
> > solution.
> 
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 50+ messages in thread

* [RFC PATCH v3] Add rcu user eqs exception hooks for async page fault
  2012-11-29 14:47             ` Frederic Weisbecker
@ 2012-11-30  9:18               ` Li Zhong
  2012-11-30 10:26                 ` Gleb Natapov
  2012-12-03  9:57                 ` Gleb Natapov
  0 siblings, 2 replies; 50+ messages in thread
From: Li Zhong @ 2012-11-30  9:18 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Gleb Natapov, linux-next list, LKML, paulmck, sasha.levin, avi

This patch adds user eqs exception hooks for async page fault page not
present code path, to exit the user eqs and re-enter it as necessary. 

Async page fault is different from other exceptions that it may be
triggered from idle process, so we still need rcu_irq_enter() and
rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
that needs use rcu.

As Frederic pointed out it would be safest and simplest to protect the
whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
code there deeply for potential RCU uses and ensure it will never be
extended later to use RCU.".

However, We'd better re-enter the cpu idle eqs if we get the exception
in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt(). 

So the patch does what Frederic suggested for rcu_irq_*() API usage
here, except that I moved the rcu_irq_*() pair originally in
do_async_page_fault() into kvm_async_pf_task_wait(). 

That's because, I think it's better to have rcu_irq_*() pairs to be in
one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
kvm_async_pf_task_wait() has other callers, which might cause
rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
which is illegal if the cpu happens to be in rcu idle state. 

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/x86/kernel/kvm.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 4180a87..342b00b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -42,6 +42,7 @@
 #include <asm/apic.h>
 #include <asm/apicdef.h>
 #include <asm/hypervisor.h>
+#include <asm/rcu.h>
 
 static int kvmapf = 1;
 
@@ -112,6 +113,8 @@ void kvm_async_pf_task_wait(u32 token)
 	DEFINE_WAIT(wait);
 	int cpu, idle;
 
+	rcu_irq_enter();
+
 	cpu = get_cpu();
 	idle = idle_cpu(cpu);
 	put_cpu();
@@ -123,6 +126,8 @@ void kvm_async_pf_task_wait(u32 token)
 		hlist_del(&e->link);
 		kfree(e);
 		spin_unlock(&b->lock);
+
+		rcu_irq_exit();
 		return;
 	}
 
@@ -147,13 +152,16 @@ void kvm_async_pf_task_wait(u32 token)
 			/*
 			 * We cannot reschedule. So halt.
 			 */
+			rcu_irq_exit();
 			native_safe_halt();
+			rcu_irq_enter();
 			local_irq_disable();
 		}
 	}
 	if (!n.halted)
 		finish_wait(&n.wq, &wait);
 
+	rcu_irq_exit();
 	return;
 }
 EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
@@ -247,10 +255,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
 		break;
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
 		/* page is swapped out by the host. */
-		rcu_irq_enter();
+		exception_enter(regs);
 		exit_idle();
 		kvm_async_pf_task_wait((u32)read_cr2());
-		rcu_irq_exit();
+		exception_exit(regs);
 		break;
 	case KVM_PV_REASON_PAGE_READY:
 		rcu_irq_enter();
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v2] Add rcu user eqs exception hooks for async page fault
  2012-11-30  8:36             ` Li Zhong
@ 2012-11-30 10:08               ` Gleb Natapov
  0 siblings, 0 replies; 50+ messages in thread
From: Gleb Natapov @ 2012-11-30 10:08 UTC (permalink / raw)
  To: Li Zhong
  Cc: Frederic Weisbecker, linux-next list, LKML, paulmck, sasha.levin, avi

On Fri, Nov 30, 2012 at 04:36:46PM +0800, Li Zhong wrote:
> On Thu, 2012-11-29 at 19:25 +0200, Gleb Natapov wrote:
> > On Thu, Nov 29, 2012 at 03:40:05PM +0100, Frederic Weisbecker wrote:
> > > 2012/11/29 Li Zhong <zhong@linux.vnet.ibm.com>:
> > > > On Wed, 2012-11-28 at 13:55 +0100, Frederic Weisbecker wrote:
> > > >> > With rcu_user_exit() at the beginning, now rcu_irq_enter() only protects
> > > >> > the cpu idle eqs, but it's not good to call rcu_irq_exit() after the cpu
> > > >> > halt and the page ready.
> > > >>
> > > >> Hmm, why is it not good?
> > > >
> > > > I think in this case, as cpu halts and waits for page ready, it is
> > > > really in idle, and it's better for rcu to think it as rcu idle. But
> > > > with rcu_irq_enter(), the rcu would not think this cpu as idle. And the
> > > > rcu_irq_exit() is only called after this idle period to mark this cpu as
> > > > rcu idle again.
> > > 
> > > 
> > > Indeed. How about calling rcu_irq_exit() before native_safe_halt() and
> > > rcu_irq_enter() right after?
> > > 
> > We can't. If exception happens in the middle of rcu read section while
> > preemption is disabled then calling rcu_irq_exit() before halt is
> > incorrect. We can do that only if exception happen during idle and this
> > should be rare enough for us to not care.
> 
> If I understand correctly, maybe it doesn't cause problems.
> 
> I guess you mean in other(not idle) task's rcu read critical section, we
> get an async page fault? In this case the rcu_idle_exit() won't make
> this cpu to enter rcu idle state. As the task environment is rcu non
> idle. This is the case we use rcu_irq_enter()/rcu_irq_exit() in code
> path where it might be in rcu idle or rcu non idle, and they don't
> change the rcu non-idle state if used in rcu non-idle state.
> 
> And it is also safe if it is rcu read critical section in idle, as it is
> most probably wrapped in an outer rcu_irq_enter/exit(), so we have 
> 
> rcu idle
> 	rcu_irq_enter() - exit rcu idle, to protect the read section
> 	  exception
> 	  rcu_irq_enter() in the page not present path
> 		rcu_irq_exit() before halt 
> 
Yes, I was thinking about this scenario and you are right, since
rcu_irq_enter() will be nested it should be safe to call rcu_irq_exit
before halt.

> so the halt is also safe here, as we have two rcu_irq_enter() and one
> rcu_irq_exit().
> 
> I agree with Frederic that this is the safest and simplest way now, and
> will try to update the code according to it.
> 
> Thanks, Zhong
> 
> > > >> > So I still want to remove it. And later if it shows that we really needs
> > > >> > rcu somewhere in this code path, maybe we could use RCU_NONIDLE() to
> > > >> > protect it. ( The suspicious RCU usage reported in commit
> > > >> > c5e015d4949aa665 seems related to schedule(), which is not in the code
> > > >> > path if we are in cpu idle eqs )
> > > >>
> > > >> Yes but if rcu_irq_*() calls are fine to be called there, and I
> > > >> believe they are because exception_enter() exits the user mode, we
> > > >> should start to protect there right now instead of waiting for a
> > > >> potential future warning of illegal RCU use.
> > > >
> > > > I agree, but I think by only protecting the necessary code avoids
> > > > marking the entire waiting period as rcu non idle.
> > > 
> > > If we use RCU_NONIDLE(), this assume we need to check all the code
> > > there deeply for potential RCU uses and ensure it will never be
> > > extended later to use RCU. We really don't scale enough for that.
> > > There are too much subsystems involved there: waitqueue, spinlocks,
> > > slab, per cpu, etc...
> > > 
> > > So I strongly suggest we use rcu_irq_*() APIs, and relax them around
> > > native_safe_halt() like I suggested above. This seems to me the safest
> > > solution.
> > 
> > --
> > 			Gleb.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v3] Add rcu user eqs exception hooks for async page fault
  2012-11-30  9:18               ` [RFC PATCH v3] " Li Zhong
@ 2012-11-30 10:26                 ` Gleb Natapov
  2012-12-03  2:08                   ` Li Zhong
  2012-12-03  9:57                 ` Gleb Natapov
  1 sibling, 1 reply; 50+ messages in thread
From: Gleb Natapov @ 2012-11-30 10:26 UTC (permalink / raw)
  To: Li Zhong
  Cc: Frederic Weisbecker, linux-next list, LKML, paulmck, sasha.levin, avi

On Fri, Nov 30, 2012 at 05:18:41PM +0800, Li Zhong wrote:
> This patch adds user eqs exception hooks for async page fault page not
> present code path, to exit the user eqs and re-enter it as necessary. 
> 
> Async page fault is different from other exceptions that it may be
> triggered from idle process, so we still need rcu_irq_enter() and
> rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
> that needs use rcu.
> 
> As Frederic pointed out it would be safest and simplest to protect the
> whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
> code there deeply for potential RCU uses and ensure it will never be
> extended later to use RCU.".
> 
> However, We'd better re-enter the cpu idle eqs if we get the exception
> in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt(). 
> 
> So the patch does what Frederic suggested for rcu_irq_*() API usage
> here, except that I moved the rcu_irq_*() pair originally in
> do_async_page_fault() into kvm_async_pf_task_wait(). 
> 
> That's because, I think it's better to have rcu_irq_*() pairs to be in
> one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
> kvm_async_pf_task_wait() has other callers, which might cause
> rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
> which is illegal if the cpu happens to be in rcu idle state. 
> 
> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> ---
>  arch/x86/kernel/kvm.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 4180a87..342b00b 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -42,6 +42,7 @@
>  #include <asm/apic.h>
>  #include <asm/apicdef.h>
>  #include <asm/hypervisor.h>
> +#include <asm/rcu.h>
>  
>  static int kvmapf = 1;
>  
> @@ -112,6 +113,8 @@ void kvm_async_pf_task_wait(u32 token)
>  	DEFINE_WAIT(wait);
>  	int cpu, idle;
>  
> +	rcu_irq_enter();
> +
Why move rcu_irq_*() calls into kvm_async_pf_task_wait()?

>  	cpu = get_cpu();
>  	idle = idle_cpu(cpu);
>  	put_cpu();
> @@ -123,6 +126,8 @@ void kvm_async_pf_task_wait(u32 token)
>  		hlist_del(&e->link);
>  		kfree(e);
>  		spin_unlock(&b->lock);
> +
> +		rcu_irq_exit();
We can skip that if  rcu_irq_*() will stay outside.

>  		return;
>  	}
>  
> @@ -147,13 +152,16 @@ void kvm_async_pf_task_wait(u32 token)
>  			/*
>  			 * We cannot reschedule. So halt.
>  			 */
> +			rcu_irq_exit();
>  			native_safe_halt();
> +			rcu_irq_enter();
>  			local_irq_disable();
>  		}
>  	}
>  	if (!n.halted)
>  		finish_wait(&n.wq, &wait);
>  
> +	rcu_irq_exit();
>  	return;
>  }
>  EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
> @@ -247,10 +255,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
>  		break;
>  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
>  		/* page is swapped out by the host. */
> -		rcu_irq_enter();
> +		exception_enter(regs);
>  		exit_idle();
>  		kvm_async_pf_task_wait((u32)read_cr2());
> -		rcu_irq_exit();
> +		exception_exit(regs);
>  		break;
>  	case KVM_PV_REASON_PAGE_READY:
>  		rcu_irq_enter();
> -- 
> 1.7.11.4

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v3] Add rcu user eqs exception hooks for async page fault
  2012-11-30 10:26                 ` Gleb Natapov
@ 2012-12-03  2:08                   ` Li Zhong
  2012-12-03  8:30                     ` Gleb Natapov
  0 siblings, 1 reply; 50+ messages in thread
From: Li Zhong @ 2012-12-03  2:08 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Frederic Weisbecker, linux-next list, LKML, paulmck, sasha.levin, avi

On Fri, 2012-11-30 at 12:26 +0200, Gleb Natapov wrote:
> On Fri, Nov 30, 2012 at 05:18:41PM +0800, Li Zhong wrote:
> > This patch adds user eqs exception hooks for async page fault page not
> > present code path, to exit the user eqs and re-enter it as necessary. 
> > 
> > Async page fault is different from other exceptions that it may be
> > triggered from idle process, so we still need rcu_irq_enter() and
> > rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
> > that needs use rcu.
> > 
> > As Frederic pointed out it would be safest and simplest to protect the
> > whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
> > code there deeply for potential RCU uses and ensure it will never be
> > extended later to use RCU.".
> > 
> > However, We'd better re-enter the cpu idle eqs if we get the exception
> > in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt(). 
> > 
> > So the patch does what Frederic suggested for rcu_irq_*() API usage
> > here, except that I moved the rcu_irq_*() pair originally in
> > do_async_page_fault() into kvm_async_pf_task_wait(). 
> > 
> > That's because, I think it's better to have rcu_irq_*() pairs to be in
> > one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
> > kvm_async_pf_task_wait() has other callers, which might cause
> > rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
> > which is illegal if the cpu happens to be in rcu idle state. 
> > 
> > Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> > ---
> >  arch/x86/kernel/kvm.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 4180a87..342b00b 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -42,6 +42,7 @@
> >  #include <asm/apic.h>
> >  #include <asm/apicdef.h>
> >  #include <asm/hypervisor.h>
> > +#include <asm/rcu.h>
> >  
> >  static int kvmapf = 1;
> >  
> > @@ -112,6 +113,8 @@ void kvm_async_pf_task_wait(u32 token)
> >  	DEFINE_WAIT(wait);
> >  	int cpu, idle;
> >  
> > +	rcu_irq_enter();
> > +
> Why move rcu_irq_*() calls into kvm_async_pf_task_wait()?

I think it is not good for a function to have a rcu_irq_exit(), which
needs a matching rcu_irq_enter() in its caller. 

Here, if not move rcu_irq_* in, then the rcu_irq_exit() before
native_safe_halt() in kvm_async_pf_task_wait() is the one that needs the
matching rcu_irq_enter() in do_async_page_fault(). And, for this case,
kvm_async_pf_task_wait() even has other caller - pf_interception().
Maybe it will always be rcu non-idle for pf_interception (so a matching
rcu_irq_enter() is not needed), or maybe we could (or need) add
rcu_irq_*() in pf_interception().  But I still think it's good to have
those function calls that need to be matched be contained in one
function.

Thanks, Zhong

> >  	cpu = get_cpu();
> >  	idle = idle_cpu(cpu);
> >  	put_cpu();
> > @@ -123,6 +126,8 @@ void kvm_async_pf_task_wait(u32 token)
> >  		hlist_del(&e->link);
> >  		kfree(e);
> >  		spin_unlock(&b->lock);
> > +
> > +		rcu_irq_exit();
> We can skip that if  rcu_irq_*() will stay outside.
> 
> >  		return;
> >  	}
> >  
> > @@ -147,13 +152,16 @@ void kvm_async_pf_task_wait(u32 token)
> >  			/*
> >  			 * We cannot reschedule. So halt.
> >  			 */
> > +			rcu_irq_exit();
> >  			native_safe_halt();
> > +			rcu_irq_enter();
> >  			local_irq_disable();
> >  		}
> >  	}
> >  	if (!n.halted)
> >  		finish_wait(&n.wq, &wait);
> >  
> > +	rcu_irq_exit();
> >  	return;
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
> > @@ -247,10 +255,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
> >  		break;
> >  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
> >  		/* page is swapped out by the host. */
> > -		rcu_irq_enter();
> > +		exception_enter(regs);
> >  		exit_idle();
> >  		kvm_async_pf_task_wait((u32)read_cr2());
> > -		rcu_irq_exit();
> > +		exception_exit(regs);
> >  		break;
> >  	case KVM_PV_REASON_PAGE_READY:
> >  		rcu_irq_enter();
> > -- 
> > 1.7.11.4
> 
> --
> 			Gleb.
> 



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v3] Add rcu user eqs exception hooks for async page fault
  2012-12-03  2:08                   ` Li Zhong
@ 2012-12-03  8:30                     ` Gleb Natapov
  0 siblings, 0 replies; 50+ messages in thread
From: Gleb Natapov @ 2012-12-03  8:30 UTC (permalink / raw)
  To: Li Zhong
  Cc: Frederic Weisbecker, linux-next list, LKML, paulmck, sasha.levin, avi

On Mon, Dec 03, 2012 at 10:08:32AM +0800, Li Zhong wrote:
> On Fri, 2012-11-30 at 12:26 +0200, Gleb Natapov wrote:
> > On Fri, Nov 30, 2012 at 05:18:41PM +0800, Li Zhong wrote:
> > > This patch adds user eqs exception hooks for async page fault page not
> > > present code path, to exit the user eqs and re-enter it as necessary. 
> > > 
> > > Async page fault is different from other exceptions that it may be
> > > triggered from idle process, so we still need rcu_irq_enter() and
> > > rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
> > > that needs use rcu.
> > > 
> > > As Frederic pointed out it would be safest and simplest to protect the
> > > whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
> > > code there deeply for potential RCU uses and ensure it will never be
> > > extended later to use RCU.".
> > > 
> > > However, We'd better re-enter the cpu idle eqs if we get the exception
> > > in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt(). 
> > > 
> > > So the patch does what Frederic suggested for rcu_irq_*() API usage
> > > here, except that I moved the rcu_irq_*() pair originally in
> > > do_async_page_fault() into kvm_async_pf_task_wait(). 
> > > 
> > > That's because, I think it's better to have rcu_irq_*() pairs to be in
> > > one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
> > > kvm_async_pf_task_wait() has other callers, which might cause
> > > rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
> > > which is illegal if the cpu happens to be in rcu idle state. 
> > > 
> > > Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> > > ---
> > >  arch/x86/kernel/kvm.c | 12 ++++++++++--
> > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > index 4180a87..342b00b 100644
> > > --- a/arch/x86/kernel/kvm.c
> > > +++ b/arch/x86/kernel/kvm.c
> > > @@ -42,6 +42,7 @@
> > >  #include <asm/apic.h>
> > >  #include <asm/apicdef.h>
> > >  #include <asm/hypervisor.h>
> > > +#include <asm/rcu.h>
> > >  
> > >  static int kvmapf = 1;
> > >  
> > > @@ -112,6 +113,8 @@ void kvm_async_pf_task_wait(u32 token)
> > >  	DEFINE_WAIT(wait);
> > >  	int cpu, idle;
> > >  
> > > +	rcu_irq_enter();
> > > +
> > Why move rcu_irq_*() calls into kvm_async_pf_task_wait()?
> 
> I think it is not good for a function to have a rcu_irq_exit(), which
> needs a matching rcu_irq_enter() in its caller. 
> 
> Here, if not move rcu_irq_* in, then the rcu_irq_exit() before
> native_safe_halt() in kvm_async_pf_task_wait() is the one that needs the
> matching rcu_irq_enter() in do_async_page_fault(). And, for this case,
> kvm_async_pf_task_wait() even has other caller - pf_interception().
> Maybe it will always be rcu non-idle for pf_interception (so a matching
> rcu_irq_enter() is not needed), or maybe we could (or need) add
> rcu_irq_*() in pf_interception().  But I still think it's good to have
> those function calls that need to be matched be contained in one
> function.
> 
kvm_async_pf_task_wait() call from pf_interception() will always go to
schedule() path. I get your point and am fine with the patch as is.

> Thanks, Zhong
> 
> > >  	cpu = get_cpu();
> > >  	idle = idle_cpu(cpu);
> > >  	put_cpu();
> > > @@ -123,6 +126,8 @@ void kvm_async_pf_task_wait(u32 token)
> > >  		hlist_del(&e->link);
> > >  		kfree(e);
> > >  		spin_unlock(&b->lock);
> > > +
> > > +		rcu_irq_exit();
> > We can skip that if  rcu_irq_*() will stay outside.
> > 
> > >  		return;
> > >  	}
> > >  
> > > @@ -147,13 +152,16 @@ void kvm_async_pf_task_wait(u32 token)
> > >  			/*
> > >  			 * We cannot reschedule. So halt.
> > >  			 */
> > > +			rcu_irq_exit();
> > >  			native_safe_halt();
> > > +			rcu_irq_enter();
> > >  			local_irq_disable();
> > >  		}
> > >  	}
> > >  	if (!n.halted)
> > >  		finish_wait(&n.wq, &wait);
> > >  
> > > +	rcu_irq_exit();
> > >  	return;
> > >  }
> > >  EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
> > > @@ -247,10 +255,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
> > >  		break;
> > >  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
> > >  		/* page is swapped out by the host. */
> > > -		rcu_irq_enter();
> > > +		exception_enter(regs);
> > >  		exit_idle();
> > >  		kvm_async_pf_task_wait((u32)read_cr2());
> > > -		rcu_irq_exit();
> > > +		exception_exit(regs);
> > >  		break;
> > >  	case KVM_PV_REASON_PAGE_READY:
> > >  		rcu_irq_enter();
> > > -- 
> > > 1.7.11.4
> > 
> > --
> > 			Gleb.
> > 
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v3] Add rcu user eqs exception hooks for async page fault
  2012-11-30  9:18               ` [RFC PATCH v3] " Li Zhong
  2012-11-30 10:26                 ` Gleb Natapov
@ 2012-12-03  9:57                 ` Gleb Natapov
  2012-12-04  2:35                   ` [ PATCH] " Li Zhong
  2012-12-04  2:36                   ` [RFC PATCH v3] " Li Zhong
  1 sibling, 2 replies; 50+ messages in thread
From: Gleb Natapov @ 2012-12-03  9:57 UTC (permalink / raw)
  To: Li Zhong
  Cc: Frederic Weisbecker, linux-next list, LKML, paulmck, sasha.levin, avi

Please regenerate the patch against
git://git.kernel.org/pub/scm/virt/kvm/kvm.git queue.

On Fri, Nov 30, 2012 at 05:18:41PM +0800, Li Zhong wrote:
> This patch adds user eqs exception hooks for async page fault page not
> present code path, to exit the user eqs and re-enter it as necessary. 
> 
> Async page fault is different from other exceptions that it may be
> triggered from idle process, so we still need rcu_irq_enter() and
> rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
> that needs use rcu.
> 
> As Frederic pointed out it would be safest and simplest to protect the
> whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
> code there deeply for potential RCU uses and ensure it will never be
> extended later to use RCU.".
> 
> However, We'd better re-enter the cpu idle eqs if we get the exception
> in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt(). 
> 
> So the patch does what Frederic suggested for rcu_irq_*() API usage
> here, except that I moved the rcu_irq_*() pair originally in
> do_async_page_fault() into kvm_async_pf_task_wait(). 
> 
> That's because, I think it's better to have rcu_irq_*() pairs to be in
> one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
> kvm_async_pf_task_wait() has other callers, which might cause
> rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
> which is illegal if the cpu happens to be in rcu idle state. 
> 
> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> ---
>  arch/x86/kernel/kvm.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 4180a87..342b00b 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -42,6 +42,7 @@
>  #include <asm/apic.h>
>  #include <asm/apicdef.h>
>  #include <asm/hypervisor.h>
> +#include <asm/rcu.h>
>  
>  static int kvmapf = 1;
>  
> @@ -112,6 +113,8 @@ void kvm_async_pf_task_wait(u32 token)
>  	DEFINE_WAIT(wait);
>  	int cpu, idle;
>  
> +	rcu_irq_enter();
> +
>  	cpu = get_cpu();
>  	idle = idle_cpu(cpu);
>  	put_cpu();
> @@ -123,6 +126,8 @@ void kvm_async_pf_task_wait(u32 token)
>  		hlist_del(&e->link);
>  		kfree(e);
>  		spin_unlock(&b->lock);
> +
> +		rcu_irq_exit();
>  		return;
>  	}
>  
> @@ -147,13 +152,16 @@ void kvm_async_pf_task_wait(u32 token)
>  			/*
>  			 * We cannot reschedule. So halt.
>  			 */
> +			rcu_irq_exit();
>  			native_safe_halt();
> +			rcu_irq_enter();
>  			local_irq_disable();
>  		}
>  	}
>  	if (!n.halted)
>  		finish_wait(&n.wq, &wait);
>  
> +	rcu_irq_exit();
>  	return;
>  }
>  EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
> @@ -247,10 +255,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
>  		break;
>  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
>  		/* page is swapped out by the host. */
> -		rcu_irq_enter();
> +		exception_enter(regs);
>  		exit_idle();
>  		kvm_async_pf_task_wait((u32)read_cr2());
> -		rcu_irq_exit();
> +		exception_exit(regs);
>  		break;
>  	case KVM_PV_REASON_PAGE_READY:
>  		rcu_irq_enter();
> -- 
> 1.7.11.4

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [ PATCH] Add rcu user eqs exception hooks for async page fault
  2012-12-03  9:57                 ` Gleb Natapov
@ 2012-12-04  2:35                   ` Li Zhong
  2012-12-18 13:25                     ` Gleb Natapov
  2012-12-04  2:36                   ` [RFC PATCH v3] " Li Zhong
  1 sibling, 1 reply; 50+ messages in thread
From: Li Zhong @ 2012-12-04  2:35 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Frederic Weisbecker, linux-next list, LKML, paulmck, sasha.levin, avi

This patch adds user eqs exception hooks for async page fault page not
present code path, to exit the user eqs and re-enter it as necessary.

Async page fault is different from other exceptions that it may be
triggered from idle process, so we still need rcu_irq_enter() and
rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
that needs use rcu.

As Frederic pointed out it would be safest and simplest to protect the
whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
code there deeply for potential RCU uses and ensure it will never be
extended later to use RCU.".

However, We'd better re-enter the cpu idle eqs if we get the exception
in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt().

So the patch does what Frederic suggested for rcu_irq_*() API usage
here, except that I moved the rcu_irq_*() pair originally in
do_async_page_fault() into kvm_async_pf_task_wait().

That's because, I think it's better to have rcu_irq_*() pairs to be in
one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
kvm_async_pf_task_wait() has other callers, which might cause
rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
which is illegal if the cpu happens to be in rcu idle state.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/x86/kernel/kvm.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 08b973f..e99af60 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -43,6 +43,7 @@
 #include <asm/apicdef.h>
 #include <asm/hypervisor.h>
 #include <asm/kvm_guest.h>
+#include <asm/rcu.h>
 
 static int kvmapf = 1;
 
@@ -121,6 +122,8 @@ void kvm_async_pf_task_wait(u32 token)
 	struct kvm_task_sleep_node n, *e;
 	DEFINE_WAIT(wait);
 
+	rcu_irq_enter();
+
 	spin_lock(&b->lock);
 	e = _find_apf_task(b, token);
 	if (e) {
@@ -128,6 +131,8 @@ void kvm_async_pf_task_wait(u32 token)
 		hlist_del(&e->link);
 		kfree(e);
 		spin_unlock(&b->lock);
+
+		rcu_irq_exit();
 		return;
 	}
 
@@ -152,13 +157,16 @@ void kvm_async_pf_task_wait(u32 token)
 			/*
 			 * We cannot reschedule. So halt.
 			 */
+			rcu_irq_exit();
 			native_safe_halt();
+			rcu_irq_enter();
 			local_irq_disable();
 		}
 	}
 	if (!n.halted)
 		finish_wait(&n.wq, &wait);
 
+	rcu_irq_exit();
 	return;
 }
 EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
@@ -252,10 +260,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
 		break;
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
 		/* page is swapped out by the host. */
-		rcu_irq_enter();
+		exception_enter(regs);
 		exit_idle();
 		kvm_async_pf_task_wait((u32)read_cr2());
-		rcu_irq_exit();
+		exception_exit(regs);
 		break;
 	case KVM_PV_REASON_PAGE_READY:
 		rcu_irq_enter();
-- 
1.7.11.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v3] Add rcu user eqs exception hooks for async page fault
  2012-12-03  9:57                 ` Gleb Natapov
  2012-12-04  2:35                   ` [ PATCH] " Li Zhong
@ 2012-12-04  2:36                   ` Li Zhong
  2012-12-04  5:11                     ` Gleb Natapov
  2012-12-04 13:02                     ` Gleb Natapov
  1 sibling, 2 replies; 50+ messages in thread
From: Li Zhong @ 2012-12-04  2:36 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Frederic Weisbecker, linux-next list, LKML, paulmck, sasha.levin, avi

On Mon, 2012-12-03 at 11:57 +0200, Gleb Natapov wrote:
> Please regenerate the patch against
> git://git.kernel.org/pub/scm/virt/kvm/kvm.git queue.

Done.

By the way, the included file <asm/rcu.h> is replaced with
<asm/context_tracking.h> in latest next tree(91d1aa43 from rcu tree). 

Seems if they are merged, there won't be conflicts, but we need change
the including file name after that. I don't know how to handle this kind
of thing... 

Thanks, Zhong

> 
> On Fri, Nov 30, 2012 at 05:18:41PM +0800, Li Zhong wrote:
> > This patch adds user eqs exception hooks for async page fault page not
> > present code path, to exit the user eqs and re-enter it as necessary. 
> > 
> > Async page fault is different from other exceptions that it may be
> > triggered from idle process, so we still need rcu_irq_enter() and
> > rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
> > that needs use rcu.
> > 
> > As Frederic pointed out it would be safest and simplest to protect the
> > whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
> > code there deeply for potential RCU uses and ensure it will never be
> > extended later to use RCU.".
> > 
> > However, We'd better re-enter the cpu idle eqs if we get the exception
> > in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt(). 
> > 
> > So the patch does what Frederic suggested for rcu_irq_*() API usage
> > here, except that I moved the rcu_irq_*() pair originally in
> > do_async_page_fault() into kvm_async_pf_task_wait(). 
> > 
> > That's because, I think it's better to have rcu_irq_*() pairs to be in
> > one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
> > kvm_async_pf_task_wait() has other callers, which might cause
> > rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
> > which is illegal if the cpu happens to be in rcu idle state. 
> > 
> > Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> > ---
> >  arch/x86/kernel/kvm.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 4180a87..342b00b 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -42,6 +42,7 @@
> >  #include <asm/apic.h>
> >  #include <asm/apicdef.h>
> >  #include <asm/hypervisor.h>
> > +#include <asm/rcu.h>
> >  
> >  static int kvmapf = 1;
> >  
> > @@ -112,6 +113,8 @@ void kvm_async_pf_task_wait(u32 token)
> >  	DEFINE_WAIT(wait);
> >  	int cpu, idle;
> >  
> > +	rcu_irq_enter();
> > +
> >  	cpu = get_cpu();
> >  	idle = idle_cpu(cpu);
> >  	put_cpu();
> > @@ -123,6 +126,8 @@ void kvm_async_pf_task_wait(u32 token)
> >  		hlist_del(&e->link);
> >  		kfree(e);
> >  		spin_unlock(&b->lock);
> > +
> > +		rcu_irq_exit();
> >  		return;
> >  	}
> >  
> > @@ -147,13 +152,16 @@ void kvm_async_pf_task_wait(u32 token)
> >  			/*
> >  			 * We cannot reschedule. So halt.
> >  			 */
> > +			rcu_irq_exit();
> >  			native_safe_halt();
> > +			rcu_irq_enter();
> >  			local_irq_disable();
> >  		}
> >  	}
> >  	if (!n.halted)
> >  		finish_wait(&n.wq, &wait);
> >  
> > +	rcu_irq_exit();
> >  	return;
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
> > @@ -247,10 +255,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
> >  		break;
> >  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
> >  		/* page is swapped out by the host. */
> > -		rcu_irq_enter();
> > +		exception_enter(regs);
> >  		exit_idle();
> >  		kvm_async_pf_task_wait((u32)read_cr2());
> > -		rcu_irq_exit();
> > +		exception_exit(regs);
> >  		break;
> >  	case KVM_PV_REASON_PAGE_READY:
> >  		rcu_irq_enter();
> > -- 
> > 1.7.11.4
> 
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v3] Add rcu user eqs exception hooks for async page fault
  2012-12-04  2:36                   ` [RFC PATCH v3] " Li Zhong
@ 2012-12-04  5:11                     ` Gleb Natapov
  2012-12-04  5:40                       ` Li Zhong
  2012-12-04 13:02                     ` Gleb Natapov
  1 sibling, 1 reply; 50+ messages in thread
From: Gleb Natapov @ 2012-12-04  5:11 UTC (permalink / raw)
  To: Li Zhong
  Cc: Frederic Weisbecker, linux-next list, LKML, paulmck, sasha.levin, avi

On Tue, Dec 04, 2012 at 10:36:02AM +0800, Li Zhong wrote:
> On Mon, 2012-12-03 at 11:57 +0200, Gleb Natapov wrote:
> > Please regenerate the patch against
> > git://git.kernel.org/pub/scm/virt/kvm/kvm.git queue.
> 
> Done.
> 
> By the way, the included file <asm/rcu.h> is replaced with
> <asm/context_tracking.h> in latest next tree(91d1aa43 from rcu tree). 
> 
What is the url of the rcu tree?

> Seems if they are merged, there won't be conflicts, but we need change
> the including file name after that. I don't know how to handle this kind
> of thing... 
> 
> Thanks, Zhong
> 
> > 
> > On Fri, Nov 30, 2012 at 05:18:41PM +0800, Li Zhong wrote:
> > > This patch adds user eqs exception hooks for async page fault page not
> > > present code path, to exit the user eqs and re-enter it as necessary. 
> > > 
> > > Async page fault is different from other exceptions that it may be
> > > triggered from idle process, so we still need rcu_irq_enter() and
> > > rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
> > > that needs use rcu.
> > > 
> > > As Frederic pointed out it would be safest and simplest to protect the
> > > whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
> > > code there deeply for potential RCU uses and ensure it will never be
> > > extended later to use RCU.".
> > > 
> > > However, We'd better re-enter the cpu idle eqs if we get the exception
> > > in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt(). 
> > > 
> > > So the patch does what Frederic suggested for rcu_irq_*() API usage
> > > here, except that I moved the rcu_irq_*() pair originally in
> > > do_async_page_fault() into kvm_async_pf_task_wait(). 
> > > 
> > > That's because, I think it's better to have rcu_irq_*() pairs to be in
> > > one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
> > > kvm_async_pf_task_wait() has other callers, which might cause
> > > rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
> > > which is illegal if the cpu happens to be in rcu idle state. 
> > > 
> > > Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> > > ---
> > >  arch/x86/kernel/kvm.c | 12 ++++++++++--
> > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > index 4180a87..342b00b 100644
> > > --- a/arch/x86/kernel/kvm.c
> > > +++ b/arch/x86/kernel/kvm.c
> > > @@ -42,6 +42,7 @@
> > >  #include <asm/apic.h>
> > >  #include <asm/apicdef.h>
> > >  #include <asm/hypervisor.h>
> > > +#include <asm/rcu.h>
> > >  
> > >  static int kvmapf = 1;
> > >  
> > > @@ -112,6 +113,8 @@ void kvm_async_pf_task_wait(u32 token)
> > >  	DEFINE_WAIT(wait);
> > >  	int cpu, idle;
> > >  
> > > +	rcu_irq_enter();
> > > +
> > >  	cpu = get_cpu();
> > >  	idle = idle_cpu(cpu);
> > >  	put_cpu();
> > > @@ -123,6 +126,8 @@ void kvm_async_pf_task_wait(u32 token)
> > >  		hlist_del(&e->link);
> > >  		kfree(e);
> > >  		spin_unlock(&b->lock);
> > > +
> > > +		rcu_irq_exit();
> > >  		return;
> > >  	}
> > >  
> > > @@ -147,13 +152,16 @@ void kvm_async_pf_task_wait(u32 token)
> > >  			/*
> > >  			 * We cannot reschedule. So halt.
> > >  			 */
> > > +			rcu_irq_exit();
> > >  			native_safe_halt();
> > > +			rcu_irq_enter();
> > >  			local_irq_disable();
> > >  		}
> > >  	}
> > >  	if (!n.halted)
> > >  		finish_wait(&n.wq, &wait);
> > >  
> > > +	rcu_irq_exit();
> > >  	return;
> > >  }
> > >  EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
> > > @@ -247,10 +255,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
> > >  		break;
> > >  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
> > >  		/* page is swapped out by the host. */
> > > -		rcu_irq_enter();
> > > +		exception_enter(regs);
> > >  		exit_idle();
> > >  		kvm_async_pf_task_wait((u32)read_cr2());
> > > -		rcu_irq_exit();
> > > +		exception_exit(regs);
> > >  		break;
> > >  	case KVM_PV_REASON_PAGE_READY:
> > >  		rcu_irq_enter();
> > > -- 
> > > 1.7.11.4
> > 
> > --
> > 			Gleb.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v3] Add rcu user eqs exception hooks for async page fault
  2012-12-04  5:11                     ` Gleb Natapov
@ 2012-12-04  5:40                       ` Li Zhong
  0 siblings, 0 replies; 50+ messages in thread
From: Li Zhong @ 2012-12-04  5:40 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Frederic Weisbecker, linux-next list, LKML, paulmck, sasha.levin, avi

On Tue, 2012-12-04 at 07:11 +0200, Gleb Natapov wrote:
> On Tue, Dec 04, 2012 at 10:36:02AM +0800, Li Zhong wrote:
> > On Mon, 2012-12-03 at 11:57 +0200, Gleb Natapov wrote:
> > > Please regenerate the patch against
> > > git://git.kernel.org/pub/scm/virt/kvm/kvm.git queue.
> > 
> > Done.
> > 
> > By the way, the included file <asm/rcu.h> is replaced with
> > <asm/context_tracking.h> in latest next tree(91d1aa43 from rcu tree). 
> > 
> What is the url of the rcu tree?

git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git

and the url of the commit:

git.kernel.org/?p=linux/kernel/git/paulmck/linux-rcu.git;a=commit;h=91d1aa43d30505b0b825db8898ffc80a8eca96c7

Thanks, Zhong

> 
> > Seems if they are merged, there won't be conflicts, but we need change
> > the including file name after that. I don't know how to handle this kind
> > of thing... 
> > 
> > Thanks, Zhong
> > 
> > > 
> > > On Fri, Nov 30, 2012 at 05:18:41PM +0800, Li Zhong wrote:
> > > > This patch adds user eqs exception hooks for async page fault page not
> > > > present code path, to exit the user eqs and re-enter it as necessary. 
> > > > 
> > > > Async page fault is different from other exceptions that it may be
> > > > triggered from idle process, so we still need rcu_irq_enter() and
> > > > rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
> > > > that needs use rcu.
> > > > 
> > > > As Frederic pointed out it would be safest and simplest to protect the
> > > > whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
> > > > code there deeply for potential RCU uses and ensure it will never be
> > > > extended later to use RCU.".
> > > > 
> > > > However, We'd better re-enter the cpu idle eqs if we get the exception
> > > > in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt(). 
> > > > 
> > > > So the patch does what Frederic suggested for rcu_irq_*() API usage
> > > > here, except that I moved the rcu_irq_*() pair originally in
> > > > do_async_page_fault() into kvm_async_pf_task_wait(). 
> > > > 
> > > > That's because, I think it's better to have rcu_irq_*() pairs to be in
> > > > one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
> > > > kvm_async_pf_task_wait() has other callers, which might cause
> > > > rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
> > > > which is illegal if the cpu happens to be in rcu idle state. 
> > > > 
> > > > Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> > > > ---
> > > >  arch/x86/kernel/kvm.c | 12 ++++++++++--
> > > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > > index 4180a87..342b00b 100644
> > > > --- a/arch/x86/kernel/kvm.c
> > > > +++ b/arch/x86/kernel/kvm.c
> > > > @@ -42,6 +42,7 @@
> > > >  #include <asm/apic.h>
> > > >  #include <asm/apicdef.h>
> > > >  #include <asm/hypervisor.h>
> > > > +#include <asm/rcu.h>
> > > >  
> > > >  static int kvmapf = 1;
> > > >  
> > > > @@ -112,6 +113,8 @@ void kvm_async_pf_task_wait(u32 token)
> > > >  	DEFINE_WAIT(wait);
> > > >  	int cpu, idle;
> > > >  
> > > > +	rcu_irq_enter();
> > > > +
> > > >  	cpu = get_cpu();
> > > >  	idle = idle_cpu(cpu);
> > > >  	put_cpu();
> > > > @@ -123,6 +126,8 @@ void kvm_async_pf_task_wait(u32 token)
> > > >  		hlist_del(&e->link);
> > > >  		kfree(e);
> > > >  		spin_unlock(&b->lock);
> > > > +
> > > > +		rcu_irq_exit();
> > > >  		return;
> > > >  	}
> > > >  
> > > > @@ -147,13 +152,16 @@ void kvm_async_pf_task_wait(u32 token)
> > > >  			/*
> > > >  			 * We cannot reschedule. So halt.
> > > >  			 */
> > > > +			rcu_irq_exit();
> > > >  			native_safe_halt();
> > > > +			rcu_irq_enter();
> > > >  			local_irq_disable();
> > > >  		}
> > > >  	}
> > > >  	if (!n.halted)
> > > >  		finish_wait(&n.wq, &wait);
> > > >  
> > > > +	rcu_irq_exit();
> > > >  	return;
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
> > > > @@ -247,10 +255,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
> > > >  		break;
> > > >  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
> > > >  		/* page is swapped out by the host. */
> > > > -		rcu_irq_enter();
> > > > +		exception_enter(regs);
> > > >  		exit_idle();
> > > >  		kvm_async_pf_task_wait((u32)read_cr2());
> > > > -		rcu_irq_exit();
> > > > +		exception_exit(regs);
> > > >  		break;
> > > >  	case KVM_PV_REASON_PAGE_READY:
> > > >  		rcu_irq_enter();
> > > > -- 
> > > > 1.7.11.4
> > > 
> > > --
> > > 			Gleb.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > > 
> > 
> 
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v3] Add rcu user eqs exception hooks for async page fault
  2012-12-04  2:36                   ` [RFC PATCH v3] " Li Zhong
  2012-12-04  5:11                     ` Gleb Natapov
@ 2012-12-04 13:02                     ` Gleb Natapov
  2012-12-04 14:28                       ` Paul E. McKenney
  1 sibling, 1 reply; 50+ messages in thread
From: Gleb Natapov @ 2012-12-04 13:02 UTC (permalink / raw)
  To: Li Zhong
  Cc: Frederic Weisbecker, linux-next list, LKML, paulmck, sasha.levin, avi

On Tue, Dec 04, 2012 at 10:36:02AM +0800, Li Zhong wrote:
> On Mon, 2012-12-03 at 11:57 +0200, Gleb Natapov wrote:
> > Please regenerate the patch against
> > git://git.kernel.org/pub/scm/virt/kvm/kvm.git queue.
> 
> Done.
> 
> By the way, the included file <asm/rcu.h> is replaced with
> <asm/context_tracking.h> in latest next tree(91d1aa43 from rcu tree). 
> 
> Seems if they are merged, there won't be conflicts, but we need change
> the including file name after that. I don't know how to handle this kind
> of thing... 
> 
Either merge rcu/next into kvm/next before sending to Linus, or waiting
for -rc1 and ask Linus to pull this one patch separately.

> Thanks, Zhong
> 
> > 
> > On Fri, Nov 30, 2012 at 05:18:41PM +0800, Li Zhong wrote:
> > > This patch adds user eqs exception hooks for async page fault page not
> > > present code path, to exit the user eqs and re-enter it as necessary. 
> > > 
> > > Async page fault is different from other exceptions that it may be
> > > triggered from idle process, so we still need rcu_irq_enter() and
> > > rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
> > > that needs use rcu.
> > > 
> > > As Frederic pointed out it would be safest and simplest to protect the
> > > whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
> > > code there deeply for potential RCU uses and ensure it will never be
> > > extended later to use RCU.".
> > > 
> > > However, We'd better re-enter the cpu idle eqs if we get the exception
> > > in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt(). 
> > > 
> > > So the patch does what Frederic suggested for rcu_irq_*() API usage
> > > here, except that I moved the rcu_irq_*() pair originally in
> > > do_async_page_fault() into kvm_async_pf_task_wait(). 
> > > 
> > > That's because, I think it's better to have rcu_irq_*() pairs to be in
> > > one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
> > > kvm_async_pf_task_wait() has other callers, which might cause
> > > rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
> > > which is illegal if the cpu happens to be in rcu idle state. 
> > > 
> > > Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> > > ---
> > >  arch/x86/kernel/kvm.c | 12 ++++++++++--
> > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > index 4180a87..342b00b 100644
> > > --- a/arch/x86/kernel/kvm.c
> > > +++ b/arch/x86/kernel/kvm.c
> > > @@ -42,6 +42,7 @@
> > >  #include <asm/apic.h>
> > >  #include <asm/apicdef.h>
> > >  #include <asm/hypervisor.h>
> > > +#include <asm/rcu.h>
> > >  
> > >  static int kvmapf = 1;
> > >  
> > > @@ -112,6 +113,8 @@ void kvm_async_pf_task_wait(u32 token)
> > >  	DEFINE_WAIT(wait);
> > >  	int cpu, idle;
> > >  
> > > +	rcu_irq_enter();
> > > +
> > >  	cpu = get_cpu();
> > >  	idle = idle_cpu(cpu);
> > >  	put_cpu();
> > > @@ -123,6 +126,8 @@ void kvm_async_pf_task_wait(u32 token)
> > >  		hlist_del(&e->link);
> > >  		kfree(e);
> > >  		spin_unlock(&b->lock);
> > > +
> > > +		rcu_irq_exit();
> > >  		return;
> > >  	}
> > >  
> > > @@ -147,13 +152,16 @@ void kvm_async_pf_task_wait(u32 token)
> > >  			/*
> > >  			 * We cannot reschedule. So halt.
> > >  			 */
> > > +			rcu_irq_exit();
> > >  			native_safe_halt();
> > > +			rcu_irq_enter();
> > >  			local_irq_disable();
> > >  		}
> > >  	}
> > >  	if (!n.halted)
> > >  		finish_wait(&n.wq, &wait);
> > >  
> > > +	rcu_irq_exit();
> > >  	return;
> > >  }
> > >  EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
> > > @@ -247,10 +255,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
> > >  		break;
> > >  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
> > >  		/* page is swapped out by the host. */
> > > -		rcu_irq_enter();
> > > +		exception_enter(regs);
> > >  		exit_idle();
> > >  		kvm_async_pf_task_wait((u32)read_cr2());
> > > -		rcu_irq_exit();
> > > +		exception_exit(regs);
> > >  		break;
> > >  	case KVM_PV_REASON_PAGE_READY:
> > >  		rcu_irq_enter();
> > > -- 
> > > 1.7.11.4
> > 
> > --
> > 			Gleb.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [RFC PATCH v3] Add rcu user eqs exception hooks for async page fault
  2012-12-04 13:02                     ` Gleb Natapov
@ 2012-12-04 14:28                       ` Paul E. McKenney
  0 siblings, 0 replies; 50+ messages in thread
From: Paul E. McKenney @ 2012-12-04 14:28 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Li Zhong, Frederic Weisbecker, linux-next list, LKML, sasha.levin, avi

On Tue, Dec 04, 2012 at 03:02:51PM +0200, Gleb Natapov wrote:
> On Tue, Dec 04, 2012 at 10:36:02AM +0800, Li Zhong wrote:
> > On Mon, 2012-12-03 at 11:57 +0200, Gleb Natapov wrote:
> > > Please regenerate the patch against
> > > git://git.kernel.org/pub/scm/virt/kvm/kvm.git queue.
> > 
> > Done.
> > 
> > By the way, the included file <asm/rcu.h> is replaced with
> > <asm/context_tracking.h> in latest next tree(91d1aa43 from rcu tree). 
> > 
> > Seems if they are merged, there won't be conflicts, but we need change
> > the including file name after that. I don't know how to handle this kind
> > of thing... 
> > 
> Either merge rcu/next into kvm/next before sending to Linus, or waiting
> for -rc1 and ask Linus to pull this one patch separately.

I just sent a pull request for 91d1aa43 to -tip, so hopefully things will
resolve reasonably.

							Thanx, Paul

> > Thanks, Zhong
> > 
> > > 
> > > On Fri, Nov 30, 2012 at 05:18:41PM +0800, Li Zhong wrote:
> > > > This patch adds user eqs exception hooks for async page fault page not
> > > > present code path, to exit the user eqs and re-enter it as necessary. 
> > > > 
> > > > Async page fault is different from other exceptions that it may be
> > > > triggered from idle process, so we still need rcu_irq_enter() and
> > > > rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
> > > > that needs use rcu.
> > > > 
> > > > As Frederic pointed out it would be safest and simplest to protect the
> > > > whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
> > > > code there deeply for potential RCU uses and ensure it will never be
> > > > extended later to use RCU.".
> > > > 
> > > > However, We'd better re-enter the cpu idle eqs if we get the exception
> > > > in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt(). 
> > > > 
> > > > So the patch does what Frederic suggested for rcu_irq_*() API usage
> > > > here, except that I moved the rcu_irq_*() pair originally in
> > > > do_async_page_fault() into kvm_async_pf_task_wait(). 
> > > > 
> > > > That's because, I think it's better to have rcu_irq_*() pairs to be in
> > > > one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
> > > > kvm_async_pf_task_wait() has other callers, which might cause
> > > > rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
> > > > which is illegal if the cpu happens to be in rcu idle state. 
> > > > 
> > > > Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> > > > ---
> > > >  arch/x86/kernel/kvm.c | 12 ++++++++++--
> > > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > > index 4180a87..342b00b 100644
> > > > --- a/arch/x86/kernel/kvm.c
> > > > +++ b/arch/x86/kernel/kvm.c
> > > > @@ -42,6 +42,7 @@
> > > >  #include <asm/apic.h>
> > > >  #include <asm/apicdef.h>
> > > >  #include <asm/hypervisor.h>
> > > > +#include <asm/rcu.h>
> > > >  
> > > >  static int kvmapf = 1;
> > > >  
> > > > @@ -112,6 +113,8 @@ void kvm_async_pf_task_wait(u32 token)
> > > >  	DEFINE_WAIT(wait);
> > > >  	int cpu, idle;
> > > >  
> > > > +	rcu_irq_enter();
> > > > +
> > > >  	cpu = get_cpu();
> > > >  	idle = idle_cpu(cpu);
> > > >  	put_cpu();
> > > > @@ -123,6 +126,8 @@ void kvm_async_pf_task_wait(u32 token)
> > > >  		hlist_del(&e->link);
> > > >  		kfree(e);
> > > >  		spin_unlock(&b->lock);
> > > > +
> > > > +		rcu_irq_exit();
> > > >  		return;
> > > >  	}
> > > >  
> > > > @@ -147,13 +152,16 @@ void kvm_async_pf_task_wait(u32 token)
> > > >  			/*
> > > >  			 * We cannot reschedule. So halt.
> > > >  			 */
> > > > +			rcu_irq_exit();
> > > >  			native_safe_halt();
> > > > +			rcu_irq_enter();
> > > >  			local_irq_disable();
> > > >  		}
> > > >  	}
> > > >  	if (!n.halted)
> > > >  		finish_wait(&n.wq, &wait);
> > > >  
> > > > +	rcu_irq_exit();
> > > >  	return;
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
> > > > @@ -247,10 +255,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
> > > >  		break;
> > > >  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
> > > >  		/* page is swapped out by the host. */
> > > > -		rcu_irq_enter();
> > > > +		exception_enter(regs);
> > > >  		exit_idle();
> > > >  		kvm_async_pf_task_wait((u32)read_cr2());
> > > > -		rcu_irq_exit();
> > > > +		exception_exit(regs);
> > > >  		break;
> > > >  	case KVM_PV_REASON_PAGE_READY:
> > > >  		rcu_irq_enter();
> > > > -- 
> > > > 1.7.11.4
> > > 
> > > --
> > > 			Gleb.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > > 
> > 
> 
> --
> 			Gleb.
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [ PATCH] Add rcu user eqs exception hooks for async page fault
  2012-12-04  2:35                   ` [ PATCH] " Li Zhong
@ 2012-12-18 13:25                     ` Gleb Natapov
  0 siblings, 0 replies; 50+ messages in thread
From: Gleb Natapov @ 2012-12-18 13:25 UTC (permalink / raw)
  To: Li Zhong
  Cc: Frederic Weisbecker, linux-next list, LKML, paulmck, sasha.levin, avi

On Tue, Dec 04, 2012 at 10:35:13AM +0800, Li Zhong wrote:
> This patch adds user eqs exception hooks for async page fault page not
> present code path, to exit the user eqs and re-enter it as necessary.
> 
> Async page fault is different from other exceptions that it may be
> triggered from idle process, so we still need rcu_irq_enter() and
> rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
> that needs use rcu.
> 
> As Frederic pointed out it would be safest and simplest to protect the
> whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
> code there deeply for potential RCU uses and ensure it will never be
> extended later to use RCU.".
> 
> However, We'd better re-enter the cpu idle eqs if we get the exception
> in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt().
> 
> So the patch does what Frederic suggested for rcu_irq_*() API usage
> here, except that I moved the rcu_irq_*() pair originally in
> do_async_page_fault() into kvm_async_pf_task_wait().
> 
> That's because, I think it's better to have rcu_irq_*() pairs to be in
> one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
> kvm_async_pf_task_wait() has other callers, which might cause
> rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
> which is illegal if the cpu happens to be in rcu idle state.
> 
> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Applied, thanks. Will land in -rc hopefully.

> ---
>  arch/x86/kernel/kvm.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 08b973f..e99af60 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -43,6 +43,7 @@
>  #include <asm/apicdef.h>
>  #include <asm/hypervisor.h>
>  #include <asm/kvm_guest.h>
> +#include <asm/rcu.h>
>  
>  static int kvmapf = 1;
>  
> @@ -121,6 +122,8 @@ void kvm_async_pf_task_wait(u32 token)
>  	struct kvm_task_sleep_node n, *e;
>  	DEFINE_WAIT(wait);
>  
> +	rcu_irq_enter();
> +
>  	spin_lock(&b->lock);
>  	e = _find_apf_task(b, token);
>  	if (e) {
> @@ -128,6 +131,8 @@ void kvm_async_pf_task_wait(u32 token)
>  		hlist_del(&e->link);
>  		kfree(e);
>  		spin_unlock(&b->lock);
> +
> +		rcu_irq_exit();
>  		return;
>  	}
>  
> @@ -152,13 +157,16 @@ void kvm_async_pf_task_wait(u32 token)
>  			/*
>  			 * We cannot reschedule. So halt.
>  			 */
> +			rcu_irq_exit();
>  			native_safe_halt();
> +			rcu_irq_enter();
>  			local_irq_disable();
>  		}
>  	}
>  	if (!n.halted)
>  		finish_wait(&n.wq, &wait);
>  
> +	rcu_irq_exit();
>  	return;
>  }
>  EXPORT_SYMBOL_GPL(kvm_async_pf_task_wait);
> @@ -252,10 +260,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code)
>  		break;
>  	case KVM_PV_REASON_PAGE_NOT_PRESENT:
>  		/* page is swapped out by the host. */
> -		rcu_irq_enter();
> +		exception_enter(regs);
>  		exit_idle();
>  		kvm_async_pf_task_wait((u32)read_cr2());
> -		rcu_irq_exit();
> +		exception_exit(regs);
>  		break;
>  	case KVM_PV_REASON_PAGE_READY:
>  		rcu_irq_enter();
> -- 
> 1.7.11.4

--
			Gleb.

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2012-12-18 13:25 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-27  5:15 [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault Li Zhong
2012-11-27  5:58 ` [PATCH rcu] use new nesting value for rcu_dyntick trace in rcu_eqs_enter_common Li Zhong
2012-11-27 15:18   ` Paul E. McKenney
2012-11-27 13:07 ` [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to async page fault Gleb Natapov
2012-11-27 14:01   ` Sasha Levin
2012-11-27 15:25     ` Paul E. McKenney
2012-11-27 15:34   ` Frederic Weisbecker
2012-11-27 14:38 ` Frederic Weisbecker
2012-11-27 15:44   ` Gleb Natapov
2012-11-27 15:56     ` Frederic Weisbecker
2012-11-27 16:19       ` Paul E. McKenney
2012-11-27 16:48         ` Frederic Weisbecker
2012-11-27 16:59           ` Paul E. McKenney
2012-11-27 16:39       ` Gleb Natapov
2012-11-27 16:51         ` Frederic Weisbecker
2012-11-27 17:00           ` Gleb Natapov
2012-11-27 17:30             ` Frederic Weisbecker
2012-11-27 17:47               ` Gleb Natapov
2012-11-27 18:12                 ` Frederic Weisbecker
2012-11-27 19:27                   ` Gleb Natapov
2012-11-27 22:53                     ` Frederic Weisbecker
2012-11-27 22:54                       ` Frederic Weisbecker
2012-11-27 15:39 ` Frederic Weisbecker
2012-11-27 16:16   ` Paul E. McKenney
2012-11-27 16:31     ` Frederic Weisbecker
2012-11-27 16:29 ` Frederic Weisbecker
2012-11-28  8:18   ` [RFC PATCH v2] Add rcu user eqs exception hooks for " Li Zhong
2012-11-28 12:55     ` Frederic Weisbecker
2012-11-28 13:53       ` Gleb Natapov
2012-11-28 14:25         ` Frederic Weisbecker
2012-11-29 11:07           ` Gleb Natapov
2012-11-29 14:47             ` Frederic Weisbecker
2012-11-30  9:18               ` [RFC PATCH v3] " Li Zhong
2012-11-30 10:26                 ` Gleb Natapov
2012-12-03  2:08                   ` Li Zhong
2012-12-03  8:30                     ` Gleb Natapov
2012-12-03  9:57                 ` Gleb Natapov
2012-12-04  2:35                   ` [ PATCH] " Li Zhong
2012-12-18 13:25                     ` Gleb Natapov
2012-12-04  2:36                   ` [RFC PATCH v3] " Li Zhong
2012-12-04  5:11                     ` Gleb Natapov
2012-12-04  5:40                       ` Li Zhong
2012-12-04 13:02                     ` Gleb Natapov
2012-12-04 14:28                       ` Paul E. McKenney
2012-11-29  1:49       ` [RFC PATCH v2] " Li Zhong
2012-11-29 14:40         ` Frederic Weisbecker
2012-11-29 17:25           ` Gleb Natapov
2012-11-30  8:36             ` Li Zhong
2012-11-30 10:08               ` Gleb Natapov
2012-11-27 16:43 ` [RFC PATCH] Fix abnormal rcu dynticks_nesting values related to " Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).