All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] powerpc/64: Re-fix race condition between going idle and entering guest
@ 2016-10-21  9:03 Paul Mackerras
  2016-10-21  9:04 ` [PATCH 2/2] powerpc/64: Fix race condition in setting lock bit in idle/wakeup code Paul Mackerras
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Paul Mackerras @ 2016-10-21  9:03 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Shreyas B. Prabhu

Commit 8117ac6a6c2f ("powerpc/powernv: Switch off MMU before entering
nap/sleep/rvwinkle mode", 2014-12-10) fixed a race condition where one
thread entering a KVM guest could switch the MMU context to the guest
while another thread was still in host kernel context with the MMU on.
That commit moved the point where a thread entering a power-saving
mode set its kvm_hstate.hwthread_state field in its PACA to
KVM_HWTHREAD_IN_IDLE from a point where the MMU was on to after the
MMU had been switched off.  That commit also added a comment
explaining that we have to switch to real mode before setting
hwthread_state to avoid this race.

Nevertheless, commit 4eae2c9ae54a ("powerpc/powernv: Make
pnv_powersave_common more generic", 2016-07-08) subsequently moved
the setting of hwthread_state back to a point where the MMU is on,
thus reintroducing the race, despite the comment saying that this
should not be done being included in full in the context lines of
the patch that did it.

This fixes the race again and adds a bigger and shoutier comment
explaining the potential race condition.

Cc: stable@vger.kernel.org # v4.8
Fixes: 4eae2c9ae54a
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kernel/idle_book3s.S | 32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index bd739fe..0d8712a 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -163,12 +163,6 @@ _GLOBAL(pnv_powersave_common)
 	std	r9,_MSR(r1)
 	std	r1,PACAR1(r13)
 
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-	/* Tell KVM we're entering idle */
-	li	r4,KVM_HWTHREAD_IN_IDLE
-	stb	r4,HSTATE_HWTHREAD_STATE(r13)
-#endif
-
 	/*
 	 * Go to real mode to do the nap, as required by the architecture.
 	 * Also, we need to be in real mode before setting hwthread_state,
@@ -185,6 +179,26 @@ _GLOBAL(pnv_powersave_common)
 
 	.globl pnv_enter_arch207_idle_mode
 pnv_enter_arch207_idle_mode:
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	/* Tell KVM we're entering idle */
+	li	r4,KVM_HWTHREAD_IN_IDLE
+	/******************************************************/
+	/*  N O T E   W E L L    ! ! !    N O T E   W E L L   */
+	/* The following store to HSTATE_HWTHREAD_STATE(r13)  */
+	/* MUST occur in real mode, i.e. with the MMU off,    */
+	/* and the MMU must stay off until we clear this flag */
+	/* and test HSTATE_HWTHREAD_REQ(r13) in the system    */
+	/* reset interrupt vector in exceptions-64s.S.        */
+	/* The reason is that another thread can switch the   */
+	/* MMU to a guest context whenever this flag is set   */
+	/* to KVM_HWTHREAD_IN_IDLE, and if the MMU was on,    */
+	/* that would potentially cause this thread to start  */
+	/* executing instructions from guest memory in        */
+	/* hypervisor mode, leading to a host crash or data   */
+	/* corruption, or worse.                              */
+	/******************************************************/
+	stb	r4,HSTATE_HWTHREAD_STATE(r13)
+#endif
 	stb	r3,PACA_THREAD_IDLE_STATE(r13)
 	cmpwi	cr3,r3,PNV_THREAD_SLEEP
 	bge	cr3,2f
@@ -250,6 +264,12 @@ enter_winkle:
  * r3 - requested stop state
  */
 power_enter_stop:
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	/* Tell KVM we're entering idle */
+	li	r4,KVM_HWTHREAD_IN_IDLE
+	/* DO THIS IN REAL MODE!  See comment above. */
+	stb	r4,HSTATE_HWTHREAD_STATE(r13)
+#endif
 /*
  * Check if the requested state is a deep idle state.
  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] powerpc/64: Fix race condition in setting lock bit in idle/wakeup code
  2016-10-21  9:03 [PATCH 1/2] powerpc/64: Re-fix race condition between going idle and entering guest Paul Mackerras
@ 2016-10-21  9:04 ` Paul Mackerras
  2016-10-25 11:46   ` Gautham R Shenoy
  2016-10-26 10:21   ` [2/2] " Michael Ellerman
  2016-10-21 12:32 ` [PATCH 1/2] powerpc/64: Re-fix race condition between going idle and entering guest Shreyas B. Prabhu
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 7+ messages in thread
From: Paul Mackerras @ 2016-10-21  9:04 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Shreyas B. Prabhu

This fixes a race condition where one thread that is entering or
leaving a power-saving state can inadvertently ignore the lock bit
that was set by another thread, and potentially also clear it.
The core_idle_lock_held function is called when the lock bit is
seen to be set.  It polls the lock bit until it is clear, then
does a lwarx to load the word containing the lock bit and thread
idle bits so it can be updated.  However, it is possible that the
value loaded with the lwarx has the lock bit set, even though an
immediately preceding lwz loaded a value with the lock bit clear.
If this happens then we go ahead and update the word despite the
lock bit being set, and when called from pnv_enter_arch207_idle_mode,
we will subsequently clear the lock bit.

No identifiable misbehaviour has been attributed to this race.

This fixes it by checking the lock bit in the value loaded by the
lwarx.  If it is set then we just go back and keep on polling.

Fixes: b32aadc1a8ed
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kernel/idle_book3s.S | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
index 0d8712a..72dac0b 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -90,6 +90,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
  * Threads will spin in HMT_LOW until the lock bit is cleared.
  * r14 - pointer to core_idle_state
  * r15 - used to load contents of core_idle_state
+ * r9  - used as a temporary variable
  */
 
 core_idle_lock_held:
@@ -99,6 +100,8 @@ core_idle_lock_held:
 	bne	3b
 	HMT_MEDIUM
 	lwarx	r15,0,r14
+	andi.	r9,r15,PNV_CORE_IDLE_LOCK_BIT
+	bne	core_idle_lock_held
 	blr
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] powerpc/64: Re-fix race condition between going idle and entering guest
  2016-10-21  9:03 [PATCH 1/2] powerpc/64: Re-fix race condition between going idle and entering guest Paul Mackerras
  2016-10-21  9:04 ` [PATCH 2/2] powerpc/64: Fix race condition in setting lock bit in idle/wakeup code Paul Mackerras
@ 2016-10-21 12:32 ` Shreyas B. Prabhu
  2016-10-25 10:24 ` Gautham R Shenoy
  2016-10-26 10:21 ` [1/2] " Michael Ellerman
  3 siblings, 0 replies; 7+ messages in thread
From: Shreyas B. Prabhu @ 2016-10-21 12:32 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]

On Fri, Oct 21, 2016 at 5:03 AM, Paul Mackerras <paulus@ozlabs.org> wrote:

> Commit 8117ac6a6c2f ("powerpc/powernv: Switch off MMU before entering
> nap/sleep/rvwinkle mode", 2014-12-10) fixed a race condition where one
> thread entering a KVM guest could switch the MMU context to the guest
> while another thread was still in host kernel context with the MMU on.
> That commit moved the point where a thread entering a power-saving
> mode set its kvm_hstate.hwthread_state field in its PACA to
> KVM_HWTHREAD_IN_IDLE from a point where the MMU was on to after the
> MMU had been switched off.  That commit also added a comment
> explaining that we have to switch to real mode before setting
> hwthread_state to avoid this race.
>
> Nevertheless, commit 4eae2c9ae54a ("powerpc/powernv: Make
> pnv_powersave_common more generic", 2016-07-08) subsequently moved
> the setting of hwthread_state back to a point where the MMU is on,
> thus reintroducing the race, despite the comment saying that this
> should not be done being included in full in the context lines of
> the patch that did it.
>
> This fixes the race again and adds a bigger and shoutier comment
> explaining the potential race condition.
>
> Cc: stable@vger.kernel.org # v4.8
> Fixes: 4eae2c9ae54a
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>

Serious oversight on my part. Thanks for fixing this.

Reviewed-by: Shreyas B. Prabhu <shreyasbp@gmail.com>

Thanks,
Shreyas

[-- Attachment #2: Type: text/html, Size: 2059 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] powerpc/64: Re-fix race condition between going idle and entering guest
  2016-10-21  9:03 [PATCH 1/2] powerpc/64: Re-fix race condition between going idle and entering guest Paul Mackerras
  2016-10-21  9:04 ` [PATCH 2/2] powerpc/64: Fix race condition in setting lock bit in idle/wakeup code Paul Mackerras
  2016-10-21 12:32 ` [PATCH 1/2] powerpc/64: Re-fix race condition between going idle and entering guest Shreyas B. Prabhu
@ 2016-10-25 10:24 ` Gautham R Shenoy
  2016-10-26 10:21 ` [1/2] " Michael Ellerman
  3 siblings, 0 replies; 7+ messages in thread
From: Gautham R Shenoy @ 2016-10-25 10:24 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Shreyas B. Prabhu

Hi Paul,

[Added Shreyas's current e-mail address ]

On Fri, Oct 21, 2016 at 08:03:05PM +1100, Paul Mackerras wrote:
> Commit 8117ac6a6c2f ("powerpc/powernv: Switch off MMU before entering
> nap/sleep/rvwinkle mode", 2014-12-10) fixed a race condition where one
> thread entering a KVM guest could switch the MMU context to the guest
> while another thread was still in host kernel context with the MMU on.
> That commit moved the point where a thread entering a power-saving
> mode set its kvm_hstate.hwthread_state field in its PACA to
> KVM_HWTHREAD_IN_IDLE from a point where the MMU was on to after the
> MMU had been switched off.  That commit also added a comment
> explaining that we have to switch to real mode before setting
> hwthread_state to avoid this race.
> 
> Nevertheless, commit 4eae2c9ae54a ("powerpc/powernv: Make
> pnv_powersave_common more generic", 2016-07-08) subsequently moved
> the setting of hwthread_state back to a point where the MMU is on,
> thus reintroducing the race, despite the comment saying that this
> should not be done being included in full in the context lines of
> the patch that did it.
>

Sorry about missing that part. I am at fault, since I reviewed
4eae2c9ae54a patch. Will keep this in mind in the future.

> This fixes the race again and adds a bigger and shoutier comment
> explaining the potential race condition.
> 
> Cc: stable@vger.kernel.org # v4.8
> Fixes: 4eae2c9ae54a
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/kernel/idle_book3s.S | 32 ++++++++++++++++++++++++++------
>  1 file changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index bd739fe..0d8712a 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -163,12 +163,6 @@ _GLOBAL(pnv_powersave_common)
>  	std	r9,_MSR(r1)
>  	std	r1,PACAR1(r13)
> 
> -#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> -	/* Tell KVM we're entering idle */
> -	li	r4,KVM_HWTHREAD_IN_IDLE
> -	stb	r4,HSTATE_HWTHREAD_STATE(r13)
> -#endif
> -
>  	/*
>  	 * Go to real mode to do the nap, as required by the architecture.
>  	 * Also, we need to be in real mode before setting hwthread_state,
> @@ -185,6 +179,26 @@ _GLOBAL(pnv_powersave_common)
> 
>  	.globl pnv_enter_arch207_idle_mode
>  pnv_enter_arch207_idle_mode:
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +	/* Tell KVM we're entering idle */
> +	li	r4,KVM_HWTHREAD_IN_IDLE
> +	/******************************************************/
> +	/*  N O T E   W E L L    ! ! !    N O T E   W E L L   */
> +	/* The following store to HSTATE_HWTHREAD_STATE(r13)  */
> +	/* MUST occur in real mode, i.e. with the MMU off,    */
> +	/* and the MMU must stay off until we clear this flag */
> +	/* and test HSTATE_HWTHREAD_REQ(r13) in the system    */
> +	/* reset interrupt vector in exceptions-64s.S.        */
> +	/* The reason is that another thread can switch the   */
> +	/* MMU to a guest context whenever this flag is set   */
> +	/* to KVM_HWTHREAD_IN_IDLE, and if the MMU was on,    */
> +	/* that would potentially cause this thread to start  */
> +	/* executing instructions from guest memory in        */
> +	/* hypervisor mode, leading to a host crash or data   */
> +	/* corruption, or worse.                              */
> +	/******************************************************/
> +	stb	r4,HSTATE_HWTHREAD_STATE(r13)
> +#endif
>  	stb	r3,PACA_THREAD_IDLE_STATE(r13)
>  	cmpwi	cr3,r3,PNV_THREAD_SLEEP
>  	bge	cr3,2f
> @@ -250,6 +264,12 @@ enter_winkle:
>   * r3 - requested stop state
>   */
>  power_enter_stop:
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +	/* Tell KVM we're entering idle */
> +	li	r4,KVM_HWTHREAD_IN_IDLE
> +	/* DO THIS IN REAL MODE!  See comment above. */
> +	stb	r4,HSTATE_HWTHREAD_STATE(r13)
> +#endif
>  /*
>   * Check if the requested state is a deep idle state.
>   */
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] powerpc/64: Fix race condition in setting lock bit in idle/wakeup code
  2016-10-21  9:04 ` [PATCH 2/2] powerpc/64: Fix race condition in setting lock bit in idle/wakeup code Paul Mackerras
@ 2016-10-25 11:46   ` Gautham R Shenoy
  2016-10-26 10:21   ` [2/2] " Michael Ellerman
  1 sibling, 0 replies; 7+ messages in thread
From: Gautham R Shenoy @ 2016-10-25 11:46 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Shreyas B. Prabhu, Li Zhong

Hi Paul,

On Fri, Oct 21, 2016 at 08:04:17PM +1100, Paul Mackerras wrote:
> This fixes a race condition where one thread that is entering or
> leaving a power-saving state can inadvertently ignore the lock bit
> that was set by another thread, and potentially also clear it.
> The core_idle_lock_held function is called when the lock bit is
> seen to be set.  It polls the lock bit until it is clear, then
> does a lwarx to load the word containing the lock bit and thread
> idle bits so it can be updated.  However, it is possible that the
> value loaded with the lwarx has the lock bit set, even though an
> immediately preceding lwz loaded a value with the lock bit clear.
> If this happens then we go ahead and update the word despite the
> lock bit being set, and when called from pnv_enter_arch207_idle_mode,
> we will subsequently clear the lock bit.
> 
> No identifiable misbehaviour has been attributed to this race.
> 
> This fixes it by checking the lock bit in the value loaded by the
> lwarx.  If it is set then we just go back and keep on polling.
> 
> Fixes: b32aadc1a8ed

This fixes the code which has been around since 4.2 kernel. Should
this be marked to stable as well ?

> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/kernel/idle_book3s.S | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/idle_book3s.S b/arch/powerpc/kernel/idle_book3s.S
> index 0d8712a..72dac0b 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -90,6 +90,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
>   * Threads will spin in HMT_LOW until the lock bit is cleared.
>   * r14 - pointer to core_idle_state
>   * r15 - used to load contents of core_idle_state
> + * r9  - used as a temporary variable
>   */
> 
>  core_idle_lock_held:
> @@ -99,6 +100,8 @@ core_idle_lock_held:
>  	bne	3b
>  	HMT_MEDIUM
>  	lwarx	r15,0,r14
> +	andi.	r9,r15,PNV_CORE_IDLE_LOCK_BIT
> +	bne	core_idle_lock_held
>  	blr
> 
>  /*
> -- 
> 2.7.4
> 

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [1/2] powerpc/64: Re-fix race condition between going idle and entering guest
  2016-10-21  9:03 [PATCH 1/2] powerpc/64: Re-fix race condition between going idle and entering guest Paul Mackerras
                   ` (2 preceding siblings ...)
  2016-10-25 10:24 ` Gautham R Shenoy
@ 2016-10-26 10:21 ` Michael Ellerman
  3 siblings, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2016-10-26 10:21 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev; +Cc: Shreyas B. Prabhu

On Fri, 2016-21-10 at 09:03:05 UTC, Paul Mackerras wrote:
> Commit 8117ac6a6c2f ("powerpc/powernv: Switch off MMU before entering
> nap/sleep/rvwinkle mode", 2014-12-10) fixed a race condition where one
> thread entering a KVM guest could switch the MMU context to the guest
> while another thread was still in host kernel context with the MMU on.
> That commit moved the point where a thread entering a power-saving
> mode set its kvm_hstate.hwthread_state field in its PACA to
> KVM_HWTHREAD_IN_IDLE from a point where the MMU was on to after the
> MMU had been switched off.  That commit also added a comment
> explaining that we have to switch to real mode before setting
> hwthread_state to avoid this race.
> 
> Nevertheless, commit 4eae2c9ae54a ("powerpc/powernv: Make
> pnv_powersave_common more generic", 2016-07-08) subsequently moved
> the setting of hwthread_state back to a point where the MMU is on,
> thus reintroducing the race, despite the comment saying that this
> should not be done being included in full in the context lines of
> the patch that did it.
> 
> This fixes the race again and adds a bigger and shoutier comment
> explaining the potential race condition.
> 
> Cc: stable@vger.kernel.org # v4.8
> Fixes: 4eae2c9ae54a
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/56c46222af0d09149fadec2a3ce9d4

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [2/2] powerpc/64: Fix race condition in setting lock bit in idle/wakeup code
  2016-10-21  9:04 ` [PATCH 2/2] powerpc/64: Fix race condition in setting lock bit in idle/wakeup code Paul Mackerras
  2016-10-25 11:46   ` Gautham R Shenoy
@ 2016-10-26 10:21   ` Michael Ellerman
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2016-10-26 10:21 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev; +Cc: Shreyas B. Prabhu

On Fri, 2016-21-10 at 09:04:17 UTC, Paul Mackerras wrote:
> This fixes a race condition where one thread that is entering or
> leaving a power-saving state can inadvertently ignore the lock bit
> that was set by another thread, and potentially also clear it.
> The core_idle_lock_held function is called when the lock bit is
> seen to be set.  It polls the lock bit until it is clear, then
> does a lwarx to load the word containing the lock bit and thread
> idle bits so it can be updated.  However, it is possible that the
> value loaded with the lwarx has the lock bit set, even though an
> immediately preceding lwz loaded a value with the lock bit clear.
> If this happens then we go ahead and update the word despite the
> lock bit being set, and when called from pnv_enter_arch207_idle_mode,
> we will subsequently clear the lock bit.
> 
> No identifiable misbehaviour has been attributed to this race.
> 
> This fixes it by checking the lock bit in the value loaded by the
> lwarx.  If it is set then we just go back and keep on polling.
> 
> Fixes: b32aadc1a8ed
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

Applied to powerpc fixes, thanks.

I added: Cc: stable@vger.kernel.org # v4.2+

https://git.kernel.org/powerpc/c/09b7e37b18eecc1e347f4b1a3bc863

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-10-26 10:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-21  9:03 [PATCH 1/2] powerpc/64: Re-fix race condition between going idle and entering guest Paul Mackerras
2016-10-21  9:04 ` [PATCH 2/2] powerpc/64: Fix race condition in setting lock bit in idle/wakeup code Paul Mackerras
2016-10-25 11:46   ` Gautham R Shenoy
2016-10-26 10:21   ` [2/2] " Michael Ellerman
2016-10-21 12:32 ` [PATCH 1/2] powerpc/64: Re-fix race condition between going idle and entering guest Shreyas B. Prabhu
2016-10-25 10:24 ` Gautham R Shenoy
2016-10-26 10:21 ` [1/2] " Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.