linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] powerpc: Fix partition migration hang under load
@ 2009-01-29 23:23 Brian King
  2009-01-30  0:38 ` Nathan Lynch
  0 siblings, 1 reply; 4+ messages in thread
From: Brian King @ 2009-01-29 23:23 UTC (permalink / raw)
  To: benh; +Cc: brking, linuxppc-dev


While testing partition migration with heavy CPU load using
shared processors, it was observed that sometimes the migration
would never complete and would appear to hang. Currently, the
migration code assumes that if H_SUCCESS is returned from the H_JOIN
then the migration is complete and the processor is waking up on
the target system. If there was an outstanding PROD to the processor
when the H_JOIN is called, however, it will return H_SUCCESS on the source
system, causing the migration to hang, or in some scenarios cause
the kernel to crash on the complete call waking the caller
of rtas_percpu_suspend_me. Fix this by calling H_JOIN multiple times
if necessary during the migration.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---

 arch/powerpc/kernel/rtas.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff -puN arch/powerpc/kernel/rtas.c~powerpc_migration_hang_fix arch/powerpc/kernel/rtas.c
--- linux-2.6/arch/powerpc/kernel/rtas.c~powerpc_migration_hang_fix	2009-01-29 17:19:58.000000000 -0600
+++ linux-2.6-bjking1/arch/powerpc/kernel/rtas.c	2009-01-29 17:19:58.000000000 -0600
@@ -46,6 +46,7 @@ EXPORT_SYMBOL(rtas);
 
 struct rtas_suspend_me_data {
 	atomic_t working; /* number of cpus accessing this struct */
+	atomic_t done;
 	int token; /* ibm,suspend-me */
 	int error;
 	struct completion *complete; /* wait on this until working == 0 */
@@ -689,7 +690,7 @@ static int ibm_suspend_me_token = RTAS_U
 #ifdef CONFIG_PPC_PSERIES
 static void rtas_percpu_suspend_me(void *info)
 {
-	long rc;
+	long rc = H_SUCCESS;
 	unsigned long msr_save;
 	int cpu;
 	struct rtas_suspend_me_data *data =
@@ -701,7 +702,8 @@ static void rtas_percpu_suspend_me(void 
 	msr_save = mfmsr();
 	mtmsr(msr_save & ~(MSR_EE));
 
-	rc = plpar_hcall_norets(H_JOIN);
+	while (rc == H_SUCCESS && !atomic_read(&data->done))
+		rc = plpar_hcall_norets(H_JOIN);
 
 	mtmsr(msr_save);
 
@@ -724,6 +726,9 @@ static void rtas_percpu_suspend_me(void 
 		       smp_processor_id(), rc);
 		data->error = rc;
 	}
+
+	atomic_set(&data->done, 1);
+
 	/* This cpu did the suspend or got an error; in either case,
 	 * we need to prod all other other cpus out of join state.
 	 * Extra prods are harmless.
@@ -766,6 +771,7 @@ static int rtas_ibm_suspend_me(struct rt
 	}
 
 	atomic_set(&data.working, 0);
+	atomic_set(&data.done, 0);
 	data.token = rtas_token("ibm,suspend-me");
 	data.error = 0;
 	data.complete = &done;
_

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] powerpc: Fix partition migration hang under load
  2009-01-29 23:23 [PATCH 1/1] powerpc: Fix partition migration hang under load Brian King
@ 2009-01-30  0:38 ` Nathan Lynch
  2009-01-30 14:08   ` Brian King
  0 siblings, 1 reply; 4+ messages in thread
From: Nathan Lynch @ 2009-01-30  0:38 UTC (permalink / raw)
  To: Brian King; +Cc: linuxppc-dev

Brian King wrote:
> 
> While testing partition migration with heavy CPU load using
> shared processors, it was observed that sometimes the migration
> would never complete and would appear to hang. Currently, the
> migration code assumes that if H_SUCCESS is returned from the H_JOIN
> then the migration is complete and the processor is waking up on
> the target system. If there was an outstanding PROD to the processor
> when the H_JOIN is called, however, it will return H_SUCCESS on the source
> system

Hmm, did you determine where that outstanding H_PROD is coming from?
AFAICT this is the only code which uses that hcall, and all processors
should have "consumed" their prods from one migration before another
migration can commence.

Regardless, ACK -- if we were to add another H_PROD call site (or if
there's one I missed) this would be necessary anyway.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/1] powerpc: Fix partition migration hang under load
  2009-01-30  0:38 ` Nathan Lynch
@ 2009-01-30 14:08   ` Brian King
  0 siblings, 0 replies; 4+ messages in thread
From: Brian King @ 2009-01-30 14:08 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: linuxppc-dev

Nathan Lynch wrote:
> Brian King wrote:
>> While testing partition migration with heavy CPU load using
>> shared processors, it was observed that sometimes the migration
>> would never complete and would appear to hang. Currently, the
>> migration code assumes that if H_SUCCESS is returned from the H_JOIN
>> then the migration is complete and the processor is waking up on
>> the target system. If there was an outstanding PROD to the processor
>> when the H_JOIN is called, however, it will return H_SUCCESS on the source
>> system
> 
> Hmm, did you determine where that outstanding H_PROD is coming from?
> AFAICT this is the only code which uses that hcall, and all processors
> should have "consumed" their prods from one migration before another
> migration can commence.

Not for certain. After a successful migration we PROD all the processors,
including the one doing all the PRODs. Not sure if this is where the
PROD was coming from that was causing the migration hang or not. The failing
testcase involved keeping the CPUs extremely busy and migrating back and
forth between two systems. 

-Brian

-- 
Brian King
Linux on Power Virtualization
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/1] powerpc: Fix partition migration hang under load
@ 2009-02-17 16:49 Brian King
  0 siblings, 0 replies; 4+ messages in thread
From: Brian King @ 2009-02-17 16:49 UTC (permalink / raw)
  To: benh; +Cc: brking, linuxppc-dev


While testing partition migration with heavy CPU load using
shared processors, it was observed that sometimes the migration
would never complete and would appear to hang. Currently, the
migration code assumes that if H_SUCCESS is returned from the H_JOIN
then the migration is complete and the processor is waking up on
the target system. If there was an outstanding PROD to the processor
when the H_JOIN is called, however, it will return H_SUCCESS on the source
system, causing the migration to hang, or in some scenarios cause
the kernel to crash on the complete call waking the caller
of rtas_percpu_suspend_me. Fix this by calling H_JOIN multiple times
if necessary during the migration.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---

 arch/powerpc/kernel/rtas.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff -puN arch/powerpc/kernel/rtas.c~powerpc_migration_hang_fix arch/powerpc/kernel/rtas.c
--- linux-2.6/arch/powerpc/kernel/rtas.c~powerpc_migration_hang_fix	2009-01-29 17:19:58.000000000 -0600
+++ linux-2.6-bjking1/arch/powerpc/kernel/rtas.c	2009-01-29 17:19:58.000000000 -0600
@@ -46,6 +46,7 @@ EXPORT_SYMBOL(rtas);
 
 struct rtas_suspend_me_data {
 	atomic_t working; /* number of cpus accessing this struct */
+	atomic_t done;
 	int token; /* ibm,suspend-me */
 	int error;
 	struct completion *complete; /* wait on this until working == 0 */
@@ -689,7 +690,7 @@ static int ibm_suspend_me_token = RTAS_U
 #ifdef CONFIG_PPC_PSERIES
 static void rtas_percpu_suspend_me(void *info)
 {
-	long rc;
+	long rc = H_SUCCESS;
 	unsigned long msr_save;
 	int cpu;
 	struct rtas_suspend_me_data *data =
@@ -701,7 +702,8 @@ static void rtas_percpu_suspend_me(void 
 	msr_save = mfmsr();
 	mtmsr(msr_save & ~(MSR_EE));
 
-	rc = plpar_hcall_norets(H_JOIN);
+	while (rc == H_SUCCESS && !atomic_read(&data->done))
+		rc = plpar_hcall_norets(H_JOIN);
 
 	mtmsr(msr_save);
 
@@ -724,6 +726,9 @@ static void rtas_percpu_suspend_me(void 
 		       smp_processor_id(), rc);
 		data->error = rc;
 	}
+
+	atomic_set(&data->done, 1);
+
 	/* This cpu did the suspend or got an error; in either case,
 	 * we need to prod all other other cpus out of join state.
 	 * Extra prods are harmless.
@@ -766,6 +771,7 @@ static int rtas_ibm_suspend_me(struct rt
 	}
 
 	atomic_set(&data.working, 0);
+	atomic_set(&data.done, 0);
 	data.token = rtas_token("ibm,suspend-me");
 	data.error = 0;
 	data.complete = &done;
_

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-02-17 16:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-29 23:23 [PATCH 1/1] powerpc: Fix partition migration hang under load Brian King
2009-01-30  0:38 ` Nathan Lynch
2009-01-30 14:08   ` Brian King
2009-02-17 16:49 Brian King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).