All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code
@ 2015-02-28  2:24 Tyrel Datwyler
  2015-02-28  2:24 ` [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration Tyrel Datwyler
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Tyrel Datwyler @ 2015-02-28  2:24 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Tyrel Datwyler, cyrilbur, nfont

This patchset simplifies the usage of rtas_ibm_suspend_me() by removing an
extraneous function parameter, fixes device tree updating on little endian
platforms, and adds a mechanism for informing drmgr that the kernel is cabable of
performing the whole migration including device tree update itself.

Tyrel Datwyler (3):
  powerpc/pseries: Simplify check for suspendability during
    suspend/migration
  powerpc/pseries: Little endian fixes for post mobility device tree
    update
  powerpc/pseries: Expose post-migration in kernel device tree update
    to drmgr

 arch/powerpc/include/asm/rtas.h           |  2 +-
 arch/powerpc/kernel/rtas.c                | 15 ++++-----
 arch/powerpc/platforms/pseries/mobility.c | 55 ++++++++++++++++++-------------
 3 files changed, 40 insertions(+), 32 deletions(-)

-- 
1.7.12.2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration
  2015-02-28  2:24 [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code Tyrel Datwyler
@ 2015-02-28  2:24 ` Tyrel Datwyler
  2015-03-02  4:19   ` Cyril Bur
  2015-02-28  2:24 ` [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update Tyrel Datwyler
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Tyrel Datwyler @ 2015-02-28  2:24 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Tyrel Datwyler, cyrilbur, nfont

During suspend/migration operation we must wait for the VASI state reported
by the hypervisor to become Suspending prior to making the ibm,suspend-me
RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state variable
that exposes the VASI state to the caller. This is unnecessary as the caller
only really cares about the following three conditions; if there is an error
we should bailout, success indicating we have suspended and woken back up so
proceed to device tree updated, or we are not suspendable yet so try calling
rtas_ibm_suspend_me again shortly.

This patch removes the extraneous vasi_state variable and simply uses the
return code to communicate how to proceed. We either succeed, fail, or get
-EAGAIN in which case we sleep for a second before trying to call
rtas_ibm_suspend_me again.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/rtas.h           |  2 +-
 arch/powerpc/kernel/rtas.c                | 15 +++++++--------
 arch/powerpc/platforms/pseries/mobility.c |  8 +++-----
 3 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 2e23e92..fc85eb0 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -327,7 +327,7 @@ extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
 extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
 extern int rtas_online_cpus_mask(cpumask_var_t cpus);
 extern int rtas_offline_cpus_mask(cpumask_var_t cpus);
-extern int rtas_ibm_suspend_me(u64 handle, int *vasi_return);
+extern int rtas_ibm_suspend_me(u64 handle);
 
 struct rtc_time;
 extern unsigned long rtas_get_boot_time(void);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 21c45a2..603b928 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -897,7 +897,7 @@ int rtas_offline_cpus_mask(cpumask_var_t cpus)
 }
 EXPORT_SYMBOL(rtas_offline_cpus_mask);
 
-int rtas_ibm_suspend_me(u64 handle, int *vasi_return)
+int rtas_ibm_suspend_me(u64 handle)
 {
 	long state;
 	long rc;
@@ -919,13 +919,11 @@ int rtas_ibm_suspend_me(u64 handle, int *vasi_return)
 		printk(KERN_ERR "rtas_ibm_suspend_me: vasi_state returned %ld\n",rc);
 		return rc;
 	} else if (state == H_VASI_ENABLED) {
-		*vasi_return = RTAS_NOT_SUSPENDABLE;
-		return 0;
+		return -EAGAIN;
 	} else if (state != H_VASI_SUSPENDING) {
 		printk(KERN_ERR "rtas_ibm_suspend_me: vasi_state returned state %ld\n",
 		       state);
-		*vasi_return = -1;
-		return 0;
+		return -EIO;
 	}
 
 	if (!alloc_cpumask_var(&offline_mask, GFP_TEMPORARY))
@@ -1060,9 +1058,10 @@ asmlinkage int ppc_rtas(struct rtas_args __user *uargs)
 		int vasi_rc = 0;
 		u64 handle = ((u64)be32_to_cpu(args.args[0]) << 32)
 		              | be32_to_cpu(args.args[1]);
-		rc = rtas_ibm_suspend_me(handle, &vasi_rc);
-		args.rets[0] = cpu_to_be32(vasi_rc);
-		if (rc)
+		rc = rtas_ibm_suspend_me(handle);
+		if (rc == -EAGAIN)
+			args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE);
+		else if (rc)
 			return rc;
 		goto copy_return;
 	}
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 90cf3dc..29e4f04 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -325,15 +325,13 @@ static ssize_t migrate_store(struct class *class, struct class_attribute *attr,
 		return rc;
 
 	do {
-		rc = rtas_ibm_suspend_me(streamid, &vasi_rc);
-		if (!rc && vasi_rc == RTAS_NOT_SUSPENDABLE)
+		rc = rtas_ibm_suspend_me(streamid);
+		if (rc == -EAGAIN)
 			ssleep(1);
-	} while (!rc && vasi_rc == RTAS_NOT_SUSPENDABLE);
+	} while (rc == -EAGAIN);
 
 	if (rc)
 		return rc;
-	if (vasi_rc)
-		return vasi_rc;
 
 	post_mobility_fixup();
 	return count;
-- 
1.7.12.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update
  2015-02-28  2:24 [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code Tyrel Datwyler
  2015-02-28  2:24 ` [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration Tyrel Datwyler
@ 2015-02-28  2:24 ` Tyrel Datwyler
  2015-03-02  5:20   ` Cyril Bur
  2015-02-28  2:24 ` [PATCH 3/3] powerpc/pseries: Expose post-migration in kernel device tree update to drmgr Tyrel Datwyler
  2015-03-03  6:10 ` [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code Michael Ellerman
  3 siblings, 1 reply; 18+ messages in thread
From: Tyrel Datwyler @ 2015-02-28  2:24 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Tyrel Datwyler, cyrilbur, nfont

We currently use the device tree update code in the kernel after resuming
from a suspend operation to re-sync the kernels view of the device tree with
that of the hypervisor. The code as it stands is not endian safe as it relies
on parsing buffers returned by RTAS calls that thusly contains data in big
endian format.

This patch annotates variables and structure members with __be types as well
as performing necessary byte swaps to cpu endian for data that needs to be
parsed.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 36 ++++++++++++++++---------------
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 29e4f04..0b1f70e 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -25,10 +25,10 @@
 static struct kobject *mobility_kobj;
 
 struct update_props_workarea {
-	u32 phandle;
-	u32 state;
-	u64 reserved;
-	u32 nprops;
+	__be32 phandle;
+	__be32 state;
+	__be64 reserved;
+	__be32 nprops;
 } __packed;
 
 #define NODE_ACTION_MASK	0xff000000
@@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, struct property **prop,
 	return 0;
 }
 
-static int update_dt_node(u32 phandle, s32 scope)
+static int update_dt_node(__be32 phandle, s32 scope)
 {
 	struct update_props_workarea *upwa;
 	struct device_node *dn;
@@ -136,6 +136,7 @@ static int update_dt_node(u32 phandle, s32 scope)
 	char *prop_data;
 	char *rtas_buf;
 	int update_properties_token;
+	u32 nprops;
 	u32 vd;
 
 	update_properties_token = rtas_token("ibm,update-properties");
@@ -162,6 +163,7 @@ static int update_dt_node(u32 phandle, s32 scope)
 			break;
 
 		prop_data = rtas_buf + sizeof(*upwa);
+		nprops = be32_to_cpu(upwa->nprops);
 
 		/* On the first call to ibm,update-properties for a node the
 		 * the first property value descriptor contains an empty
@@ -170,17 +172,17 @@ static int update_dt_node(u32 phandle, s32 scope)
 		 */
 		if (*prop_data == 0) {
 			prop_data++;
-			vd = *(u32 *)prop_data;
+			vd = be32_to_cpu(*(__be32 *)prop_data);
 			prop_data += vd + sizeof(vd);
-			upwa->nprops--;
+			nprops--;
 		}
 
-		for (i = 0; i < upwa->nprops; i++) {
+		for (i = 0; i < nprops; i++) {
 			char *prop_name;
 
 			prop_name = prop_data;
 			prop_data += strlen(prop_name) + 1;
-			vd = *(u32 *)prop_data;
+			vd = be32_to_cpu(*(__be32 *)prop_data);
 			prop_data += sizeof(vd);
 
 			switch (vd) {
@@ -212,7 +214,7 @@ static int update_dt_node(u32 phandle, s32 scope)
 	return 0;
 }
 
-static int add_dt_node(u32 parent_phandle, u32 drc_index)
+static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
 {
 	struct device_node *dn;
 	struct device_node *parent_dn;
@@ -237,7 +239,7 @@ static int add_dt_node(u32 parent_phandle, u32 drc_index)
 int pseries_devicetree_update(s32 scope)
 {
 	char *rtas_buf;
-	u32 *data;
+	__be32 *data;
 	int update_nodes_token;
 	int rc;
 
@@ -254,17 +256,17 @@ int pseries_devicetree_update(s32 scope)
 		if (rc && rc != 1)
 			break;
 
-		data = (u32 *)rtas_buf + 4;
-		while (*data & NODE_ACTION_MASK) {
+		data = (__be32 *)rtas_buf + 4;
+		while (be32_to_cpu(*data) & NODE_ACTION_MASK) {
 			int i;
-			u32 action = *data & NODE_ACTION_MASK;
-			int node_count = *data & NODE_COUNT_MASK;
+			u32 action = be32_to_cpu(*data) & NODE_ACTION_MASK;
+			u32 node_count = be32_to_cpu(*data) & NODE_COUNT_MASK;
 
 			data++;
 
 			for (i = 0; i < node_count; i++) {
-				u32 phandle = *data++;
-				u32 drc_index;
+				__be32 phandle = *data++;
+				__be32 drc_index;
 
 				switch (action) {
 				case DELETE_DT_NODE:
-- 
1.7.12.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] powerpc/pseries: Expose post-migration in kernel device tree update to drmgr
  2015-02-28  2:24 [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code Tyrel Datwyler
  2015-02-28  2:24 ` [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration Tyrel Datwyler
  2015-02-28  2:24 ` [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update Tyrel Datwyler
@ 2015-02-28  2:24 ` Tyrel Datwyler
  2015-03-03  6:24   ` Michael Ellerman
  2015-03-03  6:10 ` [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code Michael Ellerman
  3 siblings, 1 reply; 18+ messages in thread
From: Tyrel Datwyler @ 2015-02-28  2:24 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Tyrel Datwyler, cyrilbur, nfont

Traditionally after a migration operation drmgr has coordinated the device tree
update with the kernel in userspace via the ugly /proc/ppc64/ofdt interface. This
can be better done fully in the kernel where support already exists. Currently,
drmgr makes a faux ibm,suspend-me RTAS call which we intercept in the kernel so
that we can check VASI state for suspendability. After the LPAR resumes and
returns to drmgr that is followed by the necessary update-nodes and
update-properties RTAS calls which are parsed and communitated back to the kernel
through /proc/ppc64/ofdt for the device tree update. The drmgr tool should
instead initiate the migration using the already existing
/sysfs/kernel/mobility/migration entry that performs all this work in the kernel.

This patch adds a show function to the sysfs "migration" attribute that returns
1 to indicate the kernel will perform the device tree update after a migration
operation and that drmgr should initiated the migration through the sysfs
"migration" attribute.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/mobility.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index 0b1f70e..a689f74 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -40,6 +40,9 @@ struct update_props_workarea {
 
 #define MIGRATION_SCOPE	(1)
 
+#define USER_DT_UPDATE  0
+#define KERN_DT_UPDATE  1
+
 static int mobility_rtas_call(int token, char *buf, s32 scope)
 {
 	int rc;
@@ -339,7 +342,13 @@ static ssize_t migrate_store(struct class *class, struct class_attribute *attr,
 	return count;
 }
 
-static CLASS_ATTR(migration, S_IWUSR, NULL, migrate_store);
+static ssize_t migrate_show(struct class *class, struct class_attribute *attr,
+			    char *buf)
+{
+	return sprintf(buf, "%d\n", KERN_DT_UPDATE);
+}
+
+static CLASS_ATTR(migration, S_IWUSR | S_IRUGO, migrate_show, migrate_store);
 
 static int __init mobility_sysfs_init(void)
 {
-- 
1.7.12.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration
  2015-02-28  2:24 ` [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration Tyrel Datwyler
@ 2015-03-02  4:19   ` Cyril Bur
  2015-03-02 21:30     ` Tyrel Datwyler
  0 siblings, 1 reply; 18+ messages in thread
From: Cyril Bur @ 2015-03-02  4:19 UTC (permalink / raw)
  To: Tyrel Datwyler; +Cc: linuxppc-dev, nfont

On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
> During suspend/migration operation we must wait for the VASI state reported
> by the hypervisor to become Suspending prior to making the ibm,suspend-me
> RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state variable
> that exposes the VASI state to the caller. This is unnecessary as the caller
> only really cares about the following three conditions; if there is an error
> we should bailout, success indicating we have suspended and woken back up so
> proceed to device tree updated, or we are not suspendable yet so try calling
> rtas_ibm_suspend_me again shortly.
> 
> This patch removes the extraneous vasi_state variable and simply uses the
> return code to communicate how to proceed. We either succeed, fail, or get
> -EAGAIN in which case we sleep for a second before trying to call
> rtas_ibm_suspend_me again.
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/rtas.h           |  2 +-
>  arch/powerpc/kernel/rtas.c                | 15 +++++++--------
>  arch/powerpc/platforms/pseries/mobility.c |  8 +++-----
>  3 files changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
> index 2e23e92..fc85eb0 100644
> --- a/arch/powerpc/include/asm/rtas.h
> +++ b/arch/powerpc/include/asm/rtas.h
> @@ -327,7 +327,7 @@ extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
>  extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
>  extern int rtas_online_cpus_mask(cpumask_var_t cpus);
>  extern int rtas_offline_cpus_mask(cpumask_var_t cpus);
> -extern int rtas_ibm_suspend_me(u64 handle, int *vasi_return);
> +extern int rtas_ibm_suspend_me(u64 handle);
>  
I like ditching vasi_return, I was never happy with myself for doing
that!

>  struct rtc_time;
>  extern unsigned long rtas_get_boot_time(void);
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 21c45a2..603b928 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -897,7 +897,7 @@ int rtas_offline_cpus_mask(cpumask_var_t cpus)
>  }
>  EXPORT_SYMBOL(rtas_offline_cpus_mask);
>  
> -int rtas_ibm_suspend_me(u64 handle, int *vasi_return)
> +int rtas_ibm_suspend_me(u64 handle)

That definition is actually in an #ifdef CONFIG_PPC_PSERIES, you'll need
to change the definition for !CONFIG_PPC_PSERIES
>  {
>  	long state;
>  	long rc;
> @@ -919,13 +919,11 @@ int rtas_ibm_suspend_me(u64 handle, int *vasi_return)
>  		printk(KERN_ERR "rtas_ibm_suspend_me: vasi_state returned %ld\n",rc);
>  		return rc;
>  	} else if (state == H_VASI_ENABLED) {
> -		*vasi_return = RTAS_NOT_SUSPENDABLE;
> -		return 0;
> +		return -EAGAIN;
>  	} else if (state != H_VASI_SUSPENDING) {
>  		printk(KERN_ERR "rtas_ibm_suspend_me: vasi_state returned state %ld\n",
>  		       state);
> -		*vasi_return = -1;
> -		return 0;
> +		return -EIO;

I've had a look as to how these return values get passed back up the
stack and admittedly were dealing with a confusing mess, I've compared
back to before my patch (which wasn't perfect either it seems).
Both the state == H_VASI_ENABLED and state == H_VASI_SUSPENDING cause
ppc_rtas to go to the copy_return and return 0 (albeit with an error
code in args.rets[0]), because rtas_ppc goes back to out userland, I
hesitate to change any of that.
>  	}
>  
>  	if (!alloc_cpumask_var(&offline_mask, GFP_TEMPORARY))
> @@ -1060,9 +1058,10 @@ asmlinkage int ppc_rtas(struct rtas_args __user *uargs)
>  		int vasi_rc = 0;

This generates unused variable warning.

>  		u64 handle = ((u64)be32_to_cpu(args.args[0]) << 32)
>  		              | be32_to_cpu(args.args[1]);
> -		rc = rtas_ibm_suspend_me(handle, &vasi_rc);
> -		args.rets[0] = cpu_to_be32(vasi_rc);
> -		if (rc)
> +		rc = rtas_ibm_suspend_me(handle);
> +		if (rc == -EAGAIN)
> +			args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE);

(continuing on...) so perhaps here have
	rc = 0;
else if (rc == -EIO)
	args.rets[0] = cpu_to_be32(-1);
	rc = 0;
Which should keep the original behaviour, the last thing we want to do
is break BE.

Might be worth checking that rc from rtas_ibm_suspend_me will only be
-EAGAIN and -EIO when they are explicitly set in rtas_ibm_suspend_me and
can't come back out from the hcall.
>From reading PAPR we're ok there but just as a thought it might be worth
returning errno as positive because hcall errors are going to be
negative, to make life easier at some point... but then we'll have to
remember to make them negative when going back to userland (and there
are two places...) so there's no perfect win here.

> +		else if (rc)
>  			return rc;
>  		goto copy_return;
>  	}
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index 90cf3dc..29e4f04 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -325,15 +325,13 @@ static ssize_t migrate_store(struct class *class, struct class_attribute *attr,
>  		return rc;
>  
>  	do {
> -		rc = rtas_ibm_suspend_me(streamid, &vasi_rc);
> -		if (!rc && vasi_rc == RTAS_NOT_SUSPENDABLE)
> +		rc = rtas_ibm_suspend_me(streamid);
> +		if (rc == -EAGAIN)
>  			ssleep(1);
> -	} while (!rc && vasi_rc == RTAS_NOT_SUSPENDABLE);
> +	} while (rc == -EAGAIN);

This is going to change the value of the error code.
>  
>  	if (rc)
>  		return rc;
> -	if (vasi_rc)
> -		return vasi_rc;
>  
>  	post_mobility_fixup();
>  	return count;

Thanks for taking it, it looks nicer now.

Cyril

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update
  2015-02-28  2:24 ` [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update Tyrel Datwyler
@ 2015-03-02  5:20   ` Cyril Bur
  2015-03-02 21:49     ` Tyrel Datwyler
  0 siblings, 1 reply; 18+ messages in thread
From: Cyril Bur @ 2015-03-02  5:20 UTC (permalink / raw)
  To: Tyrel Datwyler; +Cc: linuxppc-dev, nfont

On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
> We currently use the device tree update code in the kernel after resuming
> from a suspend operation to re-sync the kernels view of the device tree with
> that of the hypervisor. The code as it stands is not endian safe as it relies
> on parsing buffers returned by RTAS calls that thusly contains data in big
> endian format.
> 
> This patch annotates variables and structure members with __be types as well
> as performing necessary byte swaps to cpu endian for data that needs to be
> parsed.
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/mobility.c | 36 ++++++++++++++++---------------
>  1 file changed, 19 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index 29e4f04..0b1f70e 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -25,10 +25,10 @@
>  static struct kobject *mobility_kobj;
>  
>  struct update_props_workarea {
> -	u32 phandle;
> -	u32 state;
> -	u64 reserved;
> -	u32 nprops;
> +	__be32 phandle;
> +	__be32 state;
> +	__be64 reserved;
> +	__be32 nprops;
>  } __packed;
>  
>  #define NODE_ACTION_MASK	0xff000000
> @@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, struct property **prop,
>  	return 0;
>  }
>  
> -static int update_dt_node(u32 phandle, s32 scope)
> +static int update_dt_node(__be32 phandle, s32 scope)
>  {

On line 153 of this function:
   dn = of_find_node_by_phandle(phandle);

You're passing a __be32 to device tree code, if we can treat the phandle
as a opaque value returned to us from the rtas call and pass it around
like that then all good.
Its also hard to be sure if these need to be BE and have always been
that way because we've always run BE so they've never actually wanted
CPU endian its just that CPU endian has always been BE (I think I
started rambling...)

Just want to check that *not* converting them is done on purpose.

And having read on, I'm assuming the answer is yes since this
observation is true for your changes which affect:
	delete_dt_node()
	update_dt_node()
        add_dt_node()
Worth noting that you didn't change the definition of delete_dt_node()

I'll have a look once you address the non compiling in patch 1/3 (I'm
getting blocked the unused var because somehow Werror is on, odd it
didn't trip you up) but I also suspect this will have sparse go a bit
nuts. 
I wonder if there is a nice way of shutting sparse up.

>  	struct update_props_workarea *upwa;
>  	struct device_node *dn;
> @@ -136,6 +136,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>  	char *prop_data;
>  	char *rtas_buf;
>  	int update_properties_token;
> +	u32 nprops;
>  	u32 vd;
>  
>  	update_properties_token = rtas_token("ibm,update-properties");
> @@ -162,6 +163,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>  			break;
>  
>  		prop_data = rtas_buf + sizeof(*upwa);
> +		nprops = be32_to_cpu(upwa->nprops);
>  
>  		/* On the first call to ibm,update-properties for a node the
>  		 * the first property value descriptor contains an empty
> @@ -170,17 +172,17 @@ static int update_dt_node(u32 phandle, s32 scope)
>  		 */
>  		if (*prop_data == 0) {
>  			prop_data++;
> -			vd = *(u32 *)prop_data;
> +			vd = be32_to_cpu(*(__be32 *)prop_data);
>  			prop_data += vd + sizeof(vd);
> -			upwa->nprops--;
> +			nprops--;
>  		}
>  
> -		for (i = 0; i < upwa->nprops; i++) {
> +		for (i = 0; i < nprops; i++) {
>  			char *prop_name;
>  
>  			prop_name = prop_data;
>  			prop_data += strlen(prop_name) + 1;
> -			vd = *(u32 *)prop_data;
> +			vd = be32_to_cpu(*(__be32 *)prop_data);
>  			prop_data += sizeof(vd);
>  
>  			switch (vd) {
> @@ -212,7 +214,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>  	return 0;
>  }
>  
> -static int add_dt_node(u32 parent_phandle, u32 drc_index)
> +static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
>  {
>  	struct device_node *dn;
>  	struct device_node *parent_dn;
> @@ -237,7 +239,7 @@ static int add_dt_node(u32 parent_phandle, u32 drc_index)
>  int pseries_devicetree_update(s32 scope)
>  {
>  	char *rtas_buf;
> -	u32 *data;
> +	__be32 *data;
>  	int update_nodes_token;
>  	int rc;
>  
> @@ -254,17 +256,17 @@ int pseries_devicetree_update(s32 scope)
>  		if (rc && rc != 1)
>  			break;
>  
> -		data = (u32 *)rtas_buf + 4;
> -		while (*data & NODE_ACTION_MASK) {
> +		data = (__be32 *)rtas_buf + 4;
> +		while (be32_to_cpu(*data) & NODE_ACTION_MASK) {
>  			int i;
> -			u32 action = *data & NODE_ACTION_MASK;
> -			int node_count = *data & NODE_COUNT_MASK;
> +			u32 action = be32_to_cpu(*data) & NODE_ACTION_MASK;
> +			u32 node_count = be32_to_cpu(*data) & NODE_COUNT_MASK;
>  
>  			data++;
>  
>  			for (i = 0; i < node_count; i++) {
> -				u32 phandle = *data++;
> -				u32 drc_index;
> +				__be32 phandle = *data++;
> +				__be32 drc_index;
>  
>  				switch (action) {
>  				case DELETE_DT_NODE:

The patch looks good, no nonsense endian fixing. 
Worth noting that it leaves existing bugs in place, which is fine, I'll
rebase my patches which address endian and bugs on top of these so as to
address the bugs.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration
  2015-03-02  4:19   ` Cyril Bur
@ 2015-03-02 21:30     ` Tyrel Datwyler
  2015-03-03  6:15       ` Michael Ellerman
  0 siblings, 1 reply; 18+ messages in thread
From: Tyrel Datwyler @ 2015-03-02 21:30 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev, nfont

On 03/01/2015 08:19 PM, Cyril Bur wrote:
> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
>> During suspend/migration operation we must wait for the VASI state reported
>> by the hypervisor to become Suspending prior to making the ibm,suspend-me
>> RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state variable
>> that exposes the VASI state to the caller. This is unnecessary as the caller
>> only really cares about the following three conditions; if there is an error
>> we should bailout, success indicating we have suspended and woken back up so
>> proceed to device tree updated, or we are not suspendable yet so try calling
>> rtas_ibm_suspend_me again shortly.
>>
>> This patch removes the extraneous vasi_state variable and simply uses the
>> return code to communicate how to proceed. We either succeed, fail, or get
>> -EAGAIN in which case we sleep for a second before trying to call
>> rtas_ibm_suspend_me again.
>>
>> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/rtas.h           |  2 +-
>>  arch/powerpc/kernel/rtas.c                | 15 +++++++--------
>>  arch/powerpc/platforms/pseries/mobility.c |  8 +++-----
>>  3 files changed, 11 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
>> index 2e23e92..fc85eb0 100644
>> --- a/arch/powerpc/include/asm/rtas.h
>> +++ b/arch/powerpc/include/asm/rtas.h
>> @@ -327,7 +327,7 @@ extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data);
>>  extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
>>  extern int rtas_online_cpus_mask(cpumask_var_t cpus);
>>  extern int rtas_offline_cpus_mask(cpumask_var_t cpus);
>> -extern int rtas_ibm_suspend_me(u64 handle, int *vasi_return);
>> +extern int rtas_ibm_suspend_me(u64 handle);
>>  
> I like ditching vasi_return, I was never happy with myself for doing
> that!
> 
>>  struct rtc_time;
>>  extern unsigned long rtas_get_boot_time(void);
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 21c45a2..603b928 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -897,7 +897,7 @@ int rtas_offline_cpus_mask(cpumask_var_t cpus)
>>  }
>>  EXPORT_SYMBOL(rtas_offline_cpus_mask);
>>  
>> -int rtas_ibm_suspend_me(u64 handle, int *vasi_return)
>> +int rtas_ibm_suspend_me(u64 handle)
> 
> That definition is actually in an #ifdef CONFIG_PPC_PSERIES, you'll need
> to change the definition for !CONFIG_PPC_PSERIES

Good catch. I'll fix it there too.

>>  {
>>  	long state;
>>  	long rc;
>> @@ -919,13 +919,11 @@ int rtas_ibm_suspend_me(u64 handle, int *vasi_return)
>>  		printk(KERN_ERR "rtas_ibm_suspend_me: vasi_state returned %ld\n",rc);
>>  		return rc;
>>  	} else if (state == H_VASI_ENABLED) {
>> -		*vasi_return = RTAS_NOT_SUSPENDABLE;
>> -		return 0;
>> +		return -EAGAIN;
>>  	} else if (state != H_VASI_SUSPENDING) {
>>  		printk(KERN_ERR "rtas_ibm_suspend_me: vasi_state returned state %ld\n",
>>  		       state);
>> -		*vasi_return = -1;
>> -		return 0;
>> +		return -EIO;
> 
> I've had a look as to how these return values get passed back up the
> stack and admittedly were dealing with a confusing mess, I've compared
> back to before my patch (which wasn't perfect either it seems).
> Both the state == H_VASI_ENABLED and state == H_VASI_SUSPENDING cause
> ppc_rtas to go to the copy_return and return 0 (albeit with an error
> code in args.rets[0]), because rtas_ppc goes back to out userland, I
> hesitate to change any of that.

Agreed, that this is a bit of a mess. The problem is we have two call
paths into rtas_ibm_suspend_me(). The one from migrate_store() and one
from ppc_rtas(). I'll address each with your other comments below.

>>  	}
>>  
>>  	if (!alloc_cpumask_var(&offline_mask, GFP_TEMPORARY))
>> @@ -1060,9 +1058,10 @@ asmlinkage int ppc_rtas(struct rtas_args __user *uargs)
>>  		int vasi_rc = 0;
> 
> This generates unused variable warning.

Sloppy on my part. Will remove.

> 
>>  		u64 handle = ((u64)be32_to_cpu(args.args[0]) << 32)
>>  		              | be32_to_cpu(args.args[1]);
>> -		rc = rtas_ibm_suspend_me(handle, &vasi_rc);
>> -		args.rets[0] = cpu_to_be32(vasi_rc);
>> -		if (rc)
>> +		rc = rtas_ibm_suspend_me(handle);
>> +		if (rc == -EAGAIN)
>> +			args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE);
> 
> (continuing on...) so perhaps here have
> 	rc = 0;
> else if (rc == -EIO)
> 	args.rets[0] = cpu_to_be32(-1);
> 	rc = 0;
> Which should keep the original behaviour, the last thing we want to do
> is break BE.

The biggest problem here is we are making what basically equates to a
fake rtas call from drmgr which we intercept in ppc_rtas(). From there
we make this special call to rtas_ibm_suspend_me() to check VASI state
and do a bunch of other specialized work that needs to be setup prior to
making the actual ibm,suspend-me rtas call. Since, we are cheating PAPR
here I guess we can really handle it however we want. I chose to simply
fail the rtas call in the case where rtas_ibm_suspend_me() fails with
something other than -EAGAIN. In user space librtas will log errno for
the failure and return RTAS_IO_ASSERT to drmgr which in turn will log
that error and fail.

Going forward we want to move drmgr to initiating migration through
sysfs and not this clunky highway robbery of the rtas interface. So, for
legacy purpose does it matter how we fail the call here? I'm open to
either solution. If we choose to pass the error back through args.ret[0]
what value do we choose? The following are all pretty standardized, but
I don't think make sense here:

-1: Hardware error
-2: Busy
-3: Parameter error
9000: Suspension Aborted

The 9000 code maybe makes sense, but doesn't really convey that
something bad a happened. In the end whatever value is passed in
args.ret[0] drmgr will simply log.

While I agree about not breaking BE I'm not sure how it would. All i've
done is added the -EIO case to explicit failure.

> 
> Might be worth checking that rc from rtas_ibm_suspend_me will only be
> -EAGAIN and -EIO when they are explicitly set in rtas_ibm_suspend_me and
> can't come back out from the hcall.
> From reading PAPR we're ok there but just as a thought it might be worth
> returning errno as positive because hcall errors are going to be
> negative, to make life easier at some point... but then we'll have to
> remember to make them negative when going back to userland (and there
> are two places...) so there's no perfect win here.
> 

There are a variety of things that could go wrong that aren't directly
related to rtas. This is why I chose to explicitly fail the rtas call if
we get anything other than 0 or -EAGAIN.

>> +		else if (rc)
>>  			return rc;
>>  		goto copy_return;
>>  	}
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index 90cf3dc..29e4f04 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -325,15 +325,13 @@ static ssize_t migrate_store(struct class *class, struct class_attribute *attr,
>>  		return rc;
>>  
>>  	do {
>> -		rc = rtas_ibm_suspend_me(streamid, &vasi_rc);
>> -		if (!rc && vasi_rc == RTAS_NOT_SUSPENDABLE)
>> +		rc = rtas_ibm_suspend_me(streamid);
>> +		if (rc == -EAGAIN)
>>  			ssleep(1);
>> -	} while (!rc && vasi_rc == RTAS_NOT_SUSPENDABLE);
>> +	} while (rc == -EAGAIN);
> 
> This is going to change the value of the error code.

Here drmgr assumes a zero or greater value to mean success, and anything
negative failure. It logs errno in failure case.

-Tyrel

>>  
>>  	if (rc)
>>  		return rc;
>> -	if (vasi_rc)
>> -		return vasi_rc;
>>  
>>  	post_mobility_fixup();
>>  	return count;
> 
> Thanks for taking it, it looks nicer now.
> 
> Cyril
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update
  2015-03-02  5:20   ` Cyril Bur
@ 2015-03-02 21:49     ` Tyrel Datwyler
  2015-03-03 23:15       ` Tyrel Datwyler
  0 siblings, 1 reply; 18+ messages in thread
From: Tyrel Datwyler @ 2015-03-02 21:49 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev, nfont

On 03/01/2015 09:20 PM, Cyril Bur wrote:
> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
>> We currently use the device tree update code in the kernel after resuming
>> from a suspend operation to re-sync the kernels view of the device tree with
>> that of the hypervisor. The code as it stands is not endian safe as it relies
>> on parsing buffers returned by RTAS calls that thusly contains data in big
>> endian format.
>>
>> This patch annotates variables and structure members with __be types as well
>> as performing necessary byte swaps to cpu endian for data that needs to be
>> parsed.
>>
>> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/mobility.c | 36 ++++++++++++++++---------------
>>  1 file changed, 19 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index 29e4f04..0b1f70e 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -25,10 +25,10 @@
>>  static struct kobject *mobility_kobj;
>>  
>>  struct update_props_workarea {
>> -	u32 phandle;
>> -	u32 state;
>> -	u64 reserved;
>> -	u32 nprops;
>> +	__be32 phandle;
>> +	__be32 state;
>> +	__be64 reserved;
>> +	__be32 nprops;
>>  } __packed;
>>  
>>  #define NODE_ACTION_MASK	0xff000000
>> @@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, struct property **prop,
>>  	return 0;
>>  }
>>  
>> -static int update_dt_node(u32 phandle, s32 scope)
>> +static int update_dt_node(__be32 phandle, s32 scope)
>>  {
> 
> On line 153 of this function:
>    dn = of_find_node_by_phandle(phandle);
> 
> You're passing a __be32 to device tree code, if we can treat the phandle
> as a opaque value returned to us from the rtas call and pass it around
> like that then all good.

Yes, of_find_node_by_phandle directly compares phandle passed in against
the handle stored in each device_node when searching for a matching
node. Since, the device tree is big endian it follows that the big
endian phandle received in the rtas buffer needs no conversion.

Further, we need to pass the phandle to ibm,update-properties in the
work area which is also required to be big endian. So, again it seemed
that converting to cpu endian was a waste of effort just to convert it
back to big endian.

> Its also hard to be sure if these need to be BE and have always been
> that way because we've always run BE so they've never actually wanted
> CPU endian its just that CPU endian has always been BE (I think I
> started rambling...)
> 
> Just want to check that *not* converting them is done on purpose.

Yes, I explicitly did not convert them on purpose. As mentioned above we
need phandle in BE for the ibm,update-properties rtas work area.
Similarly, drc_index needs to be in BE for the ibm,configure-connector
rtas work area. Outside, of that we do no other manipulation of those
values.

> 
> And having read on, I'm assuming the answer is yes since this
> observation is true for your changes which affect:
> 	delete_dt_node()
> 	update_dt_node()
>         add_dt_node()
> Worth noting that you didn't change the definition of delete_dt_node()

You are correct. Oversight. I will fix that as it should generate a
sparse complaint.

-Tyrel

> 
> I'll have a look once you address the non compiling in patch 1/3 (I'm
> getting blocked the unused var because somehow Werror is on, odd it
> didn't trip you up) but I also suspect this will have sparse go a bit
> nuts. 
> I wonder if there is a nice way of shutting sparse up.
> 
>>  	struct update_props_workarea *upwa;
>>  	struct device_node *dn;
>> @@ -136,6 +136,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>>  	char *prop_data;
>>  	char *rtas_buf;
>>  	int update_properties_token;
>> +	u32 nprops;
>>  	u32 vd;
>>  
>>  	update_properties_token = rtas_token("ibm,update-properties");
>> @@ -162,6 +163,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>>  			break;
>>  
>>  		prop_data = rtas_buf + sizeof(*upwa);
>> +		nprops = be32_to_cpu(upwa->nprops);
>>  
>>  		/* On the first call to ibm,update-properties for a node the
>>  		 * the first property value descriptor contains an empty
>> @@ -170,17 +172,17 @@ static int update_dt_node(u32 phandle, s32 scope)
>>  		 */
>>  		if (*prop_data == 0) {
>>  			prop_data++;
>> -			vd = *(u32 *)prop_data;
>> +			vd = be32_to_cpu(*(__be32 *)prop_data);
>>  			prop_data += vd + sizeof(vd);
>> -			upwa->nprops--;
>> +			nprops--;
>>  		}
>>  
>> -		for (i = 0; i < upwa->nprops; i++) {
>> +		for (i = 0; i < nprops; i++) {
>>  			char *prop_name;
>>  
>>  			prop_name = prop_data;
>>  			prop_data += strlen(prop_name) + 1;
>> -			vd = *(u32 *)prop_data;
>> +			vd = be32_to_cpu(*(__be32 *)prop_data);
>>  			prop_data += sizeof(vd);
>>  
>>  			switch (vd) {
>> @@ -212,7 +214,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>>  	return 0;
>>  }
>>  
>> -static int add_dt_node(u32 parent_phandle, u32 drc_index)
>> +static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
>>  {
>>  	struct device_node *dn;
>>  	struct device_node *parent_dn;
>> @@ -237,7 +239,7 @@ static int add_dt_node(u32 parent_phandle, u32 drc_index)
>>  int pseries_devicetree_update(s32 scope)
>>  {
>>  	char *rtas_buf;
>> -	u32 *data;
>> +	__be32 *data;
>>  	int update_nodes_token;
>>  	int rc;
>>  
>> @@ -254,17 +256,17 @@ int pseries_devicetree_update(s32 scope)
>>  		if (rc && rc != 1)
>>  			break;
>>  
>> -		data = (u32 *)rtas_buf + 4;
>> -		while (*data & NODE_ACTION_MASK) {
>> +		data = (__be32 *)rtas_buf + 4;
>> +		while (be32_to_cpu(*data) & NODE_ACTION_MASK) {
>>  			int i;
>> -			u32 action = *data & NODE_ACTION_MASK;
>> -			int node_count = *data & NODE_COUNT_MASK;
>> +			u32 action = be32_to_cpu(*data) & NODE_ACTION_MASK;
>> +			u32 node_count = be32_to_cpu(*data) & NODE_COUNT_MASK;
>>  
>>  			data++;
>>  
>>  			for (i = 0; i < node_count; i++) {
>> -				u32 phandle = *data++;
>> -				u32 drc_index;
>> +				__be32 phandle = *data++;
>> +				__be32 drc_index;
>>  
>>  				switch (action) {
>>  				case DELETE_DT_NODE:
> 
> The patch looks good, no nonsense endian fixing. 
> Worth noting that it leaves existing bugs in place, which is fine, I'll
> rebase my patches which address endian and bugs on top of these so as to
> address the bugs.
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code
  2015-02-28  2:24 [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code Tyrel Datwyler
                   ` (2 preceding siblings ...)
  2015-02-28  2:24 ` [PATCH 3/3] powerpc/pseries: Expose post-migration in kernel device tree update to drmgr Tyrel Datwyler
@ 2015-03-03  6:10 ` Michael Ellerman
  2015-03-03 20:37   ` Tyrel Datwyler
  3 siblings, 1 reply; 18+ messages in thread
From: Michael Ellerman @ 2015-03-03  6:10 UTC (permalink / raw)
  To: Tyrel Datwyler; +Cc: linuxppc-dev, cyrilbur, nfont

On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
> This patchset simplifies the usage of rtas_ibm_suspend_me() by removing an
> extraneous function parameter, fixes device tree updating on little endian
> platforms, and adds a mechanism for informing drmgr that the kernel is cabable of
> performing the whole migration including device tree update itself.
> 
> Tyrel Datwyler (3):
>   powerpc/pseries: Simplify check for suspendability during
>     suspend/migration
>   powerpc/pseries: Little endian fixes for post mobility device tree
>     update
>   powerpc/pseries: Expose post-migration in kernel device tree update
>     to drmgr

Hi Tyrel,

Firstly let me say how much I hate this code, so thanks for working on it :)

But I need you to split this series, into 1) fixes for 4.0 (and stable?), and
2) the rest.

I *think* that would be patch 2, and then patches 1 & 3, but I don't want to
guess. So please resend.

cheers

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration
  2015-03-02 21:30     ` Tyrel Datwyler
@ 2015-03-03  6:15       ` Michael Ellerman
  2015-03-03 20:16         ` Tyrel Datwyler
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Ellerman @ 2015-03-03  6:15 UTC (permalink / raw)
  To: Tyrel Datwyler; +Cc: linuxppc-dev, Cyril Bur, nfont

On Mon, 2015-03-02 at 13:30 -0800, Tyrel Datwyler wrote:
> On 03/01/2015 08:19 PM, Cyril Bur wrote:
> > On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
> >> During suspend/migration operation we must wait for the VASI state reported
> >> by the hypervisor to become Suspending prior to making the ibm,suspend-me
> >> RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state variable
> >> that exposes the VASI state to the caller. This is unnecessary as the caller
> >> only really cares about the following three conditions; if there is an error
> >> we should bailout, success indicating we have suspended and woken back up so
> >> proceed to device tree updated, or we are not suspendable yet so try calling
> >> rtas_ibm_suspend_me again shortly.
> >>
> >> This patch removes the extraneous vasi_state variable and simply uses the
> >> return code to communicate how to proceed. We either succeed, fail, or get
> >> -EAGAIN in which case we sleep for a second before trying to call
> >> rtas_ibm_suspend_me again.
> >>
> >>  		u64 handle = ((u64)be32_to_cpu(args.args[0]) << 32)
> >>  		              | be32_to_cpu(args.args[1]);
> >> -		rc = rtas_ibm_suspend_me(handle, &vasi_rc);
> >> -		args.rets[0] = cpu_to_be32(vasi_rc);
> >> -		if (rc)
> >> +		rc = rtas_ibm_suspend_me(handle);
> >> +		if (rc == -EAGAIN)
> >> +			args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE);
> > 
> > (continuing on...) so perhaps here have
> > 	rc = 0;
> > else if (rc == -EIO)
> > 	args.rets[0] = cpu_to_be32(-1);
> > 	rc = 0;
> > Which should keep the original behaviour, the last thing we want to do
> > is break BE.
> 
> The biggest problem here is we are making what basically equates to a
> fake rtas call from drmgr which we intercept in ppc_rtas(). From there
> we make this special call to rtas_ibm_suspend_me() to check VASI state
> and do a bunch of other specialized work that needs to be setup prior to
> making the actual ibm,suspend-me rtas call. Since, we are cheating PAPR
> here I guess we can really handle it however we want. I chose to simply
> fail the rtas call in the case where rtas_ibm_suspend_me() fails with
> something other than -EAGAIN. In user space librtas will log errno for
> the failure and return RTAS_IO_ASSERT to drmgr which in turn will log
> that error and fail.

We don't want to change the return values of the syscall unless we absolutely
have to. And I don't think that's the case here.

Sure we think drmgr is the only thing that uses this crap, but we don't know
for sure.

cheers

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] powerpc/pseries: Expose post-migration in kernel device tree update to drmgr
  2015-02-28  2:24 ` [PATCH 3/3] powerpc/pseries: Expose post-migration in kernel device tree update to drmgr Tyrel Datwyler
@ 2015-03-03  6:24   ` Michael Ellerman
  2015-03-03 21:18     ` Tyrel Datwyler
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Ellerman @ 2015-03-03  6:24 UTC (permalink / raw)
  To: Tyrel Datwyler; +Cc: linuxppc-dev, cyrilbur, nfont

On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
> Traditionally after a migration operation drmgr has coordinated the device tree
> update with the kernel in userspace via the ugly /proc/ppc64/ofdt interface. This
> can be better done fully in the kernel where support already exists. Currently,
> drmgr makes a faux ibm,suspend-me RTAS call which we intercept in the kernel so
> that we can check VASI state for suspendability. After the LPAR resumes and
> returns to drmgr that is followed by the necessary update-nodes and
> update-properties RTAS calls which are parsed and communitated back to the kernel
> through /proc/ppc64/ofdt for the device tree update. The drmgr tool should
> instead initiate the migration using the already existing
> /sysfs/kernel/mobility/migration entry that performs all this work in the kernel.
> 
> This patch adds a show function to the sysfs "migration" attribute that returns
> 1 to indicate the kernel will perform the device tree update after a migration
> operation and that drmgr should initiated the migration through the sysfs
> "migration" attribute.

I don't understand why we need this?

Can't drmgr just check if /sysfs/kernel/mobility/migration exists, and if so it
knows it should use it and that the kernel will handle the whole procedure?

cheers

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration
  2015-03-03  6:15       ` Michael Ellerman
@ 2015-03-03 20:16         ` Tyrel Datwyler
  2015-03-04 15:58           ` Nathan Fontenot
  0 siblings, 1 reply; 18+ messages in thread
From: Tyrel Datwyler @ 2015-03-03 20:16 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Cyril Bur, nfont

On 03/02/2015 10:15 PM, Michael Ellerman wrote:
> On Mon, 2015-03-02 at 13:30 -0800, Tyrel Datwyler wrote:
>> On 03/01/2015 08:19 PM, Cyril Bur wrote:
>>> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
>>>> During suspend/migration operation we must wait for the VASI state reported
>>>> by the hypervisor to become Suspending prior to making the ibm,suspend-me
>>>> RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state variable
>>>> that exposes the VASI state to the caller. This is unnecessary as the caller
>>>> only really cares about the following three conditions; if there is an error
>>>> we should bailout, success indicating we have suspended and woken back up so
>>>> proceed to device tree updated, or we are not suspendable yet so try calling
>>>> rtas_ibm_suspend_me again shortly.
>>>>
>>>> This patch removes the extraneous vasi_state variable and simply uses the
>>>> return code to communicate how to proceed. We either succeed, fail, or get
>>>> -EAGAIN in which case we sleep for a second before trying to call
>>>> rtas_ibm_suspend_me again.
>>>>
>>>>  		u64 handle = ((u64)be32_to_cpu(args.args[0]) << 32)
>>>>  		              | be32_to_cpu(args.args[1]);
>>>> -		rc = rtas_ibm_suspend_me(handle, &vasi_rc);
>>>> -		args.rets[0] = cpu_to_be32(vasi_rc);
>>>> -		if (rc)
>>>> +		rc = rtas_ibm_suspend_me(handle);
>>>> +		if (rc == -EAGAIN)
>>>> +			args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE);
>>>
>>> (continuing on...) so perhaps here have
>>> 	rc = 0;
>>> else if (rc == -EIO)
>>> 	args.rets[0] = cpu_to_be32(-1);
>>> 	rc = 0;
>>> Which should keep the original behaviour, the last thing we want to do
>>> is break BE.
>>
>> The biggest problem here is we are making what basically equates to a
>> fake rtas call from drmgr which we intercept in ppc_rtas(). From there
>> we make this special call to rtas_ibm_suspend_me() to check VASI state
>> and do a bunch of other specialized work that needs to be setup prior to
>> making the actual ibm,suspend-me rtas call. Since, we are cheating PAPR
>> here I guess we can really handle it however we want. I chose to simply
>> fail the rtas call in the case where rtas_ibm_suspend_me() fails with
>> something other than -EAGAIN. In user space librtas will log errno for
>> the failure and return RTAS_IO_ASSERT to drmgr which in turn will log
>> that error and fail.
> 
> We don't want to change the return values of the syscall unless we absolutely
> have to. And I don't think that's the case here.

I'd like to argue that the one case I changed makes sense, but its just
as easy to keep the original behavior.

> 
> Sure we think drmgr is the only thing that uses this crap, but we don't know
> for sure.

I can't imagine how anybody else could possibly use this hack without a
streamid from the hmc/hypervisor, but I've been wrong in the past more
times than I can count. :)

-Tyrel

> 
> cheers
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code
  2015-03-03  6:10 ` [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code Michael Ellerman
@ 2015-03-03 20:37   ` Tyrel Datwyler
  0 siblings, 0 replies; 18+ messages in thread
From: Tyrel Datwyler @ 2015-03-03 20:37 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, cyrilbur, nfont

On 03/02/2015 10:10 PM, Michael Ellerman wrote:
> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
>> This patchset simplifies the usage of rtas_ibm_suspend_me() by removing an
>> extraneous function parameter, fixes device tree updating on little endian
>> platforms, and adds a mechanism for informing drmgr that the kernel is cabable of
>> performing the whole migration including device tree update itself.
>>
>> Tyrel Datwyler (3):
>>   powerpc/pseries: Simplify check for suspendability during
>>     suspend/migration
>>   powerpc/pseries: Little endian fixes for post mobility device tree
>>     update
>>   powerpc/pseries: Expose post-migration in kernel device tree update
>>     to drmgr
> 
> Hi Tyrel,
> 
> Firstly let me say how much I hate this code, so thanks for working on it :)

I did it once. Might as well sacrifice my sanity a second time. :)

> 
> But I need you to split this series, into 1) fixes for 4.0 (and stable?), and
> 2) the rest.
> 
> I *think* that would be patch 2, and then patches 1 & 3, but I don't want to
> guess. So please resend.

Sure. Your split seems correct as patch 2 is fixes while 1 and 3 are
cosmetic/new feature. Seeing as patch 1 is endian fixes I'll Cc -stable
as well.

-Tyrel

> 
> cheers
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] powerpc/pseries: Expose post-migration in kernel device tree update to drmgr
  2015-03-03  6:24   ` Michael Ellerman
@ 2015-03-03 21:18     ` Tyrel Datwyler
  0 siblings, 0 replies; 18+ messages in thread
From: Tyrel Datwyler @ 2015-03-03 21:18 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, cyrilbur, nfont

On 03/02/2015 10:24 PM, Michael Ellerman wrote:
> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
>> Traditionally after a migration operation drmgr has coordinated the device tree
>> update with the kernel in userspace via the ugly /proc/ppc64/ofdt interface. This
>> can be better done fully in the kernel where support already exists. Currently,
>> drmgr makes a faux ibm,suspend-me RTAS call which we intercept in the kernel so
>> that we can check VASI state for suspendability. After the LPAR resumes and
>> returns to drmgr that is followed by the necessary update-nodes and
>> update-properties RTAS calls which are parsed and communitated back to the kernel
>> through /proc/ppc64/ofdt for the device tree update. The drmgr tool should
>> instead initiate the migration using the already existing
>> /sysfs/kernel/mobility/migration entry that performs all this work in the kernel.
>>
>> This patch adds a show function to the sysfs "migration" attribute that returns
>> 1 to indicate the kernel will perform the device tree update after a migration
>> operation and that drmgr should initiated the migration through the sysfs
>> "migration" attribute.
> 
> I don't understand why we need this?
> 
> Can't drmgr just check if /sysfs/kernel/mobility/migration exists, and if so it
> knows it should use it and that the kernel will handle the whole procedure?

The problem is that this sysfs entry was originally added with the
remainder of the in kernel device tree update code in 2.6.37, but drmgr
was never modified to use it. By the time I started looking at the
in-kernel device tree code I found it very broken. I had bunch of fixes
to get it working that went into 3.12.

So, if somebody were to use a newer version of drmgr that simply checks
for the existence of the migration sysfs entry on a pre-3.12 kernel
their device-tree update experience is going to be sub-par.

The approach taken here is identical to what was done in 9da3489 when we
hooked the device tree update code into the suspend code. However, in
that case we were already using the sysfs entry to trigger the suspend
and legitimately needed a way to tell drmgr the kernel was now taking
care of updating the device tree. Here we are really just trying to
inform drmgr that it is running on a new enough kernel that the kernel
device tree code actually works properly.

Now, I don't really care for this approach, but the only other thought I
had was to change the sysfs entry from "migration" to "migrate".

-Tyrel

> 
> cheers
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update
  2015-03-02 21:49     ` Tyrel Datwyler
@ 2015-03-03 23:15       ` Tyrel Datwyler
  2015-03-04  1:20         ` Cyril Bur
  0 siblings, 1 reply; 18+ messages in thread
From: Tyrel Datwyler @ 2015-03-03 23:15 UTC (permalink / raw)
  To: Cyril Bur; +Cc: nfont, linuxppc-dev

On 03/02/2015 01:49 PM, Tyrel Datwyler wrote:
> On 03/01/2015 09:20 PM, Cyril Bur wrote:
>> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
>>> We currently use the device tree update code in the kernel after resuming
>>> from a suspend operation to re-sync the kernels view of the device tree with
>>> that of the hypervisor. The code as it stands is not endian safe as it relies
>>> on parsing buffers returned by RTAS calls that thusly contains data in big
>>> endian format.
>>>
>>> This patch annotates variables and structure members with __be types as well
>>> as performing necessary byte swaps to cpu endian for data that needs to be
>>> parsed.
>>>
>>> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/platforms/pseries/mobility.c | 36 ++++++++++++++++---------------
>>>  1 file changed, 19 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>>> index 29e4f04..0b1f70e 100644
>>> --- a/arch/powerpc/platforms/pseries/mobility.c
>>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>>> @@ -25,10 +25,10 @@
>>>  static struct kobject *mobility_kobj;
>>>  
>>>  struct update_props_workarea {
>>> -	u32 phandle;
>>> -	u32 state;
>>> -	u64 reserved;
>>> -	u32 nprops;
>>> +	__be32 phandle;
>>> +	__be32 state;
>>> +	__be64 reserved;
>>> +	__be32 nprops;
>>>  } __packed;
>>>  
>>>  #define NODE_ACTION_MASK	0xff000000
>>> @@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, struct property **prop,
>>>  	return 0;
>>>  }
>>>  
>>> -static int update_dt_node(u32 phandle, s32 scope)
>>> +static int update_dt_node(__be32 phandle, s32 scope)
>>>  {
>>
>> On line 153 of this function:
>>    dn = of_find_node_by_phandle(phandle);
>>
>> You're passing a __be32 to device tree code, if we can treat the phandle
>> as a opaque value returned to us from the rtas call and pass it around
>> like that then all good.

After digging deeper the device_node->phandle is stored in cpu endian
under the covers. So, for the of_find_node_by_phandle() we do need to
convert the phandle to cpu endian first. It appears I got lucky with the
update fixing the observed RMC issue because the phandle for the root
node seems to always be 0xffffffff.

-Tyrel

> 
> Yes, of_find_node_by_phandle directly compares phandle passed in against
> the handle stored in each device_node when searching for a matching
> node. Since, the device tree is big endian it follows that the big
> endian phandle received in the rtas buffer needs no conversion.
> 
> Further, we need to pass the phandle to ibm,update-properties in the
> work area which is also required to be big endian. So, again it seemed
> that converting to cpu endian was a waste of effort just to convert it
> back to big endian.
> 
>> Its also hard to be sure if these need to be BE and have always been
>> that way because we've always run BE so they've never actually wanted
>> CPU endian its just that CPU endian has always been BE (I think I
>> started rambling...)
>>
>> Just want to check that *not* converting them is done on purpose.
> 
> Yes, I explicitly did not convert them on purpose. As mentioned above we
> need phandle in BE for the ibm,update-properties rtas work area.
> Similarly, drc_index needs to be in BE for the ibm,configure-connector
> rtas work area. Outside, of that we do no other manipulation of those
> values.
> 
>>
>> And having read on, I'm assuming the answer is yes since this
>> observation is true for your changes which affect:
>> 	delete_dt_node()
>> 	update_dt_node()
>>         add_dt_node()
>> Worth noting that you didn't change the definition of delete_dt_node()
> 
> You are correct. Oversight. I will fix that as it should generate a
> sparse complaint.
> 
> -Tyrel
> 
>>
>> I'll have a look once you address the non compiling in patch 1/3 (I'm
>> getting blocked the unused var because somehow Werror is on, odd it
>> didn't trip you up) but I also suspect this will have sparse go a bit
>> nuts. 
>> I wonder if there is a nice way of shutting sparse up.
>>
>>>  	struct update_props_workarea *upwa;
>>>  	struct device_node *dn;
>>> @@ -136,6 +136,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>>>  	char *prop_data;
>>>  	char *rtas_buf;
>>>  	int update_properties_token;
>>> +	u32 nprops;
>>>  	u32 vd;
>>>  
>>>  	update_properties_token = rtas_token("ibm,update-properties");
>>> @@ -162,6 +163,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>>>  			break;
>>>  
>>>  		prop_data = rtas_buf + sizeof(*upwa);
>>> +		nprops = be32_to_cpu(upwa->nprops);
>>>  
>>>  		/* On the first call to ibm,update-properties for a node the
>>>  		 * the first property value descriptor contains an empty
>>> @@ -170,17 +172,17 @@ static int update_dt_node(u32 phandle, s32 scope)
>>>  		 */
>>>  		if (*prop_data == 0) {
>>>  			prop_data++;
>>> -			vd = *(u32 *)prop_data;
>>> +			vd = be32_to_cpu(*(__be32 *)prop_data);
>>>  			prop_data += vd + sizeof(vd);
>>> -			upwa->nprops--;
>>> +			nprops--;
>>>  		}
>>>  
>>> -		for (i = 0; i < upwa->nprops; i++) {
>>> +		for (i = 0; i < nprops; i++) {
>>>  			char *prop_name;
>>>  
>>>  			prop_name = prop_data;
>>>  			prop_data += strlen(prop_name) + 1;
>>> -			vd = *(u32 *)prop_data;
>>> +			vd = be32_to_cpu(*(__be32 *)prop_data);
>>>  			prop_data += sizeof(vd);
>>>  
>>>  			switch (vd) {
>>> @@ -212,7 +214,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>>>  	return 0;
>>>  }
>>>  
>>> -static int add_dt_node(u32 parent_phandle, u32 drc_index)
>>> +static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
>>>  {
>>>  	struct device_node *dn;
>>>  	struct device_node *parent_dn;
>>> @@ -237,7 +239,7 @@ static int add_dt_node(u32 parent_phandle, u32 drc_index)
>>>  int pseries_devicetree_update(s32 scope)
>>>  {
>>>  	char *rtas_buf;
>>> -	u32 *data;
>>> +	__be32 *data;
>>>  	int update_nodes_token;
>>>  	int rc;
>>>  
>>> @@ -254,17 +256,17 @@ int pseries_devicetree_update(s32 scope)
>>>  		if (rc && rc != 1)
>>>  			break;
>>>  
>>> -		data = (u32 *)rtas_buf + 4;
>>> -		while (*data & NODE_ACTION_MASK) {
>>> +		data = (__be32 *)rtas_buf + 4;
>>> +		while (be32_to_cpu(*data) & NODE_ACTION_MASK) {
>>>  			int i;
>>> -			u32 action = *data & NODE_ACTION_MASK;
>>> -			int node_count = *data & NODE_COUNT_MASK;
>>> +			u32 action = be32_to_cpu(*data) & NODE_ACTION_MASK;
>>> +			u32 node_count = be32_to_cpu(*data) & NODE_COUNT_MASK;
>>>  
>>>  			data++;
>>>  
>>>  			for (i = 0; i < node_count; i++) {
>>> -				u32 phandle = *data++;
>>> -				u32 drc_index;
>>> +				__be32 phandle = *data++;
>>> +				__be32 drc_index;
>>>  
>>>  				switch (action) {
>>>  				case DELETE_DT_NODE:
>>
>> The patch looks good, no nonsense endian fixing. 
>> Worth noting that it leaves existing bugs in place, which is fine, I'll
>> rebase my patches which address endian and bugs on top of these so as to
>> address the bugs.
>>
>>
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update
  2015-03-03 23:15       ` Tyrel Datwyler
@ 2015-03-04  1:20         ` Cyril Bur
  2015-03-04  1:41           ` Tyrel Datwyler
  0 siblings, 1 reply; 18+ messages in thread
From: Cyril Bur @ 2015-03-04  1:20 UTC (permalink / raw)
  To: Tyrel Datwyler; +Cc: nfont, linuxppc-dev

On Tue, 2015-03-03 at 15:15 -0800, Tyrel Datwyler wrote:
> On 03/02/2015 01:49 PM, Tyrel Datwyler wrote:
> > On 03/01/2015 09:20 PM, Cyril Bur wrote:
> >> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
> >>> We currently use the device tree update code in the kernel after resuming
> >>> from a suspend operation to re-sync the kernels view of the device tree with
> >>> that of the hypervisor. The code as it stands is not endian safe as it relies
> >>> on parsing buffers returned by RTAS calls that thusly contains data in big
> >>> endian format.
> >>>
> >>> This patch annotates variables and structure members with __be types as well
> >>> as performing necessary byte swaps to cpu endian for data that needs to be
> >>> parsed.
> >>>
> >>> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> >>> ---
> >>>  arch/powerpc/platforms/pseries/mobility.c | 36 ++++++++++++++++---------------
> >>>  1 file changed, 19 insertions(+), 17 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> >>> index 29e4f04..0b1f70e 100644
> >>> --- a/arch/powerpc/platforms/pseries/mobility.c
> >>> +++ b/arch/powerpc/platforms/pseries/mobility.c
> >>> @@ -25,10 +25,10 @@
> >>>  static struct kobject *mobility_kobj;
> >>>  
> >>>  struct update_props_workarea {
> >>> -	u32 phandle;
> >>> -	u32 state;
> >>> -	u64 reserved;
> >>> -	u32 nprops;
> >>> +	__be32 phandle;
> >>> +	__be32 state;
> >>> +	__be64 reserved;
> >>> +	__be32 nprops;
> >>>  } __packed;
> >>>  
> >>>  #define NODE_ACTION_MASK	0xff000000
> >>> @@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, struct property **prop,
> >>>  	return 0;
> >>>  }
> >>>  
> >>> -static int update_dt_node(u32 phandle, s32 scope)
> >>> +static int update_dt_node(__be32 phandle, s32 scope)
> >>>  {
> >>
> >> On line 153 of this function:
> >>    dn = of_find_node_by_phandle(phandle);
> >>
> >> You're passing a __be32 to device tree code, if we can treat the phandle
> >> as a opaque value returned to us from the rtas call and pass it around
> >> like that then all good.
> 
> After digging deeper the device_node->phandle is stored in cpu endian
> under the covers. So, for the of_find_node_by_phandle() we do need to
> convert the phandle to cpu endian first. It appears I got lucky with the
> update fixing the observed RMC issue because the phandle for the root
> node seems to always be 0xffffffff.
> 
I think we've both switched opinions here, initially I thought an endian
conversion was necessary but turns out that all of_find_node_by_phandle
really does is:
   for_each_of_allnodes(np)
      if (np->phandle == handle)
         break;
   of_node_get(np);

The == is safe either way and I think the of code might be trying to
imply that it doesn't matter by having a typedefed type 'phandle'.

I'm still digging around, we want to get this right!


Cyril
> -Tyrel
> 
> > 
> > Yes, of_find_node_by_phandle directly compares phandle passed in against
> > the handle stored in each device_node when searching for a matching
> > node. Since, the device tree is big endian it follows that the big
> > endian phandle received in the rtas buffer needs no conversion.
> > 
> > Further, we need to pass the phandle to ibm,update-properties in the
> > work area which is also required to be big endian. So, again it seemed
> > that converting to cpu endian was a waste of effort just to convert it
> > back to big endian.
> > 
> >> Its also hard to be sure if these need to be BE and have always been
> >> that way because we've always run BE so they've never actually wanted
> >> CPU endian its just that CPU endian has always been BE (I think I
> >> started rambling...)
> >>
> >> Just want to check that *not* converting them is done on purpose.
> > 
> > Yes, I explicitly did not convert them on purpose. As mentioned above we
> > need phandle in BE for the ibm,update-properties rtas work area.
> > Similarly, drc_index needs to be in BE for the ibm,configure-connector
> > rtas work area. Outside, of that we do no other manipulation of those
> > values.
> > 
> >>
> >> And having read on, I'm assuming the answer is yes since this
> >> observation is true for your changes which affect:
> >> 	delete_dt_node()
> >> 	update_dt_node()
> >>         add_dt_node()
> >> Worth noting that you didn't change the definition of delete_dt_node()
> > 
> > You are correct. Oversight. I will fix that as it should generate a
> > sparse complaint.
> > 
> > -Tyrel
> > 
> >>
> >> I'll have a look once you address the non compiling in patch 1/3 (I'm
> >> getting blocked the unused var because somehow Werror is on, odd it
> >> didn't trip you up) but I also suspect this will have sparse go a bit
> >> nuts. 
> >> I wonder if there is a nice way of shutting sparse up.
> >>
> >>>  	struct update_props_workarea *upwa;
> >>>  	struct device_node *dn;
> >>> @@ -136,6 +136,7 @@ static int update_dt_node(u32 phandle, s32 scope)
> >>>  	char *prop_data;
> >>>  	char *rtas_buf;
> >>>  	int update_properties_token;
> >>> +	u32 nprops;
> >>>  	u32 vd;
> >>>  
> >>>  	update_properties_token = rtas_token("ibm,update-properties");
> >>> @@ -162,6 +163,7 @@ static int update_dt_node(u32 phandle, s32 scope)
> >>>  			break;
> >>>  
> >>>  		prop_data = rtas_buf + sizeof(*upwa);
> >>> +		nprops = be32_to_cpu(upwa->nprops);
> >>>  
> >>>  		/* On the first call to ibm,update-properties for a node the
> >>>  		 * the first property value descriptor contains an empty
> >>> @@ -170,17 +172,17 @@ static int update_dt_node(u32 phandle, s32 scope)
> >>>  		 */
> >>>  		if (*prop_data == 0) {
> >>>  			prop_data++;
> >>> -			vd = *(u32 *)prop_data;
> >>> +			vd = be32_to_cpu(*(__be32 *)prop_data);
> >>>  			prop_data += vd + sizeof(vd);
> >>> -			upwa->nprops--;
> >>> +			nprops--;
> >>>  		}
> >>>  
> >>> -		for (i = 0; i < upwa->nprops; i++) {
> >>> +		for (i = 0; i < nprops; i++) {
> >>>  			char *prop_name;
> >>>  
> >>>  			prop_name = prop_data;
> >>>  			prop_data += strlen(prop_name) + 1;
> >>> -			vd = *(u32 *)prop_data;
> >>> +			vd = be32_to_cpu(*(__be32 *)prop_data);
> >>>  			prop_data += sizeof(vd);
> >>>  
> >>>  			switch (vd) {
> >>> @@ -212,7 +214,7 @@ static int update_dt_node(u32 phandle, s32 scope)
> >>>  	return 0;
> >>>  }
> >>>  
> >>> -static int add_dt_node(u32 parent_phandle, u32 drc_index)
> >>> +static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
> >>>  {
> >>>  	struct device_node *dn;
> >>>  	struct device_node *parent_dn;
> >>> @@ -237,7 +239,7 @@ static int add_dt_node(u32 parent_phandle, u32 drc_index)
> >>>  int pseries_devicetree_update(s32 scope)
> >>>  {
> >>>  	char *rtas_buf;
> >>> -	u32 *data;
> >>> +	__be32 *data;
> >>>  	int update_nodes_token;
> >>>  	int rc;
> >>>  
> >>> @@ -254,17 +256,17 @@ int pseries_devicetree_update(s32 scope)
> >>>  		if (rc && rc != 1)
> >>>  			break;
> >>>  
> >>> -		data = (u32 *)rtas_buf + 4;
> >>> -		while (*data & NODE_ACTION_MASK) {
> >>> +		data = (__be32 *)rtas_buf + 4;
> >>> +		while (be32_to_cpu(*data) & NODE_ACTION_MASK) {
> >>>  			int i;
> >>> -			u32 action = *data & NODE_ACTION_MASK;
> >>> -			int node_count = *data & NODE_COUNT_MASK;
> >>> +			u32 action = be32_to_cpu(*data) & NODE_ACTION_MASK;
> >>> +			u32 node_count = be32_to_cpu(*data) & NODE_COUNT_MASK;
> >>>  
> >>>  			data++;
> >>>  
> >>>  			for (i = 0; i < node_count; i++) {
> >>> -				u32 phandle = *data++;
> >>> -				u32 drc_index;
> >>> +				__be32 phandle = *data++;
> >>> +				__be32 drc_index;
> >>>  
> >>>  				switch (action) {
> >>>  				case DELETE_DT_NODE:
> >>
> >> The patch looks good, no nonsense endian fixing. 
> >> Worth noting that it leaves existing bugs in place, which is fine, I'll
> >> rebase my patches which address endian and bugs on top of these so as to
> >> address the bugs.
> >>
> >>
> > 
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
> > 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update
  2015-03-04  1:20         ` Cyril Bur
@ 2015-03-04  1:41           ` Tyrel Datwyler
  0 siblings, 0 replies; 18+ messages in thread
From: Tyrel Datwyler @ 2015-03-04  1:41 UTC (permalink / raw)
  To: Cyril Bur; +Cc: nfont, linuxppc-dev

On 03/03/2015 05:20 PM, Cyril Bur wrote:
> On Tue, 2015-03-03 at 15:15 -0800, Tyrel Datwyler wrote:
>> On 03/02/2015 01:49 PM, Tyrel Datwyler wrote:
>>> On 03/01/2015 09:20 PM, Cyril Bur wrote:
>>>> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
>>>>> We currently use the device tree update code in the kernel after resuming
>>>>> from a suspend operation to re-sync the kernels view of the device tree with
>>>>> that of the hypervisor. The code as it stands is not endian safe as it relies
>>>>> on parsing buffers returned by RTAS calls that thusly contains data in big
>>>>> endian format.
>>>>>
>>>>> This patch annotates variables and structure members with __be types as well
>>>>> as performing necessary byte swaps to cpu endian for data that needs to be
>>>>> parsed.
>>>>>
>>>>> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
>>>>> ---
>>>>>  arch/powerpc/platforms/pseries/mobility.c | 36 ++++++++++++++++---------------
>>>>>  1 file changed, 19 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>>>>> index 29e4f04..0b1f70e 100644
>>>>> --- a/arch/powerpc/platforms/pseries/mobility.c
>>>>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>>>>> @@ -25,10 +25,10 @@
>>>>>  static struct kobject *mobility_kobj;
>>>>>  
>>>>>  struct update_props_workarea {
>>>>> -	u32 phandle;
>>>>> -	u32 state;
>>>>> -	u64 reserved;
>>>>> -	u32 nprops;
>>>>> +	__be32 phandle;
>>>>> +	__be32 state;
>>>>> +	__be64 reserved;
>>>>> +	__be32 nprops;
>>>>>  } __packed;
>>>>>  
>>>>>  #define NODE_ACTION_MASK	0xff000000
>>>>> @@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, struct property **prop,
>>>>>  	return 0;
>>>>>  }
>>>>>  
>>>>> -static int update_dt_node(u32 phandle, s32 scope)
>>>>> +static int update_dt_node(__be32 phandle, s32 scope)
>>>>>  {
>>>>
>>>> On line 153 of this function:
>>>>    dn = of_find_node_by_phandle(phandle);
>>>>
>>>> You're passing a __be32 to device tree code, if we can treat the phandle
>>>> as a opaque value returned to us from the rtas call and pass it around
>>>> like that then all good.
>>
>> After digging deeper the device_node->phandle is stored in cpu endian
>> under the covers. So, for the of_find_node_by_phandle() we do need to
>> convert the phandle to cpu endian first. It appears I got lucky with the
>> update fixing the observed RMC issue because the phandle for the root
>> node seems to always be 0xffffffff.
>>
> I think we've both switched opinions here, initially I thought an endian
> conversion was necessary but turns out that all of_find_node_by_phandle
> really does is:
>    for_each_of_allnodes(np)
>       if (np->phandle == handle)
>          break;
>    of_node_get(np);
> 
> The == is safe either way and I think the of code might be trying to
> imply that it doesn't matter by having a typedefed type 'phandle'.
> 
> I'm still digging around, we want to get this right!

When the device tree is unflattened the phandle is byte swapped to cpu
endian. The following code is from unflatten_dt_node().

    if (strcmp(pname, "ibm,phandle") == 0)
        np->phandle = be32_to_cpup(p);

I added some debug to the of_find_node_by_phandle() and verified if the
phandle isn't swapped to cpu endian we fail to find a matching node
except in the case where the phandle is equivalent in both big and
little endian.

-Tyrel

> 
> 
> Cyril
>> -Tyrel
>>
>>>
>>> Yes, of_find_node_by_phandle directly compares phandle passed in against
>>> the handle stored in each device_node when searching for a matching
>>> node. Since, the device tree is big endian it follows that the big
>>> endian phandle received in the rtas buffer needs no conversion.
>>>
>>> Further, we need to pass the phandle to ibm,update-properties in the
>>> work area which is also required to be big endian. So, again it seemed
>>> that converting to cpu endian was a waste of effort just to convert it
>>> back to big endian.
>>>
>>>> Its also hard to be sure if these need to be BE and have always been
>>>> that way because we've always run BE so they've never actually wanted
>>>> CPU endian its just that CPU endian has always been BE (I think I
>>>> started rambling...)
>>>>
>>>> Just want to check that *not* converting them is done on purpose.
>>>
>>> Yes, I explicitly did not convert them on purpose. As mentioned above we
>>> need phandle in BE for the ibm,update-properties rtas work area.
>>> Similarly, drc_index needs to be in BE for the ibm,configure-connector
>>> rtas work area. Outside, of that we do no other manipulation of those
>>> values.
>>>
>>>>
>>>> And having read on, I'm assuming the answer is yes since this
>>>> observation is true for your changes which affect:
>>>> 	delete_dt_node()
>>>> 	update_dt_node()
>>>>         add_dt_node()
>>>> Worth noting that you didn't change the definition of delete_dt_node()
>>>
>>> You are correct. Oversight. I will fix that as it should generate a
>>> sparse complaint.
>>>
>>> -Tyrel
>>>
>>>>
>>>> I'll have a look once you address the non compiling in patch 1/3 (I'm
>>>> getting blocked the unused var because somehow Werror is on, odd it
>>>> didn't trip you up) but I also suspect this will have sparse go a bit
>>>> nuts. 
>>>> I wonder if there is a nice way of shutting sparse up.
>>>>
>>>>>  	struct update_props_workarea *upwa;
>>>>>  	struct device_node *dn;
>>>>> @@ -136,6 +136,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>>>>>  	char *prop_data;
>>>>>  	char *rtas_buf;
>>>>>  	int update_properties_token;
>>>>> +	u32 nprops;
>>>>>  	u32 vd;
>>>>>  
>>>>>  	update_properties_token = rtas_token("ibm,update-properties");
>>>>> @@ -162,6 +163,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>>>>>  			break;
>>>>>  
>>>>>  		prop_data = rtas_buf + sizeof(*upwa);
>>>>> +		nprops = be32_to_cpu(upwa->nprops);
>>>>>  
>>>>>  		/* On the first call to ibm,update-properties for a node the
>>>>>  		 * the first property value descriptor contains an empty
>>>>> @@ -170,17 +172,17 @@ static int update_dt_node(u32 phandle, s32 scope)
>>>>>  		 */
>>>>>  		if (*prop_data == 0) {
>>>>>  			prop_data++;
>>>>> -			vd = *(u32 *)prop_data;
>>>>> +			vd = be32_to_cpu(*(__be32 *)prop_data);
>>>>>  			prop_data += vd + sizeof(vd);
>>>>> -			upwa->nprops--;
>>>>> +			nprops--;
>>>>>  		}
>>>>>  
>>>>> -		for (i = 0; i < upwa->nprops; i++) {
>>>>> +		for (i = 0; i < nprops; i++) {
>>>>>  			char *prop_name;
>>>>>  
>>>>>  			prop_name = prop_data;
>>>>>  			prop_data += strlen(prop_name) + 1;
>>>>> -			vd = *(u32 *)prop_data;
>>>>> +			vd = be32_to_cpu(*(__be32 *)prop_data);
>>>>>  			prop_data += sizeof(vd);
>>>>>  
>>>>>  			switch (vd) {
>>>>> @@ -212,7 +214,7 @@ static int update_dt_node(u32 phandle, s32 scope)
>>>>>  	return 0;
>>>>>  }
>>>>>  
>>>>> -static int add_dt_node(u32 parent_phandle, u32 drc_index)
>>>>> +static int add_dt_node(__be32 parent_phandle, __be32 drc_index)
>>>>>  {
>>>>>  	struct device_node *dn;
>>>>>  	struct device_node *parent_dn;
>>>>> @@ -237,7 +239,7 @@ static int add_dt_node(u32 parent_phandle, u32 drc_index)
>>>>>  int pseries_devicetree_update(s32 scope)
>>>>>  {
>>>>>  	char *rtas_buf;
>>>>> -	u32 *data;
>>>>> +	__be32 *data;
>>>>>  	int update_nodes_token;
>>>>>  	int rc;
>>>>>  
>>>>> @@ -254,17 +256,17 @@ int pseries_devicetree_update(s32 scope)
>>>>>  		if (rc && rc != 1)
>>>>>  			break;
>>>>>  
>>>>> -		data = (u32 *)rtas_buf + 4;
>>>>> -		while (*data & NODE_ACTION_MASK) {
>>>>> +		data = (__be32 *)rtas_buf + 4;
>>>>> +		while (be32_to_cpu(*data) & NODE_ACTION_MASK) {
>>>>>  			int i;
>>>>> -			u32 action = *data & NODE_ACTION_MASK;
>>>>> -			int node_count = *data & NODE_COUNT_MASK;
>>>>> +			u32 action = be32_to_cpu(*data) & NODE_ACTION_MASK;
>>>>> +			u32 node_count = be32_to_cpu(*data) & NODE_COUNT_MASK;
>>>>>  
>>>>>  			data++;
>>>>>  
>>>>>  			for (i = 0; i < node_count; i++) {
>>>>> -				u32 phandle = *data++;
>>>>> -				u32 drc_index;
>>>>> +				__be32 phandle = *data++;
>>>>> +				__be32 drc_index;
>>>>>  
>>>>>  				switch (action) {
>>>>>  				case DELETE_DT_NODE:
>>>>
>>>> The patch looks good, no nonsense endian fixing. 
>>>> Worth noting that it leaves existing bugs in place, which is fine, I'll
>>>> rebase my patches which address endian and bugs on top of these so as to
>>>> address the bugs.
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Linuxppc-dev mailing list
>>> Linuxppc-dev@lists.ozlabs.org
>>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration
  2015-03-03 20:16         ` Tyrel Datwyler
@ 2015-03-04 15:58           ` Nathan Fontenot
  0 siblings, 0 replies; 18+ messages in thread
From: Nathan Fontenot @ 2015-03-04 15:58 UTC (permalink / raw)
  To: Tyrel Datwyler, Michael Ellerman; +Cc: linuxppc-dev, Cyril Bur

On 03/03/2015 02:16 PM, Tyrel Datwyler wrote:
> On 03/02/2015 10:15 PM, Michael Ellerman wrote:
>> On Mon, 2015-03-02 at 13:30 -0800, Tyrel Datwyler wrote:
>>> On 03/01/2015 08:19 PM, Cyril Bur wrote:
>>>> On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
>>>>> During suspend/migration operation we must wait for the VASI state reported
>>>>> by the hypervisor to become Suspending prior to making the ibm,suspend-me
>>>>> RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state variable
>>>>> that exposes the VASI state to the caller. This is unnecessary as the caller
>>>>> only really cares about the following three conditions; if there is an error
>>>>> we should bailout, success indicating we have suspended and woken back up so
>>>>> proceed to device tree updated, or we are not suspendable yet so try calling
>>>>> rtas_ibm_suspend_me again shortly.
>>>>>
>>>>> This patch removes the extraneous vasi_state variable and simply uses the
>>>>> return code to communicate how to proceed. We either succeed, fail, or get
>>>>> -EAGAIN in which case we sleep for a second before trying to call
>>>>> rtas_ibm_suspend_me again.
>>>>>
>>>>>  		u64 handle = ((u64)be32_to_cpu(args.args[0]) << 32)
>>>>>  		              | be32_to_cpu(args.args[1]);
>>>>> -		rc = rtas_ibm_suspend_me(handle, &vasi_rc);
>>>>> -		args.rets[0] = cpu_to_be32(vasi_rc);
>>>>> -		if (rc)
>>>>> +		rc = rtas_ibm_suspend_me(handle);
>>>>> +		if (rc == -EAGAIN)
>>>>> +			args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE);
>>>>
>>>> (continuing on...) so perhaps here have
>>>> 	rc = 0;
>>>> else if (rc == -EIO)
>>>> 	args.rets[0] = cpu_to_be32(-1);
>>>> 	rc = 0;
>>>> Which should keep the original behaviour, the last thing we want to do
>>>> is break BE.
>>>
>>> The biggest problem here is we are making what basically equates to a
>>> fake rtas call from drmgr which we intercept in ppc_rtas(). From there
>>> we make this special call to rtas_ibm_suspend_me() to check VASI state
>>> and do a bunch of other specialized work that needs to be setup prior to
>>> making the actual ibm,suspend-me rtas call. Since, we are cheating PAPR
>>> here I guess we can really handle it however we want. I chose to simply
>>> fail the rtas call in the case where rtas_ibm_suspend_me() fails with
>>> something other than -EAGAIN. In user space librtas will log errno for
>>> the failure and return RTAS_IO_ASSERT to drmgr which in turn will log
>>> that error and fail.
>>
>> We don't want to change the return values of the syscall unless we absolutely
>> have to. And I don't think that's the case here.
> 
> I'd like to argue that the one case I changed makes sense, but its just
> as easy to keep the original behavior.
> 
>>
>> Sure we think drmgr is the only thing that uses this crap, but we don't know
>> for sure.
> 
> I can't imagine how anybody else could possibly use this hack without a
> streamid from the hmc/hypervisor, but I've been wrong in the past more
> times than I can count. :)

Correct, this will fail if called with a random streamid. The streamid has
to match what is handed to us from the HMC when a migration request is
initiated.

-Nathan
 
> 
> -Tyrel
> 
>>
>> cheers
>>
>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-03-04 15:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-28  2:24 [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code Tyrel Datwyler
2015-02-28  2:24 ` [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration Tyrel Datwyler
2015-03-02  4:19   ` Cyril Bur
2015-03-02 21:30     ` Tyrel Datwyler
2015-03-03  6:15       ` Michael Ellerman
2015-03-03 20:16         ` Tyrel Datwyler
2015-03-04 15:58           ` Nathan Fontenot
2015-02-28  2:24 ` [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update Tyrel Datwyler
2015-03-02  5:20   ` Cyril Bur
2015-03-02 21:49     ` Tyrel Datwyler
2015-03-03 23:15       ` Tyrel Datwyler
2015-03-04  1:20         ` Cyril Bur
2015-03-04  1:41           ` Tyrel Datwyler
2015-02-28  2:24 ` [PATCH 3/3] powerpc/pseries: Expose post-migration in kernel device tree update to drmgr Tyrel Datwyler
2015-03-03  6:24   ` Michael Ellerman
2015-03-03 21:18     ` Tyrel Datwyler
2015-03-03  6:10 ` [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code Michael Ellerman
2015-03-03 20:37   ` Tyrel Datwyler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.