All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/4] powerpc/papr_scm: Workaround for failure of drc bind after kexec
@ 2019-07-23 16:13 Vaibhav Jain
  2019-07-23 16:13 ` [DOC][PATCH v5 1/4] powerpc: Document some HCalls for Storage Class Memory Vaibhav Jain
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Vaibhav Jain @ 2019-07-23 16:13 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Aneesh Kumar K . V, Oliver O'Halloran, Vaibhav Jain,
	Laurent Dufour, David Gibson

Presently an error is returned in response to hcall H_SCM_BIND_MEM when a
new kernel boots on lpar via kexec. This prevents papr_scm from registering
drc memory regions with nvdimm. The error reported is of the form below:

"papr_scm ibm,persistent-memory:ibm,pmemory@44100002: bind err: -68"

On investigation it was revealed that phyp returns this error as previous
kernel did not completely release bindings for drc scm-memory blocks and
hence phyp rejected request for re-binding these block to lpar with error
H_OVERLAP. Also support for a new H_SCM_UNBIND_ALL is recently added which
is better suited for releasing all the bound scm-memory block from an lpar.

So leveraging new hcall H_SCM_UNBIND_ALL, we can workaround H_OVERLAP issue
during kexec by forcing an unbind of all drm scm-memory blocks and issuing
H_SCM_BIND_MEM to re-bind the drc scm-memory blocks to lpar. This sequence
will also be needed when a new kernel boot on lpar after previous kernel
panicked and it never got an opportunity to call H_SCM_UNBIND_MEM/ALL.

Hence this patch-set implements following changes to papr_scm module:

* Update hvcall.h to include opcodes for new hcall H_SCM_UNBIND_ALL.

* Update it to use H_SCM_UNBIND_ALL instead of H_SCM_UNBIND_MEM

* In case hcall H_SCM_BIND_MEM fails with error H_OVERLAP, force
  H_SCM_UNBIND_ALL and retry the bind operation again.

With the patch-set applied re-bind of drc scm-memory to lpar succeeds after
a kexec to new kernel as illustrated below:

# Old kernel
$ sudo ndctl list -R
[
  {
    "dev":"region0",
    <snip>
    ....
  }
]
# kexec to new kernel
$ sudo kexec --initrd=... vmlinux
...
...
I'm in purgatory
...
papr_scm ibm,persistent-memory:ibm,pmemory@44100002: Un-binding and retrying
...
# New kernel
$ sudo ndctl list -R
[
  {
    "dev":"region0",
    <snip>
    ....
  }
]

---
Change-log:
v5:
* Added a new doc-patch describing the HCALL interface between a guest kernel
  and PAPR compliant hyper-visor like PowerVM/KVM.

v4:
* Updated the patch description of first patch in the series as suggested
  by Mpe.

v3:
* Fixed a build warning reported by kbuild test robot.
* Updated the hcall opcode from latest papr-scm specification.
* Fixed a minor code comment & patch description as pointed out by Oliver.

v2:
* Addressed review comments from Oliver on v1 patchset.

Vaibhav Jain (4):
  powerpc: Document some HCalls for Storage Class Memory
  powerpc/pseries: Update SCM hcall op-codes in hvcall.h
  powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL
  powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails

 Documentation/powerpc/hcalls.txt          | 140 ++++++++++++++++++++++
 arch/powerpc/include/asm/hvcall.h         |  11 +-
 arch/powerpc/platforms/pseries/papr_scm.c |  44 +++++--
 3 files changed, 184 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/powerpc/hcalls.txt

-- 
2.21.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [DOC][PATCH v5 1/4] powerpc: Document some HCalls for Storage Class Memory
  2019-07-23 16:13 [PATCH v5 0/4] powerpc/papr_scm: Workaround for failure of drc bind after kexec Vaibhav Jain
@ 2019-07-23 16:13 ` Vaibhav Jain
  2019-07-24  9:08   ` Laurent Dufour
  2019-07-30 12:06   ` Michael Ellerman
  2019-07-23 16:13 ` [PATCH v5 2/4] powerpc/pseries: Update SCM hcall op-codes in hvcall.h Vaibhav Jain
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Vaibhav Jain @ 2019-07-23 16:13 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Aneesh Kumar K . V, Oliver O'Halloran, Vaibhav Jain,
	Laurent Dufour, David Gibson

This doc patch provides an initial description of the HCall op-codes
that are used by Linux kernel running as a guest operating
system (LPAR) on top of PowerVM or any other sPAPR compliant
hyper-visor (e.g qemu).

Apart from documenting the HCalls the doc-patch also provides a
rudimentary overview of how Hcalls are implemented inside the Linux
kernel and how information flows between kernel and PowerVM/KVM.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Change-log:

v5
* First patch in this patchset.
---
 Documentation/powerpc/hcalls.txt | 140 +++++++++++++++++++++++++++++++
 1 file changed, 140 insertions(+)
 create mode 100644 Documentation/powerpc/hcalls.txt

diff --git a/Documentation/powerpc/hcalls.txt b/Documentation/powerpc/hcalls.txt
new file mode 100644
index 000000000000..cc9dd872cecd
--- /dev/null
+++ b/Documentation/powerpc/hcalls.txt
@@ -0,0 +1,140 @@
+Hyper-visor Call Op-codes (HCALLS)
+====================================
+
+Overview
+=========
+
+Virtualization on PPC64 arch is based on the PAPR specification[1] which
+describes run-time environment for a guest operating system and how it should
+interact with the hyper-visor for privileged operations. Currently there are two
+PAPR compliant hypervisors (PHYP):
+
+IBM PowerVM: IBM's proprietary hyper-visor that supports AIX, IBM-i and Linux as
+	     supported guests (termed as Logical Partitions or LPARS).
+
+Qemu/KVM:    Supports PPC64 linux guests running on a PPC64 linux host.
+
+On PPC64 arch a virtualized guest kernel runs in a non-privileged mode (HV=0).
+Hence to perform a privileged operations the guest issues a Hyper-visor
+Call (HCALL) with necessary input operands. PHYP after performing the privilege
+operation returns a status code and output operands back to the guest.
+
+HCALL ABI
+=========
+The ABI specification for a HCall between guest os kernel and PHYP is
+described in [1]. The Opcode for Hcall is set in R3 and subsequent in-arguments
+for the Hcall are provided in registers R4-R12. On return from 'HVCS'
+instruction the status code of HCall is available in R3 an the output parameters
+are returned in registers R4-R12.
+
+Powerpc arch code provides convenient wrappers named plpar_hcall_xxx defined in
+header 'hvcall.h' to issue HCalls from the linux kernel running as guest.
+
+
+DRC & DRC Indexes
+=================
+
+		 PAPR		     Guest
+  DR1          Hypervisor             OS
+  +--+        +----------+         +---------+
+  |  |<------>|          |         |  User   |
+  +--+  DRC1  |          |   DRC   |  Space  |
+	      |          |  Index  +---------+
+  DR2         |          |         |         |
+  +--+        |          |<------->|  Kernel |
+  |  |<----- >|          |  HCall  |         |
+  +--+  DRC2  +----------+         +---------+
+
+PHYP terms shared hardware resources like PCI devices, NVDimms etc available for
+use by LPARs as Dynamic Resource (DR). When a DR is allocated to an LPAR, PHYP
+creates a data-structure called Dynamic Resource Connector (DRC) to manage LPAR
+access. An LPAR refers to a DRC via an opaque 32-bit number called DRC-Index.
+The DRC-index value is provided to the LPAR via device-tree where its present
+as an attribute in the device tree node associated with the DR.
+
+HCALL Op-codes
+==============
+
+Below is a partial of of HCALLs that are supported by PHYP. For the
+corresponding opcode values please look into the header
+'arch/powerpc/include/asm/hvcall.h' :
+
+* H_SCM_READ_METADATA:
+  Input: drcIndex, offset, buffer-address, numBytesToRead
+  Out: None
+  Description:
+  Given a DRC Index of an NVDimm, read N-bytes from the the meta data area
+  associated with it, at a specified offset and copy it to provided buffer.
+  The metadata area stores configuration information such as label information,
+  bad-blocks etc. The metadata area is located out-of-band of NVDimm storage
+  area hence a separate access semantics is provided.
+
+* H_SCM_WRITE_METADATA:
+  Input: drcIndex, offset, data, numBytesToWrite
+  Out: None
+  Description:
+  Given a DRC Index of an NVDimm, write N-bytes from provided buffer at the
+  given offset to the the meta data area associated with the NVDimm.
+
+
+* H_SCM_BIND_MEM:
+  Input: drcIndex, startingScmBlockIndex, numScmBlocksToBind, targetAddress
+  Out: guestMappedAddress, numScmBlockBound
+  Description:
+  Given a DRC-Index of an NVDimm, maps the SCM (Storage Class Memory) blocks to
+  continuous logical addresses in guest physical address space. The HCALL
+  arguments can be used to map partial range of SCM blocks instead of entire
+  NVDimm range to the LPAR.
+
+* H_SCM_UNBIND_MEM:
+  Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
+  Out: numScmBlocksUnbound
+  Description:
+  Given a DRC-Index of an NVDimm, unmap one or more the SCM blocks from guest
+  physical address space. The HCALL can fail if the Guest has an active PTE
+  entry to the SCM block being unbinded.
+
+* H_SCM_QUERY_BLOCK_MEM_BINDING:
+  Input: drcIndex, scmBlockIndex
+  Out: Guest-Physical-Address
+  Description:
+  Given a DRC-Index and an SCM Block index return the guest physical address to
+  which the SCM block is mapped to.
+
+* H_SCM_QUERY_LOGICAL_MEM_BINDING:
+  Input: Guest-Physical-Address
+  Out: drcIndex, scmBlockIndex
+  Description:
+  Given a guest physical address return which DRC Index and SCM block is mapped
+  to that address.
+
+* H_SCM_UNBIND_ALL:
+  Input: scmTargetScope, drcIndex
+  Out: None
+  Description:
+  Depending on the Target scope unmap all scm blocks belonging to all NVDimms
+  or all scm blocks belonging to a single NVDimm identified by its drcIndex
+  from the LPAR memory.
+
+* H_SCM_HEALTH:
+  Input: drcIndex
+  Output: health-bitmap, health-bit-valid-bitmap
+  Description:
+  Given a DRC Index return the info on predictive failure and over all health of
+  the NVDimm. The asserted bits in the health-bitmap indicate a single predictive
+  failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
+  valid.
+
+
+* H_SCM_PERFORMANCE_STATS:
+  Input: drcIndex, resultBuffer Addr
+  Out: None
+  Description:
+  Given a DRC Index collect the performance statistics for NVDimm and copy them
+  to the resultBuffer.
+
+
+References
+==========
+[1]: "Linux on Power Architecture Platform Reference"
+     https://members.openpowerfoundation.org/document/dl/469
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 2/4] powerpc/pseries: Update SCM hcall op-codes in hvcall.h
  2019-07-23 16:13 [PATCH v5 0/4] powerpc/papr_scm: Workaround for failure of drc bind after kexec Vaibhav Jain
  2019-07-23 16:13 ` [DOC][PATCH v5 1/4] powerpc: Document some HCalls for Storage Class Memory Vaibhav Jain
@ 2019-07-23 16:13 ` Vaibhav Jain
  2019-07-26  8:53   ` David Gibson
  2019-07-23 16:13 ` [PATCH v5 3/4] powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL Vaibhav Jain
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Vaibhav Jain @ 2019-07-23 16:13 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Aneesh Kumar K . V, Oliver O'Halloran, Vaibhav Jain,
	Laurent Dufour, David Gibson

Update the hvcalls.h to include op-codes for new hcalls introduce to
manage SCM memory. Also update existing hcall definitions to reflect
current papr specification for SCM.

The removed hcall op-codes H_SCM_MEM_QUERY, H_SCM_BLOCK_CLEAR were
transient proposals and there support was never implemented by
Power-VM nor they were used anywhere in Linux kernel. Hence we don't
expect anyone to be impacted by this change.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Change-log:

v5:
* None. Re-spinning the patchset.

v4:
* Updated the patch description mentioned current status of removed
  hcall opcodes. [Mpe]

v3:
* Added updated opcode for H_SCM_HEALTH [Oliver]

v2:
* None new patch in this series.
---
 arch/powerpc/include/asm/hvcall.h | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 463c63a9fcf1..11112023e327 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -302,9 +302,14 @@
 #define H_SCM_UNBIND_MEM        0x3F0
 #define H_SCM_QUERY_BLOCK_MEM_BINDING 0x3F4
 #define H_SCM_QUERY_LOGICAL_MEM_BINDING 0x3F8
-#define H_SCM_MEM_QUERY	        0x3FC
-#define H_SCM_BLOCK_CLEAR       0x400
-#define MAX_HCALL_OPCODE	H_SCM_BLOCK_CLEAR
+#define H_SCM_UNBIND_ALL        0x3FC
+#define H_SCM_HEALTH            0x400
+#define H_SCM_PERFORMANCE_STATS 0x418
+#define MAX_HCALL_OPCODE	H_SCM_PERFORMANCE_STATS
+
+/* Scope args for H_SCM_UNBIND_ALL */
+#define H_UNBIND_SCOPE_ALL (0x1)
+#define H_UNBIND_SCOPE_DRC (0x2)
 
 /* H_VIOCTL functions */
 #define H_GET_VIOA_DUMP_SIZE	0x01
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 3/4] powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL
  2019-07-23 16:13 [PATCH v5 0/4] powerpc/papr_scm: Workaround for failure of drc bind after kexec Vaibhav Jain
  2019-07-23 16:13 ` [DOC][PATCH v5 1/4] powerpc: Document some HCalls for Storage Class Memory Vaibhav Jain
  2019-07-23 16:13 ` [PATCH v5 2/4] powerpc/pseries: Update SCM hcall op-codes in hvcall.h Vaibhav Jain
@ 2019-07-23 16:13 ` Vaibhav Jain
  2019-07-23 16:13 ` [PATCH v5 4/4] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails Vaibhav Jain
  2019-07-24 10:04 ` [PATCH v5 0/4] powerpc/papr_scm: Workaround for failure of drc bind after kexec Aneesh Kumar K.V
  4 siblings, 0 replies; 16+ messages in thread
From: Vaibhav Jain @ 2019-07-23 16:13 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Aneesh Kumar K . V, Oliver O'Halloran, Vaibhav Jain,
	Laurent Dufour, David Gibson

The new hcall named H_SCM_UNBIND_ALL has been introduce that can
unbind all or specific scm memory assigned to an lpar. This is
more efficient than using H_SCM_UNBIND_MEM as currently we don't
support partial unbind of scm memory.

Hence this patch proposes following changes to drc_pmem_unbind():

    * Update drc_pmem_unbind() to replace hcall H_SCM_UNBIND_MEM to
      H_SCM_UNBIND_ALL.

    * Update drc_pmem_unbind() to handles cases when PHYP asks the guest
      kernel to wait for specific amount of time before retrying the
      hcall via the 'LONG_BUSY' return value.

    * Ensure appropriate error code is returned back from the function
      in case of an error.

Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Change-log:

v5:
* None. Re-spinning the patchset.

v4:
* None. Re-spinning the patchset.

v3:
* Fixed a build warning reported by kbuild-robot.
* Updated patch description to put emphasis on 'scm memory' instead of
  'scm drc memory blocks' as for phyp there is a stark difference
  between how drc are managed for scm memory v/s regular memory. [Oliver]

v2:
* Added a dev_dbg when unbind operation succeeds [Oliver]
* Changed type of variable 'rc' to int64_t [Oliver]
* Removed the code that was logging a warning in case bind operation
  takes >1-seconds [Oliver]
* Spinned off changes to hvcall.h as a separate patch. [Oliver]
---
 arch/powerpc/platforms/pseries/papr_scm.c | 29 +++++++++++++++++------
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index c8ec670ee924..82568a7e0a7c 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -11,6 +11,7 @@
 #include <linux/sched.h>
 #include <linux/libnvdimm.h>
 #include <linux/platform_device.h>
+#include <linux/delay.h>
 
 #include <asm/plpar_wrappers.h>
 
@@ -78,22 +79,36 @@ static int drc_pmem_bind(struct papr_scm_priv *p)
 static int drc_pmem_unbind(struct papr_scm_priv *p)
 {
 	unsigned long ret[PLPAR_HCALL_BUFSIZE];
-	uint64_t rc, token;
+	uint64_t token = 0;
+	int64_t rc;
 
-	token = 0;
+	dev_dbg(&p->pdev->dev, "unbind drc %x\n", p->drc_index);
 
-	/* NB: unbind has the same retry requirements mentioned above */
+	/* NB: unbind has the same retry requirements as drc_pmem_bind() */
 	do {
-		rc = plpar_hcall(H_SCM_UNBIND_MEM, ret, p->drc_index,
-				p->bound_addr, p->blocks, token);
+
+		/* Unbind of all SCM resources associated with drcIndex */
+		rc = plpar_hcall(H_SCM_UNBIND_ALL, ret, H_UNBIND_SCOPE_DRC,
+				 p->drc_index, token);
 		token = ret[0];
-		cond_resched();
+
+		/* Check if we are stalled for some time */
+		if (H_IS_LONG_BUSY(rc)) {
+			msleep(get_longbusy_msecs(rc));
+			rc = H_BUSY;
+		} else if (rc == H_BUSY) {
+			cond_resched();
+		}
+
 	} while (rc == H_BUSY);
 
 	if (rc)
 		dev_err(&p->pdev->dev, "unbind error: %lld\n", rc);
+	else
+		dev_dbg(&p->pdev->dev, "unbind drc %x complete\n",
+			p->drc_index);
 
-	return !!rc;
+	return rc == H_SUCCESS ? 0 : -ENXIO;
 }
 
 static int papr_scm_meta_get(struct papr_scm_priv *p,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 4/4] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails
  2019-07-23 16:13 [PATCH v5 0/4] powerpc/papr_scm: Workaround for failure of drc bind after kexec Vaibhav Jain
                   ` (2 preceding siblings ...)
  2019-07-23 16:13 ` [PATCH v5 3/4] powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL Vaibhav Jain
@ 2019-07-23 16:13 ` Vaibhav Jain
  2019-07-24  9:17   ` Laurent Dufour
  2019-07-24 10:04 ` [PATCH v5 0/4] powerpc/papr_scm: Workaround for failure of drc bind after kexec Aneesh Kumar K.V
  4 siblings, 1 reply; 16+ messages in thread
From: Vaibhav Jain @ 2019-07-23 16:13 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Aneesh Kumar K . V, Oliver O'Halloran, Vaibhav Jain,
	Laurent Dufour, David Gibson

In some cases initial bind of scm memory for an lpar can fail if
previously it wasn't released using a scm-unbind hcall. This situation
can arise due to panic of the previous kernel or forced lpar
fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error.

To mitigate such cases the patch updates papr_scm_probe() to force a
call to drc_pmem_unbind() in case the initial bind of scm memory fails
with EBUSY error. In case scm-bind operation again fails after the
forced scm-unbind then we follow the existing error path. We also
update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp
and indicate it as a EBUSY error back to the caller.

Suggested-by: "Oliver O'Halloran" <oohall@gmail.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
---
Change-log:

v5:
* None. Re-spinning the patchset.

v4:
* None. Re-spinning the patchset.

v3:
* Minor update to a code comment. [Oliver]

v2:
* Moved the retry code from drc_pmem_bind() to papr_scm_probe()
  [Oliver]
* Changed the type of variable 'rc' in drc_pmem_bind() to
  int64_t. [Oliver]
---
 arch/powerpc/platforms/pseries/papr_scm.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 82568a7e0a7c..2c07908359b2 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -44,8 +44,9 @@ struct papr_scm_priv {
 static int drc_pmem_bind(struct papr_scm_priv *p)
 {
 	unsigned long ret[PLPAR_HCALL_BUFSIZE];
-	uint64_t rc, token;
 	uint64_t saved = 0;
+	uint64_t token;
+	int64_t rc;
 
 	/*
 	 * When the hypervisor cannot map all the requested memory in a single
@@ -65,6 +66,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p)
 	} while (rc == H_BUSY);
 
 	if (rc) {
+		/* H_OVERLAP needs a separate error path */
+		if (rc == H_OVERLAP)
+			return -EBUSY;
+
 		dev_err(&p->pdev->dev, "bind err: %lld\n", rc);
 		return -ENXIO;
 	}
@@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev)
 
 	/* request the hypervisor to bind this region to somewhere in memory */
 	rc = drc_pmem_bind(p);
+
+	/* If phyp says drc memory still bound then force unbound and retry */
+	if (rc == -EBUSY) {
+		dev_warn(&pdev->dev, "Retrying bind after unbinding\n");
+		drc_pmem_unbind(p);
+		rc = drc_pmem_bind(p);
+	}
+
 	if (rc)
 		goto err;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [DOC][PATCH v5 1/4] powerpc: Document some HCalls for Storage Class Memory
  2019-07-23 16:13 ` [DOC][PATCH v5 1/4] powerpc: Document some HCalls for Storage Class Memory Vaibhav Jain
@ 2019-07-24  9:08   ` Laurent Dufour
  2019-07-24 12:59     ` Michal Suchánek
  2019-07-30 12:06   ` Michael Ellerman
  1 sibling, 1 reply; 16+ messages in thread
From: Laurent Dufour @ 2019-07-24  9:08 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev
  Cc: Oliver O'Halloran, Aneesh Kumar K . V, David Gibson

Le 23/07/2019 à 18:13, Vaibhav Jain a écrit :
> This doc patch provides an initial description of the HCall op-codes
> that are used by Linux kernel running as a guest operating
> system (LPAR) on top of PowerVM or any other sPAPR compliant
> hyper-visor (e.g qemu).
> 
> Apart from documenting the HCalls the doc-patch also provides a
> rudimentary overview of how Hcalls are implemented inside the Linux
> kernel and how information flows between kernel and PowerVM/KVM.

Hi Vaibhav,

That's a good idea to introduce such a documentation.

> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Change-log:
> 
> v5
> * First patch in this patchset.
> ---
>   Documentation/powerpc/hcalls.txt | 140 +++++++++++++++++++++++++++++++
>   1 file changed, 140 insertions(+)
>   create mode 100644 Documentation/powerpc/hcalls.txt
> 
> diff --git a/Documentation/powerpc/hcalls.txt b/Documentation/powerpc/hcalls.txt
> new file mode 100644
> index 000000000000..cc9dd872cecd
> --- /dev/null
> +++ b/Documentation/powerpc/hcalls.txt
> @@ -0,0 +1,140 @@
> +Hyper-visor Call Op-codes (HCALLS)
> +====================================
> +
> +Overview
> +=========
> +
> +Virtualization on PPC64 arch is based on the PAPR specification[1] which
> +describes run-time environment for a guest operating system and how it should
> +interact with the hyper-visor for privileged operations. Currently there are two
> +PAPR compliant hypervisors (PHYP):
> +
> +IBM PowerVM: IBM's proprietary hyper-visor that supports AIX, IBM-i and Linux as
> +	     supported guests (termed as Logical Partitions or LPARS).
> +
> +Qemu/KVM:    Supports PPC64 linux guests running on a PPC64 linux host.
> +
> +On PPC64 arch a virtualized guest kernel runs in a non-privileged mode (HV=0).
> +Hence to perform a privileged operations the guest issues a Hyper-visor
> +Call (HCALL) with necessary input operands. PHYP after performing the privilege
> +operation returns a status code and output operands back to the guest.
> +
> +HCALL ABI
> +=========
> +The ABI specification for a HCall between guest os kernel and PHYP is
> +described in [1]. The Opcode for Hcall is set in R3 and subsequent in-arguments
> +for the Hcall are provided in registers R4-R12. On return from 'HVCS'
> +instruction the status code of HCall is available in R3 an the output parameters
> +are returned in registers R4-R12.

Would it be good to mention that values passed through the memory must be 
stored in Big Endian format ?

> +Powerpc arch code provides convenient wrappers named plpar_hcall_xxx defined in
> +header 'hvcall.h' to issue HCalls from the linux kernel running as guest.
> +
> +
> +DRC & DRC Indexes
> +=================
> +
> +		 PAPR		     Guest
> +  DR1          Hypervisor             OS
> +  +--+        +----------+         +---------+
> +  |  |<------>|          |         |  User   |
> +  +--+  DRC1  |          |   DRC   |  Space  |
> +	      |          |  Index  +---------+
> +  DR2         |          |         |         |
> +  +--+        |          |<------->|  Kernel |
> +  |  |<----- >|          |  HCall  |         |
> +  +--+  DRC2  +----------+         +---------+
> +
> +PHYP terms shared hardware resources like PCI devices, NVDimms etc available for
> +use by LPARs as Dynamic Resource (DR). When a DR is allocated to an LPAR, PHYP
> +creates a data-structure called Dynamic Resource Connector (DRC) to manage LPAR
> +access. An LPAR refers to a DRC via an opaque 32-bit number called DRC-Index.
> +The DRC-index value is provided to the LPAR via device-tree where its present
> +as an attribute in the device tree node associated with the DR.

Should you use the term 'Hypervisor' instead of 'PHYP' which is not usually 
designing only the proprietary one ?

Thanks,
Laurent.

> +
> +HCALL Op-codes
> +==============
> +
> +Below is a partial of of HCALLs that are supported by PHYP. For the
> +corresponding opcode values please look into the header
> +'arch/powerpc/include/asm/hvcall.h' :
> +
> +* H_SCM_READ_METADATA:
> +  Input: drcIndex, offset, buffer-address, numBytesToRead
> +  Out: None
> +  Description:
> +  Given a DRC Index of an NVDimm, read N-bytes from the the meta data area
> +  associated with it, at a specified offset and copy it to provided buffer.
> +  The metadata area stores configuration information such as label information,
> +  bad-blocks etc. The metadata area is located out-of-band of NVDimm storage
> +  area hence a separate access semantics is provided.
> +
> +* H_SCM_WRITE_METADATA:
> +  Input: drcIndex, offset, data, numBytesToWrite
> +  Out: None
> +  Description:
> +  Given a DRC Index of an NVDimm, write N-bytes from provided buffer at the
> +  given offset to the the meta data area associated with the NVDimm.
> +
> +
> +* H_SCM_BIND_MEM:
> +  Input: drcIndex, startingScmBlockIndex, numScmBlocksToBind, targetAddress
> +  Out: guestMappedAddress, numScmBlockBound
> +  Description:
> +  Given a DRC-Index of an NVDimm, maps the SCM (Storage Class Memory) blocks to
> +  continuous logical addresses in guest physical address space. The HCALL
> +  arguments can be used to map partial range of SCM blocks instead of entire
> +  NVDimm range to the LPAR.
> +
> +* H_SCM_UNBIND_MEM:
> +  Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
> +  Out: numScmBlocksUnbound
> +  Description:
> +  Given a DRC-Index of an NVDimm, unmap one or more the SCM blocks from guest
> +  physical address space. The HCALL can fail if the Guest has an active PTE
> +  entry to the SCM block being unbinded.
> +
> +* H_SCM_QUERY_BLOCK_MEM_BINDING:
> +  Input: drcIndex, scmBlockIndex
> +  Out: Guest-Physical-Address
> +  Description:
> +  Given a DRC-Index and an SCM Block index return the guest physical address to
> +  which the SCM block is mapped to.
> +
> +* H_SCM_QUERY_LOGICAL_MEM_BINDING:
> +  Input: Guest-Physical-Address
> +  Out: drcIndex, scmBlockIndex
> +  Description:
> +  Given a guest physical address return which DRC Index and SCM block is mapped
> +  to that address.
> +
> +* H_SCM_UNBIND_ALL:
> +  Input: scmTargetScope, drcIndex
> +  Out: None
> +  Description:
> +  Depending on the Target scope unmap all scm blocks belonging to all NVDimms
> +  or all scm blocks belonging to a single NVDimm identified by its drcIndex
> +  from the LPAR memory.
> +
> +* H_SCM_HEALTH:
> +  Input: drcIndex
> +  Output: health-bitmap, health-bit-valid-bitmap
> +  Description:
> +  Given a DRC Index return the info on predictive failure and over all health of
> +  the NVDimm. The asserted bits in the health-bitmap indicate a single predictive
> +  failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
> +  valid.
> +
> +
> +* H_SCM_PERFORMANCE_STATS:
> +  Input: drcIndex, resultBuffer Addr
> +  Out: None
> +  Description:
> +  Given a DRC Index collect the performance statistics for NVDimm and copy them
> +  to the resultBuffer.
> +
> +
> +References
> +==========
> +[1]: "Linux on Power Architecture Platform Reference"
> +     https://members.openpowerfoundation.org/document/dl/469
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 4/4] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails
  2019-07-23 16:13 ` [PATCH v5 4/4] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails Vaibhav Jain
@ 2019-07-24  9:17   ` Laurent Dufour
  2019-07-24  9:24     ` Oliver O'Halloran
  0 siblings, 1 reply; 16+ messages in thread
From: Laurent Dufour @ 2019-07-24  9:17 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev
  Cc: Oliver O'Halloran, Aneesh Kumar K . V, David Gibson

Le 23/07/2019 à 18:13, Vaibhav Jain a écrit :
> In some cases initial bind of scm memory for an lpar can fail if
> previously it wasn't released using a scm-unbind hcall. This situation
> can arise due to panic of the previous kernel or forced lpar
> fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error.
> 
> To mitigate such cases the patch updates papr_scm_probe() to force a
> call to drc_pmem_unbind() in case the initial bind of scm memory fails
> with EBUSY error. In case scm-bind operation again fails after the
> forced scm-unbind then we follow the existing error path. We also
> update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp
> and indicate it as a EBUSY error back to the caller.
> 
> Suggested-by: "Oliver O'Halloran" <oohall@gmail.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
> ---
> Change-log:
> 
> v5:
> * None. Re-spinning the patchset.
> 
> v4:
> * None. Re-spinning the patchset.
> 
> v3:
> * Minor update to a code comment. [Oliver]
> 
> v2:
> * Moved the retry code from drc_pmem_bind() to papr_scm_probe()
>    [Oliver]
> * Changed the type of variable 'rc' in drc_pmem_bind() to
>    int64_t. [Oliver]
> ---
>   arch/powerpc/platforms/pseries/papr_scm.c | 15 ++++++++++++++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index 82568a7e0a7c..2c07908359b2 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -44,8 +44,9 @@ struct papr_scm_priv {
>   static int drc_pmem_bind(struct papr_scm_priv *p)
>   {
>   	unsigned long ret[PLPAR_HCALL_BUFSIZE];
> -	uint64_t rc, token;
>   	uint64_t saved = 0;
> +	uint64_t token;
> +	int64_t rc;
> 
>   	/*
>   	 * When the hypervisor cannot map all the requested memory in a single
> @@ -65,6 +66,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p)
>   	} while (rc == H_BUSY);
> 
>   	if (rc) {
> +		/* H_OVERLAP needs a separate error path */
> +		if (rc == H_OVERLAP)
> +			return -EBUSY;
> +
>   		dev_err(&p->pdev->dev, "bind err: %lld\n", rc);
>   		return -ENXIO;
>   	}
> @@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev)
> 
>   	/* request the hypervisor to bind this region to somewhere in memory */
>   	rc = drc_pmem_bind(p);
> +
> +	/* If phyp says drc memory still bound then force unbound and retry */
> +	if (rc == -EBUSY) {
> +		dev_warn(&pdev->dev, "Retrying bind after unbinding\n");
> +		drc_pmem_unbind(p);
> +		rc = drc_pmem_bind(p);

In the unlikely case where H_SCM_BIND_MEM is returning H_OVERLAP once the 
unbinding has been done, the error would be silently processed. That sounds 
really unlikely, but should an error message be displayed in this 
particular case ?

> +	}
> +
>   	if (rc)
>   		goto err;
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 4/4] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails
  2019-07-24  9:17   ` Laurent Dufour
@ 2019-07-24  9:24     ` Oliver O'Halloran
  2019-07-24  9:27       ` Laurent Dufour
  0 siblings, 1 reply; 16+ messages in thread
From: Oliver O'Halloran @ 2019-07-24  9:24 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Vaibhav Jain, David Gibson, linuxppc-dev, Aneesh Kumar K . V

On Wed, Jul 24, 2019 at 7:17 PM Laurent Dufour
<ldufour@linux.vnet.ibm.com> wrote:
>
> Le 23/07/2019 à 18:13, Vaibhav Jain a écrit :
> > *snip*
> > @@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev)
> >
> >       /* request the hypervisor to bind this region to somewhere in memory */
> >       rc = drc_pmem_bind(p);
> > +
> > +     /* If phyp says drc memory still bound then force unbound and retry */
> > +     if (rc == -EBUSY) {
> > +             dev_warn(&pdev->dev, "Retrying bind after unbinding\n");
> > +             drc_pmem_unbind(p);
> > +             rc = drc_pmem_bind(p);
>
> In the unlikely case where H_SCM_BIND_MEM is returning H_OVERLAP once the
> unbinding has been done, the error would be silently processed. That sounds
> really unlikely, but should an error message be displayed in this
> particular case ?

drc_pmem_bind() prints the h-call error code if we get one, so it's not silent

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 4/4] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails
  2019-07-24  9:24     ` Oliver O'Halloran
@ 2019-07-24  9:27       ` Laurent Dufour
  2019-07-24  9:45         ` Oliver O'Halloran
  0 siblings, 1 reply; 16+ messages in thread
From: Laurent Dufour @ 2019-07-24  9:27 UTC (permalink / raw)
  To: Oliver O'Halloran
  Cc: Vaibhav Jain, David Gibson, linuxppc-dev, Aneesh Kumar K . V

Le 24/07/2019 à 11:24, Oliver O'Halloran a écrit :
> On Wed, Jul 24, 2019 at 7:17 PM Laurent Dufour
> <ldufour@linux.vnet.ibm.com> wrote:
>>
>> Le 23/07/2019 à 18:13, Vaibhav Jain a écrit :
>>> *snip*
>>> @@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev)
>>>
>>>        /* request the hypervisor to bind this region to somewhere in memory */
>>>        rc = drc_pmem_bind(p);
>>> +
>>> +     /* If phyp says drc memory still bound then force unbound and retry */
>>> +     if (rc == -EBUSY) {
>>> +             dev_warn(&pdev->dev, "Retrying bind after unbinding\n");
>>> +             drc_pmem_unbind(p);
>>> +             rc = drc_pmem_bind(p);
>>
>> In the unlikely case where H_SCM_BIND_MEM is returning H_OVERLAP once the
>> unbinding has been done, the error would be silently processed. That sounds
>> really unlikely, but should an error message be displayed in this
>> particular case ?
> 
> drc_pmem_bind() prints the h-call error code if we get one, so it's not silent

That's no more the case whith this patch, H_OVERLAP is handled before 
writing the error message, which would make sense for the first try.

For the record, the patch introduces:

@@ -65,6 +66,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p)
  	} while (rc == H_BUSY);

  	if (rc) {
+		/* H_OVERLAP needs a separate error path */
+		if (rc == H_OVERLAP)
+			return -EBUSY;
+
  		dev_err(&p->pdev->dev, "bind err: %lld\n", rc);
  		return -ENXIO;
  	}


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 4/4] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails
  2019-07-24  9:27       ` Laurent Dufour
@ 2019-07-24  9:45         ` Oliver O'Halloran
  0 siblings, 0 replies; 16+ messages in thread
From: Oliver O'Halloran @ 2019-07-24  9:45 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Vaibhav Jain, David Gibson, linuxppc-dev, Aneesh Kumar K . V

On Wed, Jul 24, 2019 at 7:27 PM Laurent Dufour
<ldufour@linux.vnet.ibm.com> wrote:
>
> Le 24/07/2019 à 11:24, Oliver O'Halloran a écrit :
> > On Wed, Jul 24, 2019 at 7:17 PM Laurent Dufour
> > <ldufour@linux.vnet.ibm.com> wrote:
> >>
> >> Le 23/07/2019 à 18:13, Vaibhav Jain a écrit :
> >>> *snip*
> >>> @@ -404,6 +409,14 @@ static int papr_scm_probe(struct platform_device *pdev)
> >>>
> >>>        /* request the hypervisor to bind this region to somewhere in memory */
> >>>        rc = drc_pmem_bind(p);
> >>> +
> >>> +     /* If phyp says drc memory still bound then force unbound and retry */
> >>> +     if (rc == -EBUSY) {
> >>> +             dev_warn(&pdev->dev, "Retrying bind after unbinding\n");
> >>> +             drc_pmem_unbind(p);
> >>> +             rc = drc_pmem_bind(p);
> >>
> >> In the unlikely case where H_SCM_BIND_MEM is returning H_OVERLAP once the
> >> unbinding has been done, the error would be silently processed. That sounds
> >> really unlikely, but should an error message be displayed in this
> >> particular case ?
> >
> > drc_pmem_bind() prints the h-call error code if we get one, so it's not silent
>
> That's no more the case whith this patch, H_OVERLAP is handled before
> writing the error message, which would make sense for the first try.
>
> For the record, the patch introduces:
>
> @@ -65,6 +66,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p)
>         } while (rc == H_BUSY);
>
>         if (rc) {
> +               /* H_OVERLAP needs a separate error path */
> +               if (rc == H_OVERLAP)
> +                       return -EBUSY;
> +
>                 dev_err(&p->pdev->dev, "bind err: %lld\n", rc);
>                 return -ENXIO;
>         }

Ah, good point. Getting H_OVERLAP is still an error case so I think
it's reasonable to still print the message in that case.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 0/4] powerpc/papr_scm: Workaround for failure of drc bind after kexec
  2019-07-23 16:13 [PATCH v5 0/4] powerpc/papr_scm: Workaround for failure of drc bind after kexec Vaibhav Jain
                   ` (3 preceding siblings ...)
  2019-07-23 16:13 ` [PATCH v5 4/4] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails Vaibhav Jain
@ 2019-07-24 10:04 ` Aneesh Kumar K.V
  4 siblings, 0 replies; 16+ messages in thread
From: Aneesh Kumar K.V @ 2019-07-24 10:04 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev
  Cc: Laurent Dufour, Vaibhav Jain, David Gibson, Oliver O'Halloran

Vaibhav Jain <vaibhav@linux.ibm.com> writes:

> Presently an error is returned in response to hcall H_SCM_BIND_MEM when a
> new kernel boots on lpar via kexec. This prevents papr_scm from registering
> drc memory regions with nvdimm. The error reported is of the form below:
>
> "papr_scm ibm,persistent-memory:ibm,pmemory@44100002: bind err: -68"
>
> On investigation it was revealed that phyp returns this error as previous
> kernel did not completely release bindings for drc scm-memory blocks and
> hence phyp rejected request for re-binding these block to lpar with error
> H_OVERLAP. Also support for a new H_SCM_UNBIND_ALL is recently added which
> is better suited for releasing all the bound scm-memory block from an lpar.
>
> So leveraging new hcall H_SCM_UNBIND_ALL, we can workaround H_OVERLAP issue
> during kexec by forcing an unbind of all drm scm-memory blocks and issuing
> H_SCM_BIND_MEM to re-bind the drc scm-memory blocks to lpar. This sequence
> will also be needed when a new kernel boot on lpar after previous kernel
> panicked and it never got an opportunity to call H_SCM_UNBIND_MEM/ALL.
>
> Hence this patch-set implements following changes to papr_scm module:
>
> * Update hvcall.h to include opcodes for new hcall H_SCM_UNBIND_ALL.
>
> * Update it to use H_SCM_UNBIND_ALL instead of H_SCM_UNBIND_MEM
>
> * In case hcall H_SCM_BIND_MEM fails with error H_OVERLAP, force
>   H_SCM_UNBIND_ALL and retry the bind operation again.
>

You can add for the series.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [DOC][PATCH v5 1/4] powerpc: Document some HCalls for Storage Class Memory
  2019-07-24  9:08   ` Laurent Dufour
@ 2019-07-24 12:59     ` Michal Suchánek
  2019-07-30  5:28       ` Vaibhav Jain
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Suchánek @ 2019-07-24 12:59 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Vaibhav Jain, Oliver O'Halloran, David Gibson, linuxppc-dev,
	Aneesh Kumar K . V

On Wed, 24 Jul 2019 11:08:58 +0200
Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:

> Le 23/07/2019 à 18:13, Vaibhav Jain a écrit :
> > This doc patch provides an initial description of the HCall op-codes
> > that are used by Linux kernel running as a guest operating
> > system (LPAR) on top of PowerVM or any other sPAPR compliant
> > hyper-visor (e.g qemu).
> > 
> > Apart from documenting the HCalls the doc-patch also provides a
> > rudimentary overview of how Hcalls are implemented inside the Linux
> > kernel and how information flows between kernel and PowerVM/KVM.  
> 
> Hi Vaibhav,
> 
> That's a good idea to introduce such a documentation.
> 
> > Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> > ---
> > Change-log:
> > 
> > v5
> > * First patch in this patchset.
> > ---
> >   Documentation/powerpc/hcalls.txt | 140 +++++++++++++++++++++++++++++++
> >   1 file changed, 140 insertions(+)
> >   create mode 100644 Documentation/powerpc/hcalls.txt
> > 
> > diff --git a/Documentation/powerpc/hcalls.txt b/Documentation/powerpc/hcalls.txt
> > new file mode 100644
> > index 000000000000..cc9dd872cecd
> > --- /dev/null
> > +++ b/Documentation/powerpc/hcalls.txt
> > @@ -0,0 +1,140 @@
> > +Hyper-visor Call Op-codes (HCALLS)
> > +====================================
> > +
> > +Overview
> > +=========
> > +
> > +Virtualization on PPC64 arch is based on the PAPR specification[1] which
> > +describes run-time environment for a guest operating system and how it should
> > +interact with the hyper-visor for privileged operations. Currently there are two
> > +PAPR compliant hypervisors (PHYP):
> > +
> > +IBM PowerVM: IBM's proprietary hyper-visor that supports AIX, IBM-i and Linux as
> > +	     supported guests (termed as Logical Partitions or LPARS).
> > +
> > +Qemu/KVM:    Supports PPC64 linux guests running on a PPC64 linux host.
> > +
> > +On PPC64 arch a virtualized guest kernel runs in a non-privileged mode (HV=0).
> > +Hence to perform a privileged operations the guest issues a Hyper-visor
> > +Call (HCALL) with necessary input operands. PHYP after performing the privilege
> > +operation returns a status code and output operands back to the guest.
> > +
> > +HCALL ABI
> > +=========
> > +The ABI specification for a HCall between guest os kernel and PHYP is
> > +described in [1]. The Opcode for Hcall is set in R3 and subsequent in-arguments
> > +for the Hcall are provided in registers R4-R12. On return from 'HVCS'
> > +instruction the status code of HCall is available in R3 an the output parameters
> > +are returned in registers R4-R12.  
> 
> Would it be good to mention that values passed through the memory must be 
> stored in Big Endian format ?
> 
> > +Powerpc arch code provides convenient wrappers named plpar_hcall_xxx defined in
> > +header 'hvcall.h' to issue HCalls from the linux kernel running as guest.
> > +
> > +
> > +DRC & DRC Indexes
> > +=================
> > +
> > +		 PAPR		     Guest
> > +  DR1          Hypervisor             OS
> > +  +--+        +----------+         +---------+
> > +  |  |<------>|          |         |  User   |
> > +  +--+  DRC1  |          |   DRC   |  Space  |
> > +	      |          |  Index  +---------+
> > +  DR2         |          |         |         |
> > +  +--+        |          |<------->|  Kernel |
> > +  |  |<----- >|          |  HCall  |         |
> > +  +--+  DRC2  +----------+         +---------+
> > +
> > +PHYP terms shared hardware resources like PCI devices, NVDimms etc available for
> > +use by LPARs as Dynamic Resource (DR). When a DR is allocated to an LPAR, PHYP
> > +creates a data-structure called Dynamic Resource Connector (DRC) to manage LPAR
> > +access. An LPAR refers to a DRC via an opaque 32-bit number called DRC-Index.
> > +The DRC-index value is provided to the LPAR via device-tree where its present
> > +as an attribute in the device tree node associated with the DR.  
> 
> Should you use the term 'Hypervisor' instead of 'PHYP' which is not usually 
> designing only the proprietary one ?
> 
> Thanks,
> Laurent.
> 
> > +
> > +HCALL Op-codes
> > +==============
> > +
> > +Below is a partial of of HCALLs that are supported by PHYP. For the
                        ^^ list?

Thanks

Michal
> > +corresponding opcode values please look into the header
> > +'arch/powerpc/include/asm/hvcall.h' :
> > +
> > +* H_SCM_READ_METADATA:
> > +  Input: drcIndex, offset, buffer-address, numBytesToRead
> > +  Out: None
> > +  Description:
> > +  Given a DRC Index of an NVDimm, read N-bytes from the the meta data area
> > +  associated with it, at a specified offset and copy it to provided buffer.
> > +  The metadata area stores configuration information such as label information,
> > +  bad-blocks etc. The metadata area is located out-of-band of NVDimm storage
> > +  area hence a separate access semantics is provided.
> > +
> > +* H_SCM_WRITE_METADATA:
> > +  Input: drcIndex, offset, data, numBytesToWrite
> > +  Out: None
> > +  Description:
> > +  Given a DRC Index of an NVDimm, write N-bytes from provided buffer at the
> > +  given offset to the the meta data area associated with the NVDimm.
> > +
> > +
> > +* H_SCM_BIND_MEM:
> > +  Input: drcIndex, startingScmBlockIndex, numScmBlocksToBind, targetAddress
> > +  Out: guestMappedAddress, numScmBlockBound
> > +  Description:
> > +  Given a DRC-Index of an NVDimm, maps the SCM (Storage Class Memory) blocks to
> > +  continuous logical addresses in guest physical address space. The HCALL
> > +  arguments can be used to map partial range of SCM blocks instead of entire
> > +  NVDimm range to the LPAR.
> > +
> > +* H_SCM_UNBIND_MEM:
> > +  Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
> > +  Out: numScmBlocksUnbound
> > +  Description:
> > +  Given a DRC-Index of an NVDimm, unmap one or more the SCM blocks from guest
> > +  physical address space. The HCALL can fail if the Guest has an active PTE
> > +  entry to the SCM block being unbinded.
> > +
> > +* H_SCM_QUERY_BLOCK_MEM_BINDING:
> > +  Input: drcIndex, scmBlockIndex
> > +  Out: Guest-Physical-Address
> > +  Description:
> > +  Given a DRC-Index and an SCM Block index return the guest physical address to
> > +  which the SCM block is mapped to.
> > +
> > +* H_SCM_QUERY_LOGICAL_MEM_BINDING:
> > +  Input: Guest-Physical-Address
> > +  Out: drcIndex, scmBlockIndex
> > +  Description:
> > +  Given a guest physical address return which DRC Index and SCM block is mapped
> > +  to that address.
> > +
> > +* H_SCM_UNBIND_ALL:
> > +  Input: scmTargetScope, drcIndex
> > +  Out: None
> > +  Description:
> > +  Depending on the Target scope unmap all scm blocks belonging to all NVDimms
> > +  or all scm blocks belonging to a single NVDimm identified by its drcIndex
> > +  from the LPAR memory.
> > +
> > +* H_SCM_HEALTH:
> > +  Input: drcIndex
> > +  Output: health-bitmap, health-bit-valid-bitmap
> > +  Description:
> > +  Given a DRC Index return the info on predictive failure and over all health of
> > +  the NVDimm. The asserted bits in the health-bitmap indicate a single predictive
> > +  failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
> > +  valid.
> > +
> > +
> > +* H_SCM_PERFORMANCE_STATS:
> > +  Input: drcIndex, resultBuffer Addr
> > +  Out: None
> > +  Description:
> > +  Given a DRC Index collect the performance statistics for NVDimm and copy them
> > +  to the resultBuffer.
> > +
> > +
> > +References
> > +==========
> > +[1]: "Linux on Power Architecture Platform Reference"
> > +     https://members.openpowerfoundation.org/document/dl/469
> >   
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/4] powerpc/pseries: Update SCM hcall op-codes in hvcall.h
  2019-07-23 16:13 ` [PATCH v5 2/4] powerpc/pseries: Update SCM hcall op-codes in hvcall.h Vaibhav Jain
@ 2019-07-26  8:53   ` David Gibson
  2019-07-30 11:26     ` Michael Ellerman
  0 siblings, 1 reply; 16+ messages in thread
From: David Gibson @ 2019-07-26  8:53 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Oliver O'Halloran, Laurent Dufour, linuxppc-dev, Aneesh Kumar K . V

[-- Attachment #1: Type: text/plain, Size: 2197 bytes --]

On Tue, Jul 23, 2019 at 09:43:55PM +0530, Vaibhav Jain wrote:
> Update the hvcalls.h to include op-codes for new hcalls introduce to
> manage SCM memory. Also update existing hcall definitions to reflect
> current papr specification for SCM.
> 
> The removed hcall op-codes H_SCM_MEM_QUERY, H_SCM_BLOCK_CLEAR were
> transient proposals and there support was never implemented by
> Power-VM nor they were used anywhere in Linux kernel. Hence we don't
> expect anyone to be impacted by this change.
> 
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>

They really should not have been merged while only interim proposals.
But since they have changed, better to update them than not, obviously.

> ---
> Change-log:
> 
> v5:
> * None. Re-spinning the patchset.
> 
> v4:
> * Updated the patch description mentioned current status of removed
>   hcall opcodes. [Mpe]
> 
> v3:
> * Added updated opcode for H_SCM_HEALTH [Oliver]
> 
> v2:
> * None new patch in this series.
> ---
>  arch/powerpc/include/asm/hvcall.h | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
> index 463c63a9fcf1..11112023e327 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -302,9 +302,14 @@
>  #define H_SCM_UNBIND_MEM        0x3F0
>  #define H_SCM_QUERY_BLOCK_MEM_BINDING 0x3F4
>  #define H_SCM_QUERY_LOGICAL_MEM_BINDING 0x3F8
> -#define H_SCM_MEM_QUERY	        0x3FC
> -#define H_SCM_BLOCK_CLEAR       0x400
> -#define MAX_HCALL_OPCODE	H_SCM_BLOCK_CLEAR
> +#define H_SCM_UNBIND_ALL        0x3FC
> +#define H_SCM_HEALTH            0x400
> +#define H_SCM_PERFORMANCE_STATS 0x418
> +#define MAX_HCALL_OPCODE	H_SCM_PERFORMANCE_STATS
> +
> +/* Scope args for H_SCM_UNBIND_ALL */
> +#define H_UNBIND_SCOPE_ALL (0x1)
> +#define H_UNBIND_SCOPE_DRC (0x2)
>  
>  /* H_VIOCTL functions */
>  #define H_GET_VIOA_DUMP_SIZE	0x01

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [DOC][PATCH v5 1/4] powerpc: Document some HCalls for Storage Class Memory
  2019-07-24 12:59     ` Michal Suchánek
@ 2019-07-30  5:28       ` Vaibhav Jain
  0 siblings, 0 replies; 16+ messages in thread
From: Vaibhav Jain @ 2019-07-30  5:28 UTC (permalink / raw)
  To: Michal Suchánek, Laurent Dufour
  Cc: Oliver O'Halloran, David Gibson, linuxppc-dev, Aneesh Kumar K . V


Thanks everyone for reviewing this patch-set. The V4 got merged upstream which
didn't have this DOC-patch. So, I will re-spin a separate and independent
doc-patch incorporating your review comments.

Cheers,
-- 
Vaibhav Jain <vaibhav@linux.ibm.com>
Linux Technology Center, IBM India Pvt. Ltd.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/4] powerpc/pseries: Update SCM hcall op-codes in hvcall.h
  2019-07-26  8:53   ` David Gibson
@ 2019-07-30 11:26     ` Michael Ellerman
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Ellerman @ 2019-07-30 11:26 UTC (permalink / raw)
  To: David Gibson, Vaibhav Jain
  Cc: Oliver O'Halloran, Laurent Dufour, linuxppc-dev, Aneesh Kumar K . V

David Gibson <david@gibson.dropbear.id.au> writes:
> On Tue, Jul 23, 2019 at 09:43:55PM +0530, Vaibhav Jain wrote:
>> Update the hvcalls.h to include op-codes for new hcalls introduce to
>> manage SCM memory. Also update existing hcall definitions to reflect
>> current papr specification for SCM.
>> 
>> The removed hcall op-codes H_SCM_MEM_QUERY, H_SCM_BLOCK_CLEAR were
>> transient proposals and there support was never implemented by
>> Power-VM nor they were used anywhere in Linux kernel. Hence we don't
>> expect anyone to be impacted by this change.
>> 
>> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
>
> They really should not have been merged while only interim proposals.
> But since they have changed, better to update them than not, obviously.

Yes, absolutely agree. It wasn't clear when I merged them that they were
*unpublished* and *unstable*.

Unfortunately we can't realistically wait for these APIs to be in a
published PAPR spec because it seems to take about 5 years for a PAPR
spec to escape the IBM firewall.

cheers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [DOC][PATCH v5 1/4] powerpc: Document some HCalls for Storage Class Memory
  2019-07-23 16:13 ` [DOC][PATCH v5 1/4] powerpc: Document some HCalls for Storage Class Memory Vaibhav Jain
  2019-07-24  9:08   ` Laurent Dufour
@ 2019-07-30 12:06   ` Michael Ellerman
  1 sibling, 0 replies; 16+ messages in thread
From: Michael Ellerman @ 2019-07-30 12:06 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev
  Cc: Vaibhav Jain, Laurent Dufour, Oliver O'Halloran,
	Aneesh Kumar K . V, David Gibson

Hi Vaibhav,

Thanks for writing this documentation.

Vaibhav Jain <vaibhav@linux.ibm.com> writes:
> This doc patch provides an initial description of the HCall op-codes
> that are used by Linux kernel running as a guest operating
> system (LPAR) on top of PowerVM or any other sPAPR compliant
> hyper-visor (e.g qemu).
>
> Apart from documenting the HCalls the doc-patch also provides a
> rudimentary overview of how Hcalls are implemented inside the Linux

I prefer "hcall" rather than "HCall", "Hcall" or "HCALL".

> kernel and how information flows between kernel and PowerVM/KVM.
>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Change-log:
>
> v5
> * First patch in this patchset.
> ---
>  Documentation/powerpc/hcalls.txt | 140 +++++++++++++++++++++++++++++++
>  1 file changed, 140 insertions(+)
>  create mode 100644 Documentation/powerpc/hcalls.txt

During this merge window all the existing documentation in there has
been converted to rst format. I suspect Jon will be annoyed at us if we
immediately start adding new plain text docs.

So we need to make this an rst file and do at least a minimal job of
making it valid ReST.

A while back I worked out how to actually build the rst docs, I'll try
and find my notes on that.

> diff --git a/Documentation/powerpc/hcalls.txt b/Documentation/powerpc/hcalls.txt
> new file mode 100644
> index 000000000000..cc9dd872cecd
> --- /dev/null
> +++ b/Documentation/powerpc/hcalls.txt

Can we call it papr_hcalls please.

> @@ -0,0 +1,140 @@
> +Hyper-visor Call Op-codes (HCALLS)
> +====================================

I'm not sure if "hyper-visor" is correct, but regardless that's not what
we commonly use, so please just spell it "hypervisor". And similarly for
"opcode".

> +Overview
> +=========
> +
> +Virtualization on PPC64 arch is based on the PAPR specification[1] which

We should probably say on "64-bit Power Book3S platforms".

Also the link you provide is to the "LoPAPR" specification. Which is not
quite the same as "PAPR", which used to be released via power.org, or
"PAPR+" which is the IBM internal version.


> +describes run-time environment for a guest operating system and how it should
> +interact with the hyper-visor for privileged operations. Currently there are two
> +PAPR compliant hypervisors (PHYP):

"PHYP" is only used as a another name for PowerVM (or part of PowerVM),
not as a generic term for a PAPR hypervisor.

I also don't think it's accurate to say Qemu/KVM is a "PAPR compliant"
hypervisor, it just implements (some of) the relevant parts of LoPAPR to
support Linux guests.

> +IBM PowerVM: IBM's proprietary hyper-visor that supports AIX, IBM-i and Linux as
> +	     supported guests (termed as Logical Partitions or LPARS).
> +
> +Qemu/KVM:    Supports PPC64 linux guests running on a PPC64 linux host.

And actually other hosts via Qemu TCG but that's a bit of a tangent.

> +On PPC64 arch a virtualized guest kernel runs in a non-privileged mode (HV=0).

"virtualized guest" is redundant, just "guest" is sufficient IMHO.

I know what you're trying to say there, but it actually contradicts the
language that's used in the ISA.

If you look at Chapter 1 of Book III it defines these terms:

  * hypervisor privileged
    A term used to describe an instruction or facility that is available
    only when the thread is in hypervisor state.
  * privileged state and supervisor mode
    Used interchangeably to refer to a state in which privileged
    facilities are available.
  * problem state and user mode
    Used interchangeably to refer to a state in which privileged
    facilities are not available.

So you might want to write it more like:

When running under a PAPR hypervisor the guest kernel runs in
supervisor mode (HV=0), and must issue hypercalls to the hypervisor
whenever it needs to perform an action that is hypervisor privileged or
for other services managed by the hypervisor.

> +Hence to perform a privileged operations the guest issues a Hyper-visor
> +Call (HCALL) with necessary input operands. PHYP after performing the privilege
> +operation returns a status code and output operands back to the guest.
> +
> +HCALL ABI
> +=========
> +The ABI specification for a HCall between guest os kernel and PHYP is
                                                   ^
                                                   "OS", but probably just
                                                   "guest kernel" is fine.

> +described in [1]. The Opcode for Hcall is set in R3 and subsequent in-arguments

Where in [1] ?

It's more common to spell the GPRs with lower case 'r', eg. 'r3'.

> +for the Hcall are provided in registers R4-R12. On return from      'HVCS'
                                                                 ^the  'HVSC'

> +instruction the status code of HCall is available in R3 an the output parameters
                   ^             ^                         ^
                   return value  the                       and

> +are returned in registers R4-R12.
> +
> +Powerpc arch code provides convenient wrappers named plpar_hcall_xxx defined in
> +header 'hvcall.h' to issue HCalls from the linux kernel running as guest.
> +
> +
> +DRC & DRC Indexes
> +=================
> +
> +		 PAPR		     Guest
> +  DR1          Hypervisor             OS
> +  +--+        +----------+         +---------+
> +  |  |<------>|          |         |  User   |
> +  +--+  DRC1  |          |   DRC   |  Space  |
> +              |          |  Index  +---------+
> +  DR2         |          |         |         |
> +  +--+        |          |<------->|  Kernel |
> +  |  |<----- >|          |  HCall  |         |
> +  +--+  DRC2  +----------+         +---------+
> +
> +PHYP terms shared hardware resources like PCI devices, NVDimms etc available for

DIMM is an acronym too, so it should be either nvdimm or NVDIMM IMHO.

> +use by LPARs as Dynamic Resource (DR). When a DR is allocated to an LPAR, PHYP
> +creates a data-structure called Dynamic Resource Connector (DRC) to manage LPAR
> +access. An LPAR refers to a DRC via an opaque 32-bit number called DRC-Index.
> +The DRC-index value is provided to the LPAR via device-tree where its present
> +as an attribute in the device tree node associated with the DR.
> +
> +HCALL Op-codes
> +==============
> +
> +Below is a partial of of HCALLs that are supported by PHYP. For the
> +corresponding opcode values please look into the header
> +'arch/powerpc/include/asm/hvcall.h' :
> +
> +* H_SCM_READ_METADATA:
> +  Input: drcIndex, offset, buffer-address, numBytesToRead
> +  Out: None
> +  Description:
> +  Given a DRC Index of an NVDimm, read N-bytes from the the meta data area
> +  associated with it, at a specified offset and copy it to provided buffer.
> +  The metadata area stores configuration information such as label information,
> +  bad-blocks etc. The metadata area is located out-of-band of NVDimm storage
> +  area hence a separate access semantics is provided.

Can you also document the possible return values? (If they are defined).

> +* H_SCM_WRITE_METADATA:
> +  Input: drcIndex, offset, data, numBytesToWrite
> +  Out: None
> +  Description:
> +  Given a DRC Index of an NVDimm, write N-bytes from provided buffer at the
> +  given offset to the the meta data area associated with the NVDimm.
> +
> +
> +* H_SCM_BIND_MEM:
> +  Input: drcIndex, startingScmBlockIndex, numScmBlocksToBind, targetAddress
> +  Out: guestMappedAddress, numScmBlockBound
> +  Description:
> +  Given a DRC-Index of an NVDimm, maps the SCM (Storage Class Memory) blocks to
> +  continuous logical addresses in guest physical address space. The HCALL
> +  arguments can be used to map partial range of SCM blocks instead of entire
> +  NVDimm range to the LPAR.

What address space do targetAddress and guestMappedAddress exist in?

What are these blocks and how do we find out about them? ie. how do I
know what a valid block index is.

> +* H_SCM_UNBIND_MEM:
> +  Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind

Similar question for startingScmLogicalMemoryAddress.

> +  Out: numScmBlocksUnbound
> +  Description:
> +  Given a DRC-Index of an NVDimm, unmap one or more the SCM blocks from guest
                                                      ^
                                                      of
> +  physical address space. The HCALL can fail if the Guest has an active PTE
> +  entry to the SCM block being unbinded.
                                  unbound

> +* H_SCM_QUERY_BLOCK_MEM_BINDING:
> +  Input: drcIndex, scmBlockIndex
> +  Out: Guest-Physical-Address
> +  Description:
> +  Given a DRC-Index and an SCM Block index return the guest physical address to
> +  which the SCM block is mapped to.
> +
> +* H_SCM_QUERY_LOGICAL_MEM_BINDING:
> +  Input: Guest-Physical-Address
> +  Out: drcIndex, scmBlockIndex
> +  Description:
> +  Given a guest physical address return which DRC Index and SCM block is mapped
> +  to that address.
> +
> +* H_SCM_UNBIND_ALL:
> +  Input: scmTargetScope, drcIndex
> +  Out: None
> +  Description:
> +  Depending on the Target scope unmap all scm blocks belonging to all NVDimms
                                             ^
                                             SCM for consistency.
> +  or all scm blocks belonging to a single NVDimm identified by its drcIndex
> +  from the LPAR memory.
> +
> +* H_SCM_HEALTH:
> +  Input: drcIndex
> +  Output: health-bitmap, health-bit-valid-bitmap
> +  Description:
> +  Given a DRC Index return the info on predictive failure and over all health of
                                                                 overall
> +  the NVDimm. The asserted bits in the health-bitmap indicate a single predictive
> +  failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
> +  valid.

Presumably using bit endian bit ordering?


> +* H_SCM_PERFORMANCE_STATS:
> +  Input: drcIndex, resultBuffer Addr
> +  Out: None
> +  Description:
> +  Given a DRC Index collect the performance statistics for NVDimm and copy them
> +  to the resultBuffer.
> +
> +
> +References
> +==========
> +[1]: "Linux on Power Architecture Platform Reference"
> +     https://members.openpowerfoundation.org/document/dl/469


cheers

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-07-30 12:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-23 16:13 [PATCH v5 0/4] powerpc/papr_scm: Workaround for failure of drc bind after kexec Vaibhav Jain
2019-07-23 16:13 ` [DOC][PATCH v5 1/4] powerpc: Document some HCalls for Storage Class Memory Vaibhav Jain
2019-07-24  9:08   ` Laurent Dufour
2019-07-24 12:59     ` Michal Suchánek
2019-07-30  5:28       ` Vaibhav Jain
2019-07-30 12:06   ` Michael Ellerman
2019-07-23 16:13 ` [PATCH v5 2/4] powerpc/pseries: Update SCM hcall op-codes in hvcall.h Vaibhav Jain
2019-07-26  8:53   ` David Gibson
2019-07-30 11:26     ` Michael Ellerman
2019-07-23 16:13 ` [PATCH v5 3/4] powerpc/papr_scm: Update drc_pmem_unbind() to use H_SCM_UNBIND_ALL Vaibhav Jain
2019-07-23 16:13 ` [PATCH v5 4/4] powerpc/papr_scm: Force a scm-unbind if initial scm-bind fails Vaibhav Jain
2019-07-24  9:17   ` Laurent Dufour
2019-07-24  9:24     ` Oliver O'Halloran
2019-07-24  9:27       ` Laurent Dufour
2019-07-24  9:45         ` Oliver O'Halloran
2019-07-24 10:04 ` [PATCH v5 0/4] powerpc/papr_scm: Workaround for failure of drc bind after kexec Aneesh Kumar K.V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.