linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [DOC][PATCH] powerpc: Provide initial documentation for PAPR hcalls
@ 2019-08-27 15:23 Vaibhav Jain
  2019-08-27 15:52 ` Laurent Dufour
  2019-08-28  1:09 ` Nicholas Piggin
  0 siblings, 2 replies; 4+ messages in thread
From: Vaibhav Jain @ 2019-08-27 15:23 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: msuchanek, Oliver O'Halloran, Aneesh Kumar K . V,
	Vaibhav Jain, Laurent Dufour, David Gibson

This doc patch provides an initial description of the hcall op-codes
that are used by Linux kernel running as a guest (LPAR) on top of
PowerVM or any other sPAPR compliant hyper-visor (e.g qemu).

Apart from documenting the hcalls the doc-patch also provides a
rudimentary overview of how hcall ABI, how they are issued with the
Linux kernel and how information/control flows between the guest and
hypervisor.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
---
Change-log:

Initial version of this doc-patch was posted and reviewed as part of
the patch-series "[PATCH v5 0/4] powerpc/papr_scm: Workaround for
failure of drc bind after kexec"
https://patchwork.ozlabs.org/patch/1136022/. Changes introduced on top
the original patch:

* Replaced the of term PHYP with Hypervisor to indicate both
PowerVM/Qemu [Laurent]
* Emphasized that In/Out arguments to hcalls are in Big-endian format
[Laurent]
* Fixed minor word repetition, spell issues and grammatical error
[Michal, Mpe]
* Replaced various variant of term 'hcall' with a single
variant. [Mpe]
* Changed the documentation format from txt to ReST. [Mpe]
* Changed the name of documentation file to papr_hcalls.rst. [Mpe]
* Updated the section describing privileged operation by hypervisor
to be more accurate [Mpe].
* Fixed up mention of register notation used for describing
hcalls. [Mpe]
* s/NVDimm/NVDIMM [Mpe]
* Added section on return values from hcall [Mpe]
* Described H_CONTINUE return-value for long running hcalls.
---
 Documentation/powerpc/papr_hcalls.rst | 200 ++++++++++++++++++++++++++
 1 file changed, 200 insertions(+)
 create mode 100644 Documentation/powerpc/papr_hcalls.rst

diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
new file mode 100644
index 000000000000..7afc0310de29
--- /dev/null
+++ b/Documentation/powerpc/papr_hcalls.rst
@@ -0,0 +1,200 @@
+===========================
+Hypercall Op-codes (hcalls)
+===========================
+
+Overview
+=========
+
+Virtualization on 64-bit Power Book3S Platforms is based on the PAPR
+specification [1]_ which describes the run-time environment for a guest
+operating system and how it should interact with the hypervisor for
+privileged operations. Currently there are two PAPR compliant hypervisors:
+
+- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
+  IBM-i and  Linux as supported guests (termed as Logical Partitions
+  or LPARS). It supports the full PAPR specification.
+
+- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
+  Though it only implements a subset of PAPR specification called LoPAPR [2]_.
+
+On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
+a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
+issue hypercalls to the hypervisor whenever it needs to perform an action
+that is hypervisor priviledged [3]_ or for other services managed by the
+hypervisor.
+
+Hence a Hypercall (hcall) is essentially a request by the pSeries guest
+asking hypervisor to perform a privileged operation on behalf of the guest. The
+guest issues a with necessary input operands. The hypervisor after performing
+the privilege operation returns a status code and output operands back to the
+guest.
+
+HCALL ABI
+=========
+The ABI specification for a hcall between a pSeries guest and PAPR hypervisor
+is covered in section 14.5.3 of ref [2]_. Switch to the  Hypervisor context is
+done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
+and any in-arguments for the hcall are provided in registers *r4-r12* in
+Big-endian byte order.
+
+Once control is returns back to the guest after hypervisor has serviced the
+'HVCS' instruction the return value of the hcall is available in *r3* and any
+out values are returned in registers *r4-r12*. Again like in-arguments, all the
+out value are in Big-endian byte order.
+
+Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined
+in a arch specific header [4]_ to issue hcalls from the linux kernel
+running as pseries guest.
+
+DRC & DRC Indexes
+=================
+::
+
+     DR1                                  Guest
+     +--+        +------------+         +---------+
+     |  | <----> |            |         |  User   |
+     +--+  DRC1  |            |   DRC   |  Space  |
+                 |    PAPR    |  Index  +---------+
+     DR2         | Hypervisor |         |         |
+     +--+        |            | <-----> |  Kernel |
+     |  | <----> |            |  Hcall  |         |
+     +--+  DRC2  +------------+         +---------+
+
+PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc
+available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to
+an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC)
+to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number
+called DRC-Index. The DRC-index value is provided to the LPAR via device-tree
+where its present as an attribute in the device tree node associated with the
+DR.
+
+HCALL Return-values
+===================
+
+After servicing the hcall, hypervisor sets the return-value in *r3* indicating
+success or failure of the hcall. In case of a failure an error code indicates
+the cause for error. These codes are defined and documented in arch specific
+header [4]_.
+
+In some cases a hcall can potentially take a long time and need to be issued
+multiple times in order to be completely serviced. These hcalls will usually
+accept an opaque value *continue-token* within there argument list and a
+return value of *H_CONTINUE* indicates that hypervisor hasn't still finished
+servicing the hcall yet.
+
+To make such hcalls the guest need to set *continue-token == 0* for the
+initial call and use the hypervisor returned value of *continue-token*
+for each subsequent hcall until hypervisor returns a non *H_CONTINUE*
+return value.
+
+HCALL Op-codes
+==============
+
+Below is a partial list of HCALLs that are supported by PHYP. For the
+corresponding opcode values please look into the arch specific header [4]_:
+
+**H_SCM_READ_METADATA**
+
+| Input: *drcIndex, offset, buffer-address, numBytesToRead*
+| Out: *numBytesRead*
+| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware*
+
+Given a DRC Index of an NVDIMM, read N-bytes from the the metadata area
+associated with it, at a specified offset and copy it to provided buffer.
+The metadata area stores configuration information such as label information,
+bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage
+area hence a separate access semantics is provided.
+
+**H_SCM_WRITE_METADATA**
+
+| Input: *drcIndex, offset, data, numBytesToWrite*
+| Out: *None*
+| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware*
+
+Given a DRC Index of an NVDIMM, write N-bytes to the metadata area
+associated with it, at the specified offset and from the provided buffer.
+
+**H_SCM_BIND_MEM**
+
+| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,*
+| *targetLogicalMemoryAddress, continue-token*
+| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound*
+| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,*
+| *H_Too_Big, H_P5, H_Busy*
+
+Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range
+*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest
+at *targetLogicalMemoryAddress* within guest physical address space. In
+case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor
+assigns a target address to the guest. The HCALL can fail if the Guest has
+an active PTE entry to the SCM block being bound.
+
+**H_SCM_UNBIND_MEM**
+| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
+| Out: numScmBlocksUnbound
+| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,*
+| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
+
+Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting
+at *startingScmLogicalMemoryAddress* from guest physical address space. The
+HCALL can fail if the Guest has an active PTE entry to the SCM block being
+unbound.
+
+**H_SCM_QUERY_BLOCK_MEM_BINDING**
+
+| Input: *drcIndex, scmBlockIndex*
+| Out: *Guest-Physical-Address*
+| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
+
+Given a DRC-Index and an SCM Block index return the guest physical address to
+which the SCM block is mapped to.
+
+**H_SCM_QUERY_LOGICAL_MEM_BINDING**
+
+| Input: *Guest-Physical-Address*
+| Out: *drcIndex, scmBlockIndex*
+| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
+
+Given a guest physical address return which DRC Index and SCM block is mapped
+to that address.
+
+**H_SCM_UNBIND_ALL**
+
+| Input: *scmTargetScope, drcIndex*
+| Out: *None*
+| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,*
+| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
+
+Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs
+or all SCM blocks belonging to a single NVDIMM identified by its drcIndex
+from the LPAR memory.
+
+**H_SCM_HEALTH**
+
+| Input: drcIndex
+| Out: *health-bitmap, health-bit-valid-bitmap*
+| Return Value: *H_Success, H_Parameter, H_Hardware*
+
+Given a DRC Index return the info on predictive failure and overall health of
+the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
+failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
+valid.
+
+**H_SCM_PERFORMANCE_STATS**
+
+| Input: drcIndex, resultBuffer Addr
+| Out: None
+| Return Value:  *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege*
+
+Given a DRC Index collect the performance statistics for NVDIMM and copy them
+to the resultBuffer.
+
+References
+==========
+.. [1] "Power Architecture Platform Reference"
+       https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
+.. [2] "Linux on Power Architecture Platform Reference"
+       https://members.openpowerfoundation.org/document/dl/469
+.. [3] "Definitions and Notation" Book III-Section 14.5.3
+       https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
+.. [4] arch/powerpc/include/asm/hvcall.h
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [DOC][PATCH] powerpc: Provide initial documentation for PAPR hcalls
  2019-08-27 15:23 [DOC][PATCH] powerpc: Provide initial documentation for PAPR hcalls Vaibhav Jain
@ 2019-08-27 15:52 ` Laurent Dufour
  2019-08-28 13:24   ` Michael Ellerman
  2019-08-28  1:09 ` Nicholas Piggin
  1 sibling, 1 reply; 4+ messages in thread
From: Laurent Dufour @ 2019-08-27 15:52 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev
  Cc: Aneesh Kumar K . V, msuchanek, Oliver O'Halloran, David Gibson

Le 27/08/2019 à 17:23, Vaibhav Jain a écrit :
> This doc patch provides an initial description of the hcall op-codes
> that are used by Linux kernel running as a guest (LPAR) on top of
> PowerVM or any other sPAPR compliant hyper-visor (e.g qemu).
> 
> Apart from documenting the hcalls the doc-patch also provides a
> rudimentary overview of how hcall ABI, how they are issued with the
> Linux kernel and how information/control flows between the guest and
> hypervisor.
> 
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>

Hi Vaibhav,

Thanks for documenting this.

Besides my few remarks below, please consider:

Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>

> ---
> Change-log:
> 
> Initial version of this doc-patch was posted and reviewed as part of
> the patch-series "[PATCH v5 0/4] powerpc/papr_scm: Workaround for
> failure of drc bind after kexec"
> https://patchwork.ozlabs.org/patch/1136022/. Changes introduced on top
> the original patch:
> 
> * Replaced the of term PHYP with Hypervisor to indicate both
> PowerVM/Qemu [Laurent]
> * Emphasized that In/Out arguments to hcalls are in Big-endian format
> [Laurent]
> * Fixed minor word repetition, spell issues and grammatical error
> [Michal, Mpe]
> * Replaced various variant of term 'hcall' with a single
> variant. [Mpe]
> * Changed the documentation format from txt to ReST. [Mpe]
> * Changed the name of documentation file to papr_hcalls.rst. [Mpe]
> * Updated the section describing privileged operation by hypervisor
> to be more accurate [Mpe].
> * Fixed up mention of register notation used for describing
> hcalls. [Mpe]
> * s/NVDimm/NVDIMM [Mpe]
> * Added section on return values from hcall [Mpe]
> * Described H_CONTINUE return-value for long running hcalls.
> ---
>   Documentation/powerpc/papr_hcalls.rst | 200 ++++++++++++++++++++++++++
>   1 file changed, 200 insertions(+)
>   create mode 100644 Documentation/powerpc/papr_hcalls.rst
> 
> diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
> new file mode 100644
> index 000000000000..7afc0310de29
> --- /dev/null
> +++ b/Documentation/powerpc/papr_hcalls.rst
> @@ -0,0 +1,200 @@
> +===========================
> +Hypercall Op-codes (hcalls)
> +===========================
> +
> +Overview
> +=========
> +
> +Virtualization on 64-bit Power Book3S Platforms is based on the PAPR
> +specification [1]_ which describes the run-time environment for a guest
> +operating system and how it should interact with the hypervisor for
> +privileged operations. Currently there are two PAPR compliant hypervisors:
> +
> +- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
> +  IBM-i and  Linux as supported guests (termed as Logical Partitions
> +  or LPARS). It supports the full PAPR specification.
> +
> +- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
> +  Though it only implements a subset of PAPR specification called LoPAPR [2]_.
> +
> +On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
> +a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
> +issue hypercalls to the hypervisor whenever it needs to perform an action
> +that is hypervisor priviledged [3]_ or for other services managed by the
> +hypervisor.
> +
> +Hence a Hypercall (hcall) is essentially a request by the pSeries guest
> +asking hypervisor to perform a privileged operation on behalf of the guest. The
> +guest issues a with necessary input operands. The hypervisor after performing
                  ^ hcall ?

> +the privilege operation returns a status code and output operands back to the
> +guest.
> +
> +HCALL ABI
> +=========
> +The ABI specification for a hcall between a pSeries guest and PAPR hypervisor
> +is covered in section 14.5.3 of ref [2]_. Switch to the  Hypervisor context is
> +done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
> +and any in-arguments for the hcall are provided in registers *r4-r12* in
> +Big-endian byte order.
Indeed, register valuer are not byte ordered, only values passed through 
buffer in memory are byte ordered.

Should it be explicitly said that Big-endian order is only concerning data 
stored in memory?
What about something like that:
"...any in-arguments for the hcall are provided in registers *r4-r12*. If 
values have to be passed through a memory buffer, the data stored in that 
buffer are in Big-endian order."

> +
> +Once control is returns back to the guest after hypervisor has serviced the
> +'HVCS' instruction the return value of the hcall is available in *r3* and any
> +out values are returned in registers *r4-r12*. Again like in-arguments, all the
> +out value are in Big-endian byte order.
Same would apply here.

> +
> +Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined
> +in a arch specific header [4]_ to issue hcalls from the linux kernel
> +running as pseries guest.
> +
> +DRC & DRC Indexes
> +=================
> +::
> +
> +     DR1                                  Guest
> +     +--+        +------------+         +---------+
> +     |  | <----> |            |         |  User   |
> +     +--+  DRC1  |            |   DRC   |  Space  |
> +                 |    PAPR    |  Index  +---------+
> +     DR2         | Hypervisor |         |         |
> +     +--+        |            | <-----> |  Kernel |
> +     |  | <----> |            |  Hcall  |         |
> +     +--+  DRC2  +------------+         +---------+
> +
> +PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc
> +available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to
> +an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC)
> +to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number
> +called DRC-Index. The DRC-index value is provided to the LPAR via device-tree
> +where its present as an attribute in the device tree node associated with the
> +DR.
> +
> +HCALL Return-values
> +===================
> +
> +After servicing the hcall, hypervisor sets the return-value in *r3* indicating
> +success or failure of the hcall. In case of a failure an error code indicates
> +the cause for error. These codes are defined and documented in arch specific
> +header [4]_.
> +
> +In some cases a hcall can potentially take a long time and need to be issued
> +multiple times in order to be completely serviced. These hcalls will usually
> +accept an opaque value *continue-token* within there argument list and a
> +return value of *H_CONTINUE* indicates that hypervisor hasn't still finished
> +servicing the hcall yet.
> +
> +To make such hcalls the guest need to set *continue-token == 0* for the
> +initial call and use the hypervisor returned value of *continue-token*
> +for each subsequent hcall until hypervisor returns a non *H_CONTINUE*
> +return value.
> +
> +HCALL Op-codes
> +==============
> +
> +Below is a partial list of HCALLs that are supported by PHYP. For the
> +corresponding opcode values please look into the arch specific header [4]_:
> +
> +**H_SCM_READ_METADATA**
> +
> +| Input: *drcIndex, offset, buffer-address, numBytesToRead*
> +| Out: *numBytesRead*
> +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware*
> +
> +Given a DRC Index of an NVDIMM, read N-bytes from the the metadata area
> +associated with it, at a specified offset and copy it to provided buffer.
> +The metadata area stores configuration information such as label information,
> +bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage
> +area hence a separate access semantics is provided.
> +
> +**H_SCM_WRITE_METADATA**
> +
> +| Input: *drcIndex, offset, data, numBytesToWrite*
> +| Out: *None*
> +| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware*
> +
> +Given a DRC Index of an NVDIMM, write N-bytes to the metadata area
> +associated with it, at the specified offset and from the provided buffer.
> +
> +**H_SCM_BIND_MEM**
> +
> +| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,*
> +| *targetLogicalMemoryAddress, continue-token*
> +| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound*
> +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,*
> +| *H_Too_Big, H_P5, H_Busy*
> +
> +Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range
> +*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest
> +at *targetLogicalMemoryAddress* within guest physical address space. In
> +case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor
> +assigns a target address to the guest. The HCALL can fail if the Guest has
> +an active PTE entry to the SCM block being bound.
> +
> +**H_SCM_UNBIND_MEM**
> +| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
> +| Out: numScmBlocksUnbound
> +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,*
> +| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
> +
> +Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting
> +at *startingScmLogicalMemoryAddress* from guest physical address space. The
> +HCALL can fail if the Guest has an active PTE entry to the SCM block being
> +unbound.
> +
> +**H_SCM_QUERY_BLOCK_MEM_BINDING**
> +
> +| Input: *drcIndex, scmBlockIndex*
> +| Out: *Guest-Physical-Address*
> +| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
> +
> +Given a DRC-Index and an SCM Block index return the guest physical address to
> +which the SCM block is mapped to.
> +
> +**H_SCM_QUERY_LOGICAL_MEM_BINDING**
> +
> +| Input: *Guest-Physical-Address*
> +| Out: *drcIndex, scmBlockIndex*
> +| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
> +
> +Given a guest physical address return which DRC Index and SCM block is mapped
> +to that address.
> +
> +**H_SCM_UNBIND_ALL**
> +
> +| Input: *scmTargetScope, drcIndex*
> +| Out: *None*
> +| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,*
> +| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
> +
> +Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs
> +or all SCM blocks belonging to a single NVDIMM identified by its drcIndex
> +from the LPAR memory.
> +
> +**H_SCM_HEALTH**
> +
> +| Input: drcIndex
> +| Out: *health-bitmap, health-bit-valid-bitmap*
> +| Return Value: *H_Success, H_Parameter, H_Hardware*
> +
> +Given a DRC Index return the info on predictive failure and overall health of
> +the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
> +failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
> +valid.
> +
> +**H_SCM_PERFORMANCE_STATS**
> +
> +| Input: drcIndex, resultBuffer Addr
> +| Out: None
> +| Return Value:  *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege*
> +
> +Given a DRC Index collect the performance statistics for NVDIMM and copy them
> +to the resultBuffer.
> +
> +References
> +==========
> +.. [1] "Power Architecture Platform Reference"
> +       https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
> +.. [2] "Linux on Power Architecture Platform Reference"
> +       https://members.openpowerfoundation.org/document/dl/469
> +.. [3] "Definitions and Notation" Book III-Section 14.5.3
> +       https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
> +.. [4] arch/powerpc/include/asm/hvcall.h
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [DOC][PATCH] powerpc: Provide initial documentation for PAPR hcalls
  2019-08-27 15:23 [DOC][PATCH] powerpc: Provide initial documentation for PAPR hcalls Vaibhav Jain
  2019-08-27 15:52 ` Laurent Dufour
@ 2019-08-28  1:09 ` Nicholas Piggin
  1 sibling, 0 replies; 4+ messages in thread
From: Nicholas Piggin @ 2019-08-28  1:09 UTC (permalink / raw)
  To: linuxppc-dev, Vaibhav Jain
  Cc: msuchanek, Aneesh Kumar K . V, Laurent Dufour,
	Oliver O'Halloran, David Gibson

Vaibhav Jain's on August 28, 2019 1:23 am:
> This doc patch provides an initial description of the hcall op-codes
> that are used by Linux kernel running as a guest (LPAR) on top of
> PowerVM or any other sPAPR compliant hyper-visor (e.g qemu).
> 
> Apart from documenting the hcalls the doc-patch also provides a
> rudimentary overview of how hcall ABI, how they are issued with the
> Linux kernel and how information/control flows between the guest and
> hypervisor.
> 
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
> Change-log:
> 
> Initial version of this doc-patch was posted and reviewed as part of
> the patch-series "[PATCH v5 0/4] powerpc/papr_scm: Workaround for
> failure of drc bind after kexec"
> https://patchwork.ozlabs.org/patch/1136022/. Changes introduced on top
> the original patch:
> 
> * Replaced the of term PHYP with Hypervisor to indicate both
> PowerVM/Qemu [Laurent]
> * Emphasized that In/Out arguments to hcalls are in Big-endian format
> [Laurent]
> * Fixed minor word repetition, spell issues and grammatical error
> [Michal, Mpe]
> * Replaced various variant of term 'hcall' with a single
> variant. [Mpe]
> * Changed the documentation format from txt to ReST. [Mpe]
> * Changed the name of documentation file to papr_hcalls.rst. [Mpe]
> * Updated the section describing privileged operation by hypervisor
> to be more accurate [Mpe].
> * Fixed up mention of register notation used for describing
> hcalls. [Mpe]
> * s/NVDimm/NVDIMM [Mpe]
> * Added section on return values from hcall [Mpe]
> * Described H_CONTINUE return-value for long running hcalls.
> ---
>  Documentation/powerpc/papr_hcalls.rst | 200 ++++++++++++++++++++++++++
>  1 file changed, 200 insertions(+)
>  create mode 100644 Documentation/powerpc/papr_hcalls.rst
> 
> diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
> new file mode 100644
> index 000000000000..7afc0310de29
> --- /dev/null
> +++ b/Documentation/powerpc/papr_hcalls.rst
> @@ -0,0 +1,200 @@
> +===========================
> +Hypercall Op-codes (hcalls)
> +===========================
> +
> +Overview
> +=========
> +
> +Virtualization on 64-bit Power Book3S Platforms is based on the PAPR
> +specification [1]_ which describes the run-time environment for a guest
> +operating system and how it should interact with the hypervisor for
> +privileged operations. Currently there are two PAPR compliant hypervisors:
> +
> +- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
> +  IBM-i and  Linux as supported guests (termed as Logical Partitions
> +  or LPARS). It supports the full PAPR specification.
> +
> +- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
> +  Though it only implements a subset of PAPR specification called LoPAPR [2]_.
> +
> +On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
> +a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
> +issue hypercalls to the hypervisor whenever it needs to perform an action
> +that is hypervisor priviledged [3]_ or for other services managed by the
> +hypervisor.
> +
> +Hence a Hypercall (hcall) is essentially a request by the pSeries guest
> +asking hypervisor to perform a privileged operation on behalf of the guest. The
> +guest issues a with necessary input operands. The hypervisor after performing
> +the privilege operation returns a status code and output operands back to the
> +guest.
> +
> +HCALL ABI
> +=========
> +The ABI specification for a hcall between a pSeries guest and PAPR hypervisor
> +is covered in section 14.5.3 of ref [2]_. Switch to the  Hypervisor context is
> +done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
> +and any in-arguments for the hcall are provided in registers *r4-r12* in
> +Big-endian byte order.
> +
> +Once control is returns back to the guest after hypervisor has serviced the
> +'HVCS' instruction the return value of the hcall is available in *r3* and any
> +out values are returned in registers *r4-r12*. Again like in-arguments, all the
> +out value are in Big-endian byte order.
> +
> +Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined
> +in a arch specific header [4]_ to issue hcalls from the linux kernel
> +running as pseries guest.

Thanks for this. Any chance you could replace the hcall convention in
exception-64s.S with a link to this document, and add it in here? It
needs a small fix or two as well, I think I put an ePAPR convention of
r11 for number in there.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [DOC][PATCH] powerpc: Provide initial documentation for PAPR hcalls
  2019-08-27 15:52 ` Laurent Dufour
@ 2019-08-28 13:24   ` Michael Ellerman
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Ellerman @ 2019-08-28 13:24 UTC (permalink / raw)
  To: Laurent Dufour, Vaibhav Jain, linuxppc-dev
  Cc: Aneesh Kumar K . V, msuchanek, Oliver O'Halloran, David Gibson

Laurent Dufour <ldufour@linux.vnet.ibm.com> writes:
> Le 27/08/2019 à 17:23, Vaibhav Jain a écrit :
>> This doc patch provides an initial description of the hcall op-codes
>> that are used by Linux kernel running as a guest (LPAR) on top of
>> PowerVM or any other sPAPR compliant hyper-visor (e.g qemu).
>> 
>> Apart from documenting the hcalls the doc-patch also provides a
>> rudimentary overview of how hcall ABI, how they are issued with the
>> Linux kernel and how information/control flows between the guest and
>> hypervisor.
>> 
>> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
>
> Hi Vaibhav,
>
> Thanks for documenting this.
>
> Besides my few remarks below, please consider:
>
> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
>
>> ---
>> Change-log:
>> 
>> Initial version of this doc-patch was posted and reviewed as part of
>> the patch-series "[PATCH v5 0/4] powerpc/papr_scm: Workaround for
>> failure of drc bind after kexec"
>> https://patchwork.ozlabs.org/patch/1136022/. Changes introduced on top
>> the original patch:
>> 
>> * Replaced the of term PHYP with Hypervisor to indicate both
>> PowerVM/Qemu [Laurent]
>> * Emphasized that In/Out arguments to hcalls are in Big-endian format
>> [Laurent]
>> * Fixed minor word repetition, spell issues and grammatical error
>> [Michal, Mpe]
>> * Replaced various variant of term 'hcall' with a single
>> variant. [Mpe]
>> * Changed the documentation format from txt to ReST. [Mpe]
>> * Changed the name of documentation file to papr_hcalls.rst. [Mpe]
>> * Updated the section describing privileged operation by hypervisor
>> to be more accurate [Mpe].
>> * Fixed up mention of register notation used for describing
>> hcalls. [Mpe]
>> * s/NVDimm/NVDIMM [Mpe]
>> * Added section on return values from hcall [Mpe]
>> * Described H_CONTINUE return-value for long running hcalls.
>> ---
>>   Documentation/powerpc/papr_hcalls.rst | 200 ++++++++++++++++++++++++++
>>   1 file changed, 200 insertions(+)
>>   create mode 100644 Documentation/powerpc/papr_hcalls.rst
>> 
>> diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
>> new file mode 100644
>> index 000000000000..7afc0310de29
>> --- /dev/null
>> +++ b/Documentation/powerpc/papr_hcalls.rst
>> @@ -0,0 +1,200 @@
>> +===========================
>> +Hypercall Op-codes (hcalls)
>> +===========================
>> +
>> +Overview
>> +=========
>> +
>> +Virtualization on 64-bit Power Book3S Platforms is based on the PAPR
>> +specification [1]_ which describes the run-time environment for a guest
>> +operating system and how it should interact with the hypervisor for
>> +privileged operations. Currently there are two PAPR compliant hypervisors:
>> +
>> +- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
>> +  IBM-i and  Linux as supported guests (termed as Logical Partitions
>> +  or LPARS). It supports the full PAPR specification.
>> +
>> +- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
>> +  Though it only implements a subset of PAPR specification called LoPAPR [2]_.
>> +
>> +On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
>> +a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
>> +issue hypercalls to the hypervisor whenever it needs to perform an action
>> +that is hypervisor priviledged [3]_ or for other services managed by the
>> +hypervisor.
>> +
>> +Hence a Hypercall (hcall) is essentially a request by the pSeries guest
>> +asking hypervisor to perform a privileged operation on behalf of the guest. The
>> +guest issues a with necessary input operands. The hypervisor after performing
>                   ^ hcall ?
>
>> +the privilege operation returns a status code and output operands back to the
>> +guest.
>> +
>> +HCALL ABI
>> +=========
>> +The ABI specification for a hcall between a pSeries guest and PAPR hypervisor
>> +is covered in section 14.5.3 of ref [2]_. Switch to the  Hypervisor context is
>> +done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
>> +and any in-arguments for the hcall are provided in registers *r4-r12* in
>> +Big-endian byte order.
> Indeed, register valuer are not byte ordered, only values passed through 
> buffer in memory are byte ordered.
>
> Should it be explicitly said that Big-endian order is only concerning data 
> stored in memory?
> What about something like that:
> "...any in-arguments for the hcall are provided in registers *r4-r12*. If 
> values have to be passed through a memory buffer, the data stored in that 
> buffer are in Big-endian order."

Yes that would be better.

I guess to be pedantic every structure passed in memory needs to be
defined in PAPR and could have some arbitrary ordering, but in practice
everything is big endian.

cheers

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-08-28 13:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-27 15:23 [DOC][PATCH] powerpc: Provide initial documentation for PAPR hcalls Vaibhav Jain
2019-08-27 15:52 ` Laurent Dufour
2019-08-28 13:24   ` Michael Ellerman
2019-08-28  1:09 ` Nicholas Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).