linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Documentation: hyperv: Add basic info on Hyper-V enlightenments
@ 2022-07-05 15:43 Michael Kelley
  2022-07-05 15:43 ` [PATCH 1/3] Documentation: hyperv: Add overview of " Michael Kelley
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Michael Kelley @ 2022-07-05 15:43 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, corbet, linux-kernel,
	linux-doc, linux-hyperv
  Cc: mikelley

This documentation is a high level overview to explain the basics
of Linux running as a guest on Hyper-V. The intent is to document
the forest, not the trees. The Hyper-V Top Level Functional Spec
provides conceptual material and API details for the core Hyper-V
hypervisor, and this documentation provides additional info on
how that functionality is applied to Linux. Also, there's no
public documentation on VMbus or the VMbus synthetic devices, so
this documentation helps fill that gap at a conceptual level. This
documentation is not API-level documentation, which can be seen
in the code and associated comments.

More topics will be added in future patches, including:

* Miscellaneous synthetic devices like KVP, timesync, VSS, etc.
* Virtual PCI support
* Isolated/Confidential VMs

If you think I'm missing a topic that fits into the overall
approach as described, feel free to suggest text, or let me
know and I can add it to my list.

Michael Kelley (3):
  Documentation: hyperv: Add overview of Hyper-V enlightenments
  Documentation: hyperv: Add overview of VMbus
  Documentation: hyperv: Add overview of clocks and timers

 Documentation/virt/hyperv/clocks.rst   |  73 ++++++++
 Documentation/virt/hyperv/index.rst    |  12 ++
 Documentation/virt/hyperv/overview.rst | 207 ++++++++++++++++++++++
 Documentation/virt/hyperv/vmbus.rst    | 303 +++++++++++++++++++++++++++++++++
 Documentation/virt/index.rst           |   1 +
 MAINTAINERS                            |   1 +
 6 files changed, 597 insertions(+)
 create mode 100644 Documentation/virt/hyperv/clocks.rst
 create mode 100644 Documentation/virt/hyperv/index.rst
 create mode 100644 Documentation/virt/hyperv/overview.rst
 create mode 100644 Documentation/virt/hyperv/vmbus.rst

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] Documentation: hyperv: Add overview of Hyper-V enlightenments
  2022-07-05 15:43 [PATCH 0/3] Documentation: hyperv: Add basic info on Hyper-V enlightenments Michael Kelley
@ 2022-07-05 15:43 ` Michael Kelley
  2022-07-05 15:43 ` [PATCH 2/3] Documentation: hyperv: Add overview of VMbus Michael Kelley
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Michael Kelley @ 2022-07-05 15:43 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, corbet, linux-kernel,
	linux-doc, linux-hyperv
  Cc: mikelley

Add an initial documentation topic for Linux enlightenments to
run as a guest on Microsoft's Hyper-V hypervisor, linked under
the "virt" documentation area. Update the virt doc index.rst
and the MAINTAINERS file.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
---
 Documentation/virt/hyperv/index.rst    |  10 ++
 Documentation/virt/hyperv/overview.rst | 207 +++++++++++++++++++++++++++++++++
 Documentation/virt/index.rst           |   1 +
 MAINTAINERS                            |   1 +
 4 files changed, 219 insertions(+)
 create mode 100644 Documentation/virt/hyperv/index.rst
 create mode 100644 Documentation/virt/hyperv/overview.rst

diff --git a/Documentation/virt/hyperv/index.rst b/Documentation/virt/hyperv/index.rst
new file mode 100644
index 0000000..991bee4
--- /dev/null
+++ b/Documentation/virt/hyperv/index.rst
@@ -0,0 +1,10 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+Hyper-V Enlightenments
+======================
+
+.. toctree::
+   :maxdepth: 1
+
+   overview
diff --git a/Documentation/virt/hyperv/overview.rst b/Documentation/virt/hyperv/overview.rst
new file mode 100644
index 0000000..cd49333
--- /dev/null
+++ b/Documentation/virt/hyperv/overview.rst
@@ -0,0 +1,207 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Overview
+========
+The Linux kernel contains a variety of code for running as a fully
+enlightened guest on Microsoft's Hyper-V hypervisor.  Hyper-V
+consists primarily of a bare-metal hypervisor plus a virtual machine
+management service running in the parent partition (roughly
+equivalent to KVM and QEMU, for example).  Guest VMs run in child
+partitions.  In this documentation, references to Hyper-V usually
+encompass both the hypervisor and the VMM service without making a
+distinction about which functionality is provided by which
+component.
+
+Hyper-V runs on x86/x64 and arm64 architectures, and Linux guests
+are supported on both.  The functionality and behavior of Hyper-V is
+generally the same on both architectures unless noted otherwise.
+
+Linux Guest Communication with Hyper-V
+--------------------------------------
+Linux guests communicate with Hyper-V in four different ways:
+
+* Implicit traps: As defined by the x86/x64 or arm64 architecture,
+  some guest actions trap to Hyper-V.  Hyper-V emulates the action and
+  returns control to the guest.  This behavior is generally invisible
+  to the Linux kernel.
+
+* Explicit hypercalls: Linux makes an explicit function call to
+  Hyper-V, passing parameters.  Hyper-V performs the requested action
+  and returns control to the caller.  Parameters are passed in
+  processor registers or in memory shared between the Linux guest and
+  Hyper-V.   On x86/x64, hypercalls use a Hyper-V specific calling
+  sequence.  On arm64, hypercalls use the ARM standard SMCCC calling
+  sequence.
+
+* Synthetic register access: Hyper-V implements a variety of
+  synthetic registers.  On x86/x64 these registers appear as MSRs in
+  the guest, and the Linux kernel can read or write these MSRs using
+  the normal mechanisms defined by the x86/x64 architecture.  On
+  arm64, these synthetic registers must be accessed using explicit
+  hypercalls.
+
+* VMbus: VMbus is a higher-level software construct that is built on
+  the other 3 mechanisms.  It is a message passing interface between
+  the Hyper-V host and the Linux guest.  It uses memory that is shared
+  between Hyper-V and the guest, along with various signaling
+  mechanisms.
+
+The first three communication mechanisms are documented in the
+`Hyper-V Top Level Functional Spec (TLFS)`_.  The TLFS describes
+general Hyper-V functionality and provides details on the hypercalls
+and synthetic registers.  The TLFS is currently written for the
+x86/x64 architecture only.
+
+.. _Hyper-V Top Level Functional Spec (TLFS): https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/tlfs
+
+VMbus is not documented.  This documentation provides a high-level
+overview of VMbus and how it works, but the details can be discerned
+only from the code.
+
+Sharing Memory
+--------------
+Many aspects are communication between Hyper-V and Linux are based
+on sharing memory.  Such sharing is generally accomplished as
+follows:
+
+* Linux allocates memory from its physical address space using
+  standard Linux mechanisms.
+
+* Linux tells Hyper-V the guest physical address (GPA) of the
+  allocated memory.  Many shared areas are kept to 1 page so that a
+  single GPA is sufficient.   Larger shared areas require a list of
+  GPAs, which usually do not need to be contiguous in the guest
+  physical address space.  How Hyper-V is told about the GPA or list
+  of GPAs varies.  In some cases, a single GPA is written to a
+  synthetic register.  In other cases, a GPA or list of GPAs is sent
+  in a VMbus message.
+
+* Hyper-V translates the GPAs into "real" physical memory addresses,
+  and creates a virtual mapping that it can use to access the memory.
+
+* Linux can later revoke sharing it has previously established by
+  telling Hyper-V to set the shared GPA to zero.
+
+Hyper-V operates with a page size of 4 Kbytes. GPAs communicated to
+Hyper-V may be in the form of page numbers, and always describe a
+range of 4 Kbytes.  Since the Linux guest page size on x86/x64 is
+also 4 Kbytes, the mapping from guest page to Hyper-V page is 1-to-1.
+On arm64, Hyper-V supports guests with 4/16/64 Kbyte pages as
+defined by the arm64 architecture.   If Linux is using 16 or 64
+Kbyte pages, Linux code must be careful to communicate with Hyper-V
+only in terms of 4 Kbyte pages.  HV_HYP_PAGE_SIZE and related macros
+are used in code that communicates with Hyper-V so that it works
+correctly in all configurations.
+
+As described in the TLFS, a few memory pages shared between Hyper-V
+and the Linux guest are "overlay" pages.  With overlay pages, Linux
+uses the usual approach of allocating guest memory and telling
+Hyper-V the GPA of the allocated memory.  But Hyper-V then replaces
+that physical memory page with a page it has allocated, and the
+original physical memory page is no longer accessible in the guest
+VM.  Linux may access the memory normally as if it were the memory
+that it originally allocated.  The "overlay" behavior is visible
+only because the contents of the page (as seen by Linux) change at
+the time that Linux originally establishes the sharing and the
+overlay page is inserted.  Similarly, the contents change if Linux
+revokes the sharing, in which case Hyper-V removes the overlay page,
+and the guest page originally allocated by Linux becomes visible
+again.
+
+Before Linux does a kexec to a kdump kernel or any other kernel,
+memory shared with Hyper-V should be revoked.  Hyper-V could modify
+a shared page or remove an overlay page after the new kernel is
+using the page for a different purpose, corrupting the new kernel.
+Hyper-V does not provide a single "set everything" operation to
+guest VMs, so Linux code must individually revoke all sharing before
+doing kexec.   See hv_kexec_handler() and hv_crash_handler().  But
+the crash/panic path still has holes in cleanup because some shared
+pages are set using per-CPU synthetic registers and there's no
+mechanism to revoke the shared pages for CPUs other than the CPU
+running the panic path.
+
+CPU Management
+--------------
+Hyper-V does not have a ability to hot-add or hot-remove a CPU
+from a running VM.  However, Windows Server 2019 Hyper-V and
+earlier versions may provide guests with ACPI tables that indicate
+more CPUs than are actually present in the VM.  As is normal, Linux
+treats these additional CPUs as potential hot-add CPUs, and reports
+them as such even though Hyper-V will never actually hot-add them.
+Starting in Windows Server 2022 Hyper-V, the ACPI tables reflect
+only the CPUs actually present in the VM, so Linux does not report
+any hot-add CPUs.
+
+A Linux guest CPU may be taken offline using the normal Linux
+mechanisms, provided no VMbus channel interrupts are assigned to
+the CPU.  See the section on VMbus Interrupts for more details
+on how VMbus channel interrupts can be re-assigned to permit
+taking a CPU offline.
+
+32-bit and 64-bit
+-----------------
+On x86/x64, Hyper-V supports 32-bit and 64-bit guests, and Linux
+will build and run in either version. While the 32-bit version is
+expected to work, it is used rarely and may suffer from undetected
+regressions.
+
+On arm64, Hyper-V supports only 64-bit guests.
+
+Endian-ness
+-----------
+All communication between Hyper-V and guest VMs uses Little-Endian
+format on both x86/x64 and arm64.  Big-endian format on arm64 is not
+supported by Hyper-V, and Linux code does not use endian-ness macros
+when accessing data shared with Hyper-V.
+
+Versioning
+----------
+Current Linux kernels operate correctly with older versions of
+Hyper-V back to Windows Server 2012 Hyper-V. Support for running
+on the original Hyper-V release in Windows Server 2008/2008 R2
+has been removed.
+
+A Linux guest on Hyper-V outputs in dmesg the version of Hyper-V
+it is running on.  This version is in the form of a Windows build
+number and is for display purposes only. Linux code does not
+test this version number at runtime to determine available features
+and functionality. Hyper-V indicates feature/function availability
+via flags in synthetic MSRs that Hyper-V provides to the guest,
+and the guest code tests these flags.
+
+VMbus has its own protocol version that is negotiated during the
+initial VMbus connection from the guest to Hyper-V. This version
+number is also output to dmesg during boot.  This version number
+is checked in a few places in the code to determine if specific
+functionality is present.
+
+Furthermore, each synthetic device on VMbus also has a protocol
+version that is separate from the VMbus protocol version. Device
+drivers for these synthetic devices typically negotiate the device
+protocol version, and may test that protocol version to determine
+if specific device functionality is present.
+
+Code Packaging
+--------------
+Hyper-V related code appears in the Linux kernel code tree in three
+main areas:
+
+1. drivers/hv
+
+2. arch/x86/hyperv and arch/arm64/hyperv
+
+3. individual device driver areas such as drivers/scsi, drivers/net,
+   drivers/clocksource, etc.
+
+A few miscellaneous files appear elsewhere. See the full list under
+"Hyper-V/Azure CORE AND DRIVERS" and "DRM DRIVER FOR HYPERV
+SYNTHETIC VIDEO DEVICE" in the MAINTAINERS file.
+
+The code in #1 and #2 is built only when CONFIG_HYPERV is set.
+Similarly, the code for most Hyper-V related drivers is built only
+when CONFIG_HYPERV is set.
+
+Most Hyper-V related code in #1 and #3 can be built as a module.
+The architecture specific code in #2 must be built-in.  Also,
+drivers/hv/hv_common.c is low-level code that is common across
+architectures and must be built-in.
diff --git a/Documentation/virt/index.rst b/Documentation/virt/index.rst
index 492f092..2f1cffa 100644
--- a/Documentation/virt/index.rst
+++ b/Documentation/virt/index.rst
@@ -14,6 +14,7 @@ Linux Virtualization Support
    ne_overview
    acrn/index
    coco/sev-guest
+   hyperv/index
 
 .. only:: html and subproject
 
diff --git a/MAINTAINERS b/MAINTAINERS
index a6d3bd9..fd2e5ce 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9172,6 +9172,7 @@ S:	Supported
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git
 F:	Documentation/ABI/stable/sysfs-bus-vmbus
 F:	Documentation/ABI/testing/debugfs-hyperv
+F:	Documentation/virt/hyperv
 F:	Documentation/networking/device_drivers/ethernet/microsoft/netvsc.rst
 F:	arch/arm64/hyperv
 F:	arch/arm64/include/asm/hyperv-tlfs.h
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] Documentation: hyperv: Add overview of VMbus
  2022-07-05 15:43 [PATCH 0/3] Documentation: hyperv: Add basic info on Hyper-V enlightenments Michael Kelley
  2022-07-05 15:43 ` [PATCH 1/3] Documentation: hyperv: Add overview of " Michael Kelley
@ 2022-07-05 15:43 ` Michael Kelley
  2022-07-05 15:43 ` [PATCH 3/3] Documentation: hyperv: Add overview of clocks and timers Michael Kelley
  2022-07-06 18:12 ` [PATCH 0/3] Documentation: hyperv: Add basic info on Hyper-V enlightenments Wei Liu
  3 siblings, 0 replies; 8+ messages in thread
From: Michael Kelley @ 2022-07-05 15:43 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, corbet, linux-kernel,
	linux-doc, linux-hyperv
  Cc: mikelley

Add documentation topic for using VMbus when running as a guest
on Hyper-V.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
---
 Documentation/virt/hyperv/index.rst |   1 +
 Documentation/virt/hyperv/vmbus.rst | 303 ++++++++++++++++++++++++++++++++++++
 2 files changed, 304 insertions(+)
 create mode 100644 Documentation/virt/hyperv/vmbus.rst

diff --git a/Documentation/virt/hyperv/index.rst b/Documentation/virt/hyperv/index.rst
index 991bee4..caa43ab 100644
--- a/Documentation/virt/hyperv/index.rst
+++ b/Documentation/virt/hyperv/index.rst
@@ -8,3 +8,4 @@ Hyper-V Enlightenments
    :maxdepth: 1
 
    overview
+   vmbus
diff --git a/Documentation/virt/hyperv/vmbus.rst b/Documentation/virt/hyperv/vmbus.rst
new file mode 100644
index 0000000..a9de881
--- /dev/null
+++ b/Documentation/virt/hyperv/vmbus.rst
@@ -0,0 +1,303 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+VMbus
+=====
+VMbus is a software construct provided by Hyper-V to guest VMs.  It
+consists of a control path and common facilities used by synthetic
+devices that Hyper-V presents to guest VMs.   The control path is
+used to offer synthetic devices to the guest VM and, in some cases,
+to rescind those devices.   The common facilities include software
+channels for communicating between the device driver in the guest VM
+and the synthetic device implementation that is part of Hyper-V, and
+signaling primitives to allow Hyper-V and the guest to interrupt
+each other.
+
+VMbus is modeled in Linux as a bus, with the expected /sys/bus/vmbus
+entry in a running Linux guest.  The VMbus driver (drivers/hv/vmbus_drv.c)
+establishes the VMbus control path with the Hyper-V host, then
+registers itself as a Linux bus driver.  It implements the standard
+bus functions for adding and removing devices to/from the bus.
+
+Most synthetic devices offered by Hyper-V have a corresponding Linux
+device driver.  These devices include:
+
+* SCSI controller
+* NIC
+* Graphics frame buffer
+* Keyboard
+* Mouse
+* PCI device pass-thru
+* Heartbeat
+* Time Sync
+* Shutdown
+* Memory balloon
+* Key/Value Pair (KVP) exchange with Hyper-V
+* Hyper-V online backup (a.k.a. VSS)
+
+Guest VMs may have multiple instances of the synthetic SCSI
+controller, synthetic NIC, and PCI pass-thru devices.  Other
+synthetic devices are limited to a single instance per VM.  Not
+listed above are a small number of synthetic devices offered by
+Hyper-V that are used only by Windows guests and for which Linux
+does not have a driver.
+
+Hyper-V uses the terms "VSP" and "VSC" in describing synthetic
+devices.  "VSP" refers to the Hyper-V code that implements a
+particular synthetic device, while "VSC" refers to the driver for
+the device in the guest VM.  For example, the Linux driver for the
+synthetic NIC is referred to as "netvsc" and the Linux driver for
+the synthetic SCSI controller is "storvsc".  These drivers contain
+functions with names like "storvsc_connect_to_vsp".
+
+VMbus channels
+--------------
+An instance of a synthetic device uses VMbus channels to communicate
+between the VSP and the VSC.  Channels are bi-directional and used
+for passing messages.   Most synthetic devices use a single channel,
+but the synthetic SCSI controller and synthetic NIC may use multiple
+channels to achieve higher performance and greater parallelism.
+
+Each channel consists of two ring buffers.  These are classic ring
+buffers from a university data structures textbook.  If the read
+and writes pointers are equal, the ring buffer is considered to be
+empty, so a full ring buffer always has at least one byte unused.
+The "in" ring buffer is for messages from the Hyper-V host to the
+guest, and the "out" ring buffer is for messages from the guest to
+the Hyper-V host.  In Linux, the "in" and "out" designations are as
+viewed by the guest side.  The ring buffers are memory that is
+shared between the guest and the host, and they follow the standard
+paradigm where the memory is allocated by the guest, with the list
+of GPAs that make up the ring buffer communicated to the host.  Each
+ring buffer consists of a header page (4 Kbytes) with the read and
+write indices and some control flags, followed by the memory for the
+actual ring.  The size of the ring is determined by the VSC in the
+guest and is specific to each synthetic device.   The list of GPAs
+making up the ring is communicated to the Hyper-V host over the
+VMbus control path as a GPA Descriptor List (GPADL).  See function
+vmbus_establish_gpadl().
+
+Each ring buffer is mapped into contiguous Linux kernel virtual
+space in three parts:  1) the 4 Kbyte header page, 2) the memory
+that makes up the ring itself, and 3) a second mapping of the memory
+that makes up the ring itself.  Because (2) and (3) are contiguous
+in kernel virtual space, the code that copies data to and from the
+ring buffer need not be concerned with ring buffer wrap-around.
+Once a copy operation has completed, the read or write index may
+need to be reset to point back into the first mapping, but the
+actual data copy does not need to be broken into two parts.  This
+approach also allows complex data structures to be easily accessed
+directly in the ring without handling wrap-around.
+
+On arm64 with page sizes > 4 Kbytes, the header page must still be
+passed to Hyper-V as a 4 Kbyte area.  But the memory for the actual
+ring must be aligned to PAGE_SIZE and have a size that is a multiple
+of PAGE_SIZE so that the duplicate mapping trick can be done.  Hence
+a portion of the header page is unused and not communicated to
+Hyper-V.  This case is handled by vmbus_establish_gpadl().
+
+Hyper-V enforces a limit on the aggregate amount of guest memory
+that can be shared with the host via GPADLs.  This limit ensures
+that a rogue guest can't force the consumption of excessive host
+resources.  For Windows Server 2019 and later, this limit is
+approximately 1280 Mbytes.  For versions prior to Windows Server
+2019, the limit is approximately 384 Mbytes.
+
+VMbus messages
+--------------
+All VMbus messages have a standard header that includes the message
+length, the offset of the message payload, some flags, and a
+transactionID.  The portion of the message after the header is
+unique to each VSP/VSC pair.
+
+Messages follow one of two patterns:
+
+* Unidirectional:  Either side sends a message and does not
+  expect a response message
+* Request/response:  One side (usually the guest) sends a message
+  and expects a response
+
+The transactionID (a.k.a. "requestID") is for matching requests &
+responses.  Some synthetic devices allow multiple requests to be in-
+flight simultaneously, so the guest specifies a transactionID when
+sending a request.  Hyper-V sends back the same transactionID in the
+matching response.
+
+Messages passed between the VSP and VSC are control messages.  For
+example, a message sent from the storvsc driver might be "execute
+this SCSI command".   If a message also implies some data transfer
+between the guest and the Hyper-V host, the actual data to be
+transferred may be embedded with the control message, or it may be
+specified as a separate data buffer that the Hyper-V host will
+access as a DMA operation.  The former case is used when the size of
+the data is small and the cost of copying the data to and from the
+ring buffer is minimal.  For example, time sync messages from the
+Hyper-V host to the guest contain the actual time value.  When the
+data is larger, a separate data buffer is used.  In this case, the
+control message contains a list of GPAs that describe the data
+buffer.  For example, the storvsc driver uses this approach to
+specify the data buffers to/from which disk I/O is done.
+
+Three functions exist to send VMbus messages:
+
+1. vmbus_sendpacket():  Control-only messages and messages with
+   embedded data -- no GPAs
+2. vmbus_sendpacket_pagebuffer(): Message with list of GPAs
+   identifying data to transfer.  An offset and length is
+   associated with each GPA so that multiple discontinuous areas
+   of guest memory can be targeted.
+3. vmbus_sendpacket_mpb_desc(): Message with list of GPAs
+   identifying data to transfer.  A single offset and length is
+   associated with a list of GPAs.  The GPAs must describe a
+   single logical area of guest memory to be targeted.
+
+Historically, Linux guests have trusted Hyper-V to send well-formed
+and valid messages, and Linux drivers for synthetic devices did not
+fully validate messages.  With the introduction of processor
+technologies that fully encrypt guest memory and that allow the
+guest to not trust the hypervisor (AMD SNP-SEV, Intel TDX), trusting
+the Hyper-V host is no longer a valid assumption.  The drivers for
+VMbus synthetic devices are being updated to fully validate any
+values read from memory that is shared with Hyper-V, which includes
+messages from VMbus devices.  To facilitate such validation,
+messages read by the guest from the "in" ring buffer are copied to a
+temporary buffer that is not shared with Hyper-V.  Validation is
+performed in this temporary buffer without the risk of Hyper-V
+maliciously modifying the message after it is validated but before
+it is used.
+
+VMbus interrupts
+----------------
+VMbus provides a mechanism for the guest to interrupt the host when
+the guest has queued new messages in a ring buffer.  The host
+expects that the guest will send an interrupt only when an "out"
+ring buffer transitions from empty to non-empty.  If the guest sends
+interrupts at other times, the host deems such interrupts to be
+unnecessary.  If a guest sends an excessive number of unnecessary
+interrupts, the host may throttle that guest by suspending its
+execution for a few seconds to prevent a denial-of-service attack.
+
+Similarly, the host will interrupt the guest when it sends a new
+message on the VMbus control path, or when a VMbus channel "in" ring
+buffer transitions from empty to non-empty.  Each CPU in the guest
+may receive VMbus interrupts, so they are best modeled as per-CPU
+interrupts in Linux.  This model works well on arm64 where a single
+per-CPU IRQ is allocated for VMbus.  Since x86/x64 lacks support for
+per-CPU IRQs, an x86 interrupt vector is statically allocated (see
+HYPERVISOR_CALLBACK_VECTOR) across all CPUs and explicitly coded to
+call the VMbus interrupt service routine.  These interrupts are
+visible in /proc/interrupts on the "HYP" line.
+
+The guest CPU that a VMbus channel will interrupt is selected by the
+guest when the channel is created, and the host is informed of that
+selection.  VMbus devices are broadly grouped into two categories:
+
+1. "Slow" devices that need only one VMbus channel.  The devices
+   (such as keyboard, mouse, heartbeat, and timesync) generate
+   relatively few interrupts.  Their VMbus channels are all
+   assigned to interrupt the VMBUS_CONNECT_CPU, which is always
+   CPU 0.
+
+2. "High speed" devices that may use multiple VMbus channels for
+   higher parallelism and performance.  These devices include the
+   synthetic SCSI controller and synthetic NIC.  Their VMbus
+   channels interrupts are assigned to CPUs that are spread out
+   among the available CPUs in the VM so that interrupts on
+   multiple channels can be processed in parallel.
+
+The assignment of VMbus channel interrupts to CPUs is done in the
+function init_vp_index().  This assignment is done outside of the
+normal Linux interrupt affinity mechanism, so the interrupts are
+neither "unmanaged" nor "managed" interrupts.
+
+The CPU that a VMbus channel will interrupt can be seen in
+/sys/bus/vmbus/devices/<deviceGUID>/ channels/<channelRelID>/cpu.
+When running on later versions of Hyper-V, the CPU can be changed
+by writing a new value to this sysfs entry.  Because the interrupt
+assignment is done outside of the normal Linux affinity mechanism,
+there are no entries in /proc/irq corresponding to individual
+VMbus channel interrupts.
+
+An online CPU in a Linux guest may not be taken offline if it has
+VMbus channel interrupts assigned to it.  Any such channel
+interrupts must first be manually reassigned to another CPU as
+described above.  When no channel interrupts are assigned to the
+CPU, it can be taken offline.
+
+When a guest CPU receives a VMbus interrupt from the host, the
+function vmbus_isr() handles the interrupt.  It first checks for
+channel interrupts by calling vmbus_chan_sched(), which looks at a
+bitmap setup by the host to determine which channels have pending
+interrupts on this CPU.  If multiple channels have pending
+interrupts for this CPU, they are processed sequentially.  When all
+channel interrupts have been processed, vmbus_isr() checks for and
+processes any message received on the VMbus control path.
+
+The VMbus channel interrupt handling code is designed to work
+correctly even if an interrupt is received on a CPU other than the
+CPU assigned to the channel.  Specifically, the code does not use
+CPU-based exclusion for correctness.  In normal operation, Hyper-V
+will interrupt the assigned CPU.  But when the CPU assigned to a
+channel is being changed via sysfs, the guest doesn't know exactly
+when Hyper-V will make the transition.  The code must work correctly
+even if there is a time lag before Hyper-V starts interrupting the
+new CPU.  See comments in target_cpu_store().
+
+VMbus device creation/deletion
+------------------------------
+Hyper-V and the Linux guest have a separate message-passing path
+that is used for synthetic device creation and deletion. This
+path does not use a VMbus channel.  See vmbus_post_msg() and
+vmbus_on_msg_dpc().
+
+The first step is for the guest to connect to the generic
+Hyper-V VMbus mechanism.  As part of establishing this connection,
+the guest and Hyper-V agree on a VMbus protocol version they will
+use.  This negotiation allows newer Linux kernels to run on older
+Hyper-V versions, and vice versa.
+
+The guest then tells Hyper-v to "send offers".  Hyper-V sends an
+offer message to the guest for each synthetic device that the VM
+is configured to have. Each VMbus device type has a fixed GUID
+known as the "class ID", and each VMbus device instance is also
+identified by a GUID. The offer message from Hyper-V contains
+both GUIDs to uniquely (within the VM) identify the device.
+There is one offer message for each device instance, so a VM with
+two synthetic NICs will get two offers messages with the NIC
+class ID. The ordering of offer messages can vary from boot-to-boot
+and must not be assumed to be consistent in Linux code. Offer
+message may also arrive long after Linux has initially booted
+because Hyper-V supports adding devices, such as synthetic NICs,
+to running VMs. A new offer message is processed by
+vmbus_process_offer(), which indirectly invokes vmbus_add_channel_work().
+
+Upon receipt of an offer message, the guest identifies the device
+type based on the class ID, and invokes the correct driver to set up
+the device.  Driver/device matching is performed using the standard
+Linux mechanism.
+
+The device driver probe function opens the primary VMbus channel to
+the corresponding VSP. It allocates guest memory for the channel
+ring buffers and shares the ring buffer with the Hyper-V host by
+giving the host a list of GPAs for the ring buffer memory.  See
+vmbus_establish_gpadl().
+
+Once the ring buffer is set up, the device driver and VSP exchange
+setup messages via the primary channel.  These messages may include
+negotiating the device protocol version to be used between the Linux
+VSC and the VSP on the Hyper-V host.  The setup messages may also
+include creating additional VMbus channels, which are somewhat
+mis-named as "sub-channels" since they are functionally
+equivalent to the primary channel once they are created.
+
+Finally, the device driver may create entries in /dev as with
+any device driver.
+
+The Hyper-V host can send a "rescind" message to the guest to
+remove a device that was previously offered. Linux drivers must
+handle such a rescind message at any time. Rescinding a device
+invokes the device driver "remove" function to cleanly shut
+down the device and remove it. Once a synthetic device is
+rescinded, neither Hyper-V nor Linux retains any state about
+its previous existence. Such a device might be re-added later,
+in which case it is treated as an entirely new device. See
+vmbus_onoffer_rescind().
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] Documentation: hyperv: Add overview of clocks and timers
  2022-07-05 15:43 [PATCH 0/3] Documentation: hyperv: Add basic info on Hyper-V enlightenments Michael Kelley
  2022-07-05 15:43 ` [PATCH 1/3] Documentation: hyperv: Add overview of " Michael Kelley
  2022-07-05 15:43 ` [PATCH 2/3] Documentation: hyperv: Add overview of VMbus Michael Kelley
@ 2022-07-05 15:43 ` Michael Kelley
  2022-07-06 18:25   ` Wei Liu
  2022-07-06 18:12 ` [PATCH 0/3] Documentation: hyperv: Add basic info on Hyper-V enlightenments Wei Liu
  3 siblings, 1 reply; 8+ messages in thread
From: Michael Kelley @ 2022-07-05 15:43 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, wei.liu, decui, corbet, linux-kernel,
	linux-doc, linux-hyperv
  Cc: mikelley

Add documentation topic for clocks and timers when running as a
guest on Hyper-V.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
---
 Documentation/virt/hyperv/clocks.rst | 73 ++++++++++++++++++++++++++++++++++++
 Documentation/virt/hyperv/index.rst  |  1 +
 2 files changed, 74 insertions(+)
 create mode 100644 Documentation/virt/hyperv/clocks.rst

diff --git a/Documentation/virt/hyperv/clocks.rst b/Documentation/virt/hyperv/clocks.rst
new file mode 100644
index 0000000..e4ba2890
--- /dev/null
+++ b/Documentation/virt/hyperv/clocks.rst
@@ -0,0 +1,73 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Clocks and Timers
+-----------------
+
+arm64
+~~~~~
+On arm64, Hyper-V virtualizes the ARMv8 architectural system counter
+and timer. Guest VMs use this virtualized hardware as the Linux
+clocksource and clockevents via the standard arm_arch_timer.c
+driver, just as they would on bare metal. Linux vDSO support for the
+architectural system counter is functional in guest VMs on Hyper-V.
+While Hyper-V also provides a synthetic system clock and four synthetic
+per-CPU timers as described in the TLFS, they are not used by the
+Linux kernel in a Hyper-V guest on arm64.  However, older versions
+of Hyper-V for arm64 only partially virtualize the ARMv8
+architectural timer, such that the timer does not generate
+interrupts in the VM. Because of this limitation, running current
+Linux kernel versions on these older Hyper-V versions requires an
+out-of-tree patch to use the Hyper-V synthetic clocks/timers instead.
+
+x86/x64
+~~~~~~~
+On x86/x64, Hyper-V provides guest VMs with a synthetic system clock
+and four synthetic per-CPU timers as described in the TLFS. Hyper-V
+also provides access to the virtualized TSC via the RDTSC and
+related instructions. These TSC instructions do not trap to
+the hypervisor and so provide excellent performance in a VM.
+Hyper-V performs TSC calibration, and provides the TSC frequency
+to the guest VM via a synthetic MSR.  Hyper-V initialization code
+in Linux reads this MSR to get the frequency, so it skips TSC
+calibration and sets tsc_reliable. Hyper-V provides virtualized
+versions of the PIT (in Hyper-V  Generation 1 VMs only), local
+APIC timer, and RTC. Hyper-V does not provide a virtualized HPET in
+guest VMs.
+
+The Hyper-V synthetic system clock can be read via a synthetic MSR,
+but this access traps to the hypervisor. As a faster alternative,
+the guest can configure a memory page to be shared between the guest
+and the hypervisor.  Hyper-V populates this memory page with a
+64-bit scale value and offset value. To read the synthetic clock
+value, the guest reads the TSC and then applies the scale and offset
+as described in the Hyper-V TLFS. The resulting value advances
+at a constant 10 MHz frequency. In the case of a live migration
+to a host with a different TSC frequency, Hyper-V adjusts the
+scale and offset values in the shared page so that the 10 MHz
+frequency is maintained.
+
+Starting with Windows Server 2022 Hyper-V, Hyper-V uses hardware
+support for TSC frequency scaling to enable live migration of VMs
+across Hyper-V hosts where the TSC frequency may be different.
+When a Linux guest detects that this Hyper-V functionality is
+available, it prefers to use Linux's standard TSC-based clocksource.
+Otherwise, it uses the clocksource for the Hyper-V synthetic system
+clock implemented via the shared page (identified as
+"hyperv_clocksource_tsc_page").
+
+The Hyper-V synthetic system clock is available to user space via
+vDSO, and gettimeofday() and related system calls can execute
+entirely in user space.  The vDSO is implemented by mapping the
+shared page with scale and offset values into user space.  User
+space code performs the same algorithm of reading the TSC and
+appying the scale and offset to get the constant 10 MHz clock.
+
+Linux clockevents are based on Hyper-V synthetic timer 0. While
+Hyper-V offers 4 synthetic timers for each CPU, Linux only uses
+timer 0. Interrupts from stimer0 are recorded on the "HVS" line in
+/proc/interrupts.  Clockevents based on the virtualized PIT and
+local APIC timer also work, but the Hyper-V synthetic timer is
+preferred.
+
+The driver for the Hyper-V synthetic system clock and timers is
+drivers/clocksource/hyperv_timer.c.
diff --git a/Documentation/virt/hyperv/index.rst b/Documentation/virt/hyperv/index.rst
index caa43ab..4a7a1b7 100644
--- a/Documentation/virt/hyperv/index.rst
+++ b/Documentation/virt/hyperv/index.rst
@@ -9,3 +9,4 @@ Hyper-V Enlightenments
 
    overview
    vmbus
+   clocks
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] Documentation: hyperv: Add basic info on Hyper-V enlightenments
  2022-07-05 15:43 [PATCH 0/3] Documentation: hyperv: Add basic info on Hyper-V enlightenments Michael Kelley
                   ` (2 preceding siblings ...)
  2022-07-05 15:43 ` [PATCH 3/3] Documentation: hyperv: Add overview of clocks and timers Michael Kelley
@ 2022-07-06 18:12 ` Wei Liu
  2022-07-06 18:43   ` Michael Kelley (LINUX)
  3 siblings, 1 reply; 8+ messages in thread
From: Wei Liu @ 2022-07-06 18:12 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, corbet, linux-kernel,
	linux-doc, linux-hyperv

On Tue, Jul 05, 2022 at 08:43:39AM -0700, Michael Kelley wrote:
> This documentation is a high level overview to explain the basics
> of Linux running as a guest on Hyper-V. The intent is to document
> the forest, not the trees. The Hyper-V Top Level Functional Spec
> provides conceptual material and API details for the core Hyper-V
> hypervisor, and this documentation provides additional info on
> how that functionality is applied to Linux. Also, there's no
> public documentation on VMbus or the VMbus synthetic devices, so
> this documentation helps fill that gap at a conceptual level. This
> documentation is not API-level documentation, which can be seen
> in the code and associated comments.
> 
> More topics will be added in future patches, including:
> 
> * Miscellaneous synthetic devices like KVP, timesync, VSS, etc.

There is an UIO driver for Hyper-V. I think that falls under this
category. Not sure if that's on your radar to cover?

Thanks,
Wei.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] Documentation: hyperv: Add overview of clocks and timers
  2022-07-05 15:43 ` [PATCH 3/3] Documentation: hyperv: Add overview of clocks and timers Michael Kelley
@ 2022-07-06 18:25   ` Wei Liu
  2022-07-06 18:47     ` Michael Kelley (LINUX)
  0 siblings, 1 reply; 8+ messages in thread
From: Wei Liu @ 2022-07-06 18:25 UTC (permalink / raw)
  To: Michael Kelley
  Cc: kys, haiyangz, sthemmin, wei.liu, decui, corbet, linux-kernel,
	linux-doc, linux-hyperv

On Tue, Jul 05, 2022 at 08:43:42AM -0700, Michael Kelley wrote:
> Add documentation topic for clocks and timers when running as a
> guest on Hyper-V.
> 
> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> ---
>  Documentation/virt/hyperv/clocks.rst | 73 ++++++++++++++++++++++++++++++++++++
>  Documentation/virt/hyperv/index.rst  |  1 +
>  2 files changed, 74 insertions(+)
>  create mode 100644 Documentation/virt/hyperv/clocks.rst
> 
> diff --git a/Documentation/virt/hyperv/clocks.rst b/Documentation/virt/hyperv/clocks.rst
> new file mode 100644
> index 0000000..e4ba2890
> --- /dev/null
> +++ b/Documentation/virt/hyperv/clocks.rst
> @@ -0,0 +1,73 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Clocks and Timers
> +-----------------

This seems to be inconsistent with regard to the other two files -- they
use "=" signs for the title.

> +
> +arm64
> +~~~~~

And the other files use "-" for this (second?) level.

Thanks,
Wei.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 0/3] Documentation: hyperv: Add basic info on Hyper-V enlightenments
  2022-07-06 18:12 ` [PATCH 0/3] Documentation: hyperv: Add basic info on Hyper-V enlightenments Wei Liu
@ 2022-07-06 18:43   ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Kelley (LINUX) @ 2022-07-06 18:43 UTC (permalink / raw)
  To: Wei Liu
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Dexuan Cui,
	corbet, linux-kernel, linux-doc, linux-hyperv

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, July 6, 2022 11:12 AM
> 
> On Tue, Jul 05, 2022 at 08:43:39AM -0700, Michael Kelley wrote:
> > This documentation is a high level overview to explain the basics
> > of Linux running as a guest on Hyper-V. The intent is to document
> > the forest, not the trees. The Hyper-V Top Level Functional Spec
> > provides conceptual material and API details for the core Hyper-V
> > hypervisor, and this documentation provides additional info on
> > how that functionality is applied to Linux. Also, there's no
> > public documentation on VMbus or the VMbus synthetic devices, so
> > this documentation helps fill that gap at a conceptual level. This
> > documentation is not API-level documentation, which can be seen
> > in the code and associated comments.
> >
> > More topics will be added in future patches, including:
> >
> > * Miscellaneous synthetic devices like KVP, timesync, VSS, etc.
> 
> There is an UIO driver for Hyper-V. I think that falls under this
> category. Not sure if that's on your radar to cover?
> 

Good point.  I'll add it to my list. :-)

Michael

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 3/3] Documentation: hyperv: Add overview of clocks and timers
  2022-07-06 18:25   ` Wei Liu
@ 2022-07-06 18:47     ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Kelley (LINUX) @ 2022-07-06 18:47 UTC (permalink / raw)
  To: Wei Liu
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Dexuan Cui,
	corbet, linux-kernel, linux-doc, linux-hyperv

From: Wei Liu <wei.liu@kernel.org> Sent: Wednesday, July 6, 2022 11:26 AM
> 
> On Tue, Jul 05, 2022 at 08:43:42AM -0700, Michael Kelley wrote:
> > Add documentation topic for clocks and timers when running as a
> > guest on Hyper-V.
> >
> > Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> > ---
> >  Documentation/virt/hyperv/clocks.rst | 73
> ++++++++++++++++++++++++++++++++++++
> >  Documentation/virt/hyperv/index.rst  |  1 +
> >  2 files changed, 74 insertions(+)
> >  create mode 100644 Documentation/virt/hyperv/clocks.rst
> >
> > diff --git a/Documentation/virt/hyperv/clocks.rst
> b/Documentation/virt/hyperv/clocks.rst
> > new file mode 100644
> > index 0000000..e4ba2890
> > --- /dev/null
> > +++ b/Documentation/virt/hyperv/clocks.rst
> > @@ -0,0 +1,73 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +Clocks and Timers
> > +-----------------
> 
> This seems to be inconsistent with regard to the other two files -- they
> use "=" signs for the title.
> 
> > +
> > +arm64
> > +~~~~~
> 
> And the other files use "-" for this (second?) level.
> 

Fair point.  From what I can see, it actually doesn't make any difference
in how the HTML docs look.  I didn't succeed in generating a PDF, so I'm
not sure there.   But I'll change the clocks section to match the others
just for consistency.

Michael

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-07-06 18:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-05 15:43 [PATCH 0/3] Documentation: hyperv: Add basic info on Hyper-V enlightenments Michael Kelley
2022-07-05 15:43 ` [PATCH 1/3] Documentation: hyperv: Add overview of " Michael Kelley
2022-07-05 15:43 ` [PATCH 2/3] Documentation: hyperv: Add overview of VMbus Michael Kelley
2022-07-05 15:43 ` [PATCH 3/3] Documentation: hyperv: Add overview of clocks and timers Michael Kelley
2022-07-06 18:25   ` Wei Liu
2022-07-06 18:47     ` Michael Kelley (LINUX)
2022-07-06 18:12 ` [PATCH 0/3] Documentation: hyperv: Add basic info on Hyper-V enlightenments Wei Liu
2022-07-06 18:43   ` Michael Kelley (LINUX)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).