All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-comment] [PATCH 0/3] Large shared memory regions
@ 2019-01-11 11:41 Dr. David Alan Gilbert (git)
  2019-01-11 11:41 ` [virtio-comment] [PATCH 1/3] shared memory: Define " Dr. David Alan Gilbert (git)
                   ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-01-11 11:41 UTC (permalink / raw)
  To: virtio-comment; +Cc: stefanha

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Hi,
  This series formalises the idea of shared memory regions I'd
previously discussed and is intended for use with virtio-fs.

I've split it into three parts:

  a) A general definition of the idea

  b) A definition on virtio-pci using the layout previously
discussed; this still uses the list of capabilities.

  c) A definition on virtio-mmio; this is untested and I worry
it does use quite a bit of register space; however I thought best
to include it since it shows that (a) isn't tied to PCI.

Dave


Dr. David Alan Gilbert (3):
  shared memory: Define shared memory regions
  shared memory: Define PCI capability
  shared memory: Define mmio registers

 content.tex    | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++
 shared-mem.tex | 25 ++++++++++++++++
 2 files changed, 102 insertions(+)
 create mode 100644 shared-mem.tex

-- 
2.20.1


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-11 11:41 [virtio-comment] [PATCH 0/3] Large shared memory regions Dr. David Alan Gilbert (git)
@ 2019-01-11 11:41 ` Dr. David Alan Gilbert (git)
  2019-01-11 12:15   ` Cornelia Huck
  2019-02-13  2:25   ` [virtio-comment] " Stefan Hajnoczi
  2019-01-11 11:41 ` [virtio-comment] [PATCH 2/3] shared memory: Define PCI capability Dr. David Alan Gilbert (git)
  2019-01-11 11:42 ` [virtio-comment] [PATCH 3/3] shared memory: Define mmio registers Dr. David Alan Gilbert (git)
  2 siblings, 2 replies; 44+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-01-11 11:41 UTC (permalink / raw)
  To: virtio-comment; +Cc: stefanha

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Define the requirements and idea behind shared memory regions.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 content.tex    |  3 +++
 shared-mem.tex | 25 +++++++++++++++++++++++++
 2 files changed, 28 insertions(+)
 create mode 100644 shared-mem.tex

diff --git a/content.tex b/content.tex
index b101d1b..321a2f4 100644
--- a/content.tex
+++ b/content.tex
@@ -331,6 +331,9 @@ Virtqueue format, or both.
 \input{split-ring.tex}
 
 \input{packed-ring.tex}
+
+\input{shared-mem.tex}
+
 \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
 
 We start with an overview of device initialization, then expand on the
diff --git a/shared-mem.tex b/shared-mem.tex
new file mode 100644
index 0000000..6da249c
--- /dev/null
+++ b/shared-mem.tex
@@ -0,0 +1,25 @@
+\section{Shared Memory Regions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
+
+Shared memory regions are an additional facility
+available to devices that need a region of memory that's
+continuously shared between the host and the guest, rather
+than passed between them in the way virtqueue elements are.
+
+Example uses include shared caches and version pools for versioned
+data structures.
+
+Shared memory regions MUST NOT be used to control the operation
+of the device, nor to stream data; those should still be performed
+using virtqueues.
+
+A device may have multiple shared memory regions associated with
+it.  Each region has a \field{shmid} to identify it, the meaning
+of which is device specific.
+
+Enumeration and location of shared memory regions is performed
+using a transport-specific data structure.
+
+The guest physical address and the host virtual address MUST NOT
+be used to identify structures within the memory regions; all
+addressing MUST be relative to the start of a particular region.
+
-- 
2.20.1


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [virtio-comment] [PATCH 2/3] shared memory: Define PCI capability
  2019-01-11 11:41 [virtio-comment] [PATCH 0/3] Large shared memory regions Dr. David Alan Gilbert (git)
  2019-01-11 11:41 ` [virtio-comment] [PATCH 1/3] shared memory: Define " Dr. David Alan Gilbert (git)
@ 2019-01-11 11:41 ` Dr. David Alan Gilbert (git)
  2019-02-13  2:30   ` [virtio-comment] " Stefan Hajnoczi
  2019-01-11 11:42 ` [virtio-comment] [PATCH 3/3] shared memory: Define mmio registers Dr. David Alan Gilbert (git)
  2 siblings, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-01-11 11:41 UTC (permalink / raw)
  To: virtio-comment; +Cc: stefanha

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Define the PCI capability used for enumerating shared memory regions.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 content.tex | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/content.tex b/content.tex
index 321a2f4..776162a 100644
--- a/content.tex
+++ b/content.tex
@@ -649,6 +649,8 @@ The fields are interpreted as follows:
 #define VIRTIO_PCI_CAP_DEVICE_CFG        4
 /* PCI configuration access */
 #define VIRTIO_PCI_CAP_PCI_CFG           5
+/* Shared memory region */
+#define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
 \end{lstlisting}
 
         Any other value is reserved for future use.
@@ -987,6 +989,41 @@ any device type which has a device-specific configuration.
 
 The \field{offset} for the device-specific configuration MUST be 4-byte aligned.
 
+\subsubsection{Shared memory capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Shared memory capability}
+
+Shared memory regions \ref{sec:Basic Facilities of a Virtio
+Device / Shared Memory Regions} are enumerated on the PCI transport
+as a sequence of VIRTIO_PCI_CAP_SHARED_MEMORY_CFG capabilities, one per region.
+
+The capability is immediately followed by an additional field like so:
+
+\begin{lstlisting}
+struct virtio_pci_shm_cap {
+        struct virtio_pci_cap cap;
+        u32 offset_hi;
+        u32 length_hi;
+        u8  id;
+};
+\end{lstlisting}
+
+Given that the \field{cap.length} and \field{cap.offset} fields
+are only 32 bit, the additional \field{offset_hi} and \field {length_hi}
+fields provide the most significant 32 bits of a total 64 bit offset and
+length within the bar specified by \field{cap.bar}.  \field{id}
+uniquely identifies the given shared memory region using a device
+specific identifier.
+
+While most capabilities can not be repeated, multiple
+VIRTIO_PCI_CAP_SHARED_MEMORY_CFG capabilities are allowed as long
+as the \field{id} is unique.
+
+\devicenormative{\paragraph}{Device-specific configuration}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Shared memory capability}
+
+The region defined by the combination of the offsets and length
+fields MUST be contained within the declared bar.
+
+The \field{id} MUST be unique for any one device instance.
+
 \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI configuration access capability}
 
 The VIRTIO_PCI_CAP_PCI_CFG capability
-- 
2.20.1


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [virtio-comment] [PATCH 3/3] shared memory: Define mmio registers
  2019-01-11 11:41 [virtio-comment] [PATCH 0/3] Large shared memory regions Dr. David Alan Gilbert (git)
  2019-01-11 11:41 ` [virtio-comment] [PATCH 1/3] shared memory: Define " Dr. David Alan Gilbert (git)
  2019-01-11 11:41 ` [virtio-comment] [PATCH 2/3] shared memory: Define PCI capability Dr. David Alan Gilbert (git)
@ 2019-01-11 11:42 ` Dr. David Alan Gilbert (git)
  2019-02-13  2:33   ` [virtio-comment] " Stefan Hajnoczi
  2 siblings, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert (git) @ 2019-01-11 11:42 UTC (permalink / raw)
  To: virtio-comment; +Cc: stefanha

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Define an MMIO interface to enumerate and map shared
memory regions.

Note: This hasn't been implemented, I also worrying it's using
too much of the register address space.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 content.tex | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/content.tex b/content.tex
index 776162a..e2435ff 100644
--- a/content.tex
+++ b/content.tex
@@ -1641,6 +1641,33 @@ All register values are organized as Little Endian.
     selected by writing to \field{QueueSel}.
   }
   \hline 
+  \mmioreg{SHMSel}{Shared memory region select}{0x0b0}{W}{%
+    Writing to this register selects the shared memory region \ref{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
+    following operations on \field{SHMId}, \field{SHMLenLow}, \field{SHMLenHigh},
+    \field{SHMBaseLow} and \field{SHMBaseHigh} apply to. The index
+    number of the first region is zero (0x0); note that it is a
+    simple index that does not correspond to the \field{id}.
+  }
+  \hline 
+  \mmioreg{SHMId}{Shared memory id}{0x0b4}{R}{%
+    This register returns the id value of the selected shared
+    memory region.
+  }
+  \hline 
+  \mmiodreg{SHMLenLow}{SHMLenHigh}{Shared memory region 64 bit long length}{0x0b8}{0x0xbc}{R}{%
+    These registers return the length of the shared memory
+    region, as defined by the device for the region selected by
+    the \field{SHMSel} register.  Reading from a non-existent
+    region (i.e. a too large \field{SHMSel} value) results in a
+    length of -1.  This should be used to enumerate the list of
+    regions.
+  }
+  \hline 
+  \mmiodreg{SHMBaseLow}{SHMBaseHigh}{Shared memory region 64 bit long physical address}{0x0c0}{0x0xc4}{W}{%
+    The driver writes these registers to indicate where it wishes
+    the device to map the shared memory region currently selected.
+  }
+  \hline 
   \mmioreg{ConfigGeneration}{Configuration atomicity value}{0x0fc}{R}{
     Reading from this register returns a value describing a version of the device-specific configuration space (see \field{Config}).
     The driver can then access the configuration space and, when finished, read \field{ConfigGeneration} again.
@@ -1676,6 +1703,9 @@ driver seeing an inconsistent configuration state.
 
 The device MUST NOT access virtual queue contents when \field{QueueReady} is zero (0x0).
 
+The device MUST NOT respond to SHM Base writes for none-existent
+shared memory regions.
+
 \drivernormative{\subsubsection}{MMIO Device Register Layout}{Virtio Transport Options / Virtio Over MMIO / MMIO Device Register Layout}
 The driver MUST NOT access memory locations not described in the
 table \ref{tab:Virtio Trasport Options / Virtio Over MMIO / MMIO Device Register Layout}
@@ -1730,6 +1760,13 @@ If both values are valid, it MUST read \field{DeviceID}
 and if its value is zero (0x0) MUST abort initialization and
 MUST NOT access any other register.
 
+Devices expecting shared memory regions MUST enumerate the
+regions by selecting each region in tern and checking that the
+length is not -1.
+
+Devices not expecting shared memory MUST NOT use the shared
+memory registers.
+
 Further initialization MUST follow the procedure described in
 \ref{sec:General Initialization And Device Operation / Device Initialization}~\nameref{sec:General Initialization And Device Operation / Device Initialization}.
 
-- 
2.20.1


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-11 11:41 ` [virtio-comment] [PATCH 1/3] shared memory: Define " Dr. David Alan Gilbert (git)
@ 2019-01-11 12:15   ` Cornelia Huck
  2019-01-11 12:26     ` Dr. David Alan Gilbert
  2019-01-11 15:29     ` Halil Pasic
  2019-02-13  2:25   ` [virtio-comment] " Stefan Hajnoczi
  1 sibling, 2 replies; 44+ messages in thread
From: Cornelia Huck @ 2019-01-11 12:15 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: virtio-comment, stefanha, Halil Pasic

On Fri, 11 Jan 2019 11:41:58 +0000
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Define the requirements and idea behind shared memory regions.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  content.tex    |  3 +++
>  shared-mem.tex | 25 +++++++++++++++++++++++++
>  2 files changed, 28 insertions(+)
>  create mode 100644 shared-mem.tex
> 
> diff --git a/content.tex b/content.tex
> index b101d1b..321a2f4 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -331,6 +331,9 @@ Virtqueue format, or both.
>  \input{split-ring.tex}
>  
>  \input{packed-ring.tex}
> +
> +\input{shared-mem.tex}
> +
>  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>  
>  We start with an overview of device initialization, then expand on the
> diff --git a/shared-mem.tex b/shared-mem.tex
> new file mode 100644
> index 0000000..6da249c
> --- /dev/null
> +++ b/shared-mem.tex
> @@ -0,0 +1,25 @@
> +\section{Shared Memory Regions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> +
> +Shared memory regions are an additional facility
> +available to devices that need a region of memory that's
> +continuously shared between the host and the guest, rather
> +than passed between them in the way virtqueue elements are.
> +
> +Example uses include shared caches and version pools for versioned
> +data structures.
> +
> +Shared memory regions MUST NOT be used to control the operation
> +of the device, nor to stream data; those should still be performed
> +using virtqueues.
> +
> +A device may have multiple shared memory regions associated with
> +it.  Each region has a \field{shmid} to identify it, the meaning
> +of which is device specific.
> +
> +Enumeration and location of shared memory regions is performed
> +using a transport-specific data structure.

"data structure and mechanism"?

> +
> +The guest physical address and the host virtual address MUST NOT
> +be used to identify structures within the memory regions; all
> +addressing MUST be relative to the start of a particular region.
> +

Is the intended implementation that the device provides a certain
memory region (in host memory) and exposes it to the driver? Are there
supposed to be any notifications of writes? Or do both simply write to
the region and get whatever updates the other side has made when they
read from the region again?

I'm a bit unsure how to implement this for the ccw transport. Maybe a
new pair of ccws to read/write shared memory regions? But we'd also
need a mechanism to discover the ids of those shared memory regions, I
think.

Halil, do you have any thoughts?

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-11 12:15   ` Cornelia Huck
@ 2019-01-11 12:26     ` Dr. David Alan Gilbert
  2019-01-15 10:10       ` Cornelia Huck
  2019-01-11 15:29     ` Halil Pasic
  1 sibling, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2019-01-11 12:26 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: virtio-comment, stefanha, Halil Pasic

* Cornelia Huck (cohuck@redhat.com) wrote:
> On Fri, 11 Jan 2019 11:41:58 +0000
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> 
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Define the requirements and idea behind shared memory regions.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  content.tex    |  3 +++
> >  shared-mem.tex | 25 +++++++++++++++++++++++++
> >  2 files changed, 28 insertions(+)
> >  create mode 100644 shared-mem.tex
> > 
> > diff --git a/content.tex b/content.tex
> > index b101d1b..321a2f4 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -331,6 +331,9 @@ Virtqueue format, or both.
> >  \input{split-ring.tex}
> >  
> >  \input{packed-ring.tex}
> > +
> > +\input{shared-mem.tex}
> > +
> >  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> >  
> >  We start with an overview of device initialization, then expand on the
> > diff --git a/shared-mem.tex b/shared-mem.tex
> > new file mode 100644
> > index 0000000..6da249c
> > --- /dev/null
> > +++ b/shared-mem.tex
> > @@ -0,0 +1,25 @@
> > +\section{Shared Memory Regions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> > +
> > +Shared memory regions are an additional facility
> > +available to devices that need a region of memory that's
> > +continuously shared between the host and the guest, rather
> > +than passed between them in the way virtqueue elements are.
> > +
> > +Example uses include shared caches and version pools for versioned
> > +data structures.
> > +
> > +Shared memory regions MUST NOT be used to control the operation
> > +of the device, nor to stream data; those should still be performed
> > +using virtqueues.
> > +
> > +A device may have multiple shared memory regions associated with
> > +it.  Each region has a \field{shmid} to identify it, the meaning
> > +of which is device specific.
> > +
> > +Enumeration and location of shared memory regions is performed
> > +using a transport-specific data structure.
> 
> "data structure and mechanism"?

Changed; thanks.

> > +
> > +The guest physical address and the host virtual address MUST NOT
> > +be used to identify structures within the memory regions; all
> > +addressing MUST be relative to the start of a particular region.
> > +
> 
> Is the intended implementation that the device provides a certain
> memory region (in host memory) and exposes it to the driver? Are there
> supposed to be any notifications of writes? Or do both simply write to
> the region and get whatever updates the other side has made when they
> read from the region again?

There's no notification;  in our case we have two main uses:
  a) Direct mapping of host files into the guests memory

  b) Mapping of a version table with quickly updated version numbers for
     data structures to do quick invalidation

> I'm a bit unsure how to implement this for the ccw transport. Maybe a
> new pair of ccws to read/write shared memory regions?

Without knowing anything about CCW itself, I don't think you'd want
to do calls to perform the reads/writes - remember these are entirely
emulated devices, and the shared memory regions just correspond to
memory regions in the hypervisor; so in most ways they just behave
like a region of RAM.  If the drivers can't treat them like RAM there's
probably no point in using this feature in that environment.

> But we'd also
> need a mechanism to discover the ids of those shared memory regions, I
> think.

Yes, I'm assuming you'll need a call to enumerate them.

Dave

> 
> Halil, do you have any thoughts?
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-11 12:15   ` Cornelia Huck
  2019-01-11 12:26     ` Dr. David Alan Gilbert
@ 2019-01-11 15:29     ` Halil Pasic
  2019-01-11 16:07       ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 44+ messages in thread
From: Halil Pasic @ 2019-01-11 15:29 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dr. David Alan Gilbert (git), virtio-comment, stefanha

On Fri, 11 Jan 2019 13:15:40 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Fri, 11 Jan 2019 11:41:58 +0000
> "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> 
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Define the requirements and idea behind shared memory regions.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  content.tex    |  3 +++
> >  shared-mem.tex | 25 +++++++++++++++++++++++++
> >  2 files changed, 28 insertions(+)
> >  create mode 100644 shared-mem.tex
> > 
> > diff --git a/content.tex b/content.tex
> > index b101d1b..321a2f4 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -331,6 +331,9 @@ Virtqueue format, or both.
> >  \input{split-ring.tex}
> >  
> >  \input{packed-ring.tex}
> > +
> > +\input{shared-mem.tex}
> > +
> >  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> >  
> >  We start with an overview of device initialization, then expand on the
> > diff --git a/shared-mem.tex b/shared-mem.tex
> > new file mode 100644
> > index 0000000..6da249c
> > --- /dev/null
> > +++ b/shared-mem.tex
> > @@ -0,0 +1,25 @@
> > +\section{Shared Memory Regions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> > +
> > +Shared memory regions are an additional facility
> > +available to devices that need a region of memory that's
> > +continuously shared between the host and the guest, rather
> > +than passed between them in the way virtqueue elements are.
> > +

Kind of like the rings, just without any (common) pre-defined purpose and
semantics?

> > +Example uses include shared caches and version pools for versioned
> > +data structures.
> > +
> > +Shared memory regions MUST NOT be used to control the operation
> > +of the device, nor to stream data; those should still be performed
> > +using virtqueues.
> > +

I will have to think about this paragraph some more...

> > +A device may have multiple shared memory regions associated with
> > +it.  Each region has a \field{shmid} to identify it, the meaning
> > +of which is device specific.
> > +
> > +Enumeration and location of shared memory regions is performed
> > +using a transport-specific data structure.
> 
> "data structure and mechanism"?
> 
> > +
> > +The guest physical address and the host virtual address MUST NOT
> > +be used to identify structures within the memory regions; all
> > +addressing MUST be relative to the start of a particular region.
> > +
> 
> Is the intended implementation that the device provides a certain
> memory region (in host memory) and exposes it to the driver? Are there
> supposed to be any notifications of writes? Or do both simply write to
> the region and get whatever updates the other side has made when they
> read from the region again?
> 
> I'm a bit unsure how to implement this for the ccw transport. Maybe a
> new pair of ccws to read/write shared memory regions? But we'd also
> need a mechanism to discover the ids of those shared memory regions, I
> think.
> 
> Halil, do you have any thoughts?
> 

I hope to develop more some. I've missed these discussions unfortunately,
and the memory stuff is not my forte. But we do seem to need a mechanism
to discover/expose (driver/device) these.

Do we want to change the device initialization (3.1) subsection? I'm not
sure if this shared  memory region discovery is something that's
supposed to be a part of the initialization. At the moment, I would guess
is the device not supposed to be able to provide new regions at any time
(as I don't see how the device is supposed to tell the driver: hey
please re-do discovery). 

Regards,
Halil


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-11 15:29     ` Halil Pasic
@ 2019-01-11 16:07       ` Dr. David Alan Gilbert
  2019-01-11 17:57         ` Halil Pasic
  0 siblings, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2019-01-11 16:07 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Cornelia Huck, virtio-comment, stefanha

* Halil Pasic (pasic@linux.ibm.com) wrote:
> On Fri, 11 Jan 2019 13:15:40 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > On Fri, 11 Jan 2019 11:41:58 +0000
> > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> > 
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Define the requirements and idea behind shared memory regions.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  content.tex    |  3 +++
> > >  shared-mem.tex | 25 +++++++++++++++++++++++++
> > >  2 files changed, 28 insertions(+)
> > >  create mode 100644 shared-mem.tex
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index b101d1b..321a2f4 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -331,6 +331,9 @@ Virtqueue format, or both.
> > >  \input{split-ring.tex}
> > >  
> > >  \input{packed-ring.tex}
> > > +
> > > +\input{shared-mem.tex}
> > > +
> > >  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> > >  
> > >  We start with an overview of device initialization, then expand on the
> > > diff --git a/shared-mem.tex b/shared-mem.tex
> > > new file mode 100644
> > > index 0000000..6da249c
> > > --- /dev/null
> > > +++ b/shared-mem.tex
> > > @@ -0,0 +1,25 @@
> > > +\section{Shared Memory Regions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> > > +
> > > +Shared memory regions are an additional facility
> > > +available to devices that need a region of memory that's
> > > +continuously shared between the host and the guest, rather
> > > +than passed between them in the way virtqueue elements are.
> > > +
> 
> Kind of like the rings, just without any (common) pre-defined purpose and
> semantics?

Yes, the difference is that all the data in the rings involves handing
over 'ownership' of data from one side to the other and then doing
something and handing it back; these regions are designed for data
that's in use concurrently by both sides.

> > > +Example uses include shared caches and version pools for versioned
> > > +data structures.
> > > +
> > > +Shared memory regions MUST NOT be used to control the operation
> > > +of the device, nor to stream data; those should still be performed
> > > +using virtqueues.
> > > +
> 
> I will have to think about this paragraph some more...

It's intended to stop people using these regions for things that should
be done in the rings.

> > > +A device may have multiple shared memory regions associated with
> > > +it.  Each region has a \field{shmid} to identify it, the meaning
> > > +of which is device specific.
> > > +
> > > +Enumeration and location of shared memory regions is performed
> > > +using a transport-specific data structure.
> > 
> > "data structure and mechanism"?
> > 
> > > +
> > > +The guest physical address and the host virtual address MUST NOT
> > > +be used to identify structures within the memory regions; all
> > > +addressing MUST be relative to the start of a particular region.
> > > +
> > 
> > Is the intended implementation that the device provides a certain
> > memory region (in host memory) and exposes it to the driver? Are there
> > supposed to be any notifications of writes? Or do both simply write to
> > the region and get whatever updates the other side has made when they
> > read from the region again?
> > 
> > I'm a bit unsure how to implement this for the ccw transport. Maybe a
> > new pair of ccws to read/write shared memory regions? But we'd also
> > need a mechanism to discover the ids of those shared memory regions, I
> > think.
> > 
> > Halil, do you have any thoughts?
> > 
> 
> I hope to develop more some. I've missed these discussions unfortunately,
> and the memory stuff is not my forte. But we do seem to need a mechanism
> to discover/expose (driver/device) these.

Yes, agreed; I added the PCI and mmio discovery in the next patches
but didn't know where to start for ccw.

> Do we want to change the device initialization (3.1) subsection? I'm not
> sure if this shared  memory region discovery is something that's
> supposed to be a part of the initialization. At the moment, I would guess
> is the device not supposed to be able to provide new regions at any time
> (as I don't see how the device is supposed to tell the driver: hey
> please re-do discovery). 

Yes, it's part of initialisation; although since the enumeration is
specific to the transport and the use is specific to the device, I'm not
sure what goes in a common initialization section.

Dave

> Regards,
> Halil
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-11 16:07       ` Dr. David Alan Gilbert
@ 2019-01-11 17:57         ` Halil Pasic
  2019-01-15  9:33           ` Cornelia Huck
  0 siblings, 1 reply; 44+ messages in thread
From: Halil Pasic @ 2019-01-11 17:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Cornelia Huck, virtio-comment, stefanha

On Fri, 11 Jan 2019 16:07:23 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> > Do we want to change the device initialization (3.1) subsection? I'm not
> > sure if this shared  memory region discovery is something that's
> > supposed to be a part of the initialization. At the moment, I would guess
> > is the device not supposed to be able to provide new regions at any time
> > (as I don't see how the device is supposed to tell the driver: hey
> > please re-do discovery).   
> 
> Yes, it's part of initialisation; although since the enumeration is
> specific to the transport and the use is specific to the device, I'm not
> sure what goes in a common initialization section.

I think it does go in a common initialization. Virtqueues are also
discovered in a transport specific way, and same goes for reading/writing config.

So I would add somethng to 
"7. Perform device-specific setup, including discovery of virtqueues for
the device, optional per-bus setup, reading and possibly writing the
device’s virtio configuration space, and population of virtqueues." in
subsection 3.1.1.

Regards,
Halil


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-11 17:57         ` Halil Pasic
@ 2019-01-15  9:33           ` Cornelia Huck
  0 siblings, 0 replies; 44+ messages in thread
From: Cornelia Huck @ 2019-01-15  9:33 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dr. David Alan Gilbert, virtio-comment, stefanha

On Fri, 11 Jan 2019 18:57:17 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Fri, 11 Jan 2019 16:07:23 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > > Do we want to change the device initialization (3.1) subsection? I'm not
> > > sure if this shared  memory region discovery is something that's
> > > supposed to be a part of the initialization. At the moment, I would guess
> > > is the device not supposed to be able to provide new regions at any time
> > > (as I don't see how the device is supposed to tell the driver: hey
> > > please re-do discovery).     
> > 
> > Yes, it's part of initialisation; although since the enumeration is
> > specific to the transport and the use is specific to the device, I'm not
> > sure what goes in a common initialization section.  
> 
> I think it does go in a common initialization. Virtqueues are also
> discovered in a transport specific way, and same goes for reading/writing config.
> 
> So I would add somethng to 
> "7. Perform device-specific setup, including discovery of virtqueues for
> the device, optional per-bus setup, reading and possibly writing the
> device’s virtio configuration space, and population of virtqueues." in
> subsection 3.1.1.

Agreed, it probably makes sense to add shared memory region discovery
as an extra item to this list of things to be setup.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-11 12:26     ` Dr. David Alan Gilbert
@ 2019-01-15 10:10       ` Cornelia Huck
  2019-01-15 11:23         ` Dr. David Alan Gilbert
  2019-01-23 15:12         ` Michael S. Tsirkin
  0 siblings, 2 replies; 44+ messages in thread
From: Cornelia Huck @ 2019-01-15 10:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-comment, stefanha, Halil Pasic

On Fri, 11 Jan 2019 12:26:54 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Cornelia Huck (cohuck@redhat.com) wrote:
> > On Fri, 11 Jan 2019 11:41:58 +0000
> > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:

> > > +\section{Shared Memory Regions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> > > +
> > > +Shared memory regions are an additional facility
> > > +available to devices that need a region of memory that's
> > > +continuously shared between the host and the guest, rather
> > > +than passed between them in the way virtqueue elements are.

I think we probably need to clarify the expectations (consistency etc.)
a bit more, see my remarks below.

> > > +
> > > +Example uses include shared caches and version pools for versioned
> > > +data structures.
> > > +
> > > +Shared memory regions MUST NOT be used to control the operation
> > > +of the device, nor to stream data; those should still be performed
> > > +using virtqueues.
> > > +
> > > +A device may have multiple shared memory regions associated with
> > > +it.  Each region has a \field{shmid} to identify it, the meaning
> > > +of which is device specific.
> > > +
> > > +Enumeration and location of shared memory regions is performed
> > > +using a transport-specific data structure.  
> > 
> > "data structure and mechanism"?  
> 
> Changed; thanks.
> 
> > > +
> > > +The guest physical address and the host virtual address MUST NOT
> > > +be used to identify structures within the memory regions; all
> > > +addressing MUST be relative to the start of a particular region.
> > > +  
> > 
> > Is the intended implementation that the device provides a certain
> > memory region (in host memory) and exposes it to the driver? Are there
> > supposed to be any notifications of writes? Or do both simply write to
> > the region and get whatever updates the other side has made when they
> > read from the region again?  
> 
> There's no notification;  in our case we have two main uses:
>   a) Direct mapping of host files into the guests memory
> 
>   b) Mapping of a version table with quickly updated version numbers for
>      data structures to do quick invalidation

This sounds a lot like "we have a memory area, and both device and
driver may write to or read from it at any time". Are there any
expectations regarding consistency when reading data, or is there
supposed to be a device-type specific mechanism to get certain
consistent values?

> > I'm a bit unsure how to implement this for the ccw transport. Maybe a
> > new pair of ccws to read/write shared memory regions?  
> 
> Without knowing anything about CCW itself, I don't think you'd want
> to do calls to perform the reads/writes - remember these are entirely
> emulated devices, and the shared memory regions just correspond to
> memory regions in the hypervisor; so in most ways they just behave
> like a region of RAM.  If the drivers can't treat them like RAM there's
> probably no point in using this feature in that environment.

The main issue here is that s390 does not have memory mapped I/O --
even PCI uses some specialized instructions. This means we need to
figure out how to model some stuff that Just Works on other platforms.

So, basically there are two options:
- Have the device set aside a memory area; the host maps this into the
  guest and the driver can access it. No notifications, only discovery
  is needed.
- Have the device set aside a memory area; the driver can only access
  this via special operations, which the host can trap. This needs two
  more commands to be set aside, and any driver accesses need to be
  forced through these commands (that's a bit like config space).

If I've understood the intended usage correctly, we can use the simpler
first option. The drawback is that we can't add interception
possibilities (that we get via the second option) should we need them
later on.

> 
> > But we'd also
> > need a mechanism to discover the ids of those shared memory regions, I
> > think.  
> 
> Yes, I'm assuming you'll need a call to enumerate them.

Agreed.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-15 10:10       ` Cornelia Huck
@ 2019-01-15 11:23         ` Dr. David Alan Gilbert
  2019-01-16 10:56           ` Cornelia Huck
  2019-01-23 15:12         ` Michael S. Tsirkin
  1 sibling, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2019-01-15 11:23 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: virtio-comment, stefanha, Halil Pasic

* Cornelia Huck (cohuck@redhat.com) wrote:
> On Fri, 11 Jan 2019 12:26:54 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > On Fri, 11 Jan 2019 11:41:58 +0000
> > > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> 
> > > > +\section{Shared Memory Regions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> > > > +
> > > > +Shared memory regions are an additional facility
> > > > +available to devices that need a region of memory that's
> > > > +continuously shared between the host and the guest, rather
> > > > +than passed between them in the way virtqueue elements are.
> 
> I think we probably need to clarify the expectations (consistency etc.)
> a bit more, see my remarks below.
> 
> > > > +
> > > > +Example uses include shared caches and version pools for versioned
> > > > +data structures.
> > > > +
> > > > +Shared memory regions MUST NOT be used to control the operation
> > > > +of the device, nor to stream data; those should still be performed
> > > > +using virtqueues.
> > > > +
> > > > +A device may have multiple shared memory regions associated with
> > > > +it.  Each region has a \field{shmid} to identify it, the meaning
> > > > +of which is device specific.
> > > > +
> > > > +Enumeration and location of shared memory regions is performed
> > > > +using a transport-specific data structure.  
> > > 
> > > "data structure and mechanism"?  
> > 
> > Changed; thanks.
> > 
> > > > +
> > > > +The guest physical address and the host virtual address MUST NOT
> > > > +be used to identify structures within the memory regions; all
> > > > +addressing MUST be relative to the start of a particular region.
> > > > +  
> > > 
> > > Is the intended implementation that the device provides a certain
> > > memory region (in host memory) and exposes it to the driver? Are there
> > > supposed to be any notifications of writes? Or do both simply write to
> > > the region and get whatever updates the other side has made when they
> > > read from the region again?  
> > 
> > There's no notification;  in our case we have two main uses:
> >   a) Direct mapping of host files into the guests memory
> > 
> >   b) Mapping of a version table with quickly updated version numbers for
> >      data structures to do quick invalidation
> 
> This sounds a lot like "we have a memory area, and both device and
> driver may write to or read from it at any time". Are there any
> expectations regarding consistency when reading data, or is there
> supposed to be a device-type specific mechanism to get certain
> consistent values?

It's device-type specific; and potentially different for different
shared memory regions associated with that device.
In the virtio-fs usecase we've really got two separate regions; one is
a direct mapping of files on the host, the other is a structure
containing flags/version numbers for data structures; the later
probably has much more strict ordering semantics.

> > > I'm a bit unsure how to implement this for the ccw transport. Maybe a
> > > new pair of ccws to read/write shared memory regions?  
> > 
> > Without knowing anything about CCW itself, I don't think you'd want
> > to do calls to perform the reads/writes - remember these are entirely
> > emulated devices, and the shared memory regions just correspond to
> > memory regions in the hypervisor; so in most ways they just behave
> > like a region of RAM.  If the drivers can't treat them like RAM there's
> > probably no point in using this feature in that environment.
> 
> The main issue here is that s390 does not have memory mapped I/O --
> even PCI uses some specialized instructions. This means we need to
> figure out how to model some stuff that Just Works on other platforms.
> 
> So, basically there are two options:
> - Have the device set aside a memory area; the host maps this into the
>   guest and the driver can access it. No notifications, only discovery
>   is needed.
> - Have the device set aside a memory area; the driver can only access
>   this via special operations, which the host can trap. This needs two
>   more commands to be set aside, and any driver accesses need to be
>   forced through these commands (that's a bit like config space).
> 
> If I've understood the intended usage correctly, we can use the simpler
> first option. The drawback is that we can't add interception
> possibilities (that we get via the second option) should we need them
> later on.

Right, and it's the first option we need.

> > 
> > > But we'd also
> > > need a mechanism to discover the ids of those shared memory regions, I
> > > think.  
> > 
> > Yes, I'm assuming you'll need a call to enumerate them.
> 
> Agreed.


Dave

> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-15 11:23         ` Dr. David Alan Gilbert
@ 2019-01-16 10:56           ` Cornelia Huck
  2019-01-16 20:06             ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 44+ messages in thread
From: Cornelia Huck @ 2019-01-16 10:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-comment, stefanha, Halil Pasic

On Tue, 15 Jan 2019 11:23:03 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Cornelia Huck (cohuck@redhat.com) wrote:
> > On Fri, 11 Jan 2019 12:26:54 +0000
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >   
> > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > On Fri, 11 Jan 2019 11:41:58 +0000
> > > > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:  
> >   
> > > > > +\section{Shared Memory Regions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> > > > > +
> > > > > +Shared memory regions are an additional facility
> > > > > +available to devices that need a region of memory that's
> > > > > +continuously shared between the host and the guest, rather
> > > > > +than passed between them in the way virtqueue elements are.  
> > 
> > I think we probably need to clarify the expectations (consistency etc.)
> > a bit more, see my remarks below.
> >   
> > > > > +
> > > > > +Example uses include shared caches and version pools for versioned
> > > > > +data structures.
> > > > > +
> > > > > +Shared memory regions MUST NOT be used to control the operation
> > > > > +of the device, nor to stream data; those should still be performed
> > > > > +using virtqueues.
> > > > > +
> > > > > +A device may have multiple shared memory regions associated with
> > > > > +it.  Each region has a \field{shmid} to identify it, the meaning
> > > > > +of which is device specific.
> > > > > +
> > > > > +Enumeration and location of shared memory regions is performed
> > > > > +using a transport-specific data structure.    
> > > > 
> > > > "data structure and mechanism"?    
> > > 
> > > Changed; thanks.
> > >   
> > > > > +
> > > > > +The guest physical address and the host virtual address MUST NOT
> > > > > +be used to identify structures within the memory regions; all
> > > > > +addressing MUST be relative to the start of a particular region.
> > > > > +    
> > > > 
> > > > Is the intended implementation that the device provides a certain
> > > > memory region (in host memory) and exposes it to the driver? Are there
> > > > supposed to be any notifications of writes? Or do both simply write to
> > > > the region and get whatever updates the other side has made when they
> > > > read from the region again?    
> > > 
> > > There's no notification;  in our case we have two main uses:
> > >   a) Direct mapping of host files into the guests memory
> > > 
> > >   b) Mapping of a version table with quickly updated version numbers for
> > >      data structures to do quick invalidation  
> > 
> > This sounds a lot like "we have a memory area, and both device and
> > driver may write to or read from it at any time". Are there any
> > expectations regarding consistency when reading data, or is there
> > supposed to be a device-type specific mechanism to get certain
> > consistent values?  
> 
> It's device-type specific; and potentially different for different
> shared memory regions associated with that device.

Ok; it might make sense to put a sentence like that into the generic
section.

> In the virtio-fs usecase we've really got two separate regions; one is
> a direct mapping of files on the host, the other is a structure
> containing flags/version numbers for data structures; the later
> probably has much more strict ordering semantics.

Another question: Do we expect the set of regions to be unchanged
during the lifetime of the device, or we need a mechanism to trigger
rediscovery?

> 
> > > > I'm a bit unsure how to implement this for the ccw transport. Maybe a
> > > > new pair of ccws to read/write shared memory regions?    
> > > 
> > > Without knowing anything about CCW itself, I don't think you'd want
> > > to do calls to perform the reads/writes - remember these are entirely
> > > emulated devices, and the shared memory regions just correspond to
> > > memory regions in the hypervisor; so in most ways they just behave
> > > like a region of RAM.  If the drivers can't treat them like RAM there's
> > > probably no point in using this feature in that environment.  
> > 
> > The main issue here is that s390 does not have memory mapped I/O --
> > even PCI uses some specialized instructions. This means we need to
> > figure out how to model some stuff that Just Works on other platforms.
> > 
> > So, basically there are two options:
> > - Have the device set aside a memory area; the host maps this into the
> >   guest and the driver can access it. No notifications, only discovery
> >   is needed.
> > - Have the device set aside a memory area; the driver can only access
> >   this via special operations, which the host can trap. This needs two
> >   more commands to be set aside, and any driver accesses need to be
> >   forced through these commands (that's a bit like config space).
> > 
> > If I've understood the intended usage correctly, we can use the simpler
> > first option. The drawback is that we can't add interception
> > possibilities (that we get via the second option) should we need them
> > later on.  
> 
> Right, and it's the first option we need.

That makes things easier. Thanks!

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-16 10:56           ` Cornelia Huck
@ 2019-01-16 20:06             ` Dr. David Alan Gilbert
  2019-02-11 21:52               ` Cornelia Huck
  0 siblings, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2019-01-16 20:06 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: virtio-comment, stefanha, Halil Pasic

* Cornelia Huck (cohuck@redhat.com) wrote:
> On Tue, 15 Jan 2019 11:23:03 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > On Fri, 11 Jan 2019 12:26:54 +0000
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >   
> > > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > > On Fri, 11 Jan 2019 11:41:58 +0000
> > > > > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:  
> > >   
> > > > > > +\section{Shared Memory Regions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> > > > > > +
> > > > > > +Shared memory regions are an additional facility
> > > > > > +available to devices that need a region of memory that's
> > > > > > +continuously shared between the host and the guest, rather
> > > > > > +than passed between them in the way virtqueue elements are.  
> > > 
> > > I think we probably need to clarify the expectations (consistency etc.)
> > > a bit more, see my remarks below.
> > >   
> > > > > > +
> > > > > > +Example uses include shared caches and version pools for versioned
> > > > > > +data structures.
> > > > > > +
> > > > > > +Shared memory regions MUST NOT be used to control the operation
> > > > > > +of the device, nor to stream data; those should still be performed
> > > > > > +using virtqueues.
> > > > > > +
> > > > > > +A device may have multiple shared memory regions associated with
> > > > > > +it.  Each region has a \field{shmid} to identify it, the meaning
> > > > > > +of which is device specific.
> > > > > > +
> > > > > > +Enumeration and location of shared memory regions is performed
> > > > > > +using a transport-specific data structure.    
> > > > > 
> > > > > "data structure and mechanism"?    
> > > > 
> > > > Changed; thanks.
> > > >   
> > > > > > +
> > > > > > +The guest physical address and the host virtual address MUST NOT
> > > > > > +be used to identify structures within the memory regions; all
> > > > > > +addressing MUST be relative to the start of a particular region.
> > > > > > +    
> > > > > 
> > > > > Is the intended implementation that the device provides a certain
> > > > > memory region (in host memory) and exposes it to the driver? Are there
> > > > > supposed to be any notifications of writes? Or do both simply write to
> > > > > the region and get whatever updates the other side has made when they
> > > > > read from the region again?    
> > > > 
> > > > There's no notification;  in our case we have two main uses:
> > > >   a) Direct mapping of host files into the guests memory
> > > > 
> > > >   b) Mapping of a version table with quickly updated version numbers for
> > > >      data structures to do quick invalidation  
> > > 
> > > This sounds a lot like "we have a memory area, and both device and
> > > driver may write to or read from it at any time". Are there any
> > > expectations regarding consistency when reading data, or is there
> > > supposed to be a device-type specific mechanism to get certain
> > > consistent values?  
> > 
> > It's device-type specific; and potentially different for different
> > shared memory regions associated with that device.
> 
> Ok; it might make sense to put a sentence like that into the generic
> section.

I've added:
   Memory consistency rules vary depending on the region and the
   device.  Devices MUST define the required behaviour for each
   region.

> > In the virtio-fs usecase we've really got two separate regions; one is
> > a direct mapping of files on the host, the other is a structure
> > containing flags/version numbers for data structures; the later
> > probably has much more strict ordering semantics.
> 
> Another question: Do we expect the set of regions to be unchanged
> during the lifetime of the device, or we need a mechanism to trigger
> rediscovery?

In our uses we've got fixed sizes; they're similar to caches, where
they're fixed size and we evict data and mappings and fill them again later.
I can imagine that someone might want to dynamically change it, but
perhaps it's best to wait until someone has a case like that to
understand their requirements.

> > 
> > > > > I'm a bit unsure how to implement this for the ccw transport. Maybe a
> > > > > new pair of ccws to read/write shared memory regions?    
> > > > 
> > > > Without knowing anything about CCW itself, I don't think you'd want
> > > > to do calls to perform the reads/writes - remember these are entirely
> > > > emulated devices, and the shared memory regions just correspond to
> > > > memory regions in the hypervisor; so in most ways they just behave
> > > > like a region of RAM.  If the drivers can't treat them like RAM there's
> > > > probably no point in using this feature in that environment.  
> > > 
> > > The main issue here is that s390 does not have memory mapped I/O --
> > > even PCI uses some specialized instructions. This means we need to
> > > figure out how to model some stuff that Just Works on other platforms.
> > > 
> > > So, basically there are two options:
> > > - Have the device set aside a memory area; the host maps this into the
> > >   guest and the driver can access it. No notifications, only discovery
> > >   is needed.
> > > - Have the device set aside a memory area; the driver can only access
> > >   this via special operations, which the host can trap. This needs two
> > >   more commands to be set aside, and any driver accesses need to be
> > >   forced through these commands (that's a bit like config space).
> > > 
> > > If I've understood the intended usage correctly, we can use the simpler
> > > first option. The drawback is that we can't add interception
> > > possibilities (that we get via the second option) should we need them
> > > later on.  
> > 
> > Right, and it's the first option we need.
> 
> That makes things easier. Thanks!

Great.

So these are all moving this 1/3 forward - has anyone got comments on
the transport specific implementations?

Dave

> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-15 10:10       ` Cornelia Huck
  2019-01-15 11:23         ` Dr. David Alan Gilbert
@ 2019-01-23 15:12         ` Michael S. Tsirkin
  1 sibling, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2019-01-23 15:12 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Dr. David Alan Gilbert, virtio-comment, stefanha, Halil Pasic

On Tue, Jan 15, 2019 at 11:10:38AM +0100, Cornelia Huck wrote:
> On Fri, 11 Jan 2019 12:26:54 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > On Fri, 11 Jan 2019 11:41:58 +0000
> > > "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> 
> > > > +\section{Shared Memory Regions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> > > > +
> > > > +Shared memory regions are an additional facility
> > > > +available to devices that need a region of memory that's
> > > > +continuously shared between the host and the guest, rather
> > > > +than passed between them in the way virtqueue elements are.
> 
> I think we probably need to clarify the expectations (consistency etc.)
> a bit more, see my remarks below.
> 
> > > > +
> > > > +Example uses include shared caches and version pools for versioned
> > > > +data structures.
> > > > +
> > > > +Shared memory regions MUST NOT be used to control the operation
> > > > +of the device, nor to stream data; those should still be performed
> > > > +using virtqueues.
> > > > +
> > > > +A device may have multiple shared memory regions associated with
> > > > +it.  Each region has a \field{shmid} to identify it, the meaning
> > > > +of which is device specific.
> > > > +
> > > > +Enumeration and location of shared memory regions is performed
> > > > +using a transport-specific data structure.  
> > > 
> > > "data structure and mechanism"?  
> > 
> > Changed; thanks.
> > 
> > > > +
> > > > +The guest physical address and the host virtual address MUST NOT
> > > > +be used to identify structures within the memory regions; all
> > > > +addressing MUST be relative to the start of a particular region.
> > > > +  
> > > 
> > > Is the intended implementation that the device provides a certain
> > > memory region (in host memory) and exposes it to the driver? Are there
> > > supposed to be any notifications of writes? Or do both simply write to
> > > the region and get whatever updates the other side has made when they
> > > read from the region again?  
> > 
> > There's no notification;  in our case we have two main uses:
> >   a) Direct mapping of host files into the guests memory
> > 
> >   b) Mapping of a version table with quickly updated version numbers for
> >      data structures to do quick invalidation
> 
> This sounds a lot like "we have a memory area, and both device and
> driver may write to or read from it at any time". Are there any
> expectations regarding consistency when reading data, or is there
> supposed to be a device-type specific mechanism to get certain
> consistent values?
> 
> > > I'm a bit unsure how to implement this for the ccw transport. Maybe a
> > > new pair of ccws to read/write shared memory regions?  
> > 
> > Without knowing anything about CCW itself, I don't think you'd want
> > to do calls to perform the reads/writes - remember these are entirely
> > emulated devices, and the shared memory regions just correspond to
> > memory regions in the hypervisor; so in most ways they just behave
> > like a region of RAM.  If the drivers can't treat them like RAM there's
> > probably no point in using this feature in that environment.
> 
> The main issue here is that s390 does not have memory mapped I/O --
> even PCI uses some specialized instructions. This means we need to
> figure out how to model some stuff that Just Works on other platforms.
> 
> So, basically there are two options:
> - Have the device set aside a memory area; the host maps this into the
>   guest and the driver can access it. No notifications, only discovery
>   is needed.
> - Have the device set aside a memory area; the driver can only access
>   this via special operations, which the host can trap. This needs two
>   more commands to be set aside, and any driver accesses need to be
>   forced through these commands (that's a bit like config space).
> 
> If I've understood the intended usage correctly, we can use the simpler
> first option. The drawback is that we can't add interception
> possibilities (that we get via the second option) should we need them
> later on.

Won't write-protecting the memory make writes cause a trap that we can
then handle?

> > 
> > > But we'd also
> > > need a mechanism to discover the ids of those shared memory regions, I
> > > think.  
> > 
> > Yes, I'm assuming you'll need a call to enumerate them.
> 
> Agreed.
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-16 20:06             ` Dr. David Alan Gilbert
@ 2019-02-11 21:52               ` Cornelia Huck
  2019-02-13 18:37                 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 44+ messages in thread
From: Cornelia Huck @ 2019-02-11 21:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-comment, stefanha, Halil Pasic

On Wed, 16 Jan 2019 20:06:25 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> So these are all moving this 1/3 forward - has anyone got comments on
> the transport specific implementations?

No comment on pci or mmio, but I've hacked something together for ccw.
Basically, one sense-type ccw for discovery and a control-type ccw for
activation of the regions (no idea if we really need the latter), both
available with ccw revision 3.

No idea whether this will work this way, though...

diff --git a/content.tex b/content.tex
index 836ee5236939..7f379bca932e 100644
--- a/content.tex
+++ b/content.tex
@@ -2078,6 +2078,8 @@ virtio:
 #define CCW_CMD_READ_VQ_CONF 0x32
 #define CCW_CMD_SET_VIRTIO_REV 0x83
 #define CCW_CMD_READ_STATUS 0x72
+#define CCW_CMD_GET_REGIONS 0x14
+#define CCW_CMD_MAP_REGIONS 0x93
 \end{lstlisting}
 
 \subsubsection{Notifications}\label{sec:Virtio Transport Options / Virtio
@@ -2170,7 +2172,9 @@ The following values are supported:
 \hline
 2        & 0      & <empty>   & CCW_CMD_READ_STATUS support \\
 \hline
-3-n      &        &           & reserved for later revisions \\
+3        & 0      & <empty>   & CCW_CMD_GET_REGIONS and CCW_CMD_MAP_REGIONS support \\
+\hline
+4-n      &        &           & reserved for later revisions \\
 \hline
 \end{tabular}
 
@@ -2449,6 +2453,72 @@ command. Some legacy devices will support two-stage queue indicators, though,
 and a driver will be able to successfully use CCW_CMD_SET_IND_ADAPTER to set
 them up.
 
+\subsubsection{Handling Shared Memory Regions}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions}
+
+The CCW_CMD_GET_REGIONS command allows the driver to discover shared memory
+regions provided by the device, if any.
+
+The driver provides a pointer to a 4096 byte buffer that is filled out by
+the device:
+
+\begin{lstlisting}
+  struct shared_regions_info {
+    be64 num_regions;
+    struct shared_region_desc regions[];
+  };
+\end{lstlisting}
+
+The buffer contains 0 or more shared region descriptors, as specified
+by \field{num_regions}. If the devices does not provide shared regions,
+\field{num_regions} is 0. Otherwise, the shared region descriptors have
+the following format:
+
+\begin{lstlisting}
+struct shared_region_desc {
+        be64 addr;
+        be64 len;
+        u8 id;
+        u8 pad[3];
+}
+\end{lstlisting}
+
+\field{addr} is the guest-physical address of the region with a length of
+\field{len}, identified by \field{id}. The contents of \field{pad} are
+unpredictable, although it is recommended that the device fills in zeroes.
+
+To activate or deactivate a shared memory region, the device uses the
+CCW_CMD_MAP_REGIONS command. It takes the following payload:
+
+\begin{lstlisting}
+struct shared_region_ctrl {
+        u8 id;
+        u8 activate;
+        u8 pad[2];
+}
+\end{lstlisting}
+
+\field{id} denotes the shared memory region that is the target of the command,
+while \field{activate} specifies whether the region should be activated (if 1)
+or deactivated (if 0). When activated, the device makes the guest-physical
+address of the region available as a shared memory region.
+
+\devicenormative{\paragraph}{Handling Shared Memory Regions}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions}
+
+The device MUST reject the CCW_CMD_GET_REGIONS and CCW_CMD_MAP_REGIONS
+commands if not at least revision 3 has been negotiated.
+
+The device MUST NOT read from or write to the region before it has been
+activated by the driver or after it has been deactivated by the driver.
+
+If the driver reads from or writes to an address specified to a region that is
+not activated by the driver, it MUST treat this read or write as a normal
+read or write operation.
+
+\drivernormative{\paragraph}{Handling Shared Memory Regions}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions}
+
+The driver MUST NOT treat the guest-physical address of a region as a shared
+memory region before it has activated it or after it has deactivated it.
+
 \subsection{Device Operation}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation}
 
 \subsubsection{Host->Guest Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification}

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [virtio-comment] Re: [PATCH 1/3] shared memory: Define shared memory regions
  2019-01-11 11:41 ` [virtio-comment] [PATCH 1/3] shared memory: Define " Dr. David Alan Gilbert (git)
  2019-01-11 12:15   ` Cornelia Huck
@ 2019-02-13  2:25   ` Stefan Hajnoczi
  2019-02-13 10:44     ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2019-02-13  2:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: virtio-comment

[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

On Fri, Jan 11, 2019 at 11:41:58AM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Define the requirements and idea behind shared memory regions.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  content.tex    |  3 +++
>  shared-mem.tex | 25 +++++++++++++++++++++++++
>  2 files changed, 28 insertions(+)
>  create mode 100644 shared-mem.tex
> 
> diff --git a/content.tex b/content.tex
> index b101d1b..321a2f4 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -331,6 +331,9 @@ Virtqueue format, or both.
>  \input{split-ring.tex}
>  
>  \input{packed-ring.tex}
> +
> +\input{shared-mem.tex}
> +
>  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>  
>  We start with an overview of device initialization, then expand on the
> diff --git a/shared-mem.tex b/shared-mem.tex
> new file mode 100644
> index 0000000..6da249c
> --- /dev/null
> +++ b/shared-mem.tex
> @@ -0,0 +1,25 @@
> +\section{Shared Memory Regions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> +
> +Shared memory regions are an additional facility
> +available to devices that need a region of memory that's
> +continuously shared between the host and the guest, rather
> +than passed between them in the way virtqueue elements are.
> +
> +Example uses include shared caches and version pools for versioned
> +data structures.

Another motivation for shared memory regions is to accommodate memory
resources that must be allocated by the device rather than the driver.

For example, some GPU hardware restrictions make it impossible for the
driver to select a suitable RAM region.

> +
> +Shared memory regions MUST NOT be used to control the operation
> +of the device, nor to stream data; those should still be performed
> +using virtqueues.
> +
> +A device may have multiple shared memory regions associated with
> +it.  Each region has a \field{shmid} to identify it, the meaning
> +of which is device specific.

s/device specific/device-specific/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [virtio-comment] Re: [PATCH 2/3] shared memory: Define PCI capability
  2019-01-11 11:41 ` [virtio-comment] [PATCH 2/3] shared memory: Define PCI capability Dr. David Alan Gilbert (git)
@ 2019-02-13  2:30   ` Stefan Hajnoczi
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2019-02-13  2:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: virtio-comment

[-- Attachment #1: Type: text/plain, Size: 436 bytes --]

On Fri, Jan 11, 2019 at 11:41:59AM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Define the PCI capability used for enumerating shared memory regions.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  content.tex | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [virtio-comment] Re: [PATCH 3/3] shared memory: Define mmio registers
  2019-01-11 11:42 ` [virtio-comment] [PATCH 3/3] shared memory: Define mmio registers Dr. David Alan Gilbert (git)
@ 2019-02-13  2:33   ` Stefan Hajnoczi
  2019-02-13 16:52     ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2019-02-13  2:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert (git); +Cc: virtio-comment

[-- Attachment #1: Type: text/plain, Size: 1092 bytes --]

On Fri, Jan 11, 2019 at 11:42:00AM +0000, Dr. David Alan Gilbert (git) wrote:
> diff --git a/content.tex b/content.tex
> index 776162a..e2435ff 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -1641,6 +1641,33 @@ All register values are organized as Little Endian.
>      selected by writing to \field{QueueSel}.
>    }
>    \hline 
> +  \mmioreg{SHMSel}{Shared memory region select}{0x0b0}{W}{%
> +    Writing to this register selects the shared memory region \ref{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> +    following operations on \field{SHMId}, \field{SHMLenLow}, \field{SHMLenHigh},
> +    \field{SHMBaseLow} and \field{SHMBaseHigh} apply to. The index
> +    number of the first region is zero (0x0); note that it is a
> +    simple index that does not correspond to the \field{id}.

Why use a separate virtio-mmio index instead of the shared memory ID?
The driver would map those IDs it knows about, so I'm not sure if
enumerating them via a separate index is useful.  Plus using the ID lets
you drop one register from the interface.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [virtio-comment] Re: [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-13  2:25   ` [virtio-comment] " Stefan Hajnoczi
@ 2019-02-13 10:44     ` Dr. David Alan Gilbert
  2019-02-14  3:43       ` Stefan Hajnoczi
  0 siblings, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2019-02-13 10:44 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-comment

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Fri, Jan 11, 2019 at 11:41:58AM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Define the requirements and idea behind shared memory regions.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  content.tex    |  3 +++
> >  shared-mem.tex | 25 +++++++++++++++++++++++++
> >  2 files changed, 28 insertions(+)
> >  create mode 100644 shared-mem.tex
> > 
> > diff --git a/content.tex b/content.tex
> > index b101d1b..321a2f4 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -331,6 +331,9 @@ Virtqueue format, or both.
> >  \input{split-ring.tex}
> >  
> >  \input{packed-ring.tex}
> > +
> > +\input{shared-mem.tex}
> > +
> >  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> >  
> >  We start with an overview of device initialization, then expand on the
> > diff --git a/shared-mem.tex b/shared-mem.tex
> > new file mode 100644
> > index 0000000..6da249c
> > --- /dev/null
> > +++ b/shared-mem.tex
> > @@ -0,0 +1,25 @@
> > +\section{Shared Memory Regions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> > +
> > +Shared memory regions are an additional facility
> > +available to devices that need a region of memory that's
> > +continuously shared between the host and the guest, rather
> > +than passed between them in the way virtqueue elements are.
> > +
> > +Example uses include shared caches and version pools for versioned
> > +data structures.
> 
> Another motivation for shared memory regions is to accommodate memory
> resources that must be allocated by the device rather than the driver.
> 
> For example, some GPU hardware restrictions make it impossible for the
> driver to select a suitable RAM region.

I've added:

  The region is chosen by the host and presented to the guest, as
  such it is useful in situations where the memory is accessed on
  the host by other libraries that can't safely access guest RAM.


> > +
> > +Shared memory regions MUST NOT be used to control the operation
> > +of the device, nor to stream data; those should still be performed
> > +using virtqueues.
> > +
> > +A device may have multiple shared memory regions associated with
> > +it.  Each region has a \field{shmid} to identify it, the meaning
> > +of which is device specific.
> 
> s/device specific/device-specific/

Fixed.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] Re: [PATCH 3/3] shared memory: Define mmio registers
  2019-02-13  2:33   ` [virtio-comment] " Stefan Hajnoczi
@ 2019-02-13 16:52     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2019-02-13 16:52 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-comment

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Fri, Jan 11, 2019 at 11:42:00AM +0000, Dr. David Alan Gilbert (git) wrote:
> > diff --git a/content.tex b/content.tex
> > index 776162a..e2435ff 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -1641,6 +1641,33 @@ All register values are organized as Little Endian.
> >      selected by writing to \field{QueueSel}.
> >    }
> >    \hline 
> > +  \mmioreg{SHMSel}{Shared memory region select}{0x0b0}{W}{%
> > +    Writing to this register selects the shared memory region \ref{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> > +    following operations on \field{SHMId}, \field{SHMLenLow}, \field{SHMLenHigh},
> > +    \field{SHMBaseLow} and \field{SHMBaseHigh} apply to. The index
> > +    number of the first region is zero (0x0); note that it is a
> > +    simple index that does not correspond to the \field{id}.
> 
> Why use a separate virtio-mmio index instead of the shared memory ID?
> The driver would map those IDs it knows about, so I'm not sure if
> enumerating them via a separate index is useful.  Plus using the ID lets
> you drop one register from the interface.

Yes, I think if we don't need enumeration that's right; we just shange
SHMId to be the writeable register and drop the SHMSel register.

Dave


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-11 21:52               ` Cornelia Huck
@ 2019-02-13 18:37                 ` Dr. David Alan Gilbert
  2019-02-14 10:58                   ` Cornelia Huck
  0 siblings, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2019-02-13 18:37 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: virtio-comment, stefanha, Halil Pasic

* Cornelia Huck (cohuck@redhat.com) wrote:
> On Wed, 16 Jan 2019 20:06:25 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > So these are all moving this 1/3 forward - has anyone got comments on
> > the transport specific implementations?
> 
> No comment on pci or mmio, but I've hacked something together for ccw.
> Basically, one sense-type ccw for discovery and a control-type ccw for
> activation of the regions (no idea if we really need the latter), both
> available with ccw revision 3.
> 
> No idea whether this will work this way, though...

That sounds (from a shm perspective) reasonable; can I ask why the
'activate' is needed?

Dave

> diff --git a/content.tex b/content.tex
> index 836ee5236939..7f379bca932e 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -2078,6 +2078,8 @@ virtio:
>  #define CCW_CMD_READ_VQ_CONF 0x32
>  #define CCW_CMD_SET_VIRTIO_REV 0x83
>  #define CCW_CMD_READ_STATUS 0x72
> +#define CCW_CMD_GET_REGIONS 0x14
> +#define CCW_CMD_MAP_REGIONS 0x93
>  \end{lstlisting}
>  
>  \subsubsection{Notifications}\label{sec:Virtio Transport Options / Virtio
> @@ -2170,7 +2172,9 @@ The following values are supported:
>  \hline
>  2        & 0      & <empty>   & CCW_CMD_READ_STATUS support \\
>  \hline
> -3-n      &        &           & reserved for later revisions \\
> +3        & 0      & <empty>   & CCW_CMD_GET_REGIONS and CCW_CMD_MAP_REGIONS support \\
> +\hline
> +4-n      &        &           & reserved for later revisions \\
>  \hline
>  \end{tabular}
>  
> @@ -2449,6 +2453,72 @@ command. Some legacy devices will support two-stage queue indicators, though,
>  and a driver will be able to successfully use CCW_CMD_SET_IND_ADAPTER to set
>  them up.
>  
> +\subsubsection{Handling Shared Memory Regions}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions}
> +
> +The CCW_CMD_GET_REGIONS command allows the driver to discover shared memory
> +regions provided by the device, if any.
> +
> +The driver provides a pointer to a 4096 byte buffer that is filled out by
> +the device:
> +
> +\begin{lstlisting}
> +  struct shared_regions_info {
> +    be64 num_regions;
> +    struct shared_region_desc regions[];
> +  };
> +\end{lstlisting}
> +
> +The buffer contains 0 or more shared region descriptors, as specified
> +by \field{num_regions}. If the devices does not provide shared regions,
> +\field{num_regions} is 0. Otherwise, the shared region descriptors have
> +the following format:
> +
> +\begin{lstlisting}
> +struct shared_region_desc {
> +        be64 addr;
> +        be64 len;
> +        u8 id;
> +        u8 pad[3];
> +}
> +\end{lstlisting}
> +
> +\field{addr} is the guest-physical address of the region with a length of
> +\field{len}, identified by \field{id}. The contents of \field{pad} are
> +unpredictable, although it is recommended that the device fills in zeroes.
> +
> +To activate or deactivate a shared memory region, the device uses the
> +CCW_CMD_MAP_REGIONS command. It takes the following payload:
> +
> +\begin{lstlisting}
> +struct shared_region_ctrl {
> +        u8 id;
> +        u8 activate;
> +        u8 pad[2];
> +}
> +\end{lstlisting}
> +
> +\field{id} denotes the shared memory region that is the target of the command,
> +while \field{activate} specifies whether the region should be activated (if 1)
> +or deactivated (if 0). When activated, the device makes the guest-physical
> +address of the region available as a shared memory region.
> +
> +\devicenormative{\paragraph}{Handling Shared Memory Regions}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions}
> +
> +The device MUST reject the CCW_CMD_GET_REGIONS and CCW_CMD_MAP_REGIONS
> +commands if not at least revision 3 has been negotiated.
> +
> +The device MUST NOT read from or write to the region before it has been
> +activated by the driver or after it has been deactivated by the driver.
> +
> +If the driver reads from or writes to an address specified to a region that is
> +not activated by the driver, it MUST treat this read or write as a normal
> +read or write operation.
> +
> +\drivernormative{\paragraph}{Handling Shared Memory Regions}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions}
> +
> +The driver MUST NOT treat the guest-physical address of a region as a shared
> +memory region before it has activated it or after it has deactivated it.
> +
>  \subsection{Device Operation}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation}
>  
>  \subsubsection{Host->Guest Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification}
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [virtio-comment] Re: [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-13 10:44     ` Dr. David Alan Gilbert
@ 2019-02-14  3:43       ` Stefan Hajnoczi
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2019-02-14  3:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-comment

[-- Attachment #1: Type: text/plain, Size: 1543 bytes --]

On Wed, Feb 13, 2019 at 10:44:27AM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Fri, Jan 11, 2019 at 11:41:58AM +0000, Dr. David Alan Gilbert (git) wrote:
 > >  We start with an overview of device initialization, then expand on the
> > > diff --git a/shared-mem.tex b/shared-mem.tex
> > > new file mode 100644
> > > index 0000000..6da249c
> > > --- /dev/null
> > > +++ b/shared-mem.tex
> > > @@ -0,0 +1,25 @@
> > > +\section{Shared Memory Regions}\label{sec:Basic Facilities of a Virtio Device / Shared Memory Regions}
> > > +
> > > +Shared memory regions are an additional facility
> > > +available to devices that need a region of memory that's
> > > +continuously shared between the host and the guest, rather
> > > +than passed between them in the way virtqueue elements are.
> > > +
> > > +Example uses include shared caches and version pools for versioned
> > > +data structures.
> > 
> > Another motivation for shared memory regions is to accommodate memory
> > resources that must be allocated by the device rather than the driver.
> > 
> > For example, some GPU hardware restrictions make it impossible for the
> > driver to select a suitable RAM region.
> 
> I've added:
> 
>   The region is chosen by the host and presented to the guest, as
>   such it is useful in situations where the memory is accessed on
>   the host by other libraries that can't safely access guest RAM.

Great!

"libraries" can be made more general:
s/libraries/components/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-13 18:37                 ` Dr. David Alan Gilbert
@ 2019-02-14 10:58                   ` Cornelia Huck
  2019-02-14 16:37                     ` Dr. David Alan Gilbert
  2019-02-15 12:51                     ` Halil Pasic
  0 siblings, 2 replies; 44+ messages in thread
From: Cornelia Huck @ 2019-02-14 10:58 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-comment, stefanha, Halil Pasic

On Wed, 13 Feb 2019 18:37:56 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Cornelia Huck (cohuck@redhat.com) wrote:
> > On Wed, 16 Jan 2019 20:06:25 +0000
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >   
> > > So these are all moving this 1/3 forward - has anyone got comments on
> > > the transport specific implementations?  
> > 
> > No comment on pci or mmio, but I've hacked something together for ccw.
> > Basically, one sense-type ccw for discovery and a control-type ccw for
> > activation of the regions (no idea if we really need the latter), both
> > available with ccw revision 3.
> > 
> > No idea whether this will work this way, though...  
> 
> That sounds (from a shm perspective) reasonable; can I ask why the
> 'activate' is needed?

The activate interface is actually what I'm most unsure about; maybe
Halil can chime in.

My basic concern is that we don't have any idea how the guest will use
the available memory. If the shared memory areas are supposed to be
mapped into an inconvenient place, the activate interface gives the
guest a chance to clear up that area before the host starts writing to
it.

I'm not really enthusiastic about that interface... for one, I'm not
sure how this plays out at the device type level, which should not
really concern itself with transport-specific handling.

Another option would be to map these into a special memory area that
the guest won't use for its normal operation... the original s390
(non-ccw) virtio transport mapped everything into two special pages
above the guest memory, but that was quite painful, and I don't think
we want to go down that road again.

> 
> Dave
> 
> > diff --git a/content.tex b/content.tex
> > index 836ee5236939..7f379bca932e 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -2078,6 +2078,8 @@ virtio:
> >  #define CCW_CMD_READ_VQ_CONF 0x32
> >  #define CCW_CMD_SET_VIRTIO_REV 0x83
> >  #define CCW_CMD_READ_STATUS 0x72
> > +#define CCW_CMD_GET_REGIONS 0x14
> > +#define CCW_CMD_MAP_REGIONS 0x93
> >  \end{lstlisting}
> >  
> >  \subsubsection{Notifications}\label{sec:Virtio Transport Options / Virtio
> > @@ -2170,7 +2172,9 @@ The following values are supported:
> >  \hline
> >  2        & 0      & <empty>   & CCW_CMD_READ_STATUS support \\
> >  \hline
> > -3-n      &        &           & reserved for later revisions \\
> > +3        & 0      & <empty>   & CCW_CMD_GET_REGIONS and CCW_CMD_MAP_REGIONS support \\
> > +\hline
> > +4-n      &        &           & reserved for later revisions \\
> >  \hline
> >  \end{tabular}
> >  
> > @@ -2449,6 +2453,72 @@ command. Some legacy devices will support two-stage queue indicators, though,
> >  and a driver will be able to successfully use CCW_CMD_SET_IND_ADAPTER to set
> >  them up.
> >  
> > +\subsubsection{Handling Shared Memory Regions}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions}
> > +
> > +The CCW_CMD_GET_REGIONS command allows the driver to discover shared memory
> > +regions provided by the device, if any.
> > +
> > +The driver provides a pointer to a 4096 byte buffer that is filled out by
> > +the device:
> > +
> > +\begin{lstlisting}
> > +  struct shared_regions_info {
> > +    be64 num_regions;
> > +    struct shared_region_desc regions[];
> > +  };
> > +\end{lstlisting}
> > +
> > +The buffer contains 0 or more shared region descriptors, as specified
> > +by \field{num_regions}. If the devices does not provide shared regions,
> > +\field{num_regions} is 0. Otherwise, the shared region descriptors have
> > +the following format:
> > +
> > +\begin{lstlisting}
> > +struct shared_region_desc {
> > +        be64 addr;
> > +        be64 len;
> > +        u8 id;
> > +        u8 pad[3];
> > +}
> > +\end{lstlisting}
> > +
> > +\field{addr} is the guest-physical address of the region with a length of
> > +\field{len}, identified by \field{id}. The contents of \field{pad} are
> > +unpredictable, although it is recommended that the device fills in zeroes.
> > +
> > +To activate or deactivate a shared memory region, the device uses the
> > +CCW_CMD_MAP_REGIONS command. It takes the following payload:
> > +
> > +\begin{lstlisting}
> > +struct shared_region_ctrl {
> > +        u8 id;
> > +        u8 activate;
> > +        u8 pad[2];
> > +}
> > +\end{lstlisting}
> > +
> > +\field{id} denotes the shared memory region that is the target of the command,
> > +while \field{activate} specifies whether the region should be activated (if 1)
> > +or deactivated (if 0). When activated, the device makes the guest-physical
> > +address of the region available as a shared memory region.
> > +
> > +\devicenormative{\paragraph}{Handling Shared Memory Regions}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions}
> > +
> > +The device MUST reject the CCW_CMD_GET_REGIONS and CCW_CMD_MAP_REGIONS
> > +commands if not at least revision 3 has been negotiated.
> > +
> > +The device MUST NOT read from or write to the region before it has been
> > +activated by the driver or after it has been deactivated by the driver.
> > +
> > +If the driver reads from or writes to an address specified to a region that is
> > +not activated by the driver, it MUST treat this read or write as a normal
> > +read or write operation.
> > +
> > +\drivernormative{\paragraph}{Handling Shared Memory Regions}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions}
> > +
> > +The driver MUST NOT treat the guest-physical address of a region as a shared
> > +memory region before it has activated it or after it has deactivated it.
> > +
> >  \subsection{Device Operation}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation}
> >  
> >  \subsubsection{Host->Guest Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification}  
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-14 10:58                   ` Cornelia Huck
@ 2019-02-14 16:37                     ` Dr. David Alan Gilbert
  2019-02-14 17:43                       ` Frank Yang
  2019-02-15 12:51                     ` Halil Pasic
  1 sibling, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2019-02-14 16:37 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: virtio-comment, stefanha, Halil Pasic

* Cornelia Huck (cohuck@redhat.com) wrote:
> On Wed, 13 Feb 2019 18:37:56 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > On Wed, 16 Jan 2019 20:06:25 +0000
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >   
> > > > So these are all moving this 1/3 forward - has anyone got comments on
> > > > the transport specific implementations?  
> > > 
> > > No comment on pci or mmio, but I've hacked something together for ccw.
> > > Basically, one sense-type ccw for discovery and a control-type ccw for
> > > activation of the regions (no idea if we really need the latter), both
> > > available with ccw revision 3.
> > > 
> > > No idea whether this will work this way, though...  
> > 
> > That sounds (from a shm perspective) reasonable; can I ask why the
> > 'activate' is needed?
> 
> The activate interface is actually what I'm most unsure about; maybe
> Halil can chime in.
> 
> My basic concern is that we don't have any idea how the guest will use
> the available memory. If the shared memory areas are supposed to be
> mapped into an inconvenient place, the activate interface gives the
> guest a chance to clear up that area before the host starts writing to
> it.

I'm expecting the host to map it into an area of GPA that is out of the
way - it doesn't overlap with RAM.
Given that, I'm not sure why the guest would have to do any 'clear up' -
it probably wants to make a virtual mapping somewhere, but again that's
upto the guest to do when it feels like it.

> I'm not really enthusiastic about that interface... for one, I'm not
> sure how this plays out at the device type level, which should not
> really concern itself with transport-specific handling.

I'd expect the host side code to give an area of memory to the transport
and tell it to map it somewhere (in the QEMU terminology a MemoryRegion
I think).
Similarly in the guest, I'm expecting the driver for the device to
ask for a pointer to a region with a particular ID and that goes
down to the transport code.

> Another option would be to map these into a special memory area that
> the guest won't use for its normal operation... the original s390
> (non-ccw) virtio transport mapped everything into two special pages
> above the guest memory, but that was quite painful, and I don't think
> we want to go down that road again.

Can you explain why?

Dave

> > 
> > Dave
> > 
> > > diff --git a/content.tex b/content.tex
> > > index 836ee5236939..7f379bca932e 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -2078,6 +2078,8 @@ virtio:
> > >  #define CCW_CMD_READ_VQ_CONF 0x32
> > >  #define CCW_CMD_SET_VIRTIO_REV 0x83
> > >  #define CCW_CMD_READ_STATUS 0x72
> > > +#define CCW_CMD_GET_REGIONS 0x14
> > > +#define CCW_CMD_MAP_REGIONS 0x93
> > >  \end{lstlisting}
> > >  
> > >  \subsubsection{Notifications}\label{sec:Virtio Transport Options / Virtio
> > > @@ -2170,7 +2172,9 @@ The following values are supported:
> > >  \hline
> > >  2        & 0      & <empty>   & CCW_CMD_READ_STATUS support \\
> > >  \hline
> > > -3-n      &        &           & reserved for later revisions \\
> > > +3        & 0      & <empty>   & CCW_CMD_GET_REGIONS and CCW_CMD_MAP_REGIONS support \\
> > > +\hline
> > > +4-n      &        &           & reserved for later revisions \\
> > >  \hline
> > >  \end{tabular}
> > >  
> > > @@ -2449,6 +2453,72 @@ command. Some legacy devices will support two-stage queue indicators, though,
> > >  and a driver will be able to successfully use CCW_CMD_SET_IND_ADAPTER to set
> > >  them up.
> > >  
> > > +\subsubsection{Handling Shared Memory Regions}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions}
> > > +
> > > +The CCW_CMD_GET_REGIONS command allows the driver to discover shared memory
> > > +regions provided by the device, if any.
> > > +
> > > +The driver provides a pointer to a 4096 byte buffer that is filled out by
> > > +the device:
> > > +
> > > +\begin{lstlisting}
> > > +  struct shared_regions_info {
> > > +    be64 num_regions;
> > > +    struct shared_region_desc regions[];
> > > +  };
> > > +\end{lstlisting}
> > > +
> > > +The buffer contains 0 or more shared region descriptors, as specified
> > > +by \field{num_regions}. If the devices does not provide shared regions,
> > > +\field{num_regions} is 0. Otherwise, the shared region descriptors have
> > > +the following format:
> > > +
> > > +\begin{lstlisting}
> > > +struct shared_region_desc {
> > > +        be64 addr;
> > > +        be64 len;
> > > +        u8 id;
> > > +        u8 pad[3];
> > > +}
> > > +\end{lstlisting}
> > > +
> > > +\field{addr} is the guest-physical address of the region with a length of
> > > +\field{len}, identified by \field{id}. The contents of \field{pad} are
> > > +unpredictable, although it is recommended that the device fills in zeroes.
> > > +
> > > +To activate or deactivate a shared memory region, the device uses the
> > > +CCW_CMD_MAP_REGIONS command. It takes the following payload:
> > > +
> > > +\begin{lstlisting}
> > > +struct shared_region_ctrl {
> > > +        u8 id;
> > > +        u8 activate;
> > > +        u8 pad[2];
> > > +}
> > > +\end{lstlisting}
> > > +
> > > +\field{id} denotes the shared memory region that is the target of the command,
> > > +while \field{activate} specifies whether the region should be activated (if 1)
> > > +or deactivated (if 0). When activated, the device makes the guest-physical
> > > +address of the region available as a shared memory region.
> > > +
> > > +\devicenormative{\paragraph}{Handling Shared Memory Regions}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions}
> > > +
> > > +The device MUST reject the CCW_CMD_GET_REGIONS and CCW_CMD_MAP_REGIONS
> > > +commands if not at least revision 3 has been negotiated.
> > > +
> > > +The device MUST NOT read from or write to the region before it has been
> > > +activated by the driver or after it has been deactivated by the driver.
> > > +
> > > +If the driver reads from or writes to an address specified to a region that is
> > > +not activated by the driver, it MUST treat this read or write as a normal
> > > +read or write operation.
> > > +
> > > +\drivernormative{\paragraph}{Handling Shared Memory Regions}{Virtio Transport Options / Virtio over channel I/O / Device Initialization / Handling Shared Memory Regions}
> > > +
> > > +The driver MUST NOT treat the guest-physical address of a region as a shared
> > > +memory region before it has activated it or after it has deactivated it.
> > > +
> > >  \subsection{Device Operation}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation}
> > >  
> > >  \subsubsection{Host->Guest Notification}\label{sec:Virtio Transport Options / Virtio over channel I/O / Device Operation / Host->Guest Notification}  
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-14 16:37                     ` Dr. David Alan Gilbert
@ 2019-02-14 17:43                       ` Frank Yang
  2019-02-15 11:07                         ` Cornelia Huck
  0 siblings, 1 reply; 44+ messages in thread
From: Frank Yang @ 2019-02-14 17:43 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Cornelia Huck, virtio-comment, Stefan Hajnoczi, Halil Pasic

[-- Attachment #1: Type: text/plain, Size: 10031 bytes --]

On Thu, Feb 14, 2019 at 8:37 AM Dr. David Alan Gilbert <dgilbert@redhat.com>
wrote:

> * Cornelia Huck (cohuck@redhat.com) wrote:
> > On Wed, 13 Feb 2019 18:37:56 +0000
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >
> > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > On Wed, 16 Jan 2019 20:06:25 +0000
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > >
> > > > > So these are all moving this 1/3 forward - has anyone got comments
> on
> > > > > the transport specific implementations?
> > > >
> > > > No comment on pci or mmio, but I've hacked something together for
> ccw.
> > > > Basically, one sense-type ccw for discovery and a control-type ccw
> for
> > > > activation of the regions (no idea if we really need the latter),
> both
> > > > available with ccw revision 3.
> > > >
> > > > No idea whether this will work this way, though...
> > >
> > > That sounds (from a shm perspective) reasonable; can I ask why the
> > > 'activate' is needed?
> >
> > The activate interface is actually what I'm most unsure about; maybe
> > Halil can chime in.
> >
> > My basic concern is that we don't have any idea how the guest will use
> > the available memory. If the shared memory areas are supposed to be
> > mapped into an inconvenient place, the activate interface gives the
> > guest a chance to clear up that area before the host starts writing to
> > it.
>
> I'm expecting the host to map it into an area of GPA that is out of the
> way - it doesn't overlap with RAM.
> Given that, I'm not sure why the guest would have to do any 'clear up' -
> it probably wants to make a virtual mapping somewhere, but again that's
> upto the guest to do when it feels like it.
>
>
This is what we do with Vulkan as well.


> > I'm not really enthusiastic about that interface... for one, I'm not
> > sure how this plays out at the device type level, which should not
> > really concern itself with transport-specific handling.
>
> I'd expect the host side code to give an area of memory to the transport
> and tell it to map it somewhere (in the QEMU terminology a MemoryRegion
> I think).
>

I wonder if this could help: the way we're running Vulkan at the moment,
what we do is add a the concept of a MemoryRegion with no actual backing:

https://android-review.googlesource.com/q/topic:%22qemu-user-controlled-hv-mappings%22+(status:open%20OR%20status:merged)

and it would be connected to the entire PCI address space on the shared
memory address space realization. So it's kind of like a sparse or deferred
MemoryRegion.

When the guest actually wants to map a subregion associated with the host
memory,
on the host side, we can call the hypervisor to map the region, based on
giving the device implementation the functions KVM_SET_USER_MEMORY_REGION
and analogs.

This has the advantage of a smaller contact area between shm and qemu,
where the device level stuff can operate at a separate layer from
MemoryRegions which is more transport level.


> Similarly in the guest, I'm expecting the driver for the device to
> ask for a pointer to a region with a particular ID and that goes
> down to the transport code.
>
> Another option would be to map these into a special memory area that
> > the guest won't use for its normal operation... the original s390
> > (non-ccw) virtio transport mapped everything into two special pages
> > above the guest memory, but that was quite painful, and I don't think
> > we want to go down that road again.
>
> Can you explain why?
>
> Dave
>
> > >
> > > Dave
> > >
> > > > diff --git a/content.tex b/content.tex
> > > > index 836ee5236939..7f379bca932e 100644
> > > > --- a/content.tex
> > > > +++ b/content.tex
> > > > @@ -2078,6 +2078,8 @@ virtio:
> > > >  #define CCW_CMD_READ_VQ_CONF 0x32
> > > >  #define CCW_CMD_SET_VIRTIO_REV 0x83
> > > >  #define CCW_CMD_READ_STATUS 0x72
> > > > +#define CCW_CMD_GET_REGIONS 0x14
> > > > +#define CCW_CMD_MAP_REGIONS 0x93
> > > >  \end{lstlisting}
> > > >
> > > >  \subsubsection{Notifications}\label{sec:Virtio Transport Options /
> Virtio
> > > > @@ -2170,7 +2172,9 @@ The following values are supported:
> > > >  \hline
> > > >  2        & 0      & <empty>   & CCW_CMD_READ_STATUS support \\
> > > >  \hline
> > > > -3-n      &        &           & reserved for later revisions \\
> > > > +3        & 0      & <empty>   & CCW_CMD_GET_REGIONS and
> CCW_CMD_MAP_REGIONS support \\
> > > > +\hline
> > > > +4-n      &        &           & reserved for later revisions \\
> > > >  \hline
> > > >  \end{tabular}
> > > >
> > > > @@ -2449,6 +2453,72 @@ command. Some legacy devices will support
> two-stage queue indicators, though,
> > > >  and a driver will be able to successfully use
> CCW_CMD_SET_IND_ADAPTER to set
> > > >  them up.
> > > >
> > > > +\subsubsection{Handling Shared Memory Regions}\label{sec:Virtio
> Transport Options / Virtio over channel I/O / Device Initialization /
> Handling Shared Memory Regions}
> > > > +
> > > > +The CCW_CMD_GET_REGIONS command allows the driver to discover
> shared memory
> > > > +regions provided by the device, if any.
> > > > +
> > > > +The driver provides a pointer to a 4096 byte buffer that is filled
> out by
> > > > +the device:
> > > > +
> > > > +\begin{lstlisting}
> > > > +  struct shared_regions_info {
> > > > +    be64 num_regions;
> > > > +    struct shared_region_desc regions[];
> > > > +  };
> > > > +\end{lstlisting}
> > > > +
> > > > +The buffer contains 0 or more shared region descriptors, as
> specified
> > > > +by \field{num_regions}. If the devices does not provide shared
> regions,
> > > > +\field{num_regions} is 0. Otherwise, the shared region descriptors
> have
> > > > +the following format:
> > > > +
> > > > +\begin{lstlisting}
> > > > +struct shared_region_desc {
> > > > +        be64 addr;
> > > > +        be64 len;
> > > > +        u8 id;
> > > > +        u8 pad[3];
> > > > +}
> > > > +\end{lstlisting}
> > > > +
> > > > +\field{addr} is the guest-physical address of the region with a
> length of
> > > > +\field{len}, identified by \field{id}. The contents of \field{pad}
> are
> > > > +unpredictable, although it is recommended that the device fills in
> zeroes.
> > > > +
> > > > +To activate or deactivate a shared memory region, the device uses
> the
> > > > +CCW_CMD_MAP_REGIONS command. It takes the following payload:
> > > > +
> > > > +\begin{lstlisting}
> > > > +struct shared_region_ctrl {
> > > > +        u8 id;
> > > > +        u8 activate;
> > > > +        u8 pad[2];
> > > > +}
> > > > +\end{lstlisting}
> > > > +
> > > > +\field{id} denotes the shared memory region that is the target of
> the command,
> > > > +while \field{activate} specifies whether the region should be
> activated (if 1)
> > > > +or deactivated (if 0). When activated, the device makes the
> guest-physical
> > > > +address of the region available as a shared memory region.
> > > > +
> > > > +\devicenormative{\paragraph}{Handling Shared Memory Regions}{Virtio
> Transport Options / Virtio over channel I/O / Device Initialization /
> Handling Shared Memory Regions}
> > > > +
> > > > +The device MUST reject the CCW_CMD_GET_REGIONS and
> CCW_CMD_MAP_REGIONS
> > > > +commands if not at least revision 3 has been negotiated.
> > > > +
> > > > +The device MUST NOT read from or write to the region before it has
> been
> > > > +activated by the driver or after it has been deactivated by the
> driver.
> > > > +
> > > > +If the driver reads from or writes to an address specified to a
> region that is
> > > > +not activated by the driver, it MUST treat this read or write as a
> normal
> > > > +read or write operation.
> > > > +
> > > > +\drivernormative{\paragraph}{Handling Shared Memory Regions}{Virtio
> Transport Options / Virtio over channel I/O / Device Initialization /
> Handling Shared Memory Regions}
> > > > +
> > > > +The driver MUST NOT treat the guest-physical address of a region as
> a shared
> > > > +memory region before it has activated it or after it has
> deactivated it.
> > > > +
> > > >  \subsection{Device Operation}\label{sec:Virtio Transport Options /
> Virtio over channel I/O / Device Operation}
> > > >
> > > >  \subsubsection{Host->Guest Notification}\label{sec:Virtio Transport
> Options / Virtio over channel I/O / Device Operation / Host->Guest
> Notification}
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >
> > This publicly archived list offers a means to provide input to the
> > OASIS Virtual I/O Device (VIRTIO) TC.
> >
> > In order to verify user consent to the Feedback License terms and
> > to minimize spam in the list archive, subscription is required
> > before posting.
> >
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License:
> https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > List Guidelines:
> https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines:
> https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>
>

[-- Attachment #2: Type: text/html, Size: 14860 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-14 17:43                       ` Frank Yang
@ 2019-02-15 11:07                         ` Cornelia Huck
  2019-02-15 11:19                           ` Dr. David Alan Gilbert
  2019-02-15 11:26                           ` David Hildenbrand
  0 siblings, 2 replies; 44+ messages in thread
From: Cornelia Huck @ 2019-02-15 11:07 UTC (permalink / raw)
  To: Frank Yang
  Cc: Dr. David Alan Gilbert, virtio-comment, Stefan Hajnoczi, Halil Pasic

On Thu, 14 Feb 2019 09:43:10 -0800
Frank Yang <lfy@google.com> wrote:

> On Thu, Feb 14, 2019 at 8:37 AM Dr. David Alan Gilbert <dgilbert@redhat.com>
> wrote:
> 
> > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > On Wed, 13 Feb 2019 18:37:56 +0000
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >  
> > > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > > On Wed, 16 Jan 2019 20:06:25 +0000
> > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > >  
> > > > > > So these are all moving this 1/3 forward - has anyone got comments  
> > on  
> > > > > > the transport specific implementations?  
> > > > >
> > > > > No comment on pci or mmio, but I've hacked something together for  
> > ccw.  
> > > > > Basically, one sense-type ccw for discovery and a control-type ccw  
> > for  
> > > > > activation of the regions (no idea if we really need the latter),  
> > both  
> > > > > available with ccw revision 3.
> > > > >
> > > > > No idea whether this will work this way, though...  
> > > >
> > > > That sounds (from a shm perspective) reasonable; can I ask why the
> > > > 'activate' is needed?  
> > >
> > > The activate interface is actually what I'm most unsure about; maybe
> > > Halil can chime in.
> > >
> > > My basic concern is that we don't have any idea how the guest will use
> > > the available memory. If the shared memory areas are supposed to be
> > > mapped into an inconvenient place, the activate interface gives the
> > > guest a chance to clear up that area before the host starts writing to
> > > it.  
> >
> > I'm expecting the host to map it into an area of GPA that is out of the
> > way - it doesn't overlap with RAM.

My issue here is that I'm not sure how to model something like that on
s390...

> > Given that, I'm not sure why the guest would have to do any 'clear up' -
> > it probably wants to make a virtual mapping somewhere, but again that's
> > upto the guest to do when it feels like it.
> >
> >  
> This is what we do with Vulkan as well.
> 
> 
> > > I'm not really enthusiastic about that interface... for one, I'm not
> > > sure how this plays out at the device type level, which should not
> > > really concern itself with transport-specific handling.  
> >
> > I'd expect the host side code to give an area of memory to the transport
> > and tell it to map it somewhere (in the QEMU terminology a MemoryRegion
> > I think).

My main issue is the 'somewhere'.

> >  
> 
> I wonder if this could help: the way we're running Vulkan at the moment,
> what we do is add a the concept of a MemoryRegion with no actual backing:
> 
> https://android-review.googlesource.com/q/topic:%22qemu-user-controlled-hv-mappings%22+(status:open%20OR%20status:merged)
> 
> and it would be connected to the entire PCI address space on the shared
> memory address space realization. So it's kind of like a sparse or deferred
> MemoryRegion.
> 
> When the guest actually wants to map a subregion associated with the host
> memory,
> on the host side, we can call the hypervisor to map the region, based on
> giving the device implementation the functions KVM_SET_USER_MEMORY_REGION
> and analogs.
> 
> This has the advantage of a smaller contact area between shm and qemu,
> where the device level stuff can operate at a separate layer from
> MemoryRegions which is more transport level.

That sounds like an interesting concept, but I'm not quite sure how it
would help with my problem. Read on for more explanation below...

> 
> 
> > Similarly in the guest, I'm expecting the driver for the device to
> > ask for a pointer to a region with a particular ID and that goes
> > down to the transport code.
> >
> > Another option would be to map these into a special memory area that  
> > > the guest won't use for its normal operation... the original s390
> > > (non-ccw) virtio transport mapped everything into two special pages
> > > above the guest memory, but that was quite painful, and I don't think
> > > we want to go down that road again.  
> >
> > Can you explain why?

The background here is that s390 traditionally does not have any
concept of memory-mapped I/O. IOW, you don't just write to or read from
a special memory area; instead, I/O operations use special instructions.

The mechanism I'm trying to extend here is channel I/O: the driver
builds a channel program with commands that point to guest memory areas
and hands it to the channel subsystem (which means, in our case, the
host) via a special instruction. The channel subsystem and the device
(the host, in our case) translate the memory addresses and execute the
commands. The one place where we write shared memory directly in the
virtio case are the virtqueues -- which are allocated in guest memory,
so the guest decides which memory addresses are special. Accessing the
config space of a virtio device via the ccw transport does not
read/write a memory location directly, but instead uses a channel
program that performs the read/write.

For pci, the memory accesses are mapped to special instructions:
reading or writing the config space of a pci device does not perform
reads or writes of a memory location, either; the driver uses special
instructions to access the config space (which are also
interpreted/emulated by QEMU, for example.)

The old s390 (pre-virtio-ccw) virtio transport had to rely on the
knowledge that there were two pages containing the virtqueues etc.
right above the normal memory (probed by checking whether accessing
that memory gave an exception or not). The main problems were that this
was inflexible (the guest had no easy way to find out how many
'special' pages were present, other than trying to access them), and
that it was different from whatever other mechanisms are common on s390.

We might be able to come up with another scheme, but I wouldn't hold my
breath. Would be great if someone else with s390 knowledge could chime
in here.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 11:07                         ` Cornelia Huck
@ 2019-02-15 11:19                           ` Dr. David Alan Gilbert
  2019-02-15 12:31                             ` Cornelia Huck
  2019-02-18 15:28                             ` Halil Pasic
  2019-02-15 11:26                           ` David Hildenbrand
  1 sibling, 2 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2019-02-15 11:19 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Frank Yang, virtio-comment, Stefan Hajnoczi, Halil Pasic

* Cornelia Huck (cohuck@redhat.com) wrote:
> On Thu, 14 Feb 2019 09:43:10 -0800
> Frank Yang <lfy@google.com> wrote:
> 
> > On Thu, Feb 14, 2019 at 8:37 AM Dr. David Alan Gilbert <dgilbert@redhat.com>
> > wrote:
> > 
> > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > On Wed, 13 Feb 2019 18:37:56 +0000
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > >  
> > > > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > > > On Wed, 16 Jan 2019 20:06:25 +0000
> > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > >  
> > > > > > > So these are all moving this 1/3 forward - has anyone got comments  
> > > on  
> > > > > > > the transport specific implementations?  
> > > > > >
> > > > > > No comment on pci or mmio, but I've hacked something together for  
> > > ccw.  
> > > > > > Basically, one sense-type ccw for discovery and a control-type ccw  
> > > for  
> > > > > > activation of the regions (no idea if we really need the latter),  
> > > both  
> > > > > > available with ccw revision 3.
> > > > > >
> > > > > > No idea whether this will work this way, though...  
> > > > >
> > > > > That sounds (from a shm perspective) reasonable; can I ask why the
> > > > > 'activate' is needed?  
> > > >
> > > > The activate interface is actually what I'm most unsure about; maybe
> > > > Halil can chime in.
> > > >
> > > > My basic concern is that we don't have any idea how the guest will use
> > > > the available memory. If the shared memory areas are supposed to be
> > > > mapped into an inconvenient place, the activate interface gives the
> > > > guest a chance to clear up that area before the host starts writing to
> > > > it.  
> > >
> > > I'm expecting the host to map it into an area of GPA that is out of the
> > > way - it doesn't overlap with RAM.
> 
> My issue here is that I'm not sure how to model something like that on
> s390...
> 
> > > Given that, I'm not sure why the guest would have to do any 'clear up' -
> > > it probably wants to make a virtual mapping somewhere, but again that's
> > > upto the guest to do when it feels like it.
> > >
> > >  
> > This is what we do with Vulkan as well.
> > 
> > 
> > > > I'm not really enthusiastic about that interface... for one, I'm not
> > > > sure how this plays out at the device type level, which should not
> > > > really concern itself with transport-specific handling.  
> > >
> > > I'd expect the host side code to give an area of memory to the transport
> > > and tell it to map it somewhere (in the QEMU terminology a MemoryRegion
> > > I think).
> 
> My main issue is the 'somewhere'.
> 
> > >  
> > 
> > I wonder if this could help: the way we're running Vulkan at the moment,
> > what we do is add a the concept of a MemoryRegion with no actual backing:
> > 
> > https://android-review.googlesource.com/q/topic:%22qemu-user-controlled-hv-mappings%22+(status:open%20OR%20status:merged)
> > 
> > and it would be connected to the entire PCI address space on the shared
> > memory address space realization. So it's kind of like a sparse or deferred
> > MemoryRegion.
> > 
> > When the guest actually wants to map a subregion associated with the host
> > memory,
> > on the host side, we can call the hypervisor to map the region, based on
> > giving the device implementation the functions KVM_SET_USER_MEMORY_REGION
> > and analogs.
> > 
> > This has the advantage of a smaller contact area between shm and qemu,
> > where the device level stuff can operate at a separate layer from
> > MemoryRegions which is more transport level.
> 
> That sounds like an interesting concept, but I'm not quite sure how it
> would help with my problem. Read on for more explanation below...
> 
> > 
> > 
> > > Similarly in the guest, I'm expecting the driver for the device to
> > > ask for a pointer to a region with a particular ID and that goes
> > > down to the transport code.
> > >
> > > Another option would be to map these into a special memory area that  
> > > > the guest won't use for its normal operation... the original s390
> > > > (non-ccw) virtio transport mapped everything into two special pages
> > > > above the guest memory, but that was quite painful, and I don't think
> > > > we want to go down that road again.  
> > >
> > > Can you explain why?
> 
> The background here is that s390 traditionally does not have any
> concept of memory-mapped I/O. IOW, you don't just write to or read from
> a special memory area; instead, I/O operations use special instructions.
> 
> The mechanism I'm trying to extend here is channel I/O: the driver
> builds a channel program with commands that point to guest memory areas
> and hands it to the channel subsystem (which means, in our case, the
> host) via a special instruction. The channel subsystem and the device
> (the host, in our case) translate the memory addresses and execute the
> commands. The one place where we write shared memory directly in the
> virtio case are the virtqueues -- which are allocated in guest memory,
> so the guest decides which memory addresses are special. Accessing the
> config space of a virtio device via the ccw transport does not
> read/write a memory location directly, but instead uses a channel
> program that performs the read/write.
> 
> For pci, the memory accesses are mapped to special instructions:
> reading or writing the config space of a pci device does not perform
> reads or writes of a memory location, either; the driver uses special
> instructions to access the config space (which are also
> interpreted/emulated by QEMU, for example.)
> 
> The old s390 (pre-virtio-ccw) virtio transport had to rely on the
> knowledge that there were two pages containing the virtqueues etc.
> right above the normal memory (probed by checking whether accessing
> that memory gave an exception or not). The main problems were that this
> was inflexible (the guest had no easy way to find out how many
> 'special' pages were present, other than trying to access them), and
> that it was different from whatever other mechanisms are common on s390.
> 
> We might be able to come up with another scheme, but I wouldn't hold my
> breath. Would be great if someone else with s390 knowledge could chime
> in here.

What I'm missing here is why the behaviour of the s390's traditional channel program
matters to the design of an entirely emulated device.

As long as the s390 allows:
  a) The host to map a region of HVA into GPA at an arbitrary GPA
address
  b) Not tell the guest that (a) is RAM
  c) Find a non-RAM GPA for (a)
  d) Allow the guest to set up a page table pointing to (c)
  e) Discover (c) via the scheme you described

Then that's all that's needed - and I'm not seeing what is different on
s390 about a-d from any other architecture.

Dave



--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 11:07                         ` Cornelia Huck
  2019-02-15 11:19                           ` Dr. David Alan Gilbert
@ 2019-02-15 11:26                           ` David Hildenbrand
  2019-02-15 12:28                             ` Cornelia Huck
  1 sibling, 1 reply; 44+ messages in thread
From: David Hildenbrand @ 2019-02-15 11:26 UTC (permalink / raw)
  To: Cornelia Huck, Frank Yang
  Cc: Dr. David Alan Gilbert, virtio-comment, Stefan Hajnoczi, Halil Pasic

On 15.02.19 12:07, Cornelia Huck wrote:
> On Thu, 14 Feb 2019 09:43:10 -0800
> Frank Yang <lfy@google.com> wrote:
> 
>> On Thu, Feb 14, 2019 at 8:37 AM Dr. David Alan Gilbert <dgilbert@redhat.com>
>> wrote:
>>
>>> * Cornelia Huck (cohuck@redhat.com) wrote:  
>>>> On Wed, 13 Feb 2019 18:37:56 +0000
>>>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
>>>>  
>>>>> * Cornelia Huck (cohuck@redhat.com) wrote:  
>>>>>> On Wed, 16 Jan 2019 20:06:25 +0000
>>>>>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
>>>>>>  
>>>>>>> So these are all moving this 1/3 forward - has anyone got comments  
>>> on  
>>>>>>> the transport specific implementations?  
>>>>>>
>>>>>> No comment on pci or mmio, but I've hacked something together for  
>>> ccw.  
>>>>>> Basically, one sense-type ccw for discovery and a control-type ccw  
>>> for  
>>>>>> activation of the regions (no idea if we really need the latter),  
>>> both  
>>>>>> available with ccw revision 3.
>>>>>>
>>>>>> No idea whether this will work this way, though...  
>>>>>
>>>>> That sounds (from a shm perspective) reasonable; can I ask why the
>>>>> 'activate' is needed?  
>>>>
>>>> The activate interface is actually what I'm most unsure about; maybe
>>>> Halil can chime in.
>>>>
>>>> My basic concern is that we don't have any idea how the guest will use
>>>> the available memory. If the shared memory areas are supposed to be
>>>> mapped into an inconvenient place, the activate interface gives the
>>>> guest a chance to clear up that area before the host starts writing to
>>>> it.  
>>>
>>> I'm expecting the host to map it into an area of GPA that is out of the
>>> way - it doesn't overlap with RAM.
> 
> My issue here is that I'm not sure how to model something like that on
> s390...
> 
>>> Given that, I'm not sure why the guest would have to do any 'clear up' -
>>> it probably wants to make a virtual mapping somewhere, but again that's
>>> upto the guest to do when it feels like it.
>>>
>>>  
>> This is what we do with Vulkan as well.
>>
>>
>>>> I'm not really enthusiastic about that interface... for one, I'm not
>>>> sure how this plays out at the device type level, which should not
>>>> really concern itself with transport-specific handling.  
>>>
>>> I'd expect the host side code to give an area of memory to the transport
>>> and tell it to map it somewhere (in the QEMU terminology a MemoryRegion
>>> I think).
> 
> My main issue is the 'somewhere'.
> 
>>>  
>>
>> I wonder if this could help: the way we're running Vulkan at the moment,
>> what we do is add a the concept of a MemoryRegion with no actual backing:
>>
>> https://android-review.googlesource.com/q/topic:%22qemu-user-controlled-hv-mappings%22+(status:open%20OR%20status:merged)
>>
>> and it would be connected to the entire PCI address space on the shared
>> memory address space realization. So it's kind of like a sparse or deferred
>> MemoryRegion.
>>
>> When the guest actually wants to map a subregion associated with the host
>> memory,
>> on the host side, we can call the hypervisor to map the region, based on
>> giving the device implementation the functions KVM_SET_USER_MEMORY_REGION
>> and analogs.
>>
>> This has the advantage of a smaller contact area between shm and qemu,
>> where the device level stuff can operate at a separate layer from
>> MemoryRegions which is more transport level.
> 
> That sounds like an interesting concept, but I'm not quite sure how it
> would help with my problem. Read on for more explanation below...
> 
>>
>>
>>> Similarly in the guest, I'm expecting the driver for the device to
>>> ask for a pointer to a region with a particular ID and that goes
>>> down to the transport code.
>>>
>>> Another option would be to map these into a special memory area that  
>>>> the guest won't use for its normal operation... the original s390
>>>> (non-ccw) virtio transport mapped everything into two special pages
>>>> above the guest memory, but that was quite painful, and I don't think
>>>> we want to go down that road again.  
>>>
>>> Can you explain why?
> 
> The background here is that s390 traditionally does not have any
> concept of memory-mapped I/O. IOW, you don't just write to or read from
> a special memory area; instead, I/O operations use special instructions.
> 
> The mechanism I'm trying to extend here is channel I/O: the driver
> builds a channel program with commands that point to guest memory areas
> and hands it to the channel subsystem (which means, in our case, the
> host) via a special instruction. The channel subsystem and the device
> (the host, in our case) translate the memory addresses and execute the
> commands. The one place where we write shared memory directly in the
> virtio case are the virtqueues -- which are allocated in guest memory,
> so the guest decides which memory addresses are special. Accessing the
> config space of a virtio device via the ccw transport does not
> read/write a memory location directly, but instead uses a channel
> program that performs the read/write.
> 
> For pci, the memory accesses are mapped to special instructions:
> reading or writing the config space of a pci device does not perform
> reads or writes of a memory location, either; the driver uses special
> instructions to access the config space (which are also
> interpreted/emulated by QEMU, for example.)
> 
> The old s390 (pre-virtio-ccw) virtio transport had to rely on the
> knowledge that there were two pages containing the virtqueues etc.
> right above the normal memory (probed by checking whether accessing
> that memory gave an exception or not). The main problems were that this
> was inflexible (the guest had no easy way to find out how many
> 'special' pages were present, other than trying to access them), and
> that it was different from whatever other mechanisms are common on s390.

Probing is always ugly. But I think we can add something like
 the x86 PCI hole between 3 and 4 GB after our initial boot memory.
So there, we would have a memory region just like e.g. x86 has.

This should even work with other mechanism I am working on. E.g.
for memory devices, we will add yet another memory region above
the special PCI region.

The layout of the guest would then be something like

[0x000000000000000]
... Memory region containing RAM
[ram_size         ]
... Memory region for e.g. special PCI devices
[ram_size +1 GB   ]
... Memory region for memory devices (virtio-pmem, virtio-mem ...)
[maxram_size - ram_size + 1GB]

We would have to create proper page tables for guest backing that take
care of the new guest size (not just ram_size). Also, to the guest we
would indicate "maximum ram size == ram_size" so it does not try to
probe the "special" memory.

As we are using paravirtualized features here, this should be fine for.
Unmodified guests will never touch/probe anything beyond ram_size.

> 
> We might be able to come up with another scheme, but I wouldn't hold my
> breath. Would be great if someone else with s390 knowledge could chime
> in here.

-- 

Thanks,

David / dhildenb

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 11:26                           ` David Hildenbrand
@ 2019-02-15 12:28                             ` Cornelia Huck
  2019-02-15 12:33                               ` David Hildenbrand
  0 siblings, 1 reply; 44+ messages in thread
From: Cornelia Huck @ 2019-02-15 12:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Frank Yang, Dr. David Alan Gilbert, virtio-comment,
	Stefan Hajnoczi, Halil Pasic

On Fri, 15 Feb 2019 12:26:00 +0100
David Hildenbrand <david@redhat.com> wrote:

> Probing is always ugly. But I think we can add something like
>  the x86 PCI hole between 3 and 4 GB after our initial boot memory.
> So there, we would have a memory region just like e.g. x86 has.

A special region is probably the best way out of this pickle. We would
only need the discovery ccw for virtio, then.

> 
> This should even work with other mechanism I am working on. E.g.
> for memory devices, we will add yet another memory region above
> the special PCI region.
> 
> The layout of the guest would then be something like
> 
> [0x000000000000000]
> ... Memory region containing RAM
> [ram_size         ]
> ... Memory region for e.g. special PCI devices
> [ram_size +1 GB   ]
> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
> [maxram_size - ram_size + 1GB]
> 
> We would have to create proper page tables for guest backing that take
> care of the new guest size (not just ram_size). Also, to the guest we
> would indicate "maximum ram size == ram_size" so it does not try to
> probe the "special" memory.

Hm... so that would be:
- 0..ram_size: just like it is handled now
- ram_size..ram_size + 1GB: guest does not treat it as ram, but does
  build page tables for it
- ram_size + 1GB..maxram_size: for whatever memory devices do with it

How does the guest probe this? (SCLP?) Or does the guest simply know
via some kind of probable feature that there's a 1GB region there?

> As we are using paravirtualized features here, this should be fine for.
> Unmodified guests will never touch/probe anything beyond ram_size.

Yes, that gives us more freedom.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 11:19                           ` Dr. David Alan Gilbert
@ 2019-02-15 12:31                             ` Cornelia Huck
  2019-02-18 15:28                             ` Halil Pasic
  1 sibling, 0 replies; 44+ messages in thread
From: Cornelia Huck @ 2019-02-15 12:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Frank Yang, virtio-comment, Stefan Hajnoczi, Halil Pasic

On Fri, 15 Feb 2019 11:19:09 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> What I'm missing here is why the behaviour of the s390's traditional channel program
> matters to the design of an entirely emulated device.

I mostly wrote this up as an explanation...

> 
> As long as the s390 allows:
>   a) The host to map a region of HVA into GPA at an arbitrary GPA
> address
>   b) Not tell the guest that (a) is RAM
>   c) Find a non-RAM GPA for (a)

I think (b) and (c) are what we need to come up with a solution for...
DavidH's idea looks like a good starting point for that.

>   d) Allow the guest to set up a page table pointing to (c)
>   e) Discover (c) via the scheme you described
> 
> Then that's all that's needed - and I'm not seeing what is different on
> s390 about a-d from any other architecture.
> 
> Dave
> 
> 
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 12:28                             ` Cornelia Huck
@ 2019-02-15 12:33                               ` David Hildenbrand
  2019-02-15 12:37                                 ` Cornelia Huck
  0 siblings, 1 reply; 44+ messages in thread
From: David Hildenbrand @ 2019-02-15 12:33 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Frank Yang, Dr. David Alan Gilbert, virtio-comment,
	Stefan Hajnoczi, Halil Pasic

On 15.02.19 13:28, Cornelia Huck wrote:
> On Fri, 15 Feb 2019 12:26:00 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Probing is always ugly. But I think we can add something like
>>  the x86 PCI hole between 3 and 4 GB after our initial boot memory.
>> So there, we would have a memory region just like e.g. x86 has.
> 
> A special region is probably the best way out of this pickle. We would
> only need the discovery ccw for virtio, then.
> 
>>
>> This should even work with other mechanism I am working on. E.g.
>> for memory devices, we will add yet another memory region above
>> the special PCI region.
>>
>> The layout of the guest would then be something like
>>
>> [0x000000000000000]
>> ... Memory region containing RAM
>> [ram_size         ]
>> ... Memory region for e.g. special PCI devices
>> [ram_size +1 GB   ]
>> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
>> [maxram_size - ram_size + 1GB]
>>
>> We would have to create proper page tables for guest backing that take
>> care of the new guest size (not just ram_size). Also, to the guest we
>> would indicate "maximum ram size == ram_size" so it does not try to
>> probe the "special" memory.
> 
> Hm... so that would be:
> - 0..ram_size: just like it is handled now
> - ram_size..ram_size + 1GB: guest does not treat it as ram, but does
>   build page tables for it
> - ram_size + 1GB..maxram_size: for whatever memory devices do with it
> 
> How does the guest probe this? (SCLP?) Or does the guest simply know
> via some kind of probable feature that there's a 1GB region there?

As the guest only "knowns" ram, there is a "maximum ram size" specified
via SCLP. An unmodified guest will not probe beyond that.

The parts of the 1GB used by a device should be communicated via the
paravirtualized device I guess. PCI bars don't really fit I assume, so
we might need some virtio-ccw thingy (you're the expert :)) on top. That
is one part to be clarified.

I guess the guest does not need to know about the whole 1GB, only per
device about the used part. We can then built page tables in the guest
for that part when plugging.

-- 

Thanks,

David / dhildenb

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 12:33                               ` David Hildenbrand
@ 2019-02-15 12:37                                 ` Cornelia Huck
  2019-02-15 12:59                                   ` David Hildenbrand
  2019-02-15 13:50                                   ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 44+ messages in thread
From: Cornelia Huck @ 2019-02-15 12:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Frank Yang, Dr. David Alan Gilbert, virtio-comment,
	Stefan Hajnoczi, Halil Pasic

On Fri, 15 Feb 2019 13:33:06 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 15.02.19 13:28, Cornelia Huck wrote:
> > On Fri, 15 Feb 2019 12:26:00 +0100
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> Probing is always ugly. But I think we can add something like
> >>  the x86 PCI hole between 3 and 4 GB after our initial boot memory.
> >> So there, we would have a memory region just like e.g. x86 has.  
> > 
> > A special region is probably the best way out of this pickle. We would
> > only need the discovery ccw for virtio, then.
> >   
> >>
> >> This should even work with other mechanism I am working on. E.g.
> >> for memory devices, we will add yet another memory region above
> >> the special PCI region.
> >>
> >> The layout of the guest would then be something like
> >>
> >> [0x000000000000000]
> >> ... Memory region containing RAM
> >> [ram_size         ]
> >> ... Memory region for e.g. special PCI devices
> >> [ram_size +1 GB   ]
> >> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
> >> [maxram_size - ram_size + 1GB]
> >>
> >> We would have to create proper page tables for guest backing that take
> >> care of the new guest size (not just ram_size). Also, to the guest we
> >> would indicate "maximum ram size == ram_size" so it does not try to
> >> probe the "special" memory.  
> > 
> > Hm... so that would be:
> > - 0..ram_size: just like it is handled now
> > - ram_size..ram_size + 1GB: guest does not treat it as ram, but does
> >   build page tables for it
> > - ram_size + 1GB..maxram_size: for whatever memory devices do with it
> > 
> > How does the guest probe this? (SCLP?) Or does the guest simply know
> > via some kind of probable feature that there's a 1GB region there?  
> 
> As the guest only "knowns" ram, there is a "maximum ram size" specified
> via SCLP. An unmodified guest will not probe beyond that.

Nod.

> The parts of the 1GB used by a device should be communicated via the
> paravirtualized device I guess. PCI bars don't really fit I assume, so
> we might need some virtio-ccw thingy (you're the expert :)) on top. That
> is one part to be clarified.
> 
> I guess the guest does not need to know about the whole 1GB, only per
> device about the used part. We can then built page tables in the guest
> for that part when plugging.

Hm. With my proposal, the guest would get a list of region addresses
from the device via a new ccw. It could then proceed to set up page
tables for it and start to use it. As long as it is aware that the
addresses it will get are beyond max_ram, that should be fine, I think.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-14 10:58                   ` Cornelia Huck
  2019-02-14 16:37                     ` Dr. David Alan Gilbert
@ 2019-02-15 12:51                     ` Halil Pasic
  2019-02-15 13:33                       ` Cornelia Huck
  1 sibling, 1 reply; 44+ messages in thread
From: Halil Pasic @ 2019-02-15 12:51 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Dr. David Alan Gilbert, virtio-comment, stefanha

On Thu, 14 Feb 2019 11:58:07 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> Tags: inv, virtio 
> From: Cornelia Huck <cohuck@redhat.com>
> To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: virtio-comment@lists.oasis-open.org, stefanha@redhat.com, Halil Pasic <pasic@linux.ibm.com>
> Subject: Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
> Date: Thu, 14 Feb 2019 11:58:07 +0100
> Sender: <virtio-comment@lists.oasis-open.org>
> Organization: Red Hat GmbH
> 
> On Wed, 13 Feb 2019 18:37:56 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > On Wed, 16 Jan 2019 20:06:25 +0000
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >     
> > > > So these are all moving this 1/3 forward - has anyone got comments on
> > > > the transport specific implementations?    
> > > 
> > > No comment on pci or mmio, but I've hacked something together for ccw.
> > > Basically, one sense-type ccw for discovery and a control-type ccw for
> > > activation of the regions (no idea if we really need the latter), both
> > > available with ccw revision 3.
> > > 
> > > No idea whether this will work this way, though...    
> > 
> > That sounds (from a shm perspective) reasonable; can I ask why the
> > 'activate' is needed?  
> 
> The activate interface is actually what I'm most unsure about; maybe
> Halil can chime in.

Sorry, I unfortunately kept deferring looking into this. Will have to do
some more thinking. I promise to give you my feedback by Monday end of
day if not sooner. Does that work with you?

Regards,
Halil


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 12:37                                 ` Cornelia Huck
@ 2019-02-15 12:59                                   ` David Hildenbrand
  2019-02-15 13:50                                   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 44+ messages in thread
From: David Hildenbrand @ 2019-02-15 12:59 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Frank Yang, Dr. David Alan Gilbert, virtio-comment,
	Stefan Hajnoczi, Halil Pasic

On 15.02.19 13:37, Cornelia Huck wrote:
> On Fri, 15 Feb 2019 13:33:06 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 15.02.19 13:28, Cornelia Huck wrote:
>>> On Fri, 15 Feb 2019 12:26:00 +0100
>>> David Hildenbrand <david@redhat.com> wrote:
>>>   
>>>> Probing is always ugly. But I think we can add something like
>>>>  the x86 PCI hole between 3 and 4 GB after our initial boot memory.
>>>> So there, we would have a memory region just like e.g. x86 has.  
>>>
>>> A special region is probably the best way out of this pickle. We would
>>> only need the discovery ccw for virtio, then.
>>>   
>>>>
>>>> This should even work with other mechanism I am working on. E.g.
>>>> for memory devices, we will add yet another memory region above
>>>> the special PCI region.
>>>>
>>>> The layout of the guest would then be something like
>>>>
>>>> [0x000000000000000]
>>>> ... Memory region containing RAM
>>>> [ram_size         ]
>>>> ... Memory region for e.g. special PCI devices
>>>> [ram_size +1 GB   ]
>>>> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
>>>> [maxram_size - ram_size + 1GB]
>>>>
>>>> We would have to create proper page tables for guest backing that take
>>>> care of the new guest size (not just ram_size). Also, to the guest we
>>>> would indicate "maximum ram size == ram_size" so it does not try to
>>>> probe the "special" memory.  
>>>
>>> Hm... so that would be:
>>> - 0..ram_size: just like it is handled now
>>> - ram_size..ram_size + 1GB: guest does not treat it as ram, but does
>>>   build page tables for it
>>> - ram_size + 1GB..maxram_size: for whatever memory devices do with it
>>>
>>> How does the guest probe this? (SCLP?) Or does the guest simply know
>>> via some kind of probable feature that there's a 1GB region there?  
>>
>> As the guest only "knowns" ram, there is a "maximum ram size" specified
>> via SCLP. An unmodified guest will not probe beyond that.
> 
> Nod.
> 
>> The parts of the 1GB used by a device should be communicated via the
>> paravirtualized device I guess. PCI bars don't really fit I assume, so
>> we might need some virtio-ccw thingy (you're the expert :)) on top. That
>> is one part to be clarified.
>>
>> I guess the guest does not need to know about the whole 1GB, only per
>> device about the used part. We can then built page tables in the guest
>> for that part when plugging.
> 
> Hm. With my proposal, the guest would get a list of region addresses
> from the device via a new ccw. It could then proceed to set up page
> tables for it and start to use it. As long as it is aware that the
> addresses it will get are beyond max_ram, that should be fine, I think.
> 

Sounds like this could work in general.

-- 

Thanks,

David / dhildenb

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 12:51                     ` Halil Pasic
@ 2019-02-15 13:33                       ` Cornelia Huck
  0 siblings, 0 replies; 44+ messages in thread
From: Cornelia Huck @ 2019-02-15 13:33 UTC (permalink / raw)
  To: Halil Pasic; +Cc: Dr. David Alan Gilbert, virtio-comment, stefanha

On Fri, 15 Feb 2019 13:51:34 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> On Thu, 14 Feb 2019 11:58:07 +0100
> Cornelia Huck <cohuck@redhat.com> wrote:
> 
> > Tags: inv, virtio 
> > From: Cornelia Huck <cohuck@redhat.com>
> > To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > Cc: virtio-comment@lists.oasis-open.org, stefanha@redhat.com, Halil Pasic <pasic@linux.ibm.com>
> > Subject: Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
> > Date: Thu, 14 Feb 2019 11:58:07 +0100
> > Sender: <virtio-comment@lists.oasis-open.org>
> > Organization: Red Hat GmbH
> > 
> > On Wed, 13 Feb 2019 18:37:56 +0000
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >   
> > > * Cornelia Huck (cohuck@redhat.com) wrote:    
> > > > On Wed, 16 Jan 2019 20:06:25 +0000
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > >       
> > > > > So these are all moving this 1/3 forward - has anyone got comments on
> > > > > the transport specific implementations?      
> > > > 
> > > > No comment on pci or mmio, but I've hacked something together for ccw.
> > > > Basically, one sense-type ccw for discovery and a control-type ccw for
> > > > activation of the regions (no idea if we really need the latter), both
> > > > available with ccw revision 3.
> > > > 
> > > > No idea whether this will work this way, though...      
> > > 
> > > That sounds (from a shm perspective) reasonable; can I ask why the
> > > 'activate' is needed?    
> > 
> > The activate interface is actually what I'm most unsure about; maybe
> > Halil can chime in.  
> 
> Sorry, I unfortunately kept deferring looking into this. Will have to do
> some more thinking. I promise to give you my feedback by Monday end of
> day if not sooner. Does that work with you?

Sure; please also take a look at DavidH's proposal.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 12:37                                 ` Cornelia Huck
  2019-02-15 12:59                                   ` David Hildenbrand
@ 2019-02-15 13:50                                   ` Dr. David Alan Gilbert
  2019-02-15 13:56                                     ` David Hildenbrand
  1 sibling, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2019-02-15 13:50 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: David Hildenbrand, Frank Yang, virtio-comment, Stefan Hajnoczi,
	Halil Pasic

* Cornelia Huck (cohuck@redhat.com) wrote:
> On Fri, 15 Feb 2019 13:33:06 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
> > On 15.02.19 13:28, Cornelia Huck wrote:
> > > On Fri, 15 Feb 2019 12:26:00 +0100
> > > David Hildenbrand <david@redhat.com> wrote:
> > >   
> > >> Probing is always ugly. But I think we can add something like
> > >>  the x86 PCI hole between 3 and 4 GB after our initial boot memory.
> > >> So there, we would have a memory region just like e.g. x86 has.  
> > > 
> > > A special region is probably the best way out of this pickle. We would
> > > only need the discovery ccw for virtio, then.
> > >   
> > >>
> > >> This should even work with other mechanism I am working on. E.g.
> > >> for memory devices, we will add yet another memory region above
> > >> the special PCI region.
> > >>
> > >> The layout of the guest would then be something like
> > >>
> > >> [0x000000000000000]
> > >> ... Memory region containing RAM
> > >> [ram_size         ]
> > >> ... Memory region for e.g. special PCI devices
> > >> [ram_size +1 GB   ]
> > >> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
> > >> [maxram_size - ram_size + 1GB]
> > >>
> > >> We would have to create proper page tables for guest backing that take
> > >> care of the new guest size (not just ram_size). Also, to the guest we
> > >> would indicate "maximum ram size == ram_size" so it does not try to
> > >> probe the "special" memory.  
> > > 
> > > Hm... so that would be:
> > > - 0..ram_size: just like it is handled now
> > > - ram_size..ram_size + 1GB: guest does not treat it as ram, but does
> > >   build page tables for it
> > > - ram_size + 1GB..maxram_size: for whatever memory devices do with it
> > > 
> > > How does the guest probe this? (SCLP?) Or does the guest simply know
> > > via some kind of probable feature that there's a 1GB region there?  
> > 
> > As the guest only "knowns" ram, there is a "maximum ram size" specified
> > via SCLP. An unmodified guest will not probe beyond that.
> 
> Nod.
> 
> > The parts of the 1GB used by a device should be communicated via the
> > paravirtualized device I guess. PCI bars don't really fit I assume, so
> > we might need some virtio-ccw thingy (you're the expert :)) on top. That
> > is one part to be clarified.
> > 
> > I guess the guest does not need to know about the whole 1GB, only per
> > device about the used part. We can then built page tables in the guest
> > for that part when plugging.
> 
> Hm. With my proposal, the guest would get a list of region addresses
> from the device via a new ccw. It could then proceed to set up page
> tables for it and start to use it. As long as it is aware that the
> addresses it will get are beyond max_ram, that should be fine, I think.

Which is the same as my virtio-mmio proposal; the host gets to put it
where ever it sees fit (outside ram) and you've just got a way of
telling the guest where it lives.

Davidh's 1GB window is pretty much how older PCs worked I think;
the problem is that 1GB is never enough and you still need a way
to enumarate what devices are where, so it doesn't help you.
(Our current virtio-fs dax mappings we're using are a few GB).

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 13:50                                   ` Dr. David Alan Gilbert
@ 2019-02-15 13:56                                     ` David Hildenbrand
  2019-02-15 14:02                                       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 44+ messages in thread
From: David Hildenbrand @ 2019-02-15 13:56 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Cornelia Huck
  Cc: Frank Yang, virtio-comment, Stefan Hajnoczi, Halil Pasic

On 15.02.19 14:50, Dr. David Alan Gilbert wrote:
> * Cornelia Huck (cohuck@redhat.com) wrote:
>> On Fri, 15 Feb 2019 13:33:06 +0100
>> David Hildenbrand <david@redhat.com> wrote:
>>
>>> On 15.02.19 13:28, Cornelia Huck wrote:
>>>> On Fri, 15 Feb 2019 12:26:00 +0100
>>>> David Hildenbrand <david@redhat.com> wrote:
>>>>   
>>>>> Probing is always ugly. But I think we can add something like
>>>>>  the x86 PCI hole between 3 and 4 GB after our initial boot memory.
>>>>> So there, we would have a memory region just like e.g. x86 has.  
>>>>
>>>> A special region is probably the best way out of this pickle. We would
>>>> only need the discovery ccw for virtio, then.
>>>>   
>>>>>
>>>>> This should even work with other mechanism I am working on. E.g.
>>>>> for memory devices, we will add yet another memory region above
>>>>> the special PCI region.
>>>>>
>>>>> The layout of the guest would then be something like
>>>>>
>>>>> [0x000000000000000]
>>>>> ... Memory region containing RAM
>>>>> [ram_size         ]
>>>>> ... Memory region for e.g. special PCI devices
>>>>> [ram_size +1 GB   ]
>>>>> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
>>>>> [maxram_size - ram_size + 1GB]
>>>>>
>>>>> We would have to create proper page tables for guest backing that take
>>>>> care of the new guest size (not just ram_size). Also, to the guest we
>>>>> would indicate "maximum ram size == ram_size" so it does not try to
>>>>> probe the "special" memory.  
>>>>
>>>> Hm... so that would be:
>>>> - 0..ram_size: just like it is handled now
>>>> - ram_size..ram_size + 1GB: guest does not treat it as ram, but does
>>>>   build page tables for it
>>>> - ram_size + 1GB..maxram_size: for whatever memory devices do with it
>>>>
>>>> How does the guest probe this? (SCLP?) Or does the guest simply know
>>>> via some kind of probable feature that there's a 1GB region there?  
>>>
>>> As the guest only "knowns" ram, there is a "maximum ram size" specified
>>> via SCLP. An unmodified guest will not probe beyond that.
>>
>> Nod.
>>
>>> The parts of the 1GB used by a device should be communicated via the
>>> paravirtualized device I guess. PCI bars don't really fit I assume, so
>>> we might need some virtio-ccw thingy (you're the expert :)) on top. That
>>> is one part to be clarified.
>>>
>>> I guess the guest does not need to know about the whole 1GB, only per
>>> device about the used part. We can then built page tables in the guest
>>> for that part when plugging.
>>
>> Hm. With my proposal, the guest would get a list of region addresses
>> from the device via a new ccw. It could then proceed to set up page
>> tables for it and start to use it. As long as it is aware that the
>> addresses it will get are beyond max_ram, that should be fine, I think.
> 
> Which is the same as my virtio-mmio proposal; the host gets to put it
> where ever it sees fit (outside ram) and you've just got a way of
> telling the guest where it lives.
> 
> Davidh's 1GB window is pretty much how older PCs worked I think;
> the problem is that 1GB is never enough and you still need a way
> to enumarate what devices are where, so it doesn't help you.
> (Our current virtio-fs dax mappings we're using are a few GB).
> 

How does that work on x86? You cannot suddenly move stuff into the
memory device memory region and potentially mess with DIMMs to be
plugged later. QEMU wise, this sounds wrong.

> Dave
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 


-- 

Thanks,

David / dhildenb

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 13:56                                     ` David Hildenbrand
@ 2019-02-15 14:02                                       ` Dr. David Alan Gilbert
  2019-02-15 14:13                                         ` David Hildenbrand
  0 siblings, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2019-02-15 14:02 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Cornelia Huck, Frank Yang, virtio-comment, Stefan Hajnoczi, Halil Pasic

* David Hildenbrand (david@redhat.com) wrote:
> On 15.02.19 14:50, Dr. David Alan Gilbert wrote:
> > * Cornelia Huck (cohuck@redhat.com) wrote:
> >> On Fri, 15 Feb 2019 13:33:06 +0100
> >> David Hildenbrand <david@redhat.com> wrote:
> >>
> >>> On 15.02.19 13:28, Cornelia Huck wrote:
> >>>> On Fri, 15 Feb 2019 12:26:00 +0100
> >>>> David Hildenbrand <david@redhat.com> wrote:
> >>>>   
> >>>>> Probing is always ugly. But I think we can add something like
> >>>>>  the x86 PCI hole between 3 and 4 GB after our initial boot memory.
> >>>>> So there, we would have a memory region just like e.g. x86 has.  
> >>>>
> >>>> A special region is probably the best way out of this pickle. We would
> >>>> only need the discovery ccw for virtio, then.
> >>>>   
> >>>>>
> >>>>> This should even work with other mechanism I am working on. E.g.
> >>>>> for memory devices, we will add yet another memory region above
> >>>>> the special PCI region.
> >>>>>
> >>>>> The layout of the guest would then be something like
> >>>>>
> >>>>> [0x000000000000000]
> >>>>> ... Memory region containing RAM
> >>>>> [ram_size         ]
> >>>>> ... Memory region for e.g. special PCI devices
> >>>>> [ram_size +1 GB   ]
> >>>>> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
> >>>>> [maxram_size - ram_size + 1GB]
> >>>>>
> >>>>> We would have to create proper page tables for guest backing that take
> >>>>> care of the new guest size (not just ram_size). Also, to the guest we
> >>>>> would indicate "maximum ram size == ram_size" so it does not try to
> >>>>> probe the "special" memory.  
> >>>>
> >>>> Hm... so that would be:
> >>>> - 0..ram_size: just like it is handled now
> >>>> - ram_size..ram_size + 1GB: guest does not treat it as ram, but does
> >>>>   build page tables for it
> >>>> - ram_size + 1GB..maxram_size: for whatever memory devices do with it
> >>>>
> >>>> How does the guest probe this? (SCLP?) Or does the guest simply know
> >>>> via some kind of probable feature that there's a 1GB region there?  
> >>>
> >>> As the guest only "knowns" ram, there is a "maximum ram size" specified
> >>> via SCLP. An unmodified guest will not probe beyond that.
> >>
> >> Nod.
> >>
> >>> The parts of the 1GB used by a device should be communicated via the
> >>> paravirtualized device I guess. PCI bars don't really fit I assume, so
> >>> we might need some virtio-ccw thingy (you're the expert :)) on top. That
> >>> is one part to be clarified.
> >>>
> >>> I guess the guest does not need to know about the whole 1GB, only per
> >>> device about the used part. We can then built page tables in the guest
> >>> for that part when plugging.
> >>
> >> Hm. With my proposal, the guest would get a list of region addresses
> >> from the device via a new ccw. It could then proceed to set up page
> >> tables for it and start to use it. As long as it is aware that the
> >> addresses it will get are beyond max_ram, that should be fine, I think.
> > 
> > Which is the same as my virtio-mmio proposal; the host gets to put it
> > where ever it sees fit (outside ram) and you've just got a way of
> > telling the guest where it lives.
> > 
> > Davidh's 1GB window is pretty much how older PCs worked I think;
> > the problem is that 1GB is never enough and you still need a way
> > to enumarate what devices are where, so it doesn't help you.
> > (Our current virtio-fs dax mappings we're using are a few GB).
> > 
> 
> How does that work on x86? You cannot suddenly move stuff into the
> memory device memory region and potentially mess with DIMMs to be
> plugged later. QEMU wise, this sounds wrong.

Because it's PCI based, it becomes the guests problem - the guest
sets the PCI BARs which set the GPA of the PCI devices;  I assume
there's some protection that happens if it gets mapped over RAM (?!)

I think that varies by firmware as well, with EFI mapping
them differently from our bios.
I think the guest knows the total number of DIMM slots and max-ram
limit, so knows where not-to-map.

Dave

> > Dave
> > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> 
> 
> -- 
> 
> Thanks,
> 
> David / dhildenb
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 14:02                                       ` Dr. David Alan Gilbert
@ 2019-02-15 14:13                                         ` David Hildenbrand
  2019-02-15 15:14                                           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 44+ messages in thread
From: David Hildenbrand @ 2019-02-15 14:13 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Cornelia Huck, Frank Yang, virtio-comment, Stefan Hajnoczi, Halil Pasic

On 15.02.19 15:02, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> On 15.02.19 14:50, Dr. David Alan Gilbert wrote:
>>> * Cornelia Huck (cohuck@redhat.com) wrote:
>>>> On Fri, 15 Feb 2019 13:33:06 +0100
>>>> David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>>> On 15.02.19 13:28, Cornelia Huck wrote:
>>>>>> On Fri, 15 Feb 2019 12:26:00 +0100
>>>>>> David Hildenbrand <david@redhat.com> wrote:
>>>>>>   
>>>>>>> Probing is always ugly. But I think we can add something like
>>>>>>>  the x86 PCI hole between 3 and 4 GB after our initial boot memory.
>>>>>>> So there, we would have a memory region just like e.g. x86 has.  
>>>>>>
>>>>>> A special region is probably the best way out of this pickle. We would
>>>>>> only need the discovery ccw for virtio, then.
>>>>>>   
>>>>>>>
>>>>>>> This should even work with other mechanism I am working on. E.g.
>>>>>>> for memory devices, we will add yet another memory region above
>>>>>>> the special PCI region.
>>>>>>>
>>>>>>> The layout of the guest would then be something like
>>>>>>>
>>>>>>> [0x000000000000000]
>>>>>>> ... Memory region containing RAM
>>>>>>> [ram_size         ]
>>>>>>> ... Memory region for e.g. special PCI devices
>>>>>>> [ram_size +1 GB   ]
>>>>>>> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
>>>>>>> [maxram_size - ram_size + 1GB]
>>>>>>>
>>>>>>> We would have to create proper page tables for guest backing that take
>>>>>>> care of the new guest size (not just ram_size). Also, to the guest we
>>>>>>> would indicate "maximum ram size == ram_size" so it does not try to
>>>>>>> probe the "special" memory.  
>>>>>>
>>>>>> Hm... so that would be:
>>>>>> - 0..ram_size: just like it is handled now
>>>>>> - ram_size..ram_size + 1GB: guest does not treat it as ram, but does
>>>>>>   build page tables for it
>>>>>> - ram_size + 1GB..maxram_size: for whatever memory devices do with it
>>>>>>
>>>>>> How does the guest probe this? (SCLP?) Or does the guest simply know
>>>>>> via some kind of probable feature that there's a 1GB region there?  
>>>>>
>>>>> As the guest only "knowns" ram, there is a "maximum ram size" specified
>>>>> via SCLP. An unmodified guest will not probe beyond that.
>>>>
>>>> Nod.
>>>>
>>>>> The parts of the 1GB used by a device should be communicated via the
>>>>> paravirtualized device I guess. PCI bars don't really fit I assume, so
>>>>> we might need some virtio-ccw thingy (you're the expert :)) on top. That
>>>>> is one part to be clarified.
>>>>>
>>>>> I guess the guest does not need to know about the whole 1GB, only per
>>>>> device about the used part. We can then built page tables in the guest
>>>>> for that part when plugging.
>>>>
>>>> Hm. With my proposal, the guest would get a list of region addresses
>>>> from the device via a new ccw. It could then proceed to set up page
>>>> tables for it and start to use it. As long as it is aware that the
>>>> addresses it will get are beyond max_ram, that should be fine, I think.
>>>
>>> Which is the same as my virtio-mmio proposal; the host gets to put it
>>> where ever it sees fit (outside ram) and you've just got a way of
>>> telling the guest where it lives.
>>>
>>> Davidh's 1GB window is pretty much how older PCs worked I think;
>>> the problem is that 1GB is never enough and you still need a way
>>> to enumarate what devices are where, so it doesn't help you.
>>> (Our current virtio-fs dax mappings we're using are a few GB).
>>>
>>
>> How does that work on x86? You cannot suddenly move stuff into the
>> memory device memory region and potentially mess with DIMMs to be
>> plugged later. QEMU wise, this sounds wrong.
> 
> Because it's PCI based, it becomes the guests problem - the guest
> sets the PCI BARs which set the GPA of the PCI devices;  I assume
> there's some protection that happens if it gets mapped over RAM (?!)
> 
> I think that varies by firmware as well, with EFI mapping
> them differently from our bios.
> I think the guest knows the total number of DIMM slots and max-ram
> limit, so knows where not-to-map.

On s390x, we have to define the size of the host->guest page table when
starting the guest. So we need some upper limit. Mapping anywhere, I
really don't like. Letting the guest define the mapping, I really don't
like.

We can of course switch the order of mappings

[0x000000000000000      ]
... Memory region containing RAM
[ram_size         	]
... Memory region for memory devices (virtio-pmem, virtio-mem ...)
[maxram_size - ram_size ]
... Memory region for e.g. special PCI/CCW devices
[                    TBD]

We can size TBD in a way that we e.g. max out the current page table
size before having to switch to more levels.

> 
> Dave


-- 

Thanks,

David / dhildenb

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 14:13                                         ` David Hildenbrand
@ 2019-02-15 15:14                                           ` Dr. David Alan Gilbert
  2019-02-15 21:42                                             ` Halil Pasic
  2019-02-15 22:08                                             ` David Hildenbrand
  0 siblings, 2 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2019-02-15 15:14 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Cornelia Huck, Frank Yang, virtio-comment, Stefan Hajnoczi, Halil Pasic

* David Hildenbrand (david@redhat.com) wrote:
> On 15.02.19 15:02, Dr. David Alan Gilbert wrote:
> > * David Hildenbrand (david@redhat.com) wrote:
> >> On 15.02.19 14:50, Dr. David Alan Gilbert wrote:
> >>> * Cornelia Huck (cohuck@redhat.com) wrote:
> >>>> On Fri, 15 Feb 2019 13:33:06 +0100
> >>>> David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>>> On 15.02.19 13:28, Cornelia Huck wrote:
> >>>>>> On Fri, 15 Feb 2019 12:26:00 +0100
> >>>>>> David Hildenbrand <david@redhat.com> wrote:
> >>>>>>   
> >>>>>>> Probing is always ugly. But I think we can add something like
> >>>>>>>  the x86 PCI hole between 3 and 4 GB after our initial boot memory.
> >>>>>>> So there, we would have a memory region just like e.g. x86 has.  
> >>>>>>
> >>>>>> A special region is probably the best way out of this pickle. We would
> >>>>>> only need the discovery ccw for virtio, then.
> >>>>>>   
> >>>>>>>
> >>>>>>> This should even work with other mechanism I am working on. E.g.
> >>>>>>> for memory devices, we will add yet another memory region above
> >>>>>>> the special PCI region.
> >>>>>>>
> >>>>>>> The layout of the guest would then be something like
> >>>>>>>
> >>>>>>> [0x000000000000000]
> >>>>>>> ... Memory region containing RAM
> >>>>>>> [ram_size         ]
> >>>>>>> ... Memory region for e.g. special PCI devices
> >>>>>>> [ram_size +1 GB   ]
> >>>>>>> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
> >>>>>>> [maxram_size - ram_size + 1GB]
> >>>>>>>
> >>>>>>> We would have to create proper page tables for guest backing that take
> >>>>>>> care of the new guest size (not just ram_size). Also, to the guest we
> >>>>>>> would indicate "maximum ram size == ram_size" so it does not try to
> >>>>>>> probe the "special" memory.  
> >>>>>>
> >>>>>> Hm... so that would be:
> >>>>>> - 0..ram_size: just like it is handled now
> >>>>>> - ram_size..ram_size + 1GB: guest does not treat it as ram, but does
> >>>>>>   build page tables for it
> >>>>>> - ram_size + 1GB..maxram_size: for whatever memory devices do with it
> >>>>>>
> >>>>>> How does the guest probe this? (SCLP?) Or does the guest simply know
> >>>>>> via some kind of probable feature that there's a 1GB region there?  
> >>>>>
> >>>>> As the guest only "knowns" ram, there is a "maximum ram size" specified
> >>>>> via SCLP. An unmodified guest will not probe beyond that.
> >>>>
> >>>> Nod.
> >>>>
> >>>>> The parts of the 1GB used by a device should be communicated via the
> >>>>> paravirtualized device I guess. PCI bars don't really fit I assume, so
> >>>>> we might need some virtio-ccw thingy (you're the expert :)) on top. That
> >>>>> is one part to be clarified.
> >>>>>
> >>>>> I guess the guest does not need to know about the whole 1GB, only per
> >>>>> device about the used part. We can then built page tables in the guest
> >>>>> for that part when plugging.
> >>>>
> >>>> Hm. With my proposal, the guest would get a list of region addresses
> >>>> from the device via a new ccw. It could then proceed to set up page
> >>>> tables for it and start to use it. As long as it is aware that the
> >>>> addresses it will get are beyond max_ram, that should be fine, I think.
> >>>
> >>> Which is the same as my virtio-mmio proposal; the host gets to put it
> >>> where ever it sees fit (outside ram) and you've just got a way of
> >>> telling the guest where it lives.
> >>>
> >>> Davidh's 1GB window is pretty much how older PCs worked I think;
> >>> the problem is that 1GB is never enough and you still need a way
> >>> to enumarate what devices are where, so it doesn't help you.
> >>> (Our current virtio-fs dax mappings we're using are a few GB).
> >>>
> >>
> >> How does that work on x86? You cannot suddenly move stuff into the
> >> memory device memory region and potentially mess with DIMMs to be
> >> plugged later. QEMU wise, this sounds wrong.
> > 
> > Because it's PCI based, it becomes the guests problem - the guest
> > sets the PCI BARs which set the GPA of the PCI devices;  I assume
> > there's some protection that happens if it gets mapped over RAM (?!)
> > 
> > I think that varies by firmware as well, with EFI mapping
> > them differently from our bios.
> > I think the guest knows the total number of DIMM slots and max-ram
> > limit, so knows where not-to-map.
> 
> On s390x, we have to define the size of the host->guest page table when
> starting the guest. So we need some upper limit.

That's OK; x86 also has that because they have a limited physical
and virtual address size [which may or may not be correctly passed to
the guest!].

> Mapping anywhere, I
> really don't like. Letting the guest define the mapping, I really don't
> like.

Well it's OK to have a hole for it, but letting the guest choose where
those mappings go in the hole is the norm for PCI (there are
exceptions).

> We can of course switch the order of mappings
> 
> [0x000000000000000      ]
> ... Memory region containing RAM
> [ram_size         	]
> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
> [maxram_size - ram_size ]
> ... Memory region for e.g. special PCI/CCW devices
> [                    TBD]
> 
> We can size TBD in a way that we e.g. max out the current page table
> size before having to switch to more levels.

Yes, that's fine to set some upper limit; you've just got to make sure
that the hypervisor knows where it can put stuff and if the guest
does PCI that it knows where it's allowed to put stuff and as long
as the two don't overlap everyone is happy.

[We should probably take this level of detail off this list - it's
parsecs away from the detail of virtio]

Dave

> > 
> > Dave
> 
> 
> -- 
> 
> Thanks,
> 
> David / dhildenb
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 15:14                                           ` Dr. David Alan Gilbert
@ 2019-02-15 21:42                                             ` Halil Pasic
  2019-02-15 22:08                                             ` David Hildenbrand
  1 sibling, 0 replies; 44+ messages in thread
From: Halil Pasic @ 2019-02-15 21:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: David Hildenbrand, Cornelia Huck, Frank Yang, virtio-comment,
	Stefan Hajnoczi

On Fri, 15 Feb 2019 15:14:25 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * David Hildenbrand (david@redhat.com) wrote:
> > On 15.02.19 15:02, Dr. David Alan Gilbert wrote:
> > > * David Hildenbrand (david@redhat.com) wrote:
> > >> On 15.02.19 14:50, Dr. David Alan Gilbert wrote:
> > >>> * Cornelia Huck (cohuck@redhat.com) wrote:
> > >>>> On Fri, 15 Feb 2019 13:33:06 +0100
> > >>>> David Hildenbrand <david@redhat.com> wrote:
> > >>>>
> > >>>>> On 15.02.19 13:28, Cornelia Huck wrote:
> > >>>>>> On Fri, 15 Feb 2019 12:26:00 +0100
> > >>>>>> David Hildenbrand <david@redhat.com> wrote:
> > >>>>>>   
> > >>>>>>> Probing is always ugly. But I think we can add something like
> > >>>>>>>  the x86 PCI hole between 3 and 4 GB after our initial boot memory.
> > >>>>>>> So there, we would have a memory region just like e.g. x86 has.  
> > >>>>>>
> > >>>>>> A special region is probably the best way out of this pickle. We would
> > >>>>>> only need the discovery ccw for virtio, then.
> > >>>>>>   
> > >>>>>>>
> > >>>>>>> This should even work with other mechanism I am working on. E.g.
> > >>>>>>> for memory devices, we will add yet another memory region above
> > >>>>>>> the special PCI region.
> > >>>>>>>
> > >>>>>>> The layout of the guest would then be something like
> > >>>>>>>
> > >>>>>>> [0x000000000000000]
> > >>>>>>> ... Memory region containing RAM
> > >>>>>>> [ram_size         ]
> > >>>>>>> ... Memory region for e.g. special PCI devices
> > >>>>>>> [ram_size +1 GB   ]
> > >>>>>>> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
> > >>>>>>> [maxram_size - ram_size + 1GB]
> > >>>>>>>
> > >>>>>>> We would have to create proper page tables for guest backing that take
> > >>>>>>> care of the new guest size (not just ram_size). Also, to the guest we
> > >>>>>>> would indicate "maximum ram size == ram_size" so it does not try to
> > >>>>>>> probe the "special" memory.  
> > >>>>>>
> > >>>>>> Hm... so that would be:
> > >>>>>> - 0..ram_size: just like it is handled now
> > >>>>>> - ram_size..ram_size + 1GB: guest does not treat it as ram, but does
> > >>>>>>   build page tables for it
> > >>>>>> - ram_size + 1GB..maxram_size: for whatever memory devices do with it
> > >>>>>>
> > >>>>>> How does the guest probe this? (SCLP?) Or does the guest simply know
> > >>>>>> via some kind of probable feature that there's a 1GB region there?  
> > >>>>>
> > >>>>> As the guest only "knowns" ram, there is a "maximum ram size" specified
> > >>>>> via SCLP. An unmodified guest will not probe beyond that.
> > >>>>
> > >>>> Nod.
> > >>>>
> > >>>>> The parts of the 1GB used by a device should be communicated via the
> > >>>>> paravirtualized device I guess. PCI bars don't really fit I assume, so
> > >>>>> we might need some virtio-ccw thingy (you're the expert :)) on top. That
> > >>>>> is one part to be clarified.
> > >>>>>
> > >>>>> I guess the guest does not need to know about the whole 1GB, only per
> > >>>>> device about the used part. We can then built page tables in the guest
> > >>>>> for that part when plugging.
> > >>>>
> > >>>> Hm. With my proposal, the guest would get a list of region addresses
> > >>>> from the device via a new ccw. It could then proceed to set up page
> > >>>> tables for it and start to use it. As long as it is aware that the
> > >>>> addresses it will get are beyond max_ram, that should be fine, I think.
> > >>>
> > >>> Which is the same as my virtio-mmio proposal; the host gets to put it
> > >>> where ever it sees fit (outside ram) and you've just got a way of
> > >>> telling the guest where it lives.
> > >>>
> > >>> Davidh's 1GB window is pretty much how older PCs worked I think;
> > >>> the problem is that 1GB is never enough and you still need a way
> > >>> to enumarate what devices are where, so it doesn't help you.
> > >>> (Our current virtio-fs dax mappings we're using are a few GB).
> > >>>
> > >>
> > >> How does that work on x86? You cannot suddenly move stuff into the
> > >> memory device memory region and potentially mess with DIMMs to be
> > >> plugged later. QEMU wise, this sounds wrong.
> > > 
> > > Because it's PCI based, it becomes the guests problem - the guest
> > > sets the PCI BARs which set the GPA of the PCI devices;  I assume
> > > there's some protection that happens if it gets mapped over RAM (?!)
> > > 
> > > I think that varies by firmware as well, with EFI mapping
> > > them differently from our bios.
> > > I think the guest knows the total number of DIMM slots and max-ram
> > > limit, so knows where not-to-map.
> > 
> > On s390x, we have to define the size of the host->guest page table when
> > starting the guest. So we need some upper limit.
> 
> That's OK; x86 also has that because they have a limited physical
> and virtual address size [which may or may not be correctly passed to
> the guest!].
> 
> > Mapping anywhere, I
> > really don't like. Letting the guest define the mapping, I really don't
> > like.
> 
> Well it's OK to have a hole for it, but letting the guest choose where
> those mappings go in the hole is the norm for PCI (there are
> exceptions).
> 
> > We can of course switch the order of mappings
> > 
> > [0x000000000000000      ]
> > ... Memory region containing RAM
> > [ram_size         	]
> > ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
> > [maxram_size - ram_size ]
> > ... Memory region for e.g. special PCI/CCW devices
> > [                    TBD]
> > 
> > We can size TBD in a way that we e.g. max out the current page table
> > size before having to switch to more levels.
> 
> Yes, that's fine to set some upper limit; you've just got to make sure
> that the hypervisor knows where it can put stuff and if the guest
> does PCI that it knows where it's allowed to put stuff and as long
> as the two don't overlap everyone is happy.
> 
> [We should probably take this level of detail off this list - it's
> parsecs away from the detail of virtio]

If you do take the in detail discussion off is list please keep me in the
loop.

Regards,
Halil


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 15:14                                           ` Dr. David Alan Gilbert
  2019-02-15 21:42                                             ` Halil Pasic
@ 2019-02-15 22:08                                             ` David Hildenbrand
  1 sibling, 0 replies; 44+ messages in thread
From: David Hildenbrand @ 2019-02-15 22:08 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Cornelia Huck, Frank Yang, virtio-comment, Stefan Hajnoczi, Halil Pasic

>> On s390x, we have to define the size of the host->guest page table when
>> starting the guest. So we need some upper limit.
> 
> That's OK; x86 also has that because they have a limited physical
> and virtual address size [which may or may not be correctly passed to
> the guest!].
> 
>> Mapping anywhere, I
>> really don't like. Letting the guest define the mapping, I really don't
>> like.
> 
> Well it's OK to have a hole for it, but letting the guest choose where
> those mappings go in the hole is the norm for PCI (there are
> exceptions).

Yes, but at least if we need a new interface for this for s390x (what I
assume), not allowing the guest to map is obviously easier. However if
there is good reason why it is needed, nothing speaks against it. We
just have to offer the guest enough information to do it.

> 
>> We can of course switch the order of mappings
>>
>> [0x000000000000000      ]
>> ... Memory region containing RAM
>> [ram_size         	]
>> ... Memory region for memory devices (virtio-pmem, virtio-mem ...)
>> [maxram_size - ram_size ]
>> ... Memory region for e.g. special PCI/CCW devices
>> [                    TBD]
>>
>> We can size TBD in a way that we e.g. max out the current page table
>> size before having to switch to more levels.
> 
> Yes, that's fine to set some upper limit; you've just got to make sure
> that the hypervisor knows where it can put stuff and if the guest
> does PCI that it knows where it's allowed to put stuff and as long
> as the two don't overlap everyone is happy.
> 
> [We should probably take this level of detail off this list - it's
> parsecs away from the detail of virtio]

Yes indeed, I guess this gives Conny an idea on how it could work, so
now she can design an interface - and clarify all the details ;)

I guess we'll need something for both, virtio-ccw and virtio-pci on s390x.

I'll be happy to discuss this off-list.

*flies away*

Have a great weekend Dave!


-- 

Thanks,

David / dhildenb

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions
  2019-02-15 11:19                           ` Dr. David Alan Gilbert
  2019-02-15 12:31                             ` Cornelia Huck
@ 2019-02-18 15:28                             ` Halil Pasic
  1 sibling, 0 replies; 44+ messages in thread
From: Halil Pasic @ 2019-02-18 15:28 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Cornelia Huck, Frank Yang, virtio-comment, Stefan Hajnoczi

On Fri, 15 Feb 2019 11:19:09 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Cornelia Huck (cohuck@redhat.com) wrote:
> > On Thu, 14 Feb 2019 09:43:10 -0800
> > Frank Yang <lfy@google.com> wrote:
> > 
> > > On Thu, Feb 14, 2019 at 8:37 AM Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > wrote:
> > > 
> > > > * Cornelia Huck (cohuck@redhat.com) wrote:  

[..]

> > 
> > The background here is that s390 traditionally does not have any
> > concept of memory-mapped I/O. IOW, you don't just write to or read from
> > a special memory area; instead, I/O operations use special instructions.
> > 
> > The mechanism I'm trying to extend here is channel I/O: the driver
> > builds a channel program with commands that point to guest memory areas
> > and hands it to the channel subsystem (which means, in our case, the
> > host) via a special instruction. The channel subsystem and the device
> > (the host, in our case) translate the memory addresses and execute the
> > commands. The one place where we write shared memory directly in the
> > virtio case are the virtqueues -- which are allocated in guest memory,
> > so the guest decides which memory addresses are special. Accessing the
> > config space of a virtio device via the ccw transport does not
> > read/write a memory location directly, but instead uses a channel
> > program that performs the read/write.
> > 
> > For pci, the memory accesses are mapped to special instructions:
> > reading or writing the config space of a pci device does not perform
> > reads or writes of a memory location, either; the driver uses special
> > instructions to access the config space (which are also
> > interpreted/emulated by QEMU, for example.)
> > 
> > The old s390 (pre-virtio-ccw) virtio transport had to rely on the
> > knowledge that there were two pages containing the virtqueues etc.
> > right above the normal memory (probed by checking whether accessing
> > that memory gave an exception or not). The main problems were that this
> > was inflexible (the guest had no easy way to find out how many
> > 'special' pages were present, other than trying to access them), and
> > that it was different from whatever other mechanisms are common on s390.
> > 
> > We might be able to come up with another scheme, but I wouldn't hold my
> > breath. Would be great if someone else with s390 knowledge could chime
> > in here.
> 
> What I'm missing here is why the behaviour of the s390's traditional channel program
> matters to the design of an entirely emulated device.
> 

Let me try to explain. Right at the beginning of the virtio spec one can
read "These devices are found in virtual environments, yet by design
they look like physical devices to the guest within the virtual machine
- and this document treats them as such." (section 1 Introduction).

That means any virtio-pci device is supposed to be a perfectly legit pci
device, and same should in theory hold for any virtio-ccw (in practice
it is a bit complicated with ccw).

As far as I understand the idea behind the shm region is to do IO via
instructions that read/write normal RAM memory. s390 does not have stuff
like: if the address is in this range it is an IO address, and if in that
range it is a RAM address. Instead it is cast in stone, that a certain
argument of a certain operation (instruction) is a RAM address (usually),
or an IO address (for certain instructions like PCI LOAD).

Furthermore to my best knowledge on s390 the OS has control over what
parts of RAM can be written/read by the device. For ccw those bits of
RAM are designated by the channel program. zPCI does this via the
translation infrastructure AFAIU -- I'm not awfully familiar with zPCI.

@Connie: We have virtio-pci on s390x. How is a virtio-pci device
supposed to access an shm region?

Thus I do have some sympathy for the guest opt-in to a shared memory
region. BTW isn't your proposal for MMIO including something similar:

"""
+ \mmiodreg{SHMBaseLow}{SHMBaseHigh}{Shared memory region 64 bit long
physical address}{0x0c0}{0x0xc4}{W}{%
+ The driver writes these registers to indicate where it wishes
+ the device to map the shared memory region currently selected.
+ }
"""

Did I misunderstand something?

> As long as the s390 allows:
>   a) The host to map a region of HVA into GPA at an arbitrary GPA
> address
>   b) Not tell the guest that (a) is RAM
>   c) Find a non-RAM GPA for (a)
>   d) Allow the guest to set up a page table pointing to (c)
>   e) Discover (c) via the scheme you described
> 

As David wrote before, yes this could work. All we need is make sure the
respective virtio driver has exclusive access to the given memory region
in the guest. And virtio-ccw moving away from normal ccw is not
necessarily a bad thing, but we need to be careful about it.

> Then that's all that's needed - and I'm not seeing what is different on
> s390 about a-d from any other architecture.
> 

I've tried to clarify that. To sum it up on s390 the full 64 bit is
conceptually RAM. For access to IO regions (PCI) s390 uses specialized
instructions. Was my explanation helpful?

Another question: Is, and if how is a virtio shared memory region
different then let's say memory that used in a 'shared' fashion by two
CPUs?

Regards,
Halil


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2019-02-18 15:28 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-11 11:41 [virtio-comment] [PATCH 0/3] Large shared memory regions Dr. David Alan Gilbert (git)
2019-01-11 11:41 ` [virtio-comment] [PATCH 1/3] shared memory: Define " Dr. David Alan Gilbert (git)
2019-01-11 12:15   ` Cornelia Huck
2019-01-11 12:26     ` Dr. David Alan Gilbert
2019-01-15 10:10       ` Cornelia Huck
2019-01-15 11:23         ` Dr. David Alan Gilbert
2019-01-16 10:56           ` Cornelia Huck
2019-01-16 20:06             ` Dr. David Alan Gilbert
2019-02-11 21:52               ` Cornelia Huck
2019-02-13 18:37                 ` Dr. David Alan Gilbert
2019-02-14 10:58                   ` Cornelia Huck
2019-02-14 16:37                     ` Dr. David Alan Gilbert
2019-02-14 17:43                       ` Frank Yang
2019-02-15 11:07                         ` Cornelia Huck
2019-02-15 11:19                           ` Dr. David Alan Gilbert
2019-02-15 12:31                             ` Cornelia Huck
2019-02-18 15:28                             ` Halil Pasic
2019-02-15 11:26                           ` David Hildenbrand
2019-02-15 12:28                             ` Cornelia Huck
2019-02-15 12:33                               ` David Hildenbrand
2019-02-15 12:37                                 ` Cornelia Huck
2019-02-15 12:59                                   ` David Hildenbrand
2019-02-15 13:50                                   ` Dr. David Alan Gilbert
2019-02-15 13:56                                     ` David Hildenbrand
2019-02-15 14:02                                       ` Dr. David Alan Gilbert
2019-02-15 14:13                                         ` David Hildenbrand
2019-02-15 15:14                                           ` Dr. David Alan Gilbert
2019-02-15 21:42                                             ` Halil Pasic
2019-02-15 22:08                                             ` David Hildenbrand
2019-02-15 12:51                     ` Halil Pasic
2019-02-15 13:33                       ` Cornelia Huck
2019-01-23 15:12         ` Michael S. Tsirkin
2019-01-11 15:29     ` Halil Pasic
2019-01-11 16:07       ` Dr. David Alan Gilbert
2019-01-11 17:57         ` Halil Pasic
2019-01-15  9:33           ` Cornelia Huck
2019-02-13  2:25   ` [virtio-comment] " Stefan Hajnoczi
2019-02-13 10:44     ` Dr. David Alan Gilbert
2019-02-14  3:43       ` Stefan Hajnoczi
2019-01-11 11:41 ` [virtio-comment] [PATCH 2/3] shared memory: Define PCI capability Dr. David Alan Gilbert (git)
2019-02-13  2:30   ` [virtio-comment] " Stefan Hajnoczi
2019-01-11 11:42 ` [virtio-comment] [PATCH 3/3] shared memory: Define mmio registers Dr. David Alan Gilbert (git)
2019-02-13  2:33   ` [virtio-comment] " Stefan Hajnoczi
2019-02-13 16:52     ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.