From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: virtio-comment-return-614-cohuck=redhat.com@lists.oasis-open.org Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis.ws5.connectedcommunity.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id C2DA3986097 for ; Thu, 14 Feb 2019 17:43:23 +0000 (UTC) MIME-Version: 1.0 References: <20190111114200.10026-2-dgilbert@redhat.com> <20190111131540.12a8abca.cohuck@redhat.com> <20190111122654.GE2738@work-vm> <20190115111038.6769d292.cohuck@redhat.com> <20190115112303.GB2135@work-vm> <20190116115638.152d6797.cohuck@redhat.com> <20190116200625.GG2351@work-vm> <20190211225225.7c39154d.cohuck@redhat.com> <20190213183755.GF2601@work-vm> <20190214115807.2bab4efb.cohuck@redhat.com> <20190214163707.GE2617@work-vm> In-Reply-To: <20190214163707.GE2617@work-vm> From: Frank Yang Date: Thu, 14 Feb 2019 09:43:10 -0800 Message-ID: Content-Type: multipart/alternative; boundary="000000000000c88c100581de310c" Subject: Re: [virtio-comment] [PATCH 1/3] shared memory: Define shared memory regions To: "Dr. David Alan Gilbert" Cc: Cornelia Huck , virtio-comment@lists.oasis-open.org, Stefan Hajnoczi , Halil Pasic List-ID: --000000000000c88c100581de310c Content-Type: text/plain; charset="UTF-8" On Thu, Feb 14, 2019 at 8:37 AM Dr. David Alan Gilbert wrote: > * Cornelia Huck (cohuck@redhat.com) wrote: > > On Wed, 13 Feb 2019 18:37:56 +0000 > > "Dr. David Alan Gilbert" wrote: > > > > > * Cornelia Huck (cohuck@redhat.com) wrote: > > > > On Wed, 16 Jan 2019 20:06:25 +0000 > > > > "Dr. David Alan Gilbert" wrote: > > > > > > > > > So these are all moving this 1/3 forward - has anyone got comments > on > > > > > the transport specific implementations? > > > > > > > > No comment on pci or mmio, but I've hacked something together for > ccw. > > > > Basically, one sense-type ccw for discovery and a control-type ccw > for > > > > activation of the regions (no idea if we really need the latter), > both > > > > available with ccw revision 3. > > > > > > > > No idea whether this will work this way, though... > > > > > > That sounds (from a shm perspective) reasonable; can I ask why the > > > 'activate' is needed? > > > > The activate interface is actually what I'm most unsure about; maybe > > Halil can chime in. > > > > My basic concern is that we don't have any idea how the guest will use > > the available memory. If the shared memory areas are supposed to be > > mapped into an inconvenient place, the activate interface gives the > > guest a chance to clear up that area before the host starts writing to > > it. > > I'm expecting the host to map it into an area of GPA that is out of the > way - it doesn't overlap with RAM. > Given that, I'm not sure why the guest would have to do any 'clear up' - > it probably wants to make a virtual mapping somewhere, but again that's > upto the guest to do when it feels like it. > > This is what we do with Vulkan as well. > > I'm not really enthusiastic about that interface... for one, I'm not > > sure how this plays out at the device type level, which should not > > really concern itself with transport-specific handling. > > I'd expect the host side code to give an area of memory to the transport > and tell it to map it somewhere (in the QEMU terminology a MemoryRegion > I think). > I wonder if this could help: the way we're running Vulkan at the moment, what we do is add a the concept of a MemoryRegion with no actual backing: https://android-review.googlesource.com/q/topic:%22qemu-user-controlled-hv-mappings%22+(status:open%20OR%20status:merged) and it would be connected to the entire PCI address space on the shared memory address space realization. So it's kind of like a sparse or deferred MemoryRegion. When the guest actually wants to map a subregion associated with the host memory, on the host side, we can call the hypervisor to map the region, based on giving the device implementation the functions KVM_SET_USER_MEMORY_REGION and analogs. This has the advantage of a smaller contact area between shm and qemu, where the device level stuff can operate at a separate layer from MemoryRegions which is more transport level. > Similarly in the guest, I'm expecting the driver for the device to > ask for a pointer to a region with a particular ID and that goes > down to the transport code. > > Another option would be to map these into a special memory area that > > the guest won't use for its normal operation... the original s390 > > (non-ccw) virtio transport mapped everything into two special pages > > above the guest memory, but that was quite painful, and I don't think > > we want to go down that road again. > > Can you explain why? > > Dave > > > > > > > Dave > > > > > > > diff --git a/content.tex b/content.tex > > > > index 836ee5236939..7f379bca932e 100644 > > > > --- a/content.tex > > > > +++ b/content.tex > > > > @@ -2078,6 +2078,8 @@ virtio: > > > > #define CCW_CMD_READ_VQ_CONF 0x32 > > > > #define CCW_CMD_SET_VIRTIO_REV 0x83 > > > > #define CCW_CMD_READ_STATUS 0x72 > > > > +#define CCW_CMD_GET_REGIONS 0x14 > > > > +#define CCW_CMD_MAP_REGIONS 0x93 > > > > \end{lstlisting} > > > > > > > > \subsubsection{Notifications}\label{sec:Virtio Transport Options / > Virtio > > > > @@ -2170,7 +2172,9 @@ The following values are supported: > > > > \hline > > > > 2 & 0 & & CCW_CMD_READ_STATUS support \\ > > > > \hline > > > > -3-n & & & reserved for later revisions \\ > > > > +3 & 0 & & CCW_CMD_GET_REGIONS and > CCW_CMD_MAP_REGIONS support \\ > > > > +\hline > > > > +4-n & & & reserved for later revisions \\ > > > > \hline > > > > \end{tabular} > > > > > > > > @@ -2449,6 +2453,72 @@ command. Some legacy devices will support > two-stage queue indicators, though, > > > > and a driver will be able to successfully use > CCW_CMD_SET_IND_ADAPTER to set > > > > them up. > > > > > > > > +\subsubsection{Handling Shared Memory Regions}\label{sec:Virtio > Transport Options / Virtio over channel I/O / Device Initialization / > Handling Shared Memory Regions} > > > > + > > > > +The CCW_CMD_GET_REGIONS command allows the driver to discover > shared memory > > > > +regions provided by the device, if any. > > > > + > > > > +The driver provides a pointer to a 4096 byte buffer that is filled > out by > > > > +the device: > > > > + > > > > +\begin{lstlisting} > > > > + struct shared_regions_info { > > > > + be64 num_regions; > > > > + struct shared_region_desc regions[]; > > > > + }; > > > > +\end{lstlisting} > > > > + > > > > +The buffer contains 0 or more shared region descriptors, as > specified > > > > +by \field{num_regions}. If the devices does not provide shared > regions, > > > > +\field{num_regions} is 0. Otherwise, the shared region descriptors > have > > > > +the following format: > > > > + > > > > +\begin{lstlisting} > > > > +struct shared_region_desc { > > > > + be64 addr; > > > > + be64 len; > > > > + u8 id; > > > > + u8 pad[3]; > > > > +} > > > > +\end{lstlisting} > > > > + > > > > +\field{addr} is the guest-physical address of the region with a > length of > > > > +\field{len}, identified by \field{id}. The contents of \field{pad} > are > > > > +unpredictable, although it is recommended that the device fills in > zeroes. > > > > + > > > > +To activate or deactivate a shared memory region, the device uses > the > > > > +CCW_CMD_MAP_REGIONS command. It takes the following payload: > > > > + > > > > +\begin{lstlisting} > > > > +struct shared_region_ctrl { > > > > + u8 id; > > > > + u8 activate; > > > > + u8 pad[2]; > > > > +} > > > > +\end{lstlisting} > > > > + > > > > +\field{id} denotes the shared memory region that is the target of > the command, > > > > +while \field{activate} specifies whether the region should be > activated (if 1) > > > > +or deactivated (if 0). When activated, the device makes the > guest-physical > > > > +address of the region available as a shared memory region. > > > > + > > > > +\devicenormative{\paragraph}{Handling Shared Memory Regions}{Virtio > Transport Options / Virtio over channel I/O / Device Initialization / > Handling Shared Memory Regions} > > > > + > > > > +The device MUST reject the CCW_CMD_GET_REGIONS and > CCW_CMD_MAP_REGIONS > > > > +commands if not at least revision 3 has been negotiated. > > > > + > > > > +The device MUST NOT read from or write to the region before it has > been > > > > +activated by the driver or after it has been deactivated by the > driver. > > > > + > > > > +If the driver reads from or writes to an address specified to a > region that is > > > > +not activated by the driver, it MUST treat this read or write as a > normal > > > > +read or write operation. > > > > + > > > > +\drivernormative{\paragraph}{Handling Shared Memory Regions}{Virtio > Transport Options / Virtio over channel I/O / Device Initialization / > Handling Shared Memory Regions} > > > > + > > > > +The driver MUST NOT treat the guest-physical address of a region as > a shared > > > > +memory region before it has activated it or after it has > deactivated it. > > > > + > > > > \subsection{Device Operation}\label{sec:Virtio Transport Options / > Virtio over channel I/O / Device Operation} > > > > > > > > \subsubsection{Host->Guest Notification}\label{sec:Virtio Transport > Options / Virtio over channel I/O / Device Operation / Host->Guest > Notification} > > > -- > > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > > > > > This publicly archived list offers a means to provide input to the > > OASIS Virtual I/O Device (VIRTIO) TC. > > > > In order to verify user consent to the Feedback License terms and > > to minimize spam in the list archive, subscription is required > > before posting. > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > > List help: virtio-comment-help@lists.oasis-open.org > > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > > Feedback License: > https://www.oasis-open.org/who/ipr/feedback_license.pdf > > List Guidelines: > https://www.oasis-open.org/policies-guidelines/mailing-lists > > Committee: https://www.oasis-open.org/committees/virtio/ > > Join OASIS: https://www.oasis-open.org/join/ > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: > https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ > > --000000000000c88c100581de310c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Thu, Feb 14, 2019= at 8:37 AM Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
* Cornelia Huck (cohuck@redhat.com) wrote:
> On Wed, 13 Feb 2019 18:37:56 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
>
> > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > On Wed, 16 Jan 2019 20:06:25 +0000
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >=C2=A0 =C2=A0
> > > > So these are all moving this 1/3 forward - has anyone g= ot comments on
> > > > the transport specific implementations?=C2=A0
> > >
> > > No comment on pci or mmio, but I've hacked something tog= ether for ccw.
> > > Basically, one sense-type ccw for discovery and a control-ty= pe ccw for
> > > activation of the regions (no idea if we really need the lat= ter), both
> > > available with ccw revision 3.
> > >
> > > No idea whether this will work this way, though...=C2=A0 > >
> > That sounds (from a shm perspective) reasonable; can I ask why th= e
> > 'activate' is needed?
>
> The activate interface is actually what I'm most unsure about; may= be
> Halil can chime in.
>
> My basic concern is that we don't have any idea how the guest will= use
> the available memory. If the shared memory areas are supposed to be > mapped into an inconvenient place, the activate interface gives the > guest a chance to clear up that area before the host starts writing to=
> it.

I'm expecting the host to map it into an area of GPA that is out of the=
way - it doesn't overlap with RAM.
Given that, I'm not sure why the guest would have to do any 'clear = up' -
it probably wants to make a virtual mapping somewhere, but again that's=
upto the guest to do when it feels like it.


This is what we do with Vulkan as well= .
=C2=A0
> I'm not really enthusiastic about that interface... for one, I'= ;m not
> sure how this plays out at the device type level, which should not
> really concern itself with transport-specific handling.

I'd expect the host side code to give an area of memory to the transpor= t
and tell it to map it somewhere (in the QEMU terminology a MemoryRegion
I think).

I wonder if this could help: = the way we're running Vulkan at the moment, what we do is add a the con= cept of a MemoryRegion with no actual backing:


and it would b= e connected to the entire PCI address space on the shared memory address sp= ace realization. So it's kind of like a sparse or deferred MemoryRegion= .

When the guest actually wants to map a subregion= associated with the host memory,
on the host side, we can call t= he hypervisor to map the region, based on giving the device implementation = the functions KVM_SET_USER_MEMORY_REGION and analogs.

<= div>This has the advantage of a smaller contact area between shm and qemu,<= /div>
where the device level stuff can operate at a separate layer from= MemoryRegions which is more transport level.
=C2=A0
Similarly in the guest, I'm expecting the driver for the device to
ask for a pointer to a region with a particular ID and that goes
down to the transport code.=C2=A0
> Another option would be to map these into a special memory area that > the guest won't use for its normal operation... the original s390<= br> > (non-ccw) virtio transport mapped everything into two special pages > above the guest memory, but that was quite painful, and I don't th= ink
> we want to go down that road again.

Can you explain why?

Dave

> >
> > Dave
> >
> > > diff --git a/content.tex b/content.tex
> > > index 836ee5236939..7f379bca932e 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -2078,6 +2078,8 @@ virtio:
> > >=C2=A0 #define CCW_CMD_READ_VQ_CONF 0x32
> > >=C2=A0 #define CCW_CMD_SET_VIRTIO_REV 0x83
> > >=C2=A0 #define CCW_CMD_READ_STATUS 0x72
> > > +#define CCW_CMD_GET_REGIONS 0x14
> > > +#define CCW_CMD_MAP_REGIONS 0x93
> > >=C2=A0 \end{lstlisting}
> > >=C2=A0
> > >=C2=A0 \subsubsection{Notifications}\label{sec:Virtio Transpo= rt Options / Virtio
> > > @@ -2170,7 +2172,9 @@ The following values are supported: > > >=C2=A0 \hline
> > >=C2=A0 2=C2=A0 =C2=A0 =C2=A0 =C2=A0 & 0=C2=A0 =C2=A0 =C2= =A0 & <empty>=C2=A0 =C2=A0& CCW_CMD_READ_STATUS support \\ > > >=C2=A0 \hline
> > > -3-n=C2=A0 =C2=A0 =C2=A0 &=C2=A0 =C2=A0 =C2=A0 =C2=A0 &a= mp;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0& reserved for later revisi= ons \\
> > > +3=C2=A0 =C2=A0 =C2=A0 =C2=A0 & 0=C2=A0 =C2=A0 =C2=A0 &a= mp; <empty>=C2=A0 =C2=A0& CCW_CMD_GET_REGIONS and CCW_CMD_MAP_REG= IONS support \\
> > > +\hline
> > > +4-n=C2=A0 =C2=A0 =C2=A0 &=C2=A0 =C2=A0 =C2=A0 =C2=A0 &a= mp;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0& reserved for later revisi= ons \\
> > >=C2=A0 \hline
> > >=C2=A0 \end{tabular}
> > >=C2=A0
> > > @@ -2449,6 +2453,72 @@ command. Some legacy devices will sup= port two-stage queue indicators, though,
> > >=C2=A0 and a driver will be able to successfully use CCW_CMD_= SET_IND_ADAPTER to set
> > >=C2=A0 them up.
> > >=C2=A0
> > > +\subsubsection{Handling Shared Memory Regions}\label{sec:Vi= rtio Transport Options / Virtio over channel I/O / Device Initialization / = Handling Shared Memory Regions}
> > > +
> > > +The CCW_CMD_GET_REGIONS command allows the driver to discov= er shared memory
> > > +regions provided by the device, if any.
> > > +
> > > +The driver provides a pointer to a 4096 byte buffer that is= filled out by
> > > +the device:
> > > +
> > > +\begin{lstlisting}
> > > +=C2=A0 struct shared_regions_info {
> > > +=C2=A0 =C2=A0 be64 num_regions;
> > > +=C2=A0 =C2=A0 struct shared_region_desc regions[];
> > > +=C2=A0 };
> > > +\end{lstlisting}
> > > +
> > > +The buffer contains 0 or more shared region descriptors, as= specified
> > > +by \field{num_regions}. If the devices does not provide sha= red regions,
> > > +\field{num_regions} is 0. Otherwise, the shared region desc= riptors have
> > > +the following format:
> > > +
> > > +\begin{lstlisting}
> > > +struct shared_region_desc {
> > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 be64 addr;
> > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 be64 len;
> > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 u8 id;
> > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 u8 pad[3];
> > > +}
> > > +\end{lstlisting}
> > > +
> > > +\field{addr} is the guest-physical address of the region wi= th a length of
> > > +\field{len}, identified by \field{id}. The contents of \fie= ld{pad} are
> > > +unpredictable, although it is recommended that the device f= ills in zeroes.
> > > +
> > > +To activate or deactivate a shared memory region, the devic= e uses the
> > > +CCW_CMD_MAP_REGIONS command. It takes the following payload= :
> > > +
> > > +\begin{lstlisting}
> > > +struct shared_region_ctrl {
> > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 u8 id;
> > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 u8 activate;
> > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 u8 pad[2];
> > > +}
> > > +\end{lstlisting}
> > > +
> > > +\field{id} denotes the shared memory region that is the tar= get of the command,
> > > +while \field{activate} specifies whether the region should = be activated (if 1)
> > > +or deactivated (if 0). When activated, the device makes the= guest-physical
> > > +address of the region available as a shared memory region.<= br> > > > +
> > > +\devicenormative{\paragraph}{Handling Shared Memory Regions= }{Virtio Transport Options / Virtio over channel I/O / Device Initializatio= n / Handling Shared Memory Regions}
> > > +
> > > +The device MUST reject the CCW_CMD_GET_REGIONS and CCW_CMD_= MAP_REGIONS
> > > +commands if not at least revision 3 has been negotiated. > > > +
> > > +The device MUST NOT read from or write to the region before= it has been
> > > +activated by the driver or after it has been deactivated by= the driver.
> > > +
> > > +If the driver reads from or writes to an address specified = to a region that is
> > > +not activated by the driver, it MUST treat this read or wri= te as a normal
> > > +read or write operation.
> > > +
> > > +\drivernormative{\paragraph}{Handling Shared Memory Regions= }{Virtio Transport Options / Virtio over channel I/O / Device Initializatio= n / Handling Shared Memory Regions}
> > > +
> > > +The driver MUST NOT treat the guest-physical address of a r= egion as a shared
> > > +memory region before it has activated it or after it has de= activated it.
> > > +
> > >=C2=A0 \subsection{Device Operation}\label{sec:Virtio Transpo= rt Options / Virtio over channel I/O / Device Operation}
> > >=C2=A0
> > >=C2=A0 \subsubsection{Host->Guest Notification}\label{sec:= Virtio Transport Options / Virtio over channel I/O / Device Operation / Hos= t->Guest Notification}=C2=A0
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org=
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org= /archives/virtio-comment/
> Feedback License: https://www.oasis-open= .org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-= open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/v= irtio/
> Join OASIS: https://www.oasis-open.org/join/
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org<= br> List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/arch= ives/virtio-comment/
Feedback License: https://www.oasis-open.org/= who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.= org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio= /
Join OASIS: https://www.oasis-open.org/join/

--000000000000c88c100581de310c--