All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
@ 2020-08-10 16:15 Michael S. Tsirkin
  2020-08-10 16:59 ` Cornelia Huck
  2022-03-29  8:33 ` [virtio-dev] " Stefan Hajnoczi
  0 siblings, 2 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2020-08-10 16:15 UTC (permalink / raw)
  To: virtio-comment, virtio-dev; +Cc: virtio

Devices that normally use buffers in order can
benefit from ability to temporarily switch to handle
some buffers out of order.

As a case in point, a networking device might handle
RX buffers in order normally. However, should
an access to an RX buffer cause a page fault
(e.g. when using PRI), the device could benefit from
ability to temporarily keep using following
buffers in the ring (possibly with higher overhead)
until the fault has been resolved.

Page faults allow more features such as THP, auto-NUMA,
live migration.

Out of order is of course already possible, however,
IN_ORDER is currently required for descriptor batching where
device marks a whole batch of buffers used in one go.

The idea behind this proposal is to relax that requirement,
allowing batching without asking device to be in orde rat all times,
as follows:

Device uses buffers in any order. Eventually when device detects that it
has used all previously outstanding buffers, it sets a FLUSH flag on the
last buffer used. If it set this flag on the last buffer used
previously, and now uses a batch of descriptors in-order, it can now
signal the last buffer used again setting the FLUSH flag.

Driver can detect in-order when it sees two FLUSH flags one after
another. In other respects the feature is similar to IN_ORDER
from the driver implementation POV.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 content.tex     |  9 ++++++++-
 packed-ring.tex | 23 +++++++++++++++++++++++
 split-ring.tex  | 26 ++++++++++++++++++++++++--
 3 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/content.tex b/content.tex
index 91735e3..8494eb6 100644
--- a/content.tex
+++ b/content.tex
@@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
 
 Some devices always use descriptors in the same order in which
 they have been made available. These devices can offer the
-VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
+VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
+descriptors in the same order in which they have been made
+available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
+feature. If one of the features VIRTIO_F_IN_ORDER or
+VIRTIO_F_PARTIAL_ORDER is negotiated, this knowledge
 might allow optimizations or simplify driver and/or device code.
 
 Each virtqueue can consist of up to 3 parts:
@@ -6132,6 +6136,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
   that the driver passes extra data (besides identifying the virtqueue)
   in its device notifications.
   See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
+  \item[VIRTIO_F_PARTIAL_ORDER(39)] This feature indicates
+  that device has ability to indicate use of (some of) buffers by the device in the same
+  order in which they have been made available.
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
diff --git a/packed-ring.tex b/packed-ring.tex
index ea92543..a120a19 100644
--- a/packed-ring.tex
+++ b/packed-ring.tex
@@ -284,6 +284,29 @@ \subsection{In-order use of descriptors}
 only writing out a single used descriptor with the Buffer ID
 corresponding to the last descriptor in the batch.
 
+Other devices sometimes use
+descriptors in the same order in which they have been made
+available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
+feature. If negotiated, whenever device has used all buffers
+since the previous used buffer in the same order
+in which they have been made available, device can set the
+VIRTQ_DESC_F_FLUSH flag in the used descriptor.
+\begin{lstlisting}
+#define VIRTQ_DESC_F_FLUSH      8
+\end{lstlisting}
+
+This knowledge allows
+devices to notify the use of a batch of buffers to the driver by
+only writing out a single used descriptor with the Buffer ID
+corresponding to the last descriptor in the batch,
+and VIRTQ_DESC_F_FLUSH set.
+
+Note that device is only allowed to batch buffers in this way
+if the previous used descriptor also has the VIRTQ_DESC_F_FLUSH
+flag set, as a result, considering the group of buffers
+used between two buffers with VIRTQ_DESC_F_FLUSH set,
+either all of them constitute a batch, or none at all.
+
 The device then skips forward in the ring according to the size of
 the batch. The driver needs to look up the used Buffer ID and
 calculate the batch size to be able to advance to where the next
diff --git a/split-ring.tex b/split-ring.tex
index 123ac9f..cf197f8 100644
--- a/split-ring.tex
+++ b/split-ring.tex
@@ -398,10 +398,11 @@ \subsection{The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Devi
         le16 avail_event; /* Only if VIRTIO_F_EVENT_IDX */
 };
 
-/* le32 is used here for ids for padding reasons. */
 struct virtq_used_elem {
         /* Index of start of used descriptor chain. */
-        le32 id;
+        le16 id;
+#define VIRTQ_USED_ELEM_F_FLUSH  0x8000
+        le16 flags;
         /* Total length of the descriptor chain which was used (written to) */
         le32 len;
 };
@@ -481,6 +482,27 @@ \subsection{In-order use of descriptors}
 corresponding to the head entry of the
 descriptor chain describing the last buffer in the batch.
 
+Other devices sometimes use
+descriptors in the same order in which they have been made
+available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
+feature. If negotiated, whenever device has used all buffers
+since the previous used buffer in the same order
+in which they have been made available, device can set the
+VIRTQ_USED_ELEM_F_FLUSH flag in the used ring entry.
+
+This knowledge allows
+devices to notify the use of a batch of buffers to the driver by
+only writing out single used ring entry with the \field{id}
+corresponding to the head entry of the
+descriptor chain describing the last buffer in the batch,
+and VIRTQ_USED_ELEM_F_FLUSH set.
+
+Note that device is only allowed to batch buffers in this way
+if the previous used ring entry also has the VIRTQ_USED_ELEM_F_FLUSH
+flag set, as a result, considering the group of buffers
+used between two buffers with VIRTQ_USED_ELEM_F_FLUSH set,
+either all of them constitute a batch, or none at all.
+
 The device then skips forward in the ring according to the size of
 the batch. Accordingly, it increments the used \field{idx} by the
 size of the batch.
-- 
MST


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that 
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2020-08-10 16:15 [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling Michael S. Tsirkin
@ 2020-08-10 16:59 ` Cornelia Huck
  2020-08-11  8:23   ` Michael S. Tsirkin
  2022-03-29  8:33 ` [virtio-dev] " Stefan Hajnoczi
  1 sibling, 1 reply; 17+ messages in thread
From: Cornelia Huck @ 2020-08-10 16:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtio-comment, virtio-dev, virtio

On Mon, 10 Aug 2020 12:15:15 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> Devices that normally use buffers in order can
> benefit from ability to temporarily switch to handle
> some buffers out of order.
> 
> As a case in point, a networking device might handle
> RX buffers in order normally. However, should
> an access to an RX buffer cause a page fault
> (e.g. when using PRI), the device could benefit from
> ability to temporarily keep using following
> buffers in the ring (possibly with higher overhead)
> until the fault has been resolved.
> 
> Page faults allow more features such as THP, auto-NUMA,
> live migration.
> 
> Out of order is of course already possible, however,
> IN_ORDER is currently required for descriptor batching where
> device marks a whole batch of buffers used in one go.
> 
> The idea behind this proposal is to relax that requirement,
> allowing batching without asking device to be in orde rat all times,
> as follows:
> 
> Device uses buffers in any order. Eventually when device detects that it
> has used all previously outstanding buffers, it sets a FLUSH flag on the
> last buffer used. If it set this flag on the last buffer used
> previously, and now uses a batch of descriptors in-order, it can now
> signal the last buffer used again setting the FLUSH flag.
> 
> Driver can detect in-order when it sees two FLUSH flags one after
> another. In other respects the feature is similar to IN_ORDER
> from the driver implementation POV.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  content.tex     |  9 ++++++++-
>  packed-ring.tex | 23 +++++++++++++++++++++++
>  split-ring.tex  | 26 ++++++++++++++++++++++++--
>  3 files changed, 55 insertions(+), 3 deletions(-)
> 
> diff --git a/content.tex b/content.tex
> index 91735e3..8494eb6 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
>  
>  Some devices always use descriptors in the same order in which
>  they have been made available. These devices can offer the
> -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
> +descriptors in the same order in which they have been made
> +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> +feature. If one of the features VIRTIO_F_IN_ORDER or
> +VIRTIO_F_PARTIAL_ORDER is negotiated, this knowledge

Do these two features conflict with each other? I.e., at most one of
them may be negotiated (or offered?) at a time?

>  might allow optimizations or simplify driver and/or device code.
>  
>  Each virtqueue can consist of up to 3 parts:
> @@ -6132,6 +6136,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    that the driver passes extra data (besides identifying the virtqueue)
>    in its device notifications.
>    See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
> +  \item[VIRTIO_F_PARTIAL_ORDER(39)] This feature indicates
> +  that device has ability to indicate use of (some of) buffers by the device in the same

"use of a subset of buffers" ?

> +  order in which they have been made available.
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> diff --git a/packed-ring.tex b/packed-ring.tex
> index ea92543..a120a19 100644
> --- a/packed-ring.tex
> +++ b/packed-ring.tex
> @@ -284,6 +284,29 @@ \subsection{In-order use of descriptors}
>  only writing out a single used descriptor with the Buffer ID
>  corresponding to the last descriptor in the batch.
>  
> +Other devices sometimes use
> +descriptors in the same order in which they have been made
> +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> +feature. If negotiated, whenever device has used all buffers

s/device/the device/

> +since the previous used buffer in the same order
> +in which they have been made available, device can set the

s/device/the device/

> +VIRTQ_DESC_F_FLUSH flag in the used descriptor.
> +\begin{lstlisting}
> +#define VIRTQ_DESC_F_FLUSH      8
> +\end{lstlisting}
> +
> +This knowledge allows
> +devices to notify the use of a batch of buffers to the driver by
> +only writing out a single used descriptor with the Buffer ID
> +corresponding to the last descriptor in the batch,
> +and VIRTQ_DESC_F_FLUSH set.
> +
> +Note that device is only allowed to batch buffers in this way

s/device/the device/

> +if the previous used descriptor also has the VIRTQ_DESC_F_FLUSH
> +flag set, as a result, considering the group of buffers

s/set, as/set. As/

> +used between two buffers with VIRTQ_DESC_F_FLUSH set,
> +either all of them constitute a batch, or none at all.
> +
>  The device then skips forward in the ring according to the size of
>  the batch. The driver needs to look up the used Buffer ID and
>  calculate the batch size to be able to advance to where the next
> diff --git a/split-ring.tex b/split-ring.tex
> index 123ac9f..cf197f8 100644
> --- a/split-ring.tex
> +++ b/split-ring.tex
> @@ -398,10 +398,11 @@ \subsection{The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Devi
>          le16 avail_event; /* Only if VIRTIO_F_EVENT_IDX */
>  };
>  
> -/* le32 is used here for ids for padding reasons. */
>  struct virtq_used_elem {
>          /* Index of start of used descriptor chain. */
> -        le32 id;
> +        le16 id;
> +#define VIRTQ_USED_ELEM_F_FLUSH  0x8000
> +        le16 flags;

This is a bit confusing for legacy, as this will only work for
little-endian. The flags field will only be used for non-legacy, but I
think the id field should still be treated as 32 bit for legacy.
(We have a wholesale "it's guest endian when legacy" further up.)

>          /* Total length of the descriptor chain which was used (written to) */
>          le32 len;
>  };
> @@ -481,6 +482,27 @@ \subsection{In-order use of descriptors}
>  corresponding to the head entry of the
>  descriptor chain describing the last buffer in the batch.
>  
> +Other devices sometimes use
> +descriptors in the same order in which they have been made
> +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> +feature. If negotiated, whenever device has used all buffers

s/device/the device/

> +since the previous used buffer in the same order
> +in which they have been made available, device can set the

s/device/the device/

> +VIRTQ_USED_ELEM_F_FLUSH flag in the used ring entry.
> +
> +This knowledge allows
> +devices to notify the use of a batch of buffers to the driver by
> +only writing out single used ring entry with the \field{id}

s/single/a single/

> +corresponding to the head entry of the
> +descriptor chain describing the last buffer in the batch,
> +and VIRTQ_USED_ELEM_F_FLUSH set.
> +
> +Note that device is only allowed to batch buffers in this way

s/device/the device/

> +if the previous used ring entry also has the VIRTQ_USED_ELEM_F_FLUSH
> +flag set, as a result, considering the group of buffers

s/set, as/set. As/

> +used between two buffers with VIRTQ_USED_ELEM_F_FLUSH set,
> +either all of them constitute a batch, or none at all.
> +
>  The device then skips forward in the ring according to the size of
>  the batch. Accordingly, it increments the used \field{idx} by the
>  size of the batch.


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that 
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2020-08-10 16:59 ` Cornelia Huck
@ 2020-08-11  8:23   ` Michael S. Tsirkin
  2020-08-11  8:39     ` Cornelia Huck
  2020-08-11 14:53     ` [virtio-comment] RE: [virtio-dev] " Lars Ganrot
  0 siblings, 2 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2020-08-11  8:23 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: virtio-comment, virtio-dev, virtio

On Mon, Aug 10, 2020 at 06:59:28PM +0200, Cornelia Huck wrote:
> On Mon, 10 Aug 2020 12:15:15 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > Devices that normally use buffers in order can
> > benefit from ability to temporarily switch to handle
> > some buffers out of order.
> > 
> > As a case in point, a networking device might handle
> > RX buffers in order normally. However, should
> > an access to an RX buffer cause a page fault
> > (e.g. when using PRI), the device could benefit from
> > ability to temporarily keep using following
> > buffers in the ring (possibly with higher overhead)
> > until the fault has been resolved.
> > 
> > Page faults allow more features such as THP, auto-NUMA,
> > live migration.
> > 
> > Out of order is of course already possible, however,
> > IN_ORDER is currently required for descriptor batching where
> > device marks a whole batch of buffers used in one go.
> > 
> > The idea behind this proposal is to relax that requirement,
> > allowing batching without asking device to be in orde rat all times,
> > as follows:
> > 
> > Device uses buffers in any order. Eventually when device detects that it
> > has used all previously outstanding buffers, it sets a FLUSH flag on the
> > last buffer used. If it set this flag on the last buffer used
> > previously, and now uses a batch of descriptors in-order, it can now
> > signal the last buffer used again setting the FLUSH flag.
> > 
> > Driver can detect in-order when it sees two FLUSH flags one after
> > another. In other respects the feature is similar to IN_ORDER
> > from the driver implementation POV.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  content.tex     |  9 ++++++++-
> >  packed-ring.tex | 23 +++++++++++++++++++++++
> >  split-ring.tex  | 26 ++++++++++++++++++++++++--
> >  3 files changed, 55 insertions(+), 3 deletions(-)
> > 
> > diff --git a/content.tex b/content.tex
> > index 91735e3..8494eb6 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
> >  
> >  Some devices always use descriptors in the same order in which
> >  they have been made available. These devices can offer the
> > -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> > +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
> > +descriptors in the same order in which they have been made
> > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > +feature. If one of the features VIRTIO_F_IN_ORDER or
> > +VIRTIO_F_PARTIAL_ORDER is negotiated, this knowledge
> 
> Do these two features conflict with each other? I.e., at most one of
> them may be negotiated (or offered?) at a time?

Good point. I think so, yes. Will document.

> >  might allow optimizations or simplify driver and/or device code.
> >  
> >  Each virtqueue can consist of up to 3 parts:
> > @@ -6132,6 +6136,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >    that the driver passes extra data (besides identifying the virtqueue)
> >    in its device notifications.
> >    See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
> > +  \item[VIRTIO_F_PARTIAL_ORDER(39)] This feature indicates
> > +  that device has ability to indicate use of (some of) buffers by the device in the same
> 
> "use of a subset of buffers" ?


That sounds better, thanks!

> > +  order in which they have been made available.
> >  \end{description}
> >  
> >  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > diff --git a/packed-ring.tex b/packed-ring.tex
> > index ea92543..a120a19 100644
> > --- a/packed-ring.tex
> > +++ b/packed-ring.tex
> > @@ -284,6 +284,29 @@ \subsection{In-order use of descriptors}
> >  only writing out a single used descriptor with the Buffer ID
> >  corresponding to the last descriptor in the batch.
> >  
> > +Other devices sometimes use
> > +descriptors in the same order in which they have been made
> > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > +feature. If negotiated, whenever device has used all buffers
> 
> s/device/the device/
> 
> > +since the previous used buffer in the same order
> > +in which they have been made available, device can set the
> 
> s/device/the device/
> 
> > +VIRTQ_DESC_F_FLUSH flag in the used descriptor.
> > +\begin{lstlisting}
> > +#define VIRTQ_DESC_F_FLUSH      8
> > +\end{lstlisting}
> > +
> > +This knowledge allows
> > +devices to notify the use of a batch of buffers to the driver by
> > +only writing out a single used descriptor with the Buffer ID
> > +corresponding to the last descriptor in the batch,
> > +and VIRTQ_DESC_F_FLUSH set.
> > +
> > +Note that device is only allowed to batch buffers in this way
> 
> s/device/the device/
> 
> > +if the previous used descriptor also has the VIRTQ_DESC_F_FLUSH
> > +flag set, as a result, considering the group of buffers
> 
> s/set, as/set. As/
> 
> > +used between two buffers with VIRTQ_DESC_F_FLUSH set,
> > +either all of them constitute a batch, or none at all.
> > +
> >  The device then skips forward in the ring according to the size of
> >  the batch. The driver needs to look up the used Buffer ID and
> >  calculate the batch size to be able to advance to where the next
> > diff --git a/split-ring.tex b/split-ring.tex
> > index 123ac9f..cf197f8 100644
> > --- a/split-ring.tex
> > +++ b/split-ring.tex
> > @@ -398,10 +398,11 @@ \subsection{The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Devi
> >          le16 avail_event; /* Only if VIRTIO_F_EVENT_IDX */
> >  };
> >  
> > -/* le32 is used here for ids for padding reasons. */
> >  struct virtq_used_elem {
> >          /* Index of start of used descriptor chain. */
> > -        le32 id;
> > +        le16 id;
> > +#define VIRTQ_USED_ELEM_F_FLUSH  0x8000
> > +        le16 flags;
> 
> This is a bit confusing for legacy, as this will only work for
> little-endian. The flags field will only be used for non-legacy, but I
> think the id field should still be treated as 32 bit for legacy.
> (We have a wholesale "it's guest endian when legacy" further up.)

True. How do you suggest handling this? Just say that id is
OR with VIRTQ_USED_ELEM_F_FLUSH ? Or maybe just add
struct virtq_used_elem_legacy?


> >          /* Total length of the descriptor chain which was used (written to) */
> >          le32 len;
> >  };
> > @@ -481,6 +482,27 @@ \subsection{In-order use of descriptors}
> >  corresponding to the head entry of the
> >  descriptor chain describing the last buffer in the batch.
> >  
> > +Other devices sometimes use
> > +descriptors in the same order in which they have been made
> > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > +feature. If negotiated, whenever device has used all buffers
> 
> s/device/the device/
> 
> > +since the previous used buffer in the same order
> > +in which they have been made available, device can set the
> 
> s/device/the device/
> 
> > +VIRTQ_USED_ELEM_F_FLUSH flag in the used ring entry.
> > +
> > +This knowledge allows
> > +devices to notify the use of a batch of buffers to the driver by
> > +only writing out single used ring entry with the \field{id}
> 
> s/single/a single/
> 
> > +corresponding to the head entry of the
> > +descriptor chain describing the last buffer in the batch,
> > +and VIRTQ_USED_ELEM_F_FLUSH set.
> > +
> > +Note that device is only allowed to batch buffers in this way
> 
> s/device/the device/
> 
> > +if the previous used ring entry also has the VIRTQ_USED_ELEM_F_FLUSH
> > +flag set, as a result, considering the group of buffers
> 
> s/set, as/set. As/
> 
> > +used between two buffers with VIRTQ_USED_ELEM_F_FLUSH set,
> > +either all of them constitute a batch, or none at all.
> > +
> >  The device then skips forward in the ring according to the size of
> >  the batch. Accordingly, it increments the used \field{idx} by the
> >  size of the batch.


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that 
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2020-08-11  8:23   ` Michael S. Tsirkin
@ 2020-08-11  8:39     ` Cornelia Huck
  2020-08-11 14:53     ` [virtio-comment] RE: [virtio-dev] " Lars Ganrot
  1 sibling, 0 replies; 17+ messages in thread
From: Cornelia Huck @ 2020-08-11  8:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtio-comment, virtio-dev, virtio

On Tue, 11 Aug 2020 04:23:13 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Aug 10, 2020 at 06:59:28PM +0200, Cornelia Huck wrote:
> > On Mon, 10 Aug 2020 12:15:15 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:

> > > diff --git a/split-ring.tex b/split-ring.tex
> > > index 123ac9f..cf197f8 100644
> > > --- a/split-ring.tex
> > > +++ b/split-ring.tex
> > > @@ -398,10 +398,11 @@ \subsection{The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Devi
> > >          le16 avail_event; /* Only if VIRTIO_F_EVENT_IDX */
> > >  };
> > >  
> > > -/* le32 is used here for ids for padding reasons. */
> > >  struct virtq_used_elem {
> > >          /* Index of start of used descriptor chain. */
> > > -        le32 id;
> > > +        le16 id;
> > > +#define VIRTQ_USED_ELEM_F_FLUSH  0x8000
> > > +        le16 flags;  
> > 
> > This is a bit confusing for legacy, as this will only work for
> > little-endian. The flags field will only be used for non-legacy, but I
> > think the id field should still be treated as 32 bit for legacy.
> > (We have a wholesale "it's guest endian when legacy" further up.)  
> 
> True. How do you suggest handling this? Just say that id is
> OR with VIRTQ_USED_ELEM_F_FLUSH ? Or maybe just add
> struct virtq_used_elem_legacy?

I think introducing a legacy struct is the least confusing option.

> 
> 
> > >          /* Total length of the descriptor chain which was used (written to) */
> > >          le32 len;
> > >  };


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that 
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [virtio-comment] RE: [virtio-dev] Re: [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2020-08-11  8:23   ` Michael S. Tsirkin
  2020-08-11  8:39     ` Cornelia Huck
@ 2020-08-11 14:53     ` Lars Ganrot
  2020-08-11 15:43       ` Lars Ganrot
  2020-08-13 20:45       ` [virtio] " Michael S. Tsirkin
  1 sibling, 2 replies; 17+ messages in thread
From: Lars Ganrot @ 2020-08-11 14:53 UTC (permalink / raw)
  To: Michael S. Tsirkin, Cornelia Huck; +Cc: virtio-comment, virtio-dev, virtio

Hi Michael,

Short comment below.

BR,

-Lars

> From: virtio-dev@lists.oasis-open.org <virtio-dev@lists.oasis-open.org> On
> Behalf Of Michael S. Tsirkin
> Sent: 11. august 2020 10:23
> 
> On Mon, Aug 10, 2020 at 06:59:28PM +0200, Cornelia Huck wrote:
> > On Mon, 10 Aug 2020 12:15:15 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> > > Devices that normally use buffers in order can benefit from ability
> > > to temporarily switch to handle some buffers out of order.
> > >
> > > As a case in point, a networking device might handle RX buffers in
> > > order normally. However, should an access to an RX buffer cause a
> > > page fault (e.g. when using PRI), the device could benefit from
> > > ability to temporarily keep using following buffers in the ring
> > > (possibly with higher overhead) until the fault has been resolved.
> > >
> > > Page faults allow more features such as THP, auto-NUMA, live
> > > migration.
> > >
> > > Out of order is of course already possible, however, IN_ORDER is
> > > currently required for descriptor batching where device marks a
> > > whole batch of buffers used in one go.
> > >
> > > The idea behind this proposal is to relax that requirement, allowing
> > > batching without asking device to be in orde rat all times, as
> > > follows:
> > >
> > > Device uses buffers in any order. Eventually when device detects
> > > that it has used all previously outstanding buffers, it sets a FLUSH
> > > flag on the last buffer used. If it set this flag on the last buffer
> > > used previously, and now uses a batch of descriptors in-order, it
> > > can now signal the last buffer used again setting the FLUSH flag.
> > >
> > > Driver can detect in-order when it sees two FLUSH flags one after
> > > another. In other respects the feature is similar to IN_ORDER from
> > > the driver implementation POV.
> > >
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  content.tex     |  9 ++++++++-
> > >  packed-ring.tex | 23 +++++++++++++++++++++++  split-ring.tex  | 26
> > > ++++++++++++++++++++++++--
> > >  3 files changed, 55 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/content.tex b/content.tex index 91735e3..8494eb6 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic Facilities
> > > of a Virtio Device / Virtqueues}
> > >
> > >  Some devices always use descriptors in the same order in which
> > > they have been made available. These devices can offer the
> > > -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> > > +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use descriptors
> > > +in the same order in which they have been made available. These
> > > +devices can offer the VIRTIO_F_PARTIAL_ORDER feature. If one of the
> > > +features VIRTIO_F_IN_ORDER or VIRTIO_F_PARTIAL_ORDER is
> negotiated,
> > > +this knowledge
> >
> > Do these two features conflict with each other? I.e., at most one of
> > them may be negotiated (or offered?) at a time?
> 
> Good point. I think so, yes. Will document.

Isn't it more natural to think of VIRTIO_F_IN_ORDER as the simple case which always maintains ordered access, while the new feature flag allows active control of when descriptors are ordered and when not? To make it backward compatible let VIRTIO_F_IN_ORDER imply the new bit is set, while the new bit set by itself without VIRTIO_F_IN_ORDER set means only active control is offered. I guess a name like VIRTIO_F_CTRL_ORDER would be more appropriate with this interpretation.



This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [virtio-comment] RE: [virtio-dev] Re: [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2020-08-11 14:53     ` [virtio-comment] RE: [virtio-dev] " Lars Ganrot
@ 2020-08-11 15:43       ` Lars Ganrot
  2020-08-12 12:50         ` [virtio] " Cornelia Huck
  2020-08-13 20:45       ` [virtio] " Michael S. Tsirkin
  1 sibling, 1 reply; 17+ messages in thread
From: Lars Ganrot @ 2020-08-11 15:43 UTC (permalink / raw)
  To: Lars Ganrot, Michael S. Tsirkin, Cornelia Huck
  Cc: virtio-comment, virtio-dev, virtio

> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Lars Ganrot
> Sent: 11. august 2020 16:54
> 
> > From: virtio-dev@lists.oasis-open.org
> > <virtio-dev@lists.oasis-open.org> On Behalf Of Michael S. Tsirkin
> > Sent: 11. august 2020 10:23
> >
> > On Mon, Aug 10, 2020 at 06:59:28PM +0200, Cornelia Huck wrote:
> > > On Mon, 10 Aug 2020 12:15:15 -0400
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >
> > > > Devices that normally use buffers in order can benefit from
> > > > ability to temporarily switch to handle some buffers out of order.
> > > >
> > > > As a case in point, a networking device might handle RX buffers in
> > > > order normally. However, should an access to an RX buffer cause a
> > > > page fault (e.g. when using PRI), the device could benefit from
> > > > ability to temporarily keep using following buffers in the ring
> > > > (possibly with higher overhead) until the fault has been resolved.
> > > >
> > > > Page faults allow more features such as THP, auto-NUMA, live
> > > > migration.
> > > >
> > > > Out of order is of course already possible, however, IN_ORDER is
> > > > currently required for descriptor batching where device marks a
> > > > whole batch of buffers used in one go.
> > > >
> > > > The idea behind this proposal is to relax that requirement,
> > > > allowing batching without asking device to be in orde rat all
> > > > times, as
> > > > follows:
> > > >
> > > > Device uses buffers in any order. Eventually when device detects
> > > > that it has used all previously outstanding buffers, it sets a
> > > > FLUSH flag on the last buffer used. If it set this flag on the
> > > > last buffer used previously, and now uses a batch of descriptors
> > > > in-order, it can now signal the last buffer used again setting the FLUSH
> flag.
> > > >
> > > > Driver can detect in-order when it sees two FLUSH flags one after
> > > > another. In other respects the feature is similar to IN_ORDER from
> > > > the driver implementation POV.
> > > >
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > >  content.tex     |  9 ++++++++-
> > > >  packed-ring.tex | 23 +++++++++++++++++++++++  split-ring.tex  |
> > > > 26
> > > > ++++++++++++++++++++++++--
> > > >  3 files changed, 55 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/content.tex b/content.tex index 91735e3..8494eb6
> > > > 100644
> > > > --- a/content.tex
> > > > +++ b/content.tex
> > > > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic
> > > > Facilities of a Virtio Device / Virtqueues}
> > > >
> > > >  Some devices always use descriptors in the same order in which
> > > > they have been made available. These devices can offer the
> > > > -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> > > > +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
> > > > +descriptors in the same order in which they have been made
> > > > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > > > +feature. If one of the features VIRTIO_F_IN_ORDER or
> > > > +VIRTIO_F_PARTIAL_ORDER is
> > negotiated,
> > > > +this knowledge
> > >
> > > Do these two features conflict with each other? I.e., at most one of
> > > them may be negotiated (or offered?) at a time?
> >
> > Good point. I think so, yes. Will document.
> 
> Isn't it more natural to think of VIRTIO_F_IN_ORDER as the simple case which
> always maintains ordered access, while the new feature flag allows active
> control of when descriptors are ordered and when not? To make it backward
> compatible let VIRTIO_F_IN_ORDER imply the new bit is set, while the new bit
> set by itself without VIRTIO_F_IN_ORDER set means only active control is
> offered. I guess a name like VIRTIO_F_CTRL_ORDER would be more
> appropriate with this interpretation.
> 

On second thought that might be a bit backwards - how about:

Legacy case: VIRTIO_F_IN_ORDER==0/1 + VIRTIO_F_ORDER_RELAX==0
This proposal: VIRTIO_F_IN_ORDER==1 + VIRTIO_F_ORDER_RELAX==1
Potential future use: VIRTIO_F_???_ORDER==1 + VIRTIO_F_ORDER_RELAX==0/1

> 
> 
> This publicly archived list offers a means to provide input to theOASIS Virtual
> I/O Device (VIRTIO) TC.In order to verify user consent to the Feedback License
> terms andto minimize spam in the list archive, subscription is requiredbefore
> posting.Subscribe: virtio-comment-subscribe@lists.oasis-
> open.orgUnsubscribe: virtio-comment-unsubscribe@lists.oasis-open.orgList
> help: virtio-comment-help@lists.oasis-open.orgList archive: https://lists.oasis-
> open.org/archives/virtio-comment/Feedback License: https://www.oasis-
> open.org/who/ipr/feedback_license.pdfList Guidelines: https://www.oasis-
> open.org/policies-guidelines/mailing-listsCommittee: https://www.oasis-
> open.org/committees/virtio/Join OASIS: https://www.oasis-open.org/join/


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [virtio] Re: [virtio-dev] Re: [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2020-08-11 15:43       ` Lars Ganrot
@ 2020-08-12 12:50         ` Cornelia Huck
  2020-08-12 15:55           ` [virtio-comment] " Lars Ganrot
  0 siblings, 1 reply; 17+ messages in thread
From: Cornelia Huck @ 2020-08-12 12:50 UTC (permalink / raw)
  To: Lars Ganrot; +Cc: Michael S. Tsirkin, virtio-comment, virtio-dev, virtio

On Tue, 11 Aug 2020 15:43:44 +0000
Lars Ganrot <lga@napatech.com> wrote:

> > From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-  
> > open.org> On Behalf Of Lars Ganrot  
> > Sent: 11. august 2020 16:54
> >  
> > > From: virtio-dev@lists.oasis-open.org
> > > <virtio-dev@lists.oasis-open.org> On Behalf Of Michael S. Tsirkin
> > > Sent: 11. august 2020 10:23
> > >
> > > On Mon, Aug 10, 2020 at 06:59:28PM +0200, Cornelia Huck wrote:  
> > > > On Mon, 10 Aug 2020 12:15:15 -0400
> > > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > >  
> > > > > Devices that normally use buffers in order can benefit from
> > > > > ability to temporarily switch to handle some buffers out of order.
> > > > >
> > > > > As a case in point, a networking device might handle RX buffers in
> > > > > order normally. However, should an access to an RX buffer cause a
> > > > > page fault (e.g. when using PRI), the device could benefit from
> > > > > ability to temporarily keep using following buffers in the ring
> > > > > (possibly with higher overhead) until the fault has been resolved.
> > > > >
> > > > > Page faults allow more features such as THP, auto-NUMA, live
> > > > > migration.
> > > > >
> > > > > Out of order is of course already possible, however, IN_ORDER is
> > > > > currently required for descriptor batching where device marks a
> > > > > whole batch of buffers used in one go.
> > > > >
> > > > > The idea behind this proposal is to relax that requirement,
> > > > > allowing batching without asking device to be in orde rat all
> > > > > times, as
> > > > > follows:
> > > > >
> > > > > Device uses buffers in any order. Eventually when device detects
> > > > > that it has used all previously outstanding buffers, it sets a
> > > > > FLUSH flag on the last buffer used. If it set this flag on the
> > > > > last buffer used previously, and now uses a batch of descriptors
> > > > > in-order, it can now signal the last buffer used again setting the FLUSH  
> > flag.  
> > > > >
> > > > > Driver can detect in-order when it sees two FLUSH flags one after
> > > > > another. In other respects the feature is similar to IN_ORDER from
> > > > > the driver implementation POV.
> > > > >
> > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > ---
> > > > >  content.tex     |  9 ++++++++-
> > > > >  packed-ring.tex | 23 +++++++++++++++++++++++  split-ring.tex  |
> > > > > 26
> > > > > ++++++++++++++++++++++++--
> > > > >  3 files changed, 55 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/content.tex b/content.tex index 91735e3..8494eb6
> > > > > 100644
> > > > > --- a/content.tex
> > > > > +++ b/content.tex
> > > > > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic
> > > > > Facilities of a Virtio Device / Virtqueues}
> > > > >
> > > > >  Some devices always use descriptors in the same order in which
> > > > > they have been made available. These devices can offer the
> > > > > -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> > > > > +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
> > > > > +descriptors in the same order in which they have been made
> > > > > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > > > > +feature. If one of the features VIRTIO_F_IN_ORDER or
> > > > > +VIRTIO_F_PARTIAL_ORDER is  
> > > negotiated,  
> > > > > +this knowledge  
> > > >
> > > > Do these two features conflict with each other? I.e., at most one of
> > > > them may be negotiated (or offered?) at a time?  
> > >
> > > Good point. I think so, yes. Will document.  
> >
> > Isn't it more natural to think of VIRTIO_F_IN_ORDER as the simple case which
> > always maintains ordered access, while the new feature flag allows active
> > control of when descriptors are ordered and when not? To make it backward
> > compatible let VIRTIO_F_IN_ORDER imply the new bit is set, while the new bit
> > set by itself without VIRTIO_F_IN_ORDER set means only active control is
> > offered. I guess a name like VIRTIO_F_CTRL_ORDER would be more
> > appropriate with this interpretation.
> >  
> 
> On second thought that might be a bit backwards - how about:
> 
> Legacy case: VIRTIO_F_IN_ORDER==0/1 + VIRTIO_F_ORDER_RELAX==0
> This proposal: VIRTIO_F_IN_ORDER==1 + VIRTIO_F_ORDER_RELAX==1
> Potential future use: VIRTIO_F_???_ORDER==1 + VIRTIO_F_ORDER_RELAX==0/1

What happens in the new device/old driver case?
- device offers IN_ORDER and PARTIAL_ORDER
- driver does not know PARTIAL_ORDER, accepts IN_ORDER
- device now only can do complete ordering

Maybe I don't understand the purpose of the new feature correctly, but
I thought it was for those devices that don't do full in-order, but can
do it for a subset of buffers? As such, the two features can't really
imply each other: a device offering IN_ORDER might not know about the
new feature and its mechanism, and a device offering the new feature,
but not IN_ORDER probably does so because it cannot support full
IN_ORDER.

I think it makes the most sense if the device can offer both flags, but
the driver must only accept at most one of them?


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that 
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [virtio-comment] Re: [virtio-dev] Re: [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2020-08-12 12:50         ` [virtio] " Cornelia Huck
@ 2020-08-12 15:55           ` Lars Ganrot
  2020-08-13 23:17             ` [virtio] " Michael S. Tsirkin
  0 siblings, 1 reply; 17+ messages in thread
From: Lars Ganrot @ 2020-08-12 15:55 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Michael S. Tsirkin, virtio-comment, virtio-dev, virtio

> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Cornelia Huck
> Sent: 12. august 2020 14:51
> 
> On Tue, 11 Aug 2020 15:43:44 +0000
> Lars Ganrot <lga@napatech.com> wrote:
> 
> > > From: virtio-comment@lists.oasis-open.org
> > > <virtio-comment@lists.oasis- open.org> On Behalf Of Lars Ganrot
> > > Sent: 11. august 2020 16:54
> > >
> > > > From: virtio-dev@lists.oasis-open.org
> > > > <virtio-dev@lists.oasis-open.org> On Behalf Of Michael S. Tsirkin
> > > > Sent: 11. august 2020 10:23
> > > >
> > > > On Mon, Aug 10, 2020 at 06:59:28PM +0200, Cornelia Huck wrote:
> > > > > On Mon, 10 Aug 2020 12:15:15 -0400 "Michael S. Tsirkin"
> > > > > <mst@redhat.com> wrote:
> > > > >
> > > > > > Devices that normally use buffers in order can benefit from
> > > > > > ability to temporarily switch to handle some buffers out of order.
> > > > > >
> > > > > > As a case in point, a networking device might handle RX
> > > > > > buffers in order normally. However, should an access to an RX
> > > > > > buffer cause a page fault (e.g. when using PRI), the device
> > > > > > could benefit from ability to temporarily keep using following
> > > > > > buffers in the ring (possibly with higher overhead) until the fault has
> been resolved.
> > > > > >
> > > > > > Page faults allow more features such as THP, auto-NUMA, live
> > > > > > migration.
> > > > > >
> > > > > > Out of order is of course already possible, however, IN_ORDER
> > > > > > is currently required for descriptor batching where device
> > > > > > marks a whole batch of buffers used in one go.
> > > > > >
> > > > > > The idea behind this proposal is to relax that requirement,
> > > > > > allowing batching without asking device to be in orde rat all
> > > > > > times, as
> > > > > > follows:
> > > > > >
> > > > > > Device uses buffers in any order. Eventually when device
> > > > > > detects that it has used all previously outstanding buffers,
> > > > > > it sets a FLUSH flag on the last buffer used. If it set this
> > > > > > flag on the last buffer used previously, and now uses a batch
> > > > > > of descriptors in-order, it can now signal the last buffer
> > > > > > used again setting the FLUSH
> > > flag.
> > > > > >
> > > > > > Driver can detect in-order when it sees two FLUSH flags one
> > > > > > after another. In other respects the feature is similar to
> > > > > > IN_ORDER from the driver implementation POV.
> > > > > >
> > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > ---
> > > > > >  content.tex     |  9 ++++++++-
> > > > > >  packed-ring.tex | 23 +++++++++++++++++++++++  split-ring.tex
> > > > > > |
> > > > > > 26
> > > > > > ++++++++++++++++++++++++--
> > > > > >  3 files changed, 55 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/content.tex b/content.tex index 91735e3..8494eb6
> > > > > > 100644
> > > > > > --- a/content.tex
> > > > > > +++ b/content.tex
> > > > > > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic
> > > > > > Facilities of a Virtio Device / Virtqueues}
> > > > > >
> > > > > >  Some devices always use descriptors in the same order in
> > > > > > which they have been made available. These devices can offer
> > > > > > the -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> > > > > > +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
> > > > > > +descriptors in the same order in which they have been made
> > > > > > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > > > > > +feature. If one of the features VIRTIO_F_IN_ORDER or
> > > > > > +VIRTIO_F_PARTIAL_ORDER is
> > > > negotiated,
> > > > > > +this knowledge
> > > > >
> > > > > Do these two features conflict with each other? I.e., at most
> > > > > one of them may be negotiated (or offered?) at a time?
> > > >
> > > > Good point. I think so, yes. Will document.
> > >
> > > Isn't it more natural to think of VIRTIO_F_IN_ORDER as the simple
> > > case which always maintains ordered access, while the new feature
> > > flag allows active control of when descriptors are ordered and when
> > > not? To make it backward compatible let VIRTIO_F_IN_ORDER imply the
> > > new bit is set, while the new bit set by itself without
> > > VIRTIO_F_IN_ORDER set means only active control is offered. I guess
> > > a name like VIRTIO_F_CTRL_ORDER would be more appropriate with this
> interpretation.
> > >
> >
> > On second thought that might be a bit backwards - how about:
> >
> > Legacy case: VIRTIO_F_IN_ORDER==0/1 + VIRTIO_F_ORDER_RELAX==0 This
> > proposal: VIRTIO_F_IN_ORDER==1 + VIRTIO_F_ORDER_RELAX==1 Potential
> > future use: VIRTIO_F_???_ORDER==1 + VIRTIO_F_ORDER_RELAX==0/1
> 
> What happens in the new device/old driver case?
> - device offers IN_ORDER and PARTIAL_ORDER
> - driver does not know PARTIAL_ORDER, accepts IN_ORDER
> - device now only can do complete ordering
> 
> Maybe I don't understand the purpose of the new feature correctly, but I
> thought it was for those devices that don't do full in-order, but can do it for a
> subset of buffers? As such, the two features can't really imply each other: a
> device offering IN_ORDER might not know about the new feature and its
> mechanism, and a device offering the new feature, but not IN_ORDER
> probably does so because it cannot support full IN_ORDER.
> 
> I think it makes the most sense if the device can offer both flags, but the
> driver must only accept at most one of them?
> 

Yes, you are absolutely right. Keeping them as two mutually exclusive options is probably best.

Another aspect of the proposal is that the functionality is only described for the packed ring layout, but the concept should be equally applicable to the split ring. The split ring does not have flags for each used ring element, so the FLUSH flag functionality needs to be a bit different. One way is to add an IN_ORDER-flag to the virtq_used.flags, that allows the device to signal when it starts to adhere to IN_ORDER buffer use. The driver acknowledges this when it is ready to fulfil its part of the IN_ORDER restrictions by setting a new virtq_avail.flag: IN_ORDER_ACK. At this point both device and driver can assume IN_ORDER optimization until the device clears IN_ORDER. After clearing IN_ORDER, the device must wait until IN_ORDER_ACK is cleared before IN_ORDER can be asserted it again.

This is a little less fine-grained than the packed-ring solution with the FLUSH flag, but for infrequent scenarios like migration it should be OK.

> 
> This publicly archived list offers a means to provide input to the OASIS Virtual
> I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and to minimize
> spam in the list archive, subscription is required before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-
> open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [virtio] Re: [virtio-dev] Re: [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2020-08-11 14:53     ` [virtio-comment] RE: [virtio-dev] " Lars Ganrot
  2020-08-11 15:43       ` Lars Ganrot
@ 2020-08-13 20:45       ` Michael S. Tsirkin
  1 sibling, 0 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2020-08-13 20:45 UTC (permalink / raw)
  To: Lars Ganrot; +Cc: Cornelia Huck, virtio-comment, virtio-dev, virtio

On Tue, Aug 11, 2020 at 02:53:39PM +0000, Lars Ganrot wrote:
> Hi Michael,
> 
> Short comment below.
> 
> BR,
> 
> -Lars
> 
> > From: virtio-dev@lists.oasis-open.org <virtio-dev@lists.oasis-open.org> On
> > Behalf Of Michael S. Tsirkin
> > Sent: 11. august 2020 10:23
> >
> > On Mon, Aug 10, 2020 at 06:59:28PM +0200, Cornelia Huck wrote:
> > > On Mon, 10 Aug 2020 12:15:15 -0400
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >
> > > > Devices that normally use buffers in order can benefit from ability
> > > > to temporarily switch to handle some buffers out of order.
> > > >
> > > > As a case in point, a networking device might handle RX buffers in
> > > > order normally. However, should an access to an RX buffer cause a
> > > > page fault (e.g. when using PRI), the device could benefit from
> > > > ability to temporarily keep using following buffers in the ring
> > > > (possibly with higher overhead) until the fault has been resolved.
> > > >
> > > > Page faults allow more features such as THP, auto-NUMA, live
> > > > migration.
> > > >
> > > > Out of order is of course already possible, however, IN_ORDER is
> > > > currently required for descriptor batching where device marks a
> > > > whole batch of buffers used in one go.
> > > >
> > > > The idea behind this proposal is to relax that requirement, allowing
> > > > batching without asking device to be in orde rat all times, as
> > > > follows:
> > > >
> > > > Device uses buffers in any order. Eventually when device detects
> > > > that it has used all previously outstanding buffers, it sets a FLUSH
> > > > flag on the last buffer used. If it set this flag on the last buffer
> > > > used previously, and now uses a batch of descriptors in-order, it
> > > > can now signal the last buffer used again setting the FLUSH flag.
> > > >
> > > > Driver can detect in-order when it sees two FLUSH flags one after
> > > > another. In other respects the feature is similar to IN_ORDER from
> > > > the driver implementation POV.
> > > >
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > >  content.tex     |  9 ++++++++-
> > > >  packed-ring.tex | 23 +++++++++++++++++++++++  split-ring.tex  | 26
> > > > ++++++++++++++++++++++++--
> > > >  3 files changed, 55 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/content.tex b/content.tex index 91735e3..8494eb6 100644
> > > > --- a/content.tex
> > > > +++ b/content.tex
> > > > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic Facilities
> > > > of a Virtio Device / Virtqueues}
> > > >
> > > >  Some devices always use descriptors in the same order in which
> > > > they have been made available. These devices can offer the
> > > > -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> > > > +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use descriptors
> > > > +in the same order in which they have been made available. These
> > > > +devices can offer the VIRTIO_F_PARTIAL_ORDER feature. If one of the
> > > > +features VIRTIO_F_IN_ORDER or VIRTIO_F_PARTIAL_ORDER is
> > negotiated,
> > > > +this knowledge
> > >
> > > Do these two features conflict with each other? I.e., at most one of
> > > them may be negotiated (or offered?) at a time?
> >
> > Good point. I think so, yes. Will document.
> 
> Isn't it more natural to think of VIRTIO_F_IN_ORDER as the simple case
> which always maintains ordered access, while the new feature flag
> allows active control of when descriptors are ordered and when not? To
> make it backward compatible let VIRTIO_F_IN_ORDER imply the new bit is
> set, while the new bit set by itself without VIRTIO_F_IN_ORDER set
> means only active control is offered. I guess a name like
> VIRTIO_F_CTRL_ORDER would be more appropriate with this
> interpretation.


So how does the behaviour with both VIRTIO_F_IN_ORDER with VIRTIO_F_PARTIAL_ORDER
differ from just VIRTIO_F_IN_ORDER? If all buffers are used in order,
then what is the extra meaning of FLUSH?



> 
> Disclaimer: This email and any files transmitted with it may contain
> confidential information intended for the addressee(s) only. The
> information is not to be surrendered or copied to unauthorized
> persons. If you have received this communication in error, please
> notify the sender immediately and delete this e-mail from your system.

I think we are supposed to disregard this part as irrlevant, but just a
reminder that you agreed that all material posted to the
virtio mailing lists is governed by the relevant IPR rules.
It is your responsibility not to post confidential information there.

-- 
MST


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that 
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [virtio] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2020-08-12 15:55           ` [virtio-comment] " Lars Ganrot
@ 2020-08-13 23:17             ` Michael S. Tsirkin
  2020-08-17  8:11               ` Lars Ganrot
  0 siblings, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2020-08-13 23:17 UTC (permalink / raw)
  To: Lars Ganrot; +Cc: Cornelia Huck, virtio-comment, virtio-dev, virtio

On Wed, Aug 12, 2020 at 03:55:18PM +0000, Lars Ganrot wrote:
> > From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> > open.org> On Behalf Of Cornelia Huck
> > Sent: 12. august 2020 14:51
> >
> > On Tue, 11 Aug 2020 15:43:44 +0000
> > Lars Ganrot <lga@napatech.com> wrote:
> >
> > > > From: virtio-comment@lists.oasis-open.org
> > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Lars Ganrot
> > > > Sent: 11. august 2020 16:54
> > > >
> > > > > From: virtio-dev@lists.oasis-open.org
> > > > > <virtio-dev@lists.oasis-open.org> On Behalf Of Michael S. Tsirkin
> > > > > Sent: 11. august 2020 10:23
> > > > >
> > > > > On Mon, Aug 10, 2020 at 06:59:28PM +0200, Cornelia Huck wrote:
> > > > > > On Mon, 10 Aug 2020 12:15:15 -0400 "Michael S. Tsirkin"
> > > > > > <mst@redhat.com> wrote:
> > > > > >
> > > > > > > Devices that normally use buffers in order can benefit from
> > > > > > > ability to temporarily switch to handle some buffers out of order.
> > > > > > >
> > > > > > > As a case in point, a networking device might handle RX
> > > > > > > buffers in order normally. However, should an access to an RX
> > > > > > > buffer cause a page fault (e.g. when using PRI), the device
> > > > > > > could benefit from ability to temporarily keep using following
> > > > > > > buffers in the ring (possibly with higher overhead) until the fault has
> > been resolved.
> > > > > > >
> > > > > > > Page faults allow more features such as THP, auto-NUMA, live
> > > > > > > migration.
> > > > > > >
> > > > > > > Out of order is of course already possible, however, IN_ORDER
> > > > > > > is currently required for descriptor batching where device
> > > > > > > marks a whole batch of buffers used in one go.
> > > > > > >
> > > > > > > The idea behind this proposal is to relax that requirement,
> > > > > > > allowing batching without asking device to be in orde rat all
> > > > > > > times, as
> > > > > > > follows:
> > > > > > >
> > > > > > > Device uses buffers in any order. Eventually when device
> > > > > > > detects that it has used all previously outstanding buffers,
> > > > > > > it sets a FLUSH flag on the last buffer used. If it set this
> > > > > > > flag on the last buffer used previously, and now uses a batch
> > > > > > > of descriptors in-order, it can now signal the last buffer
> > > > > > > used again setting the FLUSH
> > > > flag.
> > > > > > >
> > > > > > > Driver can detect in-order when it sees two FLUSH flags one
> > > > > > > after another. In other respects the feature is similar to
> > > > > > > IN_ORDER from the driver implementation POV.
> > > > > > >
> > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > ---
> > > > > > >  content.tex     |  9 ++++++++-
> > > > > > >  packed-ring.tex | 23 +++++++++++++++++++++++  split-ring.tex
> > > > > > > |
> > > > > > > 26
> > > > > > > ++++++++++++++++++++++++--
> > > > > > >  3 files changed, 55 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/content.tex b/content.tex index 91735e3..8494eb6
> > > > > > > 100644
> > > > > > > --- a/content.tex
> > > > > > > +++ b/content.tex
> > > > > > > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic
> > > > > > > Facilities of a Virtio Device / Virtqueues}
> > > > > > >
> > > > > > >  Some devices always use descriptors in the same order in
> > > > > > > which they have been made available. These devices can offer
> > > > > > > the -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> > > > > > > +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
> > > > > > > +descriptors in the same order in which they have been made
> > > > > > > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > > > > > > +feature. If one of the features VIRTIO_F_IN_ORDER or
> > > > > > > +VIRTIO_F_PARTIAL_ORDER is
> > > > > negotiated,
> > > > > > > +this knowledge
> > > > > >
> > > > > > Do these two features conflict with each other? I.e., at most
> > > > > > one of them may be negotiated (or offered?) at a time?
> > > > >
> > > > > Good point. I think so, yes. Will document.
> > > >
> > > > Isn't it more natural to think of VIRTIO_F_IN_ORDER as the simple
> > > > case which always maintains ordered access, while the new feature
> > > > flag allows active control of when descriptors are ordered and when
> > > > not? To make it backward compatible let VIRTIO_F_IN_ORDER imply the
> > > > new bit is set, while the new bit set by itself without
> > > > VIRTIO_F_IN_ORDER set means only active control is offered. I guess
> > > > a name like VIRTIO_F_CTRL_ORDER would be more appropriate with this
> > interpretation.
> > > >
> > >
> > > On second thought that might be a bit backwards - how about:
> > >
> > > Legacy case: VIRTIO_F_IN_ORDER==0/1 + VIRTIO_F_ORDER_RELAX==0 This
> > > proposal: VIRTIO_F_IN_ORDER==1 + VIRTIO_F_ORDER_RELAX==1 Potential
> > > future use: VIRTIO_F_???_ORDER==1 + VIRTIO_F_ORDER_RELAX==0/1
> >
> > What happens in the new device/old driver case?
> > - device offers IN_ORDER and PARTIAL_ORDER
> > - driver does not know PARTIAL_ORDER, accepts IN_ORDER
> > - device now only can do complete ordering
> >
> > Maybe I don't understand the purpose of the new feature correctly, but I
> > thought it was for those devices that don't do full in-order, but can do it for a
> > subset of buffers? As such, the two features can't really imply each other: a
> > device offering IN_ORDER might not know about the new feature and its
> > mechanism, and a device offering the new feature, but not IN_ORDER
> > probably does so because it cannot support full IN_ORDER.
> >
> > I think it makes the most sense if the device can offer both flags, but the
> > driver must only accept at most one of them?
> >
> 
> Yes, you are absolutely right. Keeping them as two mutually exclusive options is probably best.
> 
> Another aspect of the proposal is that the functionality is only
> described for the packed ring layout, but the concept should be
> equally applicable to the split ring.

Pls take a look at the changes in split-ring.tex
Are they unclear?

> The split ring does not have
> flags for each used ring element, so the FLUSH flag functionality
> needs to be a bit different. One way is to add an IN_ORDER-flag to the
> virtq_used.flags, that allows the device to signal when it starts to
> adhere to IN_ORDER buffer use. The driver acknowledges this when it is
> ready to fulfil its part of the IN_ORDER restrictions by setting a new
> virtq_avail.flag: IN_ORDER_ACK. At this point both device and driver
> can assume IN_ORDER optimization until the device clears IN_ORDER.
> After clearing IN_ORDER, the device must wait until IN_ORDER_ACK is
> cleared before IN_ORDER can be asserted it again.

It would be tricky to figure out which buffers does the
handshake apply to. My proposal does it a bit differently, pls take
a look.


> 
> This is a little less fine-grained than the packed-ring solution with the FLUSH flag, but for infrequent scenarios like migration it should be OK.
> 
> >
> > This publicly archived list offers a means to provide input to the OASIS Virtual
> > I/O Device (VIRTIO) TC.
> >
> > In order to verify user consent to the Feedback License terms and to minimize
> > spam in the list archive, subscription is required before posting.
> >
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License: https://www.oasis-
> > open.org/who/ipr/feedback_license.pdf
> > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/
> 
> Disclaimer: This email and any files transmitted with it may contain confidential information intended for the addressee(s) only. The information is not to be surrendered or copied to unauthorized persons. If you have received this communication in error, please notify the sender immediately and delete this e-mail from your system.


---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that 
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [virtio-comment] Re: [virtio-dev] Re: [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2020-08-13 23:17             ` [virtio] " Michael S. Tsirkin
@ 2020-08-17  8:11               ` Lars Ganrot
  2021-09-06  6:33                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 17+ messages in thread
From: Lars Ganrot @ 2020-08-17  8:11 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Cornelia Huck, virtio-comment, virtio-dev, virtio

> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Michael S. Tsirkin
> Sent: 14. august 2020 01:18
> 
> On Wed, Aug 12, 2020 at 03:55:18PM +0000, Lars Ganrot wrote:
> > > From: virtio-comment@lists.oasis-open.org
> > > <virtio-comment@lists.oasis- open.org> On Behalf Of Cornelia Huck
> > > Sent: 12. august 2020 14:51
> > >
> > > On Tue, 11 Aug 2020 15:43:44 +0000
> > > Lars Ganrot <lga@napatech.com> wrote:
> > >
> > > > > From: virtio-comment@lists.oasis-open.org
> > > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Lars Ganrot
> > > > > Sent: 11. august 2020 16:54
> > > > >
> > > > > > From: virtio-dev@lists.oasis-open.org
> > > > > > <virtio-dev@lists.oasis-open.org> On Behalf Of Michael S.
> > > > > > Tsirkin
> > > > > > Sent: 11. august 2020 10:23
> > > > > >
> > > > > > On Mon, Aug 10, 2020 at 06:59:28PM +0200, Cornelia Huck wrote:
> > > > > > > On Mon, 10 Aug 2020 12:15:15 -0400 "Michael S. Tsirkin"
> > > > > > > <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > > Devices that normally use buffers in order can benefit
> > > > > > > > from ability to temporarily switch to handle some buffers out of
> order.
> > > > > > > >
> > > > > > > > As a case in point, a networking device might handle RX
> > > > > > > > buffers in order normally. However, should an access to an
> > > > > > > > RX buffer cause a page fault (e.g. when using PRI), the
> > > > > > > > device could benefit from ability to temporarily keep
> > > > > > > > using following buffers in the ring (possibly with higher
> > > > > > > > overhead) until the fault has
> > > been resolved.
> > > > > > > >
> > > > > > > > Page faults allow more features such as THP, auto-NUMA,
> > > > > > > > live migration.
> > > > > > > >
> > > > > > > > Out of order is of course already possible, however,
> > > > > > > > IN_ORDER is currently required for descriptor batching
> > > > > > > > where device marks a whole batch of buffers used in one go.
> > > > > > > >
> > > > > > > > The idea behind this proposal is to relax that
> > > > > > > > requirement, allowing batching without asking device to be
> > > > > > > > in orde rat all times, as
> > > > > > > > follows:
> > > > > > > >
> > > > > > > > Device uses buffers in any order. Eventually when device
> > > > > > > > detects that it has used all previously outstanding
> > > > > > > > buffers, it sets a FLUSH flag on the last buffer used. If
> > > > > > > > it set this flag on the last buffer used previously, and
> > > > > > > > now uses a batch of descriptors in-order, it can now
> > > > > > > > signal the last buffer used again setting the FLUSH
> > > > > flag.
> > > > > > > >
> > > > > > > > Driver can detect in-order when it sees two FLUSH flags
> > > > > > > > one after another. In other respects the feature is
> > > > > > > > similar to IN_ORDER from the driver implementation POV.
> > > > > > > >
> > > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > ---
> > > > > > > >  content.tex     |  9 ++++++++-
> > > > > > > >  packed-ring.tex | 23 +++++++++++++++++++++++
> > > > > > > > split-ring.tex
> > > > > > > > |
> > > > > > > > 26
> > > > > > > > ++++++++++++++++++++++++--
> > > > > > > >  3 files changed, 55 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/content.tex b/content.tex index
> > > > > > > > 91735e3..8494eb6
> > > > > > > > 100644
> > > > > > > > --- a/content.tex
> > > > > > > > +++ b/content.tex
> > > > > > > > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic
> > > > > > > > Facilities of a Virtio Device / Virtqueues}
> > > > > > > >
> > > > > > > >  Some devices always use descriptors in the same order in
> > > > > > > > which they have been made available. These devices can
> > > > > > > > offer the -VIRTIO_F_IN_ORDER feature. If negotiated, this
> > > > > > > > knowledge
> > > > > > > > +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
> > > > > > > > +descriptors in the same order in which they have been
> > > > > > > > +made available. These devices can offer the
> > > > > > > > +VIRTIO_F_PARTIAL_ORDER feature. If one of the features
> > > > > > > > +VIRTIO_F_IN_ORDER or VIRTIO_F_PARTIAL_ORDER is
> > > > > > negotiated,
> > > > > > > > +this knowledge
> > > > > > >
> > > > > > > Do these two features conflict with each other? I.e., at
> > > > > > > most one of them may be negotiated (or offered?) at a time?
> > > > > >
> > > > > > Good point. I think so, yes. Will document.
> > > > >
> > > > > Isn't it more natural to think of VIRTIO_F_IN_ORDER as the
> > > > > simple case which always maintains ordered access, while the new
> > > > > feature flag allows active control of when descriptors are
> > > > > ordered and when not? To make it backward compatible let
> > > > > VIRTIO_F_IN_ORDER imply the new bit is set, while the new bit
> > > > > set by itself without VIRTIO_F_IN_ORDER set means only active
> > > > > control is offered. I guess a name like VIRTIO_F_CTRL_ORDER
> > > > > would be more appropriate with this
> > > interpretation.
> > > > >
> > > >
> > > > On second thought that might be a bit backwards - how about:
> > > >
> > > > Legacy case: VIRTIO_F_IN_ORDER==0/1 + VIRTIO_F_ORDER_RELAX==0
> This
> > > > proposal: VIRTIO_F_IN_ORDER==1 + VIRTIO_F_ORDER_RELAX==1
> Potential
> > > > future use: VIRTIO_F_???_ORDER==1 + VIRTIO_F_ORDER_RELAX==0/1
> > >
> > > What happens in the new device/old driver case?
> > > - device offers IN_ORDER and PARTIAL_ORDER
> > > - driver does not know PARTIAL_ORDER, accepts IN_ORDER
> > > - device now only can do complete ordering
> > >
> > > Maybe I don't understand the purpose of the new feature correctly,
> > > but I thought it was for those devices that don't do full in-order,
> > > but can do it for a subset of buffers? As such, the two features
> > > can't really imply each other: a device offering IN_ORDER might not
> > > know about the new feature and its mechanism, and a device offering
> > > the new feature, but not IN_ORDER probably does so because it cannot
> support full IN_ORDER.
> > >
> > > I think it makes the most sense if the device can offer both flags,
> > > but the driver must only accept at most one of them?
> > >
> >
> > Yes, you are absolutely right. Keeping them as two mutually exclusive
> options is probably best.
> >
> > Another aspect of the proposal is that the functionality is only
> > described for the packed ring layout, but the concept should be
> > equally applicable to the split ring.
> 
> Pls take a look at the changes in split-ring.tex Are they unclear?
> 

I guess not. I failed to understand that the used-ring 32-bit ID field is divided into a 16-bit ID and a 16-bit flags. Sorry for the confusion.

Does this mean that for a device that always sets the FLUSH-flag, the PARTIAL_ORDER feature will be identical to IN_ORDER for both packed and split rings? The first write to a Used structure (after creation of the Virtqueue) would not have 2 FLUSH-flags, but would still fulfil the condition.

> > The split ring does not have
> > flags for each used ring element, so the FLUSH flag functionality
> > needs to be a bit different. One way is to add an IN_ORDER-flag to the
> > virtq_used.flags, that allows the device to signal when it starts to
> > adhere to IN_ORDER buffer use. The driver acknowledges this when it is
> > ready to fulfil its part of the IN_ORDER restrictions by setting a new
> > virtq_avail.flag: IN_ORDER_ACK. At this point both device and driver
> > can assume IN_ORDER optimization until the device clears IN_ORDER.
> > After clearing IN_ORDER, the device must wait until IN_ORDER_ACK is
> > cleared before IN_ORDER can be asserted it again.
> 
> It would be tricky to figure out which buffers does the handshake apply to. My
> proposal does it a bit differently, pls take a look.
> 
> 
> >
> > This is a little less fine-grained than the packed-ring solution with the FLUSH
> flag, but for infrequent scenarios like migration it should be OK.
> >
> > >
> > > This publicly archived list offers a means to provide input to the
> > > OASIS Virtual I/O Device (VIRTIO) TC.
> > >
> > > In order to verify user consent to the Feedback License terms and to
> > > minimize spam in the list archive, subscription is required before posting.
> > >
> > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > List help: virtio-comment-help@lists.oasis-open.org
> > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > Feedback License: https://www.oasis-
> > > open.org/who/ipr/feedback_license.pdf
> > > List Guidelines:
> > > https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > Committee: https://www.oasis-open.org/committees/virtio/
> > > Join OASIS: https://www.oasis-open.org/join/
> >
> 
> 
> This publicly archived list offers a means to provide input to the OASIS Virtual
> I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and to minimize
> spam in the list archive, subscription is required before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-
> open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-comment] Re: [virtio-dev] Re: [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2020-08-17  8:11               ` Lars Ganrot
@ 2021-09-06  6:33                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2021-09-06  6:33 UTC (permalink / raw)
  To: Lars Ganrot; +Cc: Cornelia Huck, virtio-comment, virtio-dev, virtio

On Mon, Aug 17, 2020 at 08:11:36AM +0000, Lars Ganrot wrote:
> > From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> > open.org> On Behalf Of Michael S. Tsirkin
> > Sent: 14. august 2020 01:18
> > 
> > On Wed, Aug 12, 2020 at 03:55:18PM +0000, Lars Ganrot wrote:
> > > > From: virtio-comment@lists.oasis-open.org
> > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Cornelia Huck
> > > > Sent: 12. august 2020 14:51
> > > >
> > > > On Tue, 11 Aug 2020 15:43:44 +0000
> > > > Lars Ganrot <lga@napatech.com> wrote:
> > > >
> > > > > > From: virtio-comment@lists.oasis-open.org
> > > > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Lars Ganrot
> > > > > > Sent: 11. august 2020 16:54
> > > > > >
> > > > > > > From: virtio-dev@lists.oasis-open.org
> > > > > > > <virtio-dev@lists.oasis-open.org> On Behalf Of Michael S.
> > > > > > > Tsirkin
> > > > > > > Sent: 11. august 2020 10:23
> > > > > > >
> > > > > > > On Mon, Aug 10, 2020 at 06:59:28PM +0200, Cornelia Huck wrote:
> > > > > > > > On Mon, 10 Aug 2020 12:15:15 -0400 "Michael S. Tsirkin"
> > > > > > > > <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > > Devices that normally use buffers in order can benefit
> > > > > > > > > from ability to temporarily switch to handle some buffers out of
> > order.
> > > > > > > > >
> > > > > > > > > As a case in point, a networking device might handle RX
> > > > > > > > > buffers in order normally. However, should an access to an
> > > > > > > > > RX buffer cause a page fault (e.g. when using PRI), the
> > > > > > > > > device could benefit from ability to temporarily keep
> > > > > > > > > using following buffers in the ring (possibly with higher
> > > > > > > > > overhead) until the fault has
> > > > been resolved.
> > > > > > > > >
> > > > > > > > > Page faults allow more features such as THP, auto-NUMA,
> > > > > > > > > live migration.
> > > > > > > > >
> > > > > > > > > Out of order is of course already possible, however,
> > > > > > > > > IN_ORDER is currently required for descriptor batching
> > > > > > > > > where device marks a whole batch of buffers used in one go.
> > > > > > > > >
> > > > > > > > > The idea behind this proposal is to relax that
> > > > > > > > > requirement, allowing batching without asking device to be
> > > > > > > > > in orde rat all times, as
> > > > > > > > > follows:
> > > > > > > > >
> > > > > > > > > Device uses buffers in any order. Eventually when device
> > > > > > > > > detects that it has used all previously outstanding
> > > > > > > > > buffers, it sets a FLUSH flag on the last buffer used. If
> > > > > > > > > it set this flag on the last buffer used previously, and
> > > > > > > > > now uses a batch of descriptors in-order, it can now
> > > > > > > > > signal the last buffer used again setting the FLUSH
> > > > > > flag.
> > > > > > > > >
> > > > > > > > > Driver can detect in-order when it sees two FLUSH flags
> > > > > > > > > one after another. In other respects the feature is
> > > > > > > > > similar to IN_ORDER from the driver implementation POV.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > ---
> > > > > > > > >  content.tex     |  9 ++++++++-
> > > > > > > > >  packed-ring.tex | 23 +++++++++++++++++++++++
> > > > > > > > > split-ring.tex
> > > > > > > > > |
> > > > > > > > > 26
> > > > > > > > > ++++++++++++++++++++++++--
> > > > > > > > >  3 files changed, 55 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/content.tex b/content.tex index
> > > > > > > > > 91735e3..8494eb6
> > > > > > > > > 100644
> > > > > > > > > --- a/content.tex
> > > > > > > > > +++ b/content.tex
> > > > > > > > > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic
> > > > > > > > > Facilities of a Virtio Device / Virtqueues}
> > > > > > > > >
> > > > > > > > >  Some devices always use descriptors in the same order in
> > > > > > > > > which they have been made available. These devices can
> > > > > > > > > offer the -VIRTIO_F_IN_ORDER feature. If negotiated, this
> > > > > > > > > knowledge
> > > > > > > > > +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
> > > > > > > > > +descriptors in the same order in which they have been
> > > > > > > > > +made available. These devices can offer the
> > > > > > > > > +VIRTIO_F_PARTIAL_ORDER feature. If one of the features
> > > > > > > > > +VIRTIO_F_IN_ORDER or VIRTIO_F_PARTIAL_ORDER is
> > > > > > > negotiated,
> > > > > > > > > +this knowledge
> > > > > > > >
> > > > > > > > Do these two features conflict with each other? I.e., at
> > > > > > > > most one of them may be negotiated (or offered?) at a time?
> > > > > > >
> > > > > > > Good point. I think so, yes. Will document.
> > > > > >
> > > > > > Isn't it more natural to think of VIRTIO_F_IN_ORDER as the
> > > > > > simple case which always maintains ordered access, while the new
> > > > > > feature flag allows active control of when descriptors are
> > > > > > ordered and when not? To make it backward compatible let
> > > > > > VIRTIO_F_IN_ORDER imply the new bit is set, while the new bit
> > > > > > set by itself without VIRTIO_F_IN_ORDER set means only active
> > > > > > control is offered. I guess a name like VIRTIO_F_CTRL_ORDER
> > > > > > would be more appropriate with this
> > > > interpretation.
> > > > > >
> > > > >
> > > > > On second thought that might be a bit backwards - how about:
> > > > >
> > > > > Legacy case: VIRTIO_F_IN_ORDER==0/1 + VIRTIO_F_ORDER_RELAX==0
> > This
> > > > > proposal: VIRTIO_F_IN_ORDER==1 + VIRTIO_F_ORDER_RELAX==1
> > Potential
> > > > > future use: VIRTIO_F_???_ORDER==1 + VIRTIO_F_ORDER_RELAX==0/1
> > > >
> > > > What happens in the new device/old driver case?
> > > > - device offers IN_ORDER and PARTIAL_ORDER
> > > > - driver does not know PARTIAL_ORDER, accepts IN_ORDER
> > > > - device now only can do complete ordering
> > > >
> > > > Maybe I don't understand the purpose of the new feature correctly,
> > > > but I thought it was for those devices that don't do full in-order,
> > > > but can do it for a subset of buffers? As such, the two features
> > > > can't really imply each other: a device offering IN_ORDER might not
> > > > know about the new feature and its mechanism, and a device offering
> > > > the new feature, but not IN_ORDER probably does so because it cannot
> > support full IN_ORDER.
> > > >
> > > > I think it makes the most sense if the device can offer both flags,
> > > > but the driver must only accept at most one of them?
> > > >
> > >
> > > Yes, you are absolutely right. Keeping them as two mutually exclusive
> > options is probably best.
> > >
> > > Another aspect of the proposal is that the functionality is only
> > > described for the packed ring layout, but the concept should be
> > > equally applicable to the split ring.
> > 
> > Pls take a look at the changes in split-ring.tex Are they unclear?
> > 
> 
> I guess not. I failed to understand that the used-ring 32-bit ID field is divided into a 16-bit ID and a 16-bit flags. Sorry for the confusion.
> 
> Does this mean that for a device that always sets the FLUSH-flag, the
> PARTIAL_ORDER feature will be identical to IN_ORDER for both packed
> and split rings?

Yes, I think so.

> The first write to a Used structure (after creation
> of the Virtqueue) would not have 2 FLUSH-flags, but would still fulfil
> the condition.

Needs thought.

> > > The split ring does not have
> > > flags for each used ring element, so the FLUSH flag functionality
> > > needs to be a bit different. One way is to add an IN_ORDER-flag to the
> > > virtq_used.flags, that allows the device to signal when it starts to
> > > adhere to IN_ORDER buffer use. The driver acknowledges this when it is
> > > ready to fulfil its part of the IN_ORDER restrictions by setting a new
> > > virtq_avail.flag: IN_ORDER_ACK. At this point both device and driver
> > > can assume IN_ORDER optimization until the device clears IN_ORDER.
> > > After clearing IN_ORDER, the device must wait until IN_ORDER_ACK is
> > > cleared before IN_ORDER can be asserted it again.
> > 
> > It would be tricky to figure out which buffers does the handshake apply to. My
> > proposal does it a bit differently, pls take a look.
> > 
> > 
> > >
> > > This is a little less fine-grained than the packed-ring solution with the FLUSH
> > flag, but for infrequent scenarios like migration it should be OK.
> > >
> > > >
> > > > This publicly archived list offers a means to provide input to the
> > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > >
> > > > In order to verify user consent to the Feedback License terms and to
> > > > minimize spam in the list archive, subscription is required before posting.
> > > >
> > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > Feedback License: https://www.oasis-
> > > > open.org/who/ipr/feedback_license.pdf
> > > > List Guidelines:
> > > > https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > Join OASIS: https://www.oasis-open.org/join/
> > >
> > 
> > 
> > This publicly archived list offers a means to provide input to the OASIS Virtual
> > I/O Device (VIRTIO) TC.
> > 
> > In order to verify user consent to the Feedback License terms and to minimize
> > spam in the list archive, subscription is required before posting.
> > 
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License: https://www.oasis-
> > open.org/who/ipr/feedback_license.pdf
> > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-dev] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2020-08-10 16:15 [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling Michael S. Tsirkin
  2020-08-10 16:59 ` Cornelia Huck
@ 2022-03-29  8:33 ` Stefan Hajnoczi
  2022-03-29 10:30   ` Eugenio Perez Martin
  2022-03-29 14:39   ` Michael S. Tsirkin
  1 sibling, 2 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2022-03-29  8:33 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, virtio, i.maximets,
	Stefano Garzarella, Eugenio Perez Martin

[-- Attachment #1: Type: text/plain, Size: 7568 bytes --]

On Mon, Aug 10, 2020 at 12:15:15PM -0400, Michael S. Tsirkin wrote:
> Devices that normally use buffers in order can
> benefit from ability to temporarily switch to handle
> some buffers out of order.
> 
> As a case in point, a networking device might handle
> RX buffers in order normally. However, should
> an access to an RX buffer cause a page fault
> (e.g. when using PRI), the device could benefit from
> ability to temporarily keep using following
> buffers in the ring (possibly with higher overhead)
> until the fault has been resolved.
> 
> Page faults allow more features such as THP, auto-NUMA,
> live migration.
> 
> Out of order is of course already possible, however,
> IN_ORDER is currently required for descriptor batching where
> device marks a whole batch of buffers used in one go.
> 
> The idea behind this proposal is to relax that requirement,
> allowing batching without asking device to be in orde rat all times,
> as follows:
> 
> Device uses buffers in any order. Eventually when device detects that it
> has used all previously outstanding buffers, it sets a FLUSH flag on the
> last buffer used. If it set this flag on the last buffer used
> previously, and now uses a batch of descriptors in-order, it can now
> signal the last buffer used again setting the FLUSH flag.
> 
> Driver can detect in-order when it sees two FLUSH flags one after
> another. In other respects the feature is similar to IN_ORDER
> from the driver implementation POV.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  content.tex     |  9 ++++++++-
>  packed-ring.tex | 23 +++++++++++++++++++++++
>  split-ring.tex  | 26 ++++++++++++++++++++++++--
>  3 files changed, 55 insertions(+), 3 deletions(-)

Hi Michael,
What is the status of this feature?

There is a Google Summer of Code project to implement VIRTIO_F_IN_ORDER
in QEMU and Linux
(https://wiki.qemu.org/Google_Summer_of_Code_2022#VIRTIO_F_IN_ORDER_support_for_virtio_devices).

I wanted to check if you still want to pursue PARTIAL_ORDER. Maybe
Stefano and Eugenio (the mentors) will find a way to connect it to the
GSoC project.

Thanks to Ilya Maximets for mentioning PARTIAL_ORDER!

Stefan

> 
> diff --git a/content.tex b/content.tex
> index 91735e3..8494eb6 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
>  
>  Some devices always use descriptors in the same order in which
>  they have been made available. These devices can offer the
> -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
> +descriptors in the same order in which they have been made
> +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> +feature. If one of the features VIRTIO_F_IN_ORDER or
> +VIRTIO_F_PARTIAL_ORDER is negotiated, this knowledge
>  might allow optimizations or simplify driver and/or device code.
>  
>  Each virtqueue can consist of up to 3 parts:
> @@ -6132,6 +6136,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    that the driver passes extra data (besides identifying the virtqueue)
>    in its device notifications.
>    See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
> +  \item[VIRTIO_F_PARTIAL_ORDER(39)] This feature indicates
> +  that device has ability to indicate use of (some of) buffers by the device in the same
> +  order in which they have been made available.
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> diff --git a/packed-ring.tex b/packed-ring.tex
> index ea92543..a120a19 100644
> --- a/packed-ring.tex
> +++ b/packed-ring.tex
> @@ -284,6 +284,29 @@ \subsection{In-order use of descriptors}
>  only writing out a single used descriptor with the Buffer ID
>  corresponding to the last descriptor in the batch.
>  
> +Other devices sometimes use
> +descriptors in the same order in which they have been made
> +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> +feature. If negotiated, whenever device has used all buffers
> +since the previous used buffer in the same order
> +in which they have been made available, device can set the
> +VIRTQ_DESC_F_FLUSH flag in the used descriptor.
> +\begin{lstlisting}
> +#define VIRTQ_DESC_F_FLUSH      8
> +\end{lstlisting}
> +
> +This knowledge allows
> +devices to notify the use of a batch of buffers to the driver by
> +only writing out a single used descriptor with the Buffer ID
> +corresponding to the last descriptor in the batch,
> +and VIRTQ_DESC_F_FLUSH set.
> +
> +Note that device is only allowed to batch buffers in this way
> +if the previous used descriptor also has the VIRTQ_DESC_F_FLUSH
> +flag set, as a result, considering the group of buffers
> +used between two buffers with VIRTQ_DESC_F_FLUSH set,
> +either all of them constitute a batch, or none at all.
> +
>  The device then skips forward in the ring according to the size of
>  the batch. The driver needs to look up the used Buffer ID and
>  calculate the batch size to be able to advance to where the next
> diff --git a/split-ring.tex b/split-ring.tex
> index 123ac9f..cf197f8 100644
> --- a/split-ring.tex
> +++ b/split-ring.tex
> @@ -398,10 +398,11 @@ \subsection{The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Devi
>          le16 avail_event; /* Only if VIRTIO_F_EVENT_IDX */
>  };
>  
> -/* le32 is used here for ids for padding reasons. */
>  struct virtq_used_elem {
>          /* Index of start of used descriptor chain. */
> -        le32 id;
> +        le16 id;
> +#define VIRTQ_USED_ELEM_F_FLUSH  0x8000
> +        le16 flags;
>          /* Total length of the descriptor chain which was used (written to) */
>          le32 len;
>  };
> @@ -481,6 +482,27 @@ \subsection{In-order use of descriptors}
>  corresponding to the head entry of the
>  descriptor chain describing the last buffer in the batch.
>  
> +Other devices sometimes use
> +descriptors in the same order in which they have been made
> +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> +feature. If negotiated, whenever device has used all buffers
> +since the previous used buffer in the same order
> +in which they have been made available, device can set the
> +VIRTQ_USED_ELEM_F_FLUSH flag in the used ring entry.
> +
> +This knowledge allows
> +devices to notify the use of a batch of buffers to the driver by
> +only writing out single used ring entry with the \field{id}
> +corresponding to the head entry of the
> +descriptor chain describing the last buffer in the batch,
> +and VIRTQ_USED_ELEM_F_FLUSH set.
> +
> +Note that device is only allowed to batch buffers in this way
> +if the previous used ring entry also has the VIRTQ_USED_ELEM_F_FLUSH
> +flag set, as a result, considering the group of buffers
> +used between two buffers with VIRTQ_USED_ELEM_F_FLUSH set,
> +either all of them constitute a batch, or none at all.
> +
>  The device then skips forward in the ring according to the size of
>  the batch. Accordingly, it increments the used \field{idx} by the
>  size of the batch.
> -- 
> MST
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-dev] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2022-03-29  8:33 ` [virtio-dev] " Stefan Hajnoczi
@ 2022-03-29 10:30   ` Eugenio Perez Martin
  2022-03-29 14:40     ` [virtio-comment] " Michael S. Tsirkin
  2022-03-29 14:39   ` Michael S. Tsirkin
  1 sibling, 1 reply; 17+ messages in thread
From: Eugenio Perez Martin @ 2022-03-29 10:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Michael S. Tsirkin, virtio-comment, Virtio-Dev, virtio,
	i.maximets, Stefano Garzarella

On Tue, Mar 29, 2022 at 10:33 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Mon, Aug 10, 2020 at 12:15:15PM -0400, Michael S. Tsirkin wrote:
> > Devices that normally use buffers in order can
> > benefit from ability to temporarily switch to handle
> > some buffers out of order.
> >
> > As a case in point, a networking device might handle
> > RX buffers in order normally. However, should
> > an access to an RX buffer cause a page fault
> > (e.g. when using PRI), the device could benefit from
> > ability to temporarily keep using following
> > buffers in the ring (possibly with higher overhead)
> > until the fault has been resolved.
> >
> > Page faults allow more features such as THP, auto-NUMA,
> > live migration.
> >
> > Out of order is of course already possible, however,
> > IN_ORDER is currently required for descriptor batching where
> > device marks a whole batch of buffers used in one go.
> >
> > The idea behind this proposal is to relax that requirement,
> > allowing batching without asking device to be in orde rat all times,
> > as follows:
> >
> > Device uses buffers in any order. Eventually when device detects that it
> > has used all previously outstanding buffers, it sets a FLUSH flag on the
> > last buffer used. If it set this flag on the last buffer used
> > previously, and now uses a batch of descriptors in-order, it can now
> > signal the last buffer used again setting the FLUSH flag.
> >
> > Driver can detect in-order when it sees two FLUSH flags one after
> > another. In other respects the feature is similar to IN_ORDER
> > from the driver implementation POV.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  content.tex     |  9 ++++++++-
> >  packed-ring.tex | 23 +++++++++++++++++++++++
> >  split-ring.tex  | 26 ++++++++++++++++++++++++--
> >  3 files changed, 55 insertions(+), 3 deletions(-)
>
> Hi Michael,
> What is the status of this feature?
>
> There is a Google Summer of Code project to implement VIRTIO_F_IN_ORDER
> in QEMU and Linux
> (https://wiki.qemu.org/Google_Summer_of_Code_2022#VIRTIO_F_IN_ORDER_support_for_virtio_devices).
>
> I wanted to check if you still want to pursue PARTIAL_ORDER. Maybe
> Stefano and Eugenio (the mentors) will find a way to connect it to the
> GSoC project.
>

If the status is close to being merged I'm totally in to implement
either partial order first or to implement both in the project. In
some sense it superseded IN_ORDER.

Thanks!

> Thanks to Ilya Maximets for mentioning PARTIAL_ORDER!
>
> Stefan
>
> >
> > diff --git a/content.tex b/content.tex
> > index 91735e3..8494eb6 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
> >
> >  Some devices always use descriptors in the same order in which
> >  they have been made available. These devices can offer the
> > -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> > +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
> > +descriptors in the same order in which they have been made
> > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > +feature. If one of the features VIRTIO_F_IN_ORDER or
> > +VIRTIO_F_PARTIAL_ORDER is negotiated, this knowledge
> >  might allow optimizations or simplify driver and/or device code.
> >
> >  Each virtqueue can consist of up to 3 parts:
> > @@ -6132,6 +6136,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >    that the driver passes extra data (besides identifying the virtqueue)
> >    in its device notifications.
> >    See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
> > +  \item[VIRTIO_F_PARTIAL_ORDER(39)] This feature indicates
> > +  that device has ability to indicate use of (some of) buffers by the device in the same
> > +  order in which they have been made available.
> >  \end{description}
> >
> >  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > diff --git a/packed-ring.tex b/packed-ring.tex
> > index ea92543..a120a19 100644
> > --- a/packed-ring.tex
> > +++ b/packed-ring.tex
> > @@ -284,6 +284,29 @@ \subsection{In-order use of descriptors}
> >  only writing out a single used descriptor with the Buffer ID
> >  corresponding to the last descriptor in the batch.
> >
> > +Other devices sometimes use
> > +descriptors in the same order in which they have been made
> > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > +feature. If negotiated, whenever device has used all buffers
> > +since the previous used buffer in the same order
> > +in which they have been made available, device can set the
> > +VIRTQ_DESC_F_FLUSH flag in the used descriptor.
> > +\begin{lstlisting}
> > +#define VIRTQ_DESC_F_FLUSH      8
> > +\end{lstlisting}
> > +
> > +This knowledge allows
> > +devices to notify the use of a batch of buffers to the driver by
> > +only writing out a single used descriptor with the Buffer ID
> > +corresponding to the last descriptor in the batch,
> > +and VIRTQ_DESC_F_FLUSH set.
> > +
> > +Note that device is only allowed to batch buffers in this way
> > +if the previous used descriptor also has the VIRTQ_DESC_F_FLUSH
> > +flag set, as a result, considering the group of buffers
> > +used between two buffers with VIRTQ_DESC_F_FLUSH set,
> > +either all of them constitute a batch, or none at all.
> > +
> >  The device then skips forward in the ring according to the size of
> >  the batch. The driver needs to look up the used Buffer ID and
> >  calculate the batch size to be able to advance to where the next
> > diff --git a/split-ring.tex b/split-ring.tex
> > index 123ac9f..cf197f8 100644
> > --- a/split-ring.tex
> > +++ b/split-ring.tex
> > @@ -398,10 +398,11 @@ \subsection{The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Devi
> >          le16 avail_event; /* Only if VIRTIO_F_EVENT_IDX */
> >  };
> >
> > -/* le32 is used here for ids for padding reasons. */
> >  struct virtq_used_elem {
> >          /* Index of start of used descriptor chain. */
> > -        le32 id;
> > +        le16 id;
> > +#define VIRTQ_USED_ELEM_F_FLUSH  0x8000
> > +        le16 flags;
> >          /* Total length of the descriptor chain which was used (written to) */
> >          le32 len;
> >  };
> > @@ -481,6 +482,27 @@ \subsection{In-order use of descriptors}
> >  corresponding to the head entry of the
> >  descriptor chain describing the last buffer in the batch.
> >
> > +Other devices sometimes use
> > +descriptors in the same order in which they have been made
> > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > +feature. If negotiated, whenever device has used all buffers
> > +since the previous used buffer in the same order
> > +in which they have been made available, device can set the
> > +VIRTQ_USED_ELEM_F_FLUSH flag in the used ring entry.
> > +
> > +This knowledge allows
> > +devices to notify the use of a batch of buffers to the driver by
> > +only writing out single used ring entry with the \field{id}
> > +corresponding to the head entry of the
> > +descriptor chain describing the last buffer in the batch,
> > +and VIRTQ_USED_ELEM_F_FLUSH set.
> > +
> > +Note that device is only allowed to batch buffers in this way
> > +if the previous used ring entry also has the VIRTQ_USED_ELEM_F_FLUSH
> > +flag set, as a result, considering the group of buffers
> > +used between two buffers with VIRTQ_USED_ELEM_F_FLUSH set,
> > +either all of them constitute a batch, or none at all.
> > +
> >  The device then skips forward in the ring according to the size of
> >  the batch. Accordingly, it increments the used \field{idx} by the
> >  size of the batch.
> > --
> > MST
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-dev] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2022-03-29  8:33 ` [virtio-dev] " Stefan Hajnoczi
  2022-03-29 10:30   ` Eugenio Perez Martin
@ 2022-03-29 14:39   ` Michael S. Tsirkin
  1 sibling, 0 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2022-03-29 14:39 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-comment, virtio-dev, virtio, i.maximets,
	Stefano Garzarella, Eugenio Perez Martin

On Tue, Mar 29, 2022 at 09:33:39AM +0100, Stefan Hajnoczi wrote:
> On Mon, Aug 10, 2020 at 12:15:15PM -0400, Michael S. Tsirkin wrote:
> > Devices that normally use buffers in order can
> > benefit from ability to temporarily switch to handle
> > some buffers out of order.
> > 
> > As a case in point, a networking device might handle
> > RX buffers in order normally. However, should
> > an access to an RX buffer cause a page fault
> > (e.g. when using PRI), the device could benefit from
> > ability to temporarily keep using following
> > buffers in the ring (possibly with higher overhead)
> > until the fault has been resolved.
> > 
> > Page faults allow more features such as THP, auto-NUMA,
> > live migration.
> > 
> > Out of order is of course already possible, however,
> > IN_ORDER is currently required for descriptor batching where
> > device marks a whole batch of buffers used in one go.
> > 
> > The idea behind this proposal is to relax that requirement,
> > allowing batching without asking device to be in orde rat all times,
> > as follows:
> > 
> > Device uses buffers in any order. Eventually when device detects that it
> > has used all previously outstanding buffers, it sets a FLUSH flag on the
> > last buffer used. If it set this flag on the last buffer used
> > previously, and now uses a batch of descriptors in-order, it can now
> > signal the last buffer used again setting the FLUSH flag.
> > 
> > Driver can detect in-order when it sees two FLUSH flags one after
> > another. In other respects the feature is similar to IN_ORDER
> > from the driver implementation POV.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  content.tex     |  9 ++++++++-
> >  packed-ring.tex | 23 +++++++++++++++++++++++
> >  split-ring.tex  | 26 ++++++++++++++++++++++++--
> >  3 files changed, 55 insertions(+), 3 deletions(-)
> 
> Hi Michael,
> What is the status of this feature?
> 
> There is a Google Summer of Code project to implement VIRTIO_F_IN_ORDER
> in QEMU and Linux
> (https://wiki.qemu.org/Google_Summer_of_Code_2022#VIRTIO_F_IN_ORDER_support_for_virtio_devices).
> 
> I wanted to check if you still want to pursue PARTIAL_ORDER. Maybe
> Stefano and Eugenio (the mentors) will find a way to connect it to the
> GSoC project.
> 
> Thanks to Ilya Maximets for mentioning PARTIAL_ORDER!
> 
> Stefan

Yes I do. Would be great if we had someone who can work on this!

> > 
> > diff --git a/content.tex b/content.tex
> > index 91735e3..8494eb6 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
> >  
> >  Some devices always use descriptors in the same order in which
> >  they have been made available. These devices can offer the
> > -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> > +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
> > +descriptors in the same order in which they have been made
> > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > +feature. If one of the features VIRTIO_F_IN_ORDER or
> > +VIRTIO_F_PARTIAL_ORDER is negotiated, this knowledge
> >  might allow optimizations or simplify driver and/or device code.
> >  
> >  Each virtqueue can consist of up to 3 parts:
> > @@ -6132,6 +6136,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >    that the driver passes extra data (besides identifying the virtqueue)
> >    in its device notifications.
> >    See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
> > +  \item[VIRTIO_F_PARTIAL_ORDER(39)] This feature indicates
> > +  that device has ability to indicate use of (some of) buffers by the device in the same
> > +  order in which they have been made available.
> >  \end{description}
> >  
> >  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > diff --git a/packed-ring.tex b/packed-ring.tex
> > index ea92543..a120a19 100644
> > --- a/packed-ring.tex
> > +++ b/packed-ring.tex
> > @@ -284,6 +284,29 @@ \subsection{In-order use of descriptors}
> >  only writing out a single used descriptor with the Buffer ID
> >  corresponding to the last descriptor in the batch.
> >  
> > +Other devices sometimes use
> > +descriptors in the same order in which they have been made
> > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > +feature. If negotiated, whenever device has used all buffers
> > +since the previous used buffer in the same order
> > +in which they have been made available, device can set the
> > +VIRTQ_DESC_F_FLUSH flag in the used descriptor.
> > +\begin{lstlisting}
> > +#define VIRTQ_DESC_F_FLUSH      8
> > +\end{lstlisting}
> > +
> > +This knowledge allows
> > +devices to notify the use of a batch of buffers to the driver by
> > +only writing out a single used descriptor with the Buffer ID
> > +corresponding to the last descriptor in the batch,
> > +and VIRTQ_DESC_F_FLUSH set.
> > +
> > +Note that device is only allowed to batch buffers in this way
> > +if the previous used descriptor also has the VIRTQ_DESC_F_FLUSH
> > +flag set, as a result, considering the group of buffers
> > +used between two buffers with VIRTQ_DESC_F_FLUSH set,
> > +either all of them constitute a batch, or none at all.
> > +
> >  The device then skips forward in the ring according to the size of
> >  the batch. The driver needs to look up the used Buffer ID and
> >  calculate the batch size to be able to advance to where the next
> > diff --git a/split-ring.tex b/split-ring.tex
> > index 123ac9f..cf197f8 100644
> > --- a/split-ring.tex
> > +++ b/split-ring.tex
> > @@ -398,10 +398,11 @@ \subsection{The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Devi
> >          le16 avail_event; /* Only if VIRTIO_F_EVENT_IDX */
> >  };
> >  
> > -/* le32 is used here for ids for padding reasons. */
> >  struct virtq_used_elem {
> >          /* Index of start of used descriptor chain. */
> > -        le32 id;
> > +        le16 id;
> > +#define VIRTQ_USED_ELEM_F_FLUSH  0x8000
> > +        le16 flags;
> >          /* Total length of the descriptor chain which was used (written to) */
> >          le32 len;
> >  };
> > @@ -481,6 +482,27 @@ \subsection{In-order use of descriptors}
> >  corresponding to the head entry of the
> >  descriptor chain describing the last buffer in the batch.
> >  
> > +Other devices sometimes use
> > +descriptors in the same order in which they have been made
> > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > +feature. If negotiated, whenever device has used all buffers
> > +since the previous used buffer in the same order
> > +in which they have been made available, device can set the
> > +VIRTQ_USED_ELEM_F_FLUSH flag in the used ring entry.
> > +
> > +This knowledge allows
> > +devices to notify the use of a batch of buffers to the driver by
> > +only writing out single used ring entry with the \field{id}
> > +corresponding to the head entry of the
> > +descriptor chain describing the last buffer in the batch,
> > +and VIRTQ_USED_ELEM_F_FLUSH set.
> > +
> > +Note that device is only allowed to batch buffers in this way
> > +if the previous used ring entry also has the VIRTQ_USED_ELEM_F_FLUSH
> > +flag set, as a result, considering the group of buffers
> > +used between two buffers with VIRTQ_USED_ELEM_F_FLUSH set,
> > +either all of them constitute a batch, or none at all.
> > +
> >  The device then skips forward in the ring according to the size of
> >  the batch. Accordingly, it increments the used \field{idx} by the
> >  size of the batch.
> > -- 
> > MST
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [virtio-comment] Re: [virtio-dev] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2022-03-29 10:30   ` Eugenio Perez Martin
@ 2022-03-29 14:40     ` Michael S. Tsirkin
  2022-03-30  9:03       ` Eugenio Perez Martin
  0 siblings, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2022-03-29 14:40 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Stefan Hajnoczi, virtio-comment, Virtio-Dev, virtio, i.maximets,
	Stefano Garzarella

On Tue, Mar 29, 2022 at 12:30:25PM +0200, Eugenio Perez Martin wrote:
> On Tue, Mar 29, 2022 at 10:33 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Mon, Aug 10, 2020 at 12:15:15PM -0400, Michael S. Tsirkin wrote:
> > > Devices that normally use buffers in order can
> > > benefit from ability to temporarily switch to handle
> > > some buffers out of order.
> > >
> > > As a case in point, a networking device might handle
> > > RX buffers in order normally. However, should
> > > an access to an RX buffer cause a page fault
> > > (e.g. when using PRI), the device could benefit from
> > > ability to temporarily keep using following
> > > buffers in the ring (possibly with higher overhead)
> > > until the fault has been resolved.
> > >
> > > Page faults allow more features such as THP, auto-NUMA,
> > > live migration.
> > >
> > > Out of order is of course already possible, however,
> > > IN_ORDER is currently required for descriptor batching where
> > > device marks a whole batch of buffers used in one go.
> > >
> > > The idea behind this proposal is to relax that requirement,
> > > allowing batching without asking device to be in orde rat all times,
> > > as follows:
> > >
> > > Device uses buffers in any order. Eventually when device detects that it
> > > has used all previously outstanding buffers, it sets a FLUSH flag on the
> > > last buffer used. If it set this flag on the last buffer used
> > > previously, and now uses a batch of descriptors in-order, it can now
> > > signal the last buffer used again setting the FLUSH flag.
> > >
> > > Driver can detect in-order when it sees two FLUSH flags one after
> > > another. In other respects the feature is similar to IN_ORDER
> > > from the driver implementation POV.
> > >
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  content.tex     |  9 ++++++++-
> > >  packed-ring.tex | 23 +++++++++++++++++++++++
> > >  split-ring.tex  | 26 ++++++++++++++++++++++++--
> > >  3 files changed, 55 insertions(+), 3 deletions(-)
> >
> > Hi Michael,
> > What is the status of this feature?
> >
> > There is a Google Summer of Code project to implement VIRTIO_F_IN_ORDER
> > in QEMU and Linux
> > (https://wiki.qemu.org/Google_Summer_of_Code_2022#VIRTIO_F_IN_ORDER_support_for_virtio_devices).
> >
> > I wanted to check if you still want to pursue PARTIAL_ORDER. Maybe
> > Stefano and Eugenio (the mentors) will find a way to connect it to the
> > GSoC project.
> >
> 
> If the status is close to being merged I'm totally in to implement
> either partial order first or to implement both in the project. In
> some sense it superseded IN_ORDER.
> 
> Thanks!

I think it's blocked on lack of implementation actually.
TC was concerned whether there are any correctness issues.
An implementation which can be stress-tested would resolve this.


> > Thanks to Ilya Maximets for mentioning PARTIAL_ORDER!
> >
> > Stefan
> >
> > >
> > > diff --git a/content.tex b/content.tex
> > > index 91735e3..8494eb6 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
> > >
> > >  Some devices always use descriptors in the same order in which
> > >  they have been made available. These devices can offer the
> > > -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> > > +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
> > > +descriptors in the same order in which they have been made
> > > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > > +feature. If one of the features VIRTIO_F_IN_ORDER or
> > > +VIRTIO_F_PARTIAL_ORDER is negotiated, this knowledge
> > >  might allow optimizations or simplify driver and/or device code.
> > >
> > >  Each virtqueue can consist of up to 3 parts:
> > > @@ -6132,6 +6136,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >    that the driver passes extra data (besides identifying the virtqueue)
> > >    in its device notifications.
> > >    See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
> > > +  \item[VIRTIO_F_PARTIAL_ORDER(39)] This feature indicates
> > > +  that device has ability to indicate use of (some of) buffers by the device in the same
> > > +  order in which they have been made available.
> > >  \end{description}
> > >
> > >  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > > diff --git a/packed-ring.tex b/packed-ring.tex
> > > index ea92543..a120a19 100644
> > > --- a/packed-ring.tex
> > > +++ b/packed-ring.tex
> > > @@ -284,6 +284,29 @@ \subsection{In-order use of descriptors}
> > >  only writing out a single used descriptor with the Buffer ID
> > >  corresponding to the last descriptor in the batch.
> > >
> > > +Other devices sometimes use
> > > +descriptors in the same order in which they have been made
> > > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > > +feature. If negotiated, whenever device has used all buffers
> > > +since the previous used buffer in the same order
> > > +in which they have been made available, device can set the
> > > +VIRTQ_DESC_F_FLUSH flag in the used descriptor.
> > > +\begin{lstlisting}
> > > +#define VIRTQ_DESC_F_FLUSH      8
> > > +\end{lstlisting}
> > > +
> > > +This knowledge allows
> > > +devices to notify the use of a batch of buffers to the driver by
> > > +only writing out a single used descriptor with the Buffer ID
> > > +corresponding to the last descriptor in the batch,
> > > +and VIRTQ_DESC_F_FLUSH set.
> > > +
> > > +Note that device is only allowed to batch buffers in this way
> > > +if the previous used descriptor also has the VIRTQ_DESC_F_FLUSH
> > > +flag set, as a result, considering the group of buffers
> > > +used between two buffers with VIRTQ_DESC_F_FLUSH set,
> > > +either all of them constitute a batch, or none at all.
> > > +
> > >  The device then skips forward in the ring according to the size of
> > >  the batch. The driver needs to look up the used Buffer ID and
> > >  calculate the batch size to be able to advance to where the next
> > > diff --git a/split-ring.tex b/split-ring.tex
> > > index 123ac9f..cf197f8 100644
> > > --- a/split-ring.tex
> > > +++ b/split-ring.tex
> > > @@ -398,10 +398,11 @@ \subsection{The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Devi
> > >          le16 avail_event; /* Only if VIRTIO_F_EVENT_IDX */
> > >  };
> > >
> > > -/* le32 is used here for ids for padding reasons. */
> > >  struct virtq_used_elem {
> > >          /* Index of start of used descriptor chain. */
> > > -        le32 id;
> > > +        le16 id;
> > > +#define VIRTQ_USED_ELEM_F_FLUSH  0x8000
> > > +        le16 flags;
> > >          /* Total length of the descriptor chain which was used (written to) */
> > >          le32 len;
> > >  };
> > > @@ -481,6 +482,27 @@ \subsection{In-order use of descriptors}
> > >  corresponding to the head entry of the
> > >  descriptor chain describing the last buffer in the batch.
> > >
> > > +Other devices sometimes use
> > > +descriptors in the same order in which they have been made
> > > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > > +feature. If negotiated, whenever device has used all buffers
> > > +since the previous used buffer in the same order
> > > +in which they have been made available, device can set the
> > > +VIRTQ_USED_ELEM_F_FLUSH flag in the used ring entry.
> > > +
> > > +This knowledge allows
> > > +devices to notify the use of a batch of buffers to the driver by
> > > +only writing out single used ring entry with the \field{id}
> > > +corresponding to the head entry of the
> > > +descriptor chain describing the last buffer in the batch,
> > > +and VIRTQ_USED_ELEM_F_FLUSH set.
> > > +
> > > +Note that device is only allowed to batch buffers in this way
> > > +if the previous used ring entry also has the VIRTQ_USED_ELEM_F_FLUSH
> > > +flag set, as a result, considering the group of buffers
> > > +used between two buffers with VIRTQ_USED_ELEM_F_FLUSH set,
> > > +either all of them constitute a batch, or none at all.
> > > +
> > >  The device then skips forward in the ring according to the size of
> > >  the batch. Accordingly, it increments the used \field{idx} by the
> > >  size of the batch.
> > > --
> > > MST
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-dev] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling
  2022-03-29 14:40     ` [virtio-comment] " Michael S. Tsirkin
@ 2022-03-30  9:03       ` Eugenio Perez Martin
  0 siblings, 0 replies; 17+ messages in thread
From: Eugenio Perez Martin @ 2022-03-30  9:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefan Hajnoczi, virtio-comment, Virtio-Dev, virtio,
	Ilya Maximets, Stefano Garzarella

On Tue, Mar 29, 2022 at 4:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 29, 2022 at 12:30:25PM +0200, Eugenio Perez Martin wrote:
> > On Tue, Mar 29, 2022 at 10:33 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >
> > > On Mon, Aug 10, 2020 at 12:15:15PM -0400, Michael S. Tsirkin wrote:
> > > > Devices that normally use buffers in order can
> > > > benefit from ability to temporarily switch to handle
> > > > some buffers out of order.
> > > >
> > > > As a case in point, a networking device might handle
> > > > RX buffers in order normally. However, should
> > > > an access to an RX buffer cause a page fault
> > > > (e.g. when using PRI), the device could benefit from
> > > > ability to temporarily keep using following
> > > > buffers in the ring (possibly with higher overhead)
> > > > until the fault has been resolved.
> > > >
> > > > Page faults allow more features such as THP, auto-NUMA,
> > > > live migration.
> > > >
> > > > Out of order is of course already possible, however,
> > > > IN_ORDER is currently required for descriptor batching where
> > > > device marks a whole batch of buffers used in one go.
> > > >
> > > > The idea behind this proposal is to relax that requirement,
> > > > allowing batching without asking device to be in orde rat all times,
> > > > as follows:
> > > >
> > > > Device uses buffers in any order. Eventually when device detects that it
> > > > has used all previously outstanding buffers, it sets a FLUSH flag on the
> > > > last buffer used. If it set this flag on the last buffer used
> > > > previously, and now uses a batch of descriptors in-order, it can now
> > > > signal the last buffer used again setting the FLUSH flag.
> > > >
> > > > Driver can detect in-order when it sees two FLUSH flags one after
> > > > another. In other respects the feature is similar to IN_ORDER
> > > > from the driver implementation POV.
> > > >
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > >  content.tex     |  9 ++++++++-
> > > >  packed-ring.tex | 23 +++++++++++++++++++++++
> > > >  split-ring.tex  | 26 ++++++++++++++++++++++++--
> > > >  3 files changed, 55 insertions(+), 3 deletions(-)
> > >
> > > Hi Michael,
> > > What is the status of this feature?
> > >
> > > There is a Google Summer of Code project to implement VIRTIO_F_IN_ORDER
> > > in QEMU and Linux
> > > (https://wiki.qemu.org/Google_Summer_of_Code_2022#VIRTIO_F_IN_ORDER_support_for_virtio_devices).
> > >
> > > I wanted to check if you still want to pursue PARTIAL_ORDER. Maybe
> > > Stefano and Eugenio (the mentors) will find a way to connect it to the
> > > GSoC project.
> > >
> >
> > If the status is close to being merged I'm totally in to implement
> > either partial order first or to implement both in the project. In
> > some sense it superseded IN_ORDER.
> >
> > Thanks!
>
> I think it's blocked on lack of implementation actually.
> TC was concerned whether there are any correctness issues.
> An implementation which can be stress-tested would resolve this.
>

Got it, so then it is worth trying to allocate some time to implement it IMO.

Thanks!

>
> > > Thanks to Ilya Maximets for mentioning PARTIAL_ORDER!
> > >
> > > Stefan
> > >
> > > >
> > > > diff --git a/content.tex b/content.tex
> > > > index 91735e3..8494eb6 100644
> > > > --- a/content.tex
> > > > +++ b/content.tex
> > > > @@ -296,7 +296,11 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
> > > >
> > > >  Some devices always use descriptors in the same order in which
> > > >  they have been made available. These devices can offer the
> > > > -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> > > > +VIRTIO_F_IN_ORDER feature.  Other devices sometimes use
> > > > +descriptors in the same order in which they have been made
> > > > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > > > +feature. If one of the features VIRTIO_F_IN_ORDER or
> > > > +VIRTIO_F_PARTIAL_ORDER is negotiated, this knowledge
> > > >  might allow optimizations or simplify driver and/or device code.
> > > >
> > > >  Each virtqueue can consist of up to 3 parts:
> > > > @@ -6132,6 +6136,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > > >    that the driver passes extra data (besides identifying the virtqueue)
> > > >    in its device notifications.
> > > >    See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
> > > > +  \item[VIRTIO_F_PARTIAL_ORDER(39)] This feature indicates
> > > > +  that device has ability to indicate use of (some of) buffers by the device in the same
> > > > +  order in which they have been made available.
> > > >  \end{description}
> > > >
> > > >  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > > > diff --git a/packed-ring.tex b/packed-ring.tex
> > > > index ea92543..a120a19 100644
> > > > --- a/packed-ring.tex
> > > > +++ b/packed-ring.tex
> > > > @@ -284,6 +284,29 @@ \subsection{In-order use of descriptors}
> > > >  only writing out a single used descriptor with the Buffer ID
> > > >  corresponding to the last descriptor in the batch.
> > > >
> > > > +Other devices sometimes use
> > > > +descriptors in the same order in which they have been made
> > > > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > > > +feature. If negotiated, whenever device has used all buffers
> > > > +since the previous used buffer in the same order
> > > > +in which they have been made available, device can set the
> > > > +VIRTQ_DESC_F_FLUSH flag in the used descriptor.
> > > > +\begin{lstlisting}
> > > > +#define VIRTQ_DESC_F_FLUSH      8
> > > > +\end{lstlisting}
> > > > +
> > > > +This knowledge allows
> > > > +devices to notify the use of a batch of buffers to the driver by
> > > > +only writing out a single used descriptor with the Buffer ID
> > > > +corresponding to the last descriptor in the batch,
> > > > +and VIRTQ_DESC_F_FLUSH set.
> > > > +
> > > > +Note that device is only allowed to batch buffers in this way
> > > > +if the previous used descriptor also has the VIRTQ_DESC_F_FLUSH
> > > > +flag set, as a result, considering the group of buffers
> > > > +used between two buffers with VIRTQ_DESC_F_FLUSH set,
> > > > +either all of them constitute a batch, or none at all.
> > > > +
> > > >  The device then skips forward in the ring according to the size of
> > > >  the batch. The driver needs to look up the used Buffer ID and
> > > >  calculate the batch size to be able to advance to where the next
> > > > diff --git a/split-ring.tex b/split-ring.tex
> > > > index 123ac9f..cf197f8 100644
> > > > --- a/split-ring.tex
> > > > +++ b/split-ring.tex
> > > > @@ -398,10 +398,11 @@ \subsection{The Virtqueue Used Ring}\label{sec:Basic Facilities of a Virtio Devi
> > > >          le16 avail_event; /* Only if VIRTIO_F_EVENT_IDX */
> > > >  };
> > > >
> > > > -/* le32 is used here for ids for padding reasons. */
> > > >  struct virtq_used_elem {
> > > >          /* Index of start of used descriptor chain. */
> > > > -        le32 id;
> > > > +        le16 id;
> > > > +#define VIRTQ_USED_ELEM_F_FLUSH  0x8000
> > > > +        le16 flags;
> > > >          /* Total length of the descriptor chain which was used (written to) */
> > > >          le32 len;
> > > >  };
> > > > @@ -481,6 +482,27 @@ \subsection{In-order use of descriptors}
> > > >  corresponding to the head entry of the
> > > >  descriptor chain describing the last buffer in the batch.
> > > >
> > > > +Other devices sometimes use
> > > > +descriptors in the same order in which they have been made
> > > > +available. These devices can offer the VIRTIO_F_PARTIAL_ORDER
> > > > +feature. If negotiated, whenever device has used all buffers
> > > > +since the previous used buffer in the same order
> > > > +in which they have been made available, device can set the
> > > > +VIRTQ_USED_ELEM_F_FLUSH flag in the used ring entry.
> > > > +
> > > > +This knowledge allows
> > > > +devices to notify the use of a batch of buffers to the driver by
> > > > +only writing out single used ring entry with the \field{id}
> > > > +corresponding to the head entry of the
> > > > +descriptor chain describing the last buffer in the batch,
> > > > +and VIRTQ_USED_ELEM_F_FLUSH set.
> > > > +
> > > > +Note that device is only allowed to batch buffers in this way
> > > > +if the previous used ring entry also has the VIRTQ_USED_ELEM_F_FLUSH
> > > > +flag set, as a result, considering the group of buffers
> > > > +used between two buffers with VIRTQ_USED_ELEM_F_FLUSH set,
> > > > +either all of them constitute a batch, or none at all.
> > > > +
> > > >  The device then skips forward in the ring according to the size of
> > > >  the batch. Accordingly, it increments the used \field{idx} by the
> > > >  size of the batch.
> > > > --
> > > > MST
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-03-30  9:03 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-10 16:15 [virtio] [PATCH RFC] VIRTIO_F_PARTIAL_ORDER for page fault handling Michael S. Tsirkin
2020-08-10 16:59 ` Cornelia Huck
2020-08-11  8:23   ` Michael S. Tsirkin
2020-08-11  8:39     ` Cornelia Huck
2020-08-11 14:53     ` [virtio-comment] RE: [virtio-dev] " Lars Ganrot
2020-08-11 15:43       ` Lars Ganrot
2020-08-12 12:50         ` [virtio] " Cornelia Huck
2020-08-12 15:55           ` [virtio-comment] " Lars Ganrot
2020-08-13 23:17             ` [virtio] " Michael S. Tsirkin
2020-08-17  8:11               ` Lars Ganrot
2021-09-06  6:33                 ` Michael S. Tsirkin
2020-08-13 20:45       ` [virtio] " Michael S. Tsirkin
2022-03-29  8:33 ` [virtio-dev] " Stefan Hajnoczi
2022-03-29 10:30   ` Eugenio Perez Martin
2022-03-29 14:40     ` [virtio-comment] " Michael S. Tsirkin
2022-03-30  9:03       ` Eugenio Perez Martin
2022-03-29 14:39   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.