All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
@ 2011-06-01 10:25 Michael S. Tsirkin
  2011-06-02  1:49 ` Rusty Russell
  2011-06-02  1:49 ` Rusty Russell
  0 siblings, 2 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2011-06-01 10:25 UTC (permalink / raw)
  To: virtualization
  Cc: rusty, habanero, Shirley Ma, Krishna Kumar2, kvm, steved,
	Tom Lendacky, borntraeger, avi, bryanv

Add an option to modify the notificatin
hand-off in virtio to be basically like Xen:
each side published an index, the other side only triggers
an event when it crosses that index value
(Xen event indexes start at 1, ours start at 0 for
backward-compatiblity, but that's minor).

Since we've run out of bits in the 32 bit field,
I added another 32 bit and bit 31 enables that.

I started with using both flags and indexes in parallel,
but switched to doing either-or: this means we do
not need to tweak memory access ordering as index access just
replaces flags access.

A note on naming: the index replacing avail->flags is named
used_event, the index replacing used->flags is named
avail_event to stress the fact that these actually
point into the other side of the ring:
event is triggered when avail->idx == used->avail_event + 1
and when used->idx == avail->used_event + 1, respectively.

I also documented some more the ordering rules wrt flags/event index
field update.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

I switched to fedora 15 so a new lyx - hope it's not a problem.
Sorry, no PDF as lyx 2.0 seems to have trouble producing them for me.


 virtio-spec.lyx |  756 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 729 insertions(+), 27 deletions(-)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index f7c9c38..4f460ce 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -1,21 +1,29 @@
-#LyX 1.6.7 created this file. For more info see http://www.lyx.org/
-\lyxformat 345
+#LyX 2.0 created this file. For more info see http://www.lyx.org/
+\lyxformat 413
 \begin_document
 \begin_header
 \textclass report
 \use_default_options false
+\maintain_unincluded_children false
 \language english
+\language_package default
 \inputencoding auto
+\fontencoding global
 \font_roman default
 \font_sans default
 \font_typewriter default
 \font_default_family default
+\use_non_tex_fonts false
 \font_sc false
 \font_osf false
 \font_sf_scale 100
 \font_tt_scale 100
 
 \graphics default
+\default_output_format default
+\output_sync 0
+\bibtex_command default
+\index_command default
 \paperfontsize default
 \spacing single
 \use_hyperref false
@@ -23,9 +31,18 @@
 \use_geometry false
 \use_amsmath 1
 \use_esint 1
+\use_mhchem 1
+\use_mathdots 1
 \cite_engine basic
 \use_bibtopic false
+\use_indices false
 \paperorientation portrait
+\suppress_date false
+\use_refstyle 0
+\index Index
+\shortcut idx
+\color #008000
+\end_index
 \secnumdepth 3
 \tocdepth 3
 \paragraph_separation skip
@@ -36,8 +53,10 @@
 \paperpagestyle default
 \tracking_changes true
 \output_changes true
-\author "" 
-\author "" 
+\html_math_output 0
+\html_css_as_file 0
+\html_be_strict false
+\author 1 "Michael S. Tsirkin" 
 \end_header
 
 \begin_body
@@ -193,7 +212,7 @@ Each virtqueue occupies two or more physically-contiguous pages (defined,
 \begin_layout Standard
 \begin_inset Tabular
 <lyxtabular version="3" rows="1" columns="4">
-<features>
+<features tabularvalignment="middle">
 <column alignment="center" valignment="top" width="0">
 <column alignment="center" valignment="top" width="0">
 <column alignment="center" valignment="top" width="0">
@@ -308,7 +327,7 @@ The Subsystem Device ID indicates which virtio device is supported by the
 \begin_layout Standard
 \begin_inset Tabular
 <lyxtabular version="3" rows="8" columns="3">
-<features>
+<features tabularvalignment="middle">
 <column alignment="center" valignment="top" width="0">
 <column alignment="center" valignment="top" width="0">
 <column alignment="center" valignment="bottom" width="0">
@@ -661,7 +680,7 @@ The virtio header looks as follows:
 \begin_layout Standard
 \begin_inset Tabular
 <lyxtabular version="3" rows="4" columns="9">
-<features>
+<features tabularvalignment="middle">
 <column alignment="left" valignment="top" width="0">
 <column alignment="left" valignment="top" width="0">
 <column alignment="left" valignment="top" width="0">
@@ -953,6 +972,10 @@ ISR
 
 \size footnotesize
 Features
+\change_inserted 1 1304329091
+ bits 0:31
+\change_unchanged
+
 \end_layout
 
 \end_inset
@@ -964,6 +987,10 @@ Features
 
 \size footnotesize
 Features
+\change_inserted 1 1304329086
+ bits 0:31
+\change_unchanged
+
 \end_layout
 
 \end_inset
@@ -1050,7 +1077,7 @@ If MSI-X is enabled for the device, two additional fields immediately follow
 \begin_layout Standard
 \begin_inset Tabular
 <lyxtabular version="3" rows="4" columns="3">
-<features>
+<features tabularvalignment="middle">
 <column alignment="left" valignment="top" width="0">
 <column alignment="left" valignment="top" width="0">
 <column alignment="left" valignment="top" width="0">
@@ -1186,6 +1213,177 @@ Vector
 \end_layout
 
 \begin_layout Standard
+
+\change_inserted 1 1304328924
+Finally, if feature bits (VIRTIO_F_FEATURES_HI) this is immediately followed
+ by two additional fields:
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1 1304328925
+\begin_inset Tabular
+<lyxtabular version="3" rows="4" columns="3">
+<features tabularvalignment="middle">
+<column alignment="left" valignment="top" width="0">
+<column alignment="left" valignment="top" width="0">
+<column alignment="left" valignment="top" width="0">
+<row>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+Bits
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+32
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+32
+\end_layout
+
+\end_inset
+</cell>
+</row>
+<row>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+Read/Write
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+R
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+R+W
+\end_layout
+
+\end_inset
+</cell>
+</row>
+<row>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+Purpose
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+
+\size footnotesize
+Device
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+
+\size footnotesize
+Guest
+\end_layout
+
+\end_inset
+</cell>
+</row>
+<row>
+<cell alignment="center" valignment="top" bottomline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304329099
+
+\size footnotesize
+Features bits 32:63
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304329102
+
+\size footnotesize
+Features bits 32:63
+\end_layout
+
+\end_inset
+</cell>
+</row>
+</lyxtabular>
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
 Immediately following these general headers, there may be device-specific
  headers:
 \end_layout
@@ -1193,7 +1391,7 @@ Immediately following these general headers, there may be device-specific
 \begin_layout Standard
 \begin_inset Tabular
 <lyxtabular version="3" rows="4" columns="2">
-<features>
+<features tabularvalignment="middle">
 <column alignment="left" valignment="top" width="0">
 <column alignment="left" valignment="top" width="0">
 <row>
@@ -1348,7 +1546,20 @@ Feature Bits
 The least significant 31 bits of the first configuration field indicates
  the features that the device supports (the high bit is reserved, and will
  be used to indicate the presence of future feature bits elsewhere).
- The bits are allocated as follows:
+ 
+\change_inserted 1 1304331636
+If more than 31 feature bits are supported, the device indicates so by setting
+ feature bit 31 (see 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "cha:Reserved-Feature-Bits"
+
+\end_inset
+
+).
+ 
+\change_unchanged
+The bits are allocated as follows:
 \end_layout
 
 \begin_layout Description
@@ -1372,7 +1583,33 @@ to
 \begin_inset space ~
 \end_inset
 
-30 Feature bits reserved for extensions to the queue mechanism
+
+\change_inserted 1 1304329326
+4
+\change_deleted 1 1304329325
+3
+\change_unchanged
+0 Feature bits reserved for extensions to the queue 
+\change_inserted 1 1304540448
+and feature negotiation 
+\change_unchanged
+mechanism
+\change_inserted 1 1304540449
+s
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1 1304329398
+41
+\begin_inset space ~
+\end_inset
+
+to
+\begin_inset space ~
+\end_inset
+
+63 Feature bits reserved for future extensions
 \end_layout
 
 \begin_layout Standard
@@ -1407,6 +1644,19 @@ This allows for forwards and backwards compatibility: if the device is enhanced
  support, it will not see that feature bit in the Device Features field
  and can go into backwards compatibility mode (or, for poor implementations,
  set the FAILED Device Status bit).
+\change_inserted 1 1304329423
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1 1304331742
+Access to feature bits 32 to 63 is enabled by Guest by setting feature bit
+ 31.
+ If this bit is unset, Device must assume that all feature bits > 31 are
+ unset.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Subsubsection
@@ -1891,7 +2141,38 @@ flags
 
  field is currently 0 or 1: 1 indicating that we do not need an interrupt
  when the device consumes a descriptor from the available ring.
- This interrupt suppression is merely an optimization; it may not suppress
+ 
+\change_inserted 1 1306923367
+Alternatively, the guest can ask the device to delay interrupts until an
+ entry with an index specified by the 
+\begin_inset Quotes eld
+\end_inset
+
+used_event
+\begin_inset Quotes erd
+\end_inset
+
+ field is written in the used ring (equivalently, until the 
+\emph on
+idx
+\emph default
+ field in the used ring will reach the value 
+\emph on
+used_event + 1
+\emph default
+).
+ The method employed by the device is controlled by the VIRTIO_RING_F_EVENT_IDX
+ feature bit (see 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "cha:Reserved-Feature-Bits"
+
+\end_inset
+
+).
+ 
+\change_unchanged
+This interrupt suppression is merely an optimization; it may not suppress
  interrupts entirely.
 \end_layout
 
@@ -1940,6 +2221,17 @@ struct vring_avail {
 \begin_layout Plain Layout
 
    u16 ring[qsz]; /* qsz is the Queue Size field read from device */
+\change_inserted 1 1304329945
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304329957
+
+   u16 used_event;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -1963,8 +2255,71 @@ The used ring is where the device returns buffers once it is done with them.
 \emph on
 available
 \emph default
- ring (the flag is kept here because this is the only part of the virtqueue
- written by the device).
+ ring
+\change_inserted 1 1304540575
+.
+ Alternatively, the 
+\begin_inset Quotes eld
+\end_inset
+
+avail_event
+\begin_inset Quotes erd
+\end_inset
+
+ field can be used by the device to hint that no notification is necessary
+ until an entry with an index specified by the 
+\begin_inset Quotes eld
+\end_inset
+
+avail_event
+\begin_inset Quotes erd
+\end_inset
+
+ is written in the available ring (equivalently, until the 
+\emph on
+idx
+\emph default
+ field in the available ring will reach the value 
+\emph on
+avail_event + 1
+\emph default
+).
+
+\change_unchanged
+ 
+\change_inserted 1 1304540614
+The method employed by the device is controlled by the guest through the
+ VIRTIO_RING_F_EVENT_IDX feature bit (see 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "cha:Reserved-Feature-Bits"
+
+\end_inset
+
+).
+ 
+\change_deleted 1 1304331235
+(the flag is kept here because this is the only part of the virtqueue written
+ by the device)
+\change_inserted 1 1304540560
+
+\begin_inset Foot
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304331235
+These fields are kept here because this is the only part of the virtqueue
+ written by the device
+\change_unchanged
+
+\end_layout
+
+\end_inset
+
+
+\change_unchanged
+.
 \end_layout
 
 \begin_layout Standard
@@ -2046,6 +2401,17 @@ struct vring_used {
 \begin_layout Plain Layout
 
     struct vring_used_elem ring[qsz];
+\change_inserted 1 1304330369
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304330380
+
+    u16 avail_event;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -2065,9 +2431,13 @@ Helpers for Managing Virtqueues
 \begin_layout Standard
 The Linux Kernel Source code contains the definitions above and helper routines
  in a more usable form, in include/linux/virtio_ring.h.
- This was explicitly licensed by IBM under the (3-clause) BSD license so
- that it can be freely used by all other projects, and is reproduced (with
- slight variation to remove Linux assumptions) in Appendix A.
+ This was explicitly licensed by IBM 
+\change_inserted 1 1304342159
+and Red Hat 
+\change_unchanged
+under the (3-clause) BSD license so that it can be freely used by all other
+ projects, and is reproduced (with slight variation to remove Linux assumptions)
+ in Appendix A.
 \end_layout
 
 \begin_layout Section
@@ -2374,12 +2744,61 @@ before
 \emph default
  checking the suppression flag: it's OK to notify gratuitously, but not
  to omit a required notification.
- So again, we use a memory barrier here before reading the flags.
+ So again, we use a memory barrier here before reading the flags
+\change_inserted 1 1304336099
+ or the avail_event field
+\change_unchanged
+.
 \end_layout
 
 \begin_layout Standard
-If the VRING_USED_F_NOTIFY flag is not set, we go ahead and write to the
- PCI configuration space.
+If 
+\change_inserted 1 1304336234
+the VIRTIO_F_RING_EVENT_IDX feature is not negotiated, and if 
+\change_unchanged
+the VRING_USED_F_NOTIFY flag is not set, we go ahead and write to the PCI
+ configuration space.
+\change_inserted 1 1304336255
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1 1304336617
+If the VIRTIO_F_RING_EVENT_IDX feature is negotiated, we read the avail_event
+ field in the available ring structure.
+ If the available index crossed_the 
+\emph on
+avail_event
+\emph default
+ field value since the last notification, we go ahead and write to the PCI
+ configuration space.
+ The 
+\emph on
+avail_event
+\emph default
+ field wraps naturally at 65536 as well:
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1 1304336524
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304336569
+
+(u16)(new_idx - avail_event - 1) < (u16)(new_idx - old_idx)
+\end_layout
+
+\end_inset
+
+
+\change_unchanged
+
 \end_layout
 
 \begin_layout Subsection
@@ -2408,8 +2827,66 @@ Update the used ring idx.
 \end_layout
 
 \begin_layout Enumerate
-If the VRING_AVAIL_F_NO_INTERRUPT flag is not set in avail\SpecialChar \nobreakdash-
->flags:
+
+\change_inserted 1 1304336736
+Determine whether an interrupt is necessary:
+\end_layout
+
+\begin_deeper
+\begin_layout Enumerate
+
+\change_inserted 1 1306923440
+If the VIRTIO_F_RING_EVENT_IDX feature is not negotiated: check if 
+\change_deleted 1 1304336781
+I
+\change_unchanged
+f the VRING_AVAIL_F_NO_INTERRUPT flag is not set in avail\SpecialChar \nobreakdash-
+>flags
+\change_inserted 1 1304336788
+
+\end_layout
+
+\begin_layout Enumerate
+
+\change_deleted 1 1304336785
+:
+\change_inserted 1 1306923443
+If the VIRTIO_F_RING_EVENT_IDX feature is negotiated: check whether the
+ used index crossed the 
+\emph on
+used_event
+\emph default
+ field value since the last update.
+ The 
+\emph on
+used_event
+\emph default
+ field wraps naturally at 65536 as well:
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304336902
+
+(u16)(new_idx - used_event - 1) < (u16)(new_idx - old_idx)
+\end_layout
+
+\end_inset
+
+
+\change_unchanged
+
+\end_layout
+
+\end_deeper
+\begin_layout Enumerate
+
+\change_inserted 1 1304336714
+If an interrupt is necessary:
+\change_unchanged
+
 \end_layout
 
 \begin_deeper
@@ -2464,13 +2941,87 @@ If MSI-X capability is enabled: look through the used rings of each virtqueue
 \end_layout
 
 \begin_layout Standard
+
+\change_inserted 1 1306923408
+For each ring, guest should then disable interrupts by writing VRING_AVAIL_F_NO_
+INTERRUPT flag in avail structure, if required.
+ It can then process used ring entries finally enabling interrupts by clearing
+ the VRING_AVAIL_F_NO_INTERRUPT flag or updating the EVENT_IDX field in
+ the available structure, Guest should then execute a memory barrier, and
+ then recheck the ring empty condition.
+ This is necessary to handle the case where, after the last check and before
+ enabling interrupts, an interrupt has been suppressed by the device:
+\end_layout
+
+\begin_layout Standard
 \begin_inset listings
 inline false
 status open
 
 \begin_layout Plain Layout
 
-while (vq->last_seen_used != vring->used.idx) {
+\change_inserted 1 1304342051
+
+vring_disable_interrupts(vq);
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341878
+
+for (;;) {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341880
+
+    if 
+\change_deleted 1 1304341882
+while 
+\change_unchanged
+(vq->last_seen_used != vring->used.idx) {
+\change_inserted 1 1304341888
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304342047
+
+		vring_enable_interrupts(vq);
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341986
+
+		mb();
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341964
+
+		if (vq->last_seen_used != vring->used.idx)
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341974
+
+			break;
+\change_unchanged
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341887
+
+    }
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -2668,6 +3219,7 @@ Clusters of functionality which are always implemented together can use
 \begin_layout Standard
 \begin_inset CommandInset nomencl_print
 LatexCommand printnomenclature
+set_width "none"
 
 \end_inset
 
@@ -2721,6 +3273,15 @@ status open
 \begin_layout Plain Layout
 
  * Copyright 2007, 2009, IBM Corporation
+\change_inserted 1 1304341032
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341075
+
+ * Copyright 2011, Red Hat, Inc
 \end_layout
 
 \begin_layout Plain Layout
@@ -3019,6 +3580,17 @@ struct vring_avail {
 \begin_layout Plain Layout
 
         uint16_t ring[];
+\change_inserted 1 1304340808
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304340816
+
+        uint16_t used_event;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -3090,6 +3662,17 @@ struct vring_used {
 \begin_layout Plain Layout
 
         struct vring_used_elem ring[];
+\change_inserted 1 1304340824
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304340831
+
+        uint16_t avail_event;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -3227,7 +3810,13 @@ struct vring {
 
 \begin_layout Plain Layout
 
- *      __u16 used_idx;
+ *      __u16 
+\change_inserted 1 1306923408
+EVENT_IDX
+\change_deleted 1 1306923408
+used_idx
+\change_unchanged
+;
 \end_layout
 
 \begin_layout Plain Layout
@@ -3326,12 +3915,58 @@ static inline unsigned vring_size(unsigned int num, unsigned long align)
 
 \begin_layout Plain Layout
 
-                + sizeof(uint16_t)*2 + sizeof(struct vring_used_elem)*num;
+                + sizeof(uint16_t)*
+\change_deleted 1 1304340844
+2
+\change_inserted 1 1304340844
+3
+\change_unchanged
+ + sizeof(struct vring_used_elem)*num;
+\end_layout
+
+\begin_layout Plain Layout
+
+}
+\change_inserted 1 1304340918
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304340918
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304340987
+
+static inline int vring_need_event(uint16_t event_idx, uint16_t new_idx,
+ uint16_t old_idx)
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304340944
+
+{
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341001
+
+         return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx
+ - old_idx); 
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_inserted 1 1304340938
+
 }
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -3355,7 +3990,13 @@ Appendix B: Reserved Feature Bits
 \end_layout
 
 \begin_layout Standard
-Currently there are three device-independent feature bits defined:
+Currently there are 
+\change_inserted 1 1306923235
+five
+\change_deleted 1 1304330657
+three
+\change_unchanged
+ device-independent feature bits defined:
 \end_layout
 
 \begin_layout Description
@@ -3365,7 +4006,11 @@ VIRTIO_F_NOTIFY_ON_EMPTY
 
 (24) Negotiating this feature indicates that the driver wants an interrupt
  if the device runs out of available descriptors on a virtqueue, even though
- interrupts are suppressed using the VRING_AVAIL_F_NO_INTERRUPT flag.
+ interrupts are suppressed using the VRING_AVAIL_F_NO_INTERRUPT flag
+\change_inserted 1 1304341161
+ or the used_event field
+\change_unchanged
+.
  An example of this is the networking driver: it doesn't need to know every
  time a packet is transmitted, but it does need to free the transmitted
  packets a finite time after they are transmitted.
@@ -3390,6 +4035,53 @@ reference "sub:Indirect-Descriptors"
 \end_layout
 
 \begin_layout Description
+
+\change_inserted 1 1306923206
+VIRTIO_F_RING_EVENT_IDX(29) This feature enables the 
+\emph on
+used_event
+\emph default
+ and the 
+\emph on
+avail_event
+\emph default
+ fields.
+ If set, it indicates that the device should ignore the 
+\emph on
+flags
+\emph default
+ field in the available ring structure.
+ Instead, the
+\emph on
+ used_event
+\emph default
+ field in this structure is used by guest to suppress device interrupts.
+ Further, the driver should ignore the 
+\emph on
+flags
+\emph default
+ field in the used ring structure.
+ Instead, the 
+\emph on
+avail_event
+\emph default
+ field in this structure is used by the device to suppress notifications.
+ If unset, the driver should ignore the 
+\emph on
+used_event
+\emph default
+ field; the device should ignore the 
+\emph on
+avail_event
+\emph default
+ field; the 
+\emph on
+flags
+\emph default
+ field is used
+\end_layout
+
+\begin_layout Description
 VIRTIO_F_BAD_FEATURE(30) This feature should never be negotiated by the
  guest; doing so is an indication that the guest is faulty
 \begin_inset Foot
@@ -3403,6 +4095,16 @@ An experimental virtio PCI driver contained in Linux version 2.6.25 had this
 \end_inset
 
 
+\change_inserted 1 1304330854
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1 1304330961
+VIRTIO_F_FEATURES_HIGH(31) This feature indicates that the device supports
+ feature bits 32:63.
+ If unset, feature bits 32:63 are unset.
 \end_layout
 
 \begin_layout Chapter*
-- 
1.7.5.53.gc233e

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
  2011-06-01 10:25 [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes Michael S. Tsirkin
  2011-06-02  1:49 ` Rusty Russell
@ 2011-06-02  1:49 ` Rusty Russell
  2011-08-03 16:05   ` Michael S. Tsirkin
  2011-08-03 16:05   ` Michael S. Tsirkin
  1 sibling, 2 replies; 14+ messages in thread
From: Rusty Russell @ 2011-06-02  1:49 UTC (permalink / raw)
  To: Michael S. Tsirkin, virtualization
  Cc: habanero, Shirley Ma, Krishna Kumar2, kvm, steved, Tom Lendacky,
	borntraeger, avi, bryanv

On Wed, 1 Jun 2011 13:25:48 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> Add an option to modify the notificatin
> hand-off in virtio to be basically like Xen:
> each side published an index, the other side only triggers
> an event when it crosses that index value
> (Xen event indexes start at 1, ours start at 0 for
> backward-compatiblity, but that's minor).
> 
> Since we've run out of bits in the 32 bit field,
> I added another 32 bit and bit 31 enables that.

OK.  I've applied this, and published it as the 0.9 draft.

Thanks!
Rusty.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
  2011-06-01 10:25 [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes Michael S. Tsirkin
@ 2011-06-02  1:49 ` Rusty Russell
  2011-06-02  1:49 ` Rusty Russell
  1 sibling, 0 replies; 14+ messages in thread
From: Rusty Russell @ 2011-06-02  1:49 UTC (permalink / raw)
  To: Michael S. Tsirkin, virtualization
  Cc: Krishna Kumar2, habanero, kvm, steved, Shirley Ma, borntraeger,
	Tom Lendacky, avi

On Wed, 1 Jun 2011 13:25:48 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> Add an option to modify the notificatin
> hand-off in virtio to be basically like Xen:
> each side published an index, the other side only triggers
> an event when it crosses that index value
> (Xen event indexes start at 1, ours start at 0 for
> backward-compatiblity, but that's minor).
> 
> Since we've run out of bits in the 32 bit field,
> I added another 32 bit and bit 31 enables that.

OK.  I've applied this, and published it as the 0.9 draft.

Thanks!
Rusty.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
  2011-06-02  1:49 ` Rusty Russell
  2011-08-03 16:05   ` Michael S. Tsirkin
@ 2011-08-03 16:05   ` Michael S. Tsirkin
  2011-08-03 16:15     ` Gerd Hoffmann
  2011-08-03 16:15     ` Gerd Hoffmann
  1 sibling, 2 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2011-08-03 16:05 UTC (permalink / raw)
  To: Rusty Russell
  Cc: virtualization, habanero, Shirley Ma, Krishna Kumar2, kvm,
	steved, Tom Lendacky, borntraeger, avi, bryanv

On Thu, Jun 02, 2011 at 11:19:35AM +0930, Rusty Russell wrote:
> On Wed, 1 Jun 2011 13:25:48 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > Add an option to modify the notificatin
> > hand-off in virtio to be basically like Xen:
> > each side published an index, the other side only triggers
> > an event when it crosses that index value
> > (Xen event indexes start at 1, ours start at 0 for
> > backward-compatiblity, but that's minor).
> > 
> > Since we've run out of bits in the 32 bit field,
> > I added another 32 bit and bit 31 enables that.
> 
> OK.  I've applied this, and published it as the 0.9 draft.
> 
> Thanks!
> Rusty.

There's something that is bothering me: each such change
increases the size of the config.
Now, on PCI it's in io space which is much constrained.
It might not seem like 32 or 64 bytes is a lot, but in practice
there's a problem with a bridged setup: bridges
sometimes need resources pre-allocated for all devices
that might land behind them.

E.g. with 32 bridges, and 32 devices behind each one,
the available 64K space gets us only 64 bytes per device.

Thankfully even with this change, we are still below that
limit. But for example, this seems to push the size of the config for
virtio net from 32 to 34 bytes. The PCI BAR is a power of two, so it
will be exactly 64 bytes then.

io seems to work much better than memory on kvm, so
we will need to stick to that for datapath.
But maybe it's time to start putting non-datapath in memory?

Thanks,

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
  2011-06-02  1:49 ` Rusty Russell
@ 2011-08-03 16:05   ` Michael S. Tsirkin
  2011-08-03 16:05   ` Michael S. Tsirkin
  1 sibling, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2011-08-03 16:05 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Krishna Kumar2, habanero, kvm, steved, Shirley Ma,
	virtualization, borntraeger, Tom Lendacky, avi

On Thu, Jun 02, 2011 at 11:19:35AM +0930, Rusty Russell wrote:
> On Wed, 1 Jun 2011 13:25:48 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > Add an option to modify the notificatin
> > hand-off in virtio to be basically like Xen:
> > each side published an index, the other side only triggers
> > an event when it crosses that index value
> > (Xen event indexes start at 1, ours start at 0 for
> > backward-compatiblity, but that's minor).
> > 
> > Since we've run out of bits in the 32 bit field,
> > I added another 32 bit and bit 31 enables that.
> 
> OK.  I've applied this, and published it as the 0.9 draft.
> 
> Thanks!
> Rusty.

There's something that is bothering me: each such change
increases the size of the config.
Now, on PCI it's in io space which is much constrained.
It might not seem like 32 or 64 bytes is a lot, but in practice
there's a problem with a bridged setup: bridges
sometimes need resources pre-allocated for all devices
that might land behind them.

E.g. with 32 bridges, and 32 devices behind each one,
the available 64K space gets us only 64 bytes per device.

Thankfully even with this change, we are still below that
limit. But for example, this seems to push the size of the config for
virtio net from 32 to 34 bytes. The PCI BAR is a power of two, so it
will be exactly 64 bytes then.

io seems to work much better than memory on kvm, so
we will need to stick to that for datapath.
But maybe it's time to start putting non-datapath in memory?

Thanks,

-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
  2011-08-03 16:05   ` Michael S. Tsirkin
  2011-08-03 16:15     ` Gerd Hoffmann
@ 2011-08-03 16:15     ` Gerd Hoffmann
  2011-08-03 16:29       ` Michael S. Tsirkin
                         ` (3 more replies)
  1 sibling, 4 replies; 14+ messages in thread
From: Gerd Hoffmann @ 2011-08-03 16:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rusty Russell, virtualization, habanero, Shirley Ma,
	Krishna Kumar2, kvm, steved, Tom Lendacky, borntraeger, avi,
	bryanv

   Hi,

> E.g. with 32 bridges, and 32 devices behind each one,
> the available 64K space gets us only 64 bytes per device.

15 bridges (with io window enabled) max, the smallest io window you can 
assign to a bridge is 4k, and you need some space for the devices on the 
root bus ...

cheers,
   Gerd


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
  2011-08-03 16:05   ` Michael S. Tsirkin
@ 2011-08-03 16:15     ` Gerd Hoffmann
  2011-08-03 16:15     ` Gerd Hoffmann
  1 sibling, 0 replies; 14+ messages in thread
From: Gerd Hoffmann @ 2011-08-03 16:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Krishna Kumar2, habanero, kvm, steved, Shirley Ma,
	virtualization, borntraeger, Tom Lendacky, avi

   Hi,

> E.g. with 32 bridges, and 32 devices behind each one,
> the available 64K space gets us only 64 bytes per device.

15 bridges (with io window enabled) max, the smallest io window you can 
assign to a bridge is 4k, and you need some space for the devices on the 
root bus ...

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
  2011-08-03 16:15     ` Gerd Hoffmann
  2011-08-03 16:29       ` Michael S. Tsirkin
@ 2011-08-03 16:29       ` Michael S. Tsirkin
  2011-08-03 16:39       ` Michael S. Tsirkin
  2011-08-03 16:39       ` Michael S. Tsirkin
  3 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2011-08-03 16:29 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Rusty Russell, virtualization, habanero, Shirley Ma,
	Krishna Kumar2, kvm, steved, Tom Lendacky, borntraeger, avi,
	bryanv

On Wed, Aug 03, 2011 at 06:15:33PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> >E.g. with 32 bridges, and 32 devices behind each one,
> >the available 64K space gets us only 64 bytes per device.
> 
> 15 bridges (with io window enabled) max, the smallest io window you
> can assign to a bridge is 4k,

Hmm true, I missed that. So with 32 devs we get 256 bytes per device.
We can still get low on that space when using
multifunction devices though (there could be up to 256 functions behind
a bridge, which only leaves 16 bytes per device).

BTW, this limitation will be a problem for pci express
devices.

> and you need some space for the
> devices on the root bus ...
> 
> cheers,
>   Gerd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
  2011-08-03 16:15     ` Gerd Hoffmann
@ 2011-08-03 16:29       ` Michael S. Tsirkin
  2011-08-03 16:29       ` Michael S. Tsirkin
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2011-08-03 16:29 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Krishna Kumar2, habanero, kvm, steved, Shirley Ma,
	virtualization, borntraeger, Tom Lendacky, avi

On Wed, Aug 03, 2011 at 06:15:33PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> >E.g. with 32 bridges, and 32 devices behind each one,
> >the available 64K space gets us only 64 bytes per device.
> 
> 15 bridges (with io window enabled) max, the smallest io window you
> can assign to a bridge is 4k,

Hmm true, I missed that. So with 32 devs we get 256 bytes per device.
We can still get low on that space when using
multifunction devices though (there could be up to 256 functions behind
a bridge, which only leaves 16 bytes per device).

BTW, this limitation will be a problem for pci express
devices.

> and you need some space for the
> devices on the root bus ...
> 
> cheers,
>   Gerd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
  2011-08-03 16:15     ` Gerd Hoffmann
                         ` (2 preceding siblings ...)
  2011-08-03 16:39       ` Michael S. Tsirkin
@ 2011-08-03 16:39       ` Michael S. Tsirkin
  2011-08-04  7:55         ` Gerd Hoffmann
  2011-08-04  7:55         ` Gerd Hoffmann
  3 siblings, 2 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2011-08-03 16:39 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Rusty Russell, virtualization, habanero, Shirley Ma,
	Krishna Kumar2, kvm, steved, Tom Lendacky, borntraeger, avi,
	bryanv

On Wed, Aug 03, 2011 at 06:15:33PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> >E.g. with 32 bridges, and 32 devices behind each one,
> >the available 64K space gets us only 64 bytes per device.
> 
> 15 bridges (with io window enabled) max, the smallest io window you
> can assign to a bridge is 4k, and you need some space for the
> devices on the root bus ...
> 
> cheers,
>   Gerd


Hmm, wait, we could go 32 bit, then we are not limited to 32
bridges anymore, right? Does our bios support that?


-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
  2011-08-03 16:15     ` Gerd Hoffmann
  2011-08-03 16:29       ` Michael S. Tsirkin
  2011-08-03 16:29       ` Michael S. Tsirkin
@ 2011-08-03 16:39       ` Michael S. Tsirkin
  2011-08-03 16:39       ` Michael S. Tsirkin
  3 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2011-08-03 16:39 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Krishna Kumar2, habanero, kvm, steved, Shirley Ma,
	virtualization, borntraeger, Tom Lendacky, avi

On Wed, Aug 03, 2011 at 06:15:33PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> >E.g. with 32 bridges, and 32 devices behind each one,
> >the available 64K space gets us only 64 bytes per device.
> 
> 15 bridges (with io window enabled) max, the smallest io window you
> can assign to a bridge is 4k, and you need some space for the
> devices on the root bus ...
> 
> cheers,
>   Gerd


Hmm, wait, we could go 32 bit, then we are not limited to 32
bridges anymore, right? Does our bios support that?


-- 
MST

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
  2011-08-03 16:39       ` Michael S. Tsirkin
  2011-08-04  7:55         ` Gerd Hoffmann
@ 2011-08-04  7:55         ` Gerd Hoffmann
  1 sibling, 0 replies; 14+ messages in thread
From: Gerd Hoffmann @ 2011-08-04  7:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rusty Russell, virtualization, habanero, Shirley Ma,
	Krishna Kumar2, kvm, steved, Tom Lendacky, borntraeger, avi,
	bryanv

   Hi,

> Hmm, wait, we could go 32 bit, then we are not limited to 32
> bridges anymore, right? Does our bios support that?

How you want go to 32bit? io space (not mmio) is fixed at 16bit in x86, 
isn't it?

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
  2011-08-03 16:39       ` Michael S. Tsirkin
@ 2011-08-04  7:55         ` Gerd Hoffmann
  2011-08-04  7:55         ` Gerd Hoffmann
  1 sibling, 0 replies; 14+ messages in thread
From: Gerd Hoffmann @ 2011-08-04  7:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Krishna Kumar2, habanero, kvm, steved, Shirley Ma,
	virtualization, borntraeger, Tom Lendacky, avi

   Hi,

> Hmm, wait, we could go 32 bit, then we are not limited to 32
> bridges anymore, right? Does our bios support that?

How you want go to 32bit? io space (not mmio) is fixed at 16bit in x86, 
isn't it?

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes
@ 2011-06-01 10:25 Michael S. Tsirkin
  0 siblings, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2011-06-01 10:25 UTC (permalink / raw)
  To: virtualization
  Cc: Krishna Kumar2, habanero, kvm, steved, Shirley Ma, borntraeger,
	Tom Lendacky, avi

Add an option to modify the notificatin
hand-off in virtio to be basically like Xen:
each side published an index, the other side only triggers
an event when it crosses that index value
(Xen event indexes start at 1, ours start at 0 for
backward-compatiblity, but that's minor).

Since we've run out of bits in the 32 bit field,
I added another 32 bit and bit 31 enables that.

I started with using both flags and indexes in parallel,
but switched to doing either-or: this means we do
not need to tweak memory access ordering as index access just
replaces flags access.

A note on naming: the index replacing avail->flags is named
used_event, the index replacing used->flags is named
avail_event to stress the fact that these actually
point into the other side of the ring:
event is triggered when avail->idx == used->avail_event + 1
and when used->idx == avail->used_event + 1, respectively.

I also documented some more the ordering rules wrt flags/event index
field update.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

I switched to fedora 15 so a new lyx - hope it's not a problem.
Sorry, no PDF as lyx 2.0 seems to have trouble producing them for me.


 virtio-spec.lyx |  756 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 729 insertions(+), 27 deletions(-)

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index f7c9c38..4f460ce 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -1,21 +1,29 @@
-#LyX 1.6.7 created this file. For more info see http://www.lyx.org/
-\lyxformat 345
+#LyX 2.0 created this file. For more info see http://www.lyx.org/
+\lyxformat 413
 \begin_document
 \begin_header
 \textclass report
 \use_default_options false
+\maintain_unincluded_children false
 \language english
+\language_package default
 \inputencoding auto
+\fontencoding global
 \font_roman default
 \font_sans default
 \font_typewriter default
 \font_default_family default
+\use_non_tex_fonts false
 \font_sc false
 \font_osf false
 \font_sf_scale 100
 \font_tt_scale 100
 
 \graphics default
+\default_output_format default
+\output_sync 0
+\bibtex_command default
+\index_command default
 \paperfontsize default
 \spacing single
 \use_hyperref false
@@ -23,9 +31,18 @@
 \use_geometry false
 \use_amsmath 1
 \use_esint 1
+\use_mhchem 1
+\use_mathdots 1
 \cite_engine basic
 \use_bibtopic false
+\use_indices false
 \paperorientation portrait
+\suppress_date false
+\use_refstyle 0
+\index Index
+\shortcut idx
+\color #008000
+\end_index
 \secnumdepth 3
 \tocdepth 3
 \paragraph_separation skip
@@ -36,8 +53,10 @@
 \paperpagestyle default
 \tracking_changes true
 \output_changes true
-\author "" 
-\author "" 
+\html_math_output 0
+\html_css_as_file 0
+\html_be_strict false
+\author 1 "Michael S. Tsirkin" 
 \end_header
 
 \begin_body
@@ -193,7 +212,7 @@ Each virtqueue occupies two or more physically-contiguous pages (defined,
 \begin_layout Standard
 \begin_inset Tabular
 <lyxtabular version="3" rows="1" columns="4">
-<features>
+<features tabularvalignment="middle">
 <column alignment="center" valignment="top" width="0">
 <column alignment="center" valignment="top" width="0">
 <column alignment="center" valignment="top" width="0">
@@ -308,7 +327,7 @@ The Subsystem Device ID indicates which virtio device is supported by the
 \begin_layout Standard
 \begin_inset Tabular
 <lyxtabular version="3" rows="8" columns="3">
-<features>
+<features tabularvalignment="middle">
 <column alignment="center" valignment="top" width="0">
 <column alignment="center" valignment="top" width="0">
 <column alignment="center" valignment="bottom" width="0">
@@ -661,7 +680,7 @@ The virtio header looks as follows:
 \begin_layout Standard
 \begin_inset Tabular
 <lyxtabular version="3" rows="4" columns="9">
-<features>
+<features tabularvalignment="middle">
 <column alignment="left" valignment="top" width="0">
 <column alignment="left" valignment="top" width="0">
 <column alignment="left" valignment="top" width="0">
@@ -953,6 +972,10 @@ ISR
 
 \size footnotesize
 Features
+\change_inserted 1 1304329091
+ bits 0:31
+\change_unchanged
+
 \end_layout
 
 \end_inset
@@ -964,6 +987,10 @@ Features
 
 \size footnotesize
 Features
+\change_inserted 1 1304329086
+ bits 0:31
+\change_unchanged
+
 \end_layout
 
 \end_inset
@@ -1050,7 +1077,7 @@ If MSI-X is enabled for the device, two additional fields immediately follow
 \begin_layout Standard
 \begin_inset Tabular
 <lyxtabular version="3" rows="4" columns="3">
-<features>
+<features tabularvalignment="middle">
 <column alignment="left" valignment="top" width="0">
 <column alignment="left" valignment="top" width="0">
 <column alignment="left" valignment="top" width="0">
@@ -1186,6 +1213,177 @@ Vector
 \end_layout
 
 \begin_layout Standard
+
+\change_inserted 1 1304328924
+Finally, if feature bits (VIRTIO_F_FEATURES_HI) this is immediately followed
+ by two additional fields:
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1 1304328925
+\begin_inset Tabular
+<lyxtabular version="3" rows="4" columns="3">
+<features tabularvalignment="middle">
+<column alignment="left" valignment="top" width="0">
+<column alignment="left" valignment="top" width="0">
+<column alignment="left" valignment="top" width="0">
+<row>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+Bits
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+32
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+32
+\end_layout
+
+\end_inset
+</cell>
+</row>
+<row>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+Read/Write
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+R
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+R+W
+\end_layout
+
+\end_inset
+</cell>
+</row>
+<row>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+Purpose
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+
+\size footnotesize
+Device
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+
+\size footnotesize
+Guest
+\end_layout
+
+\end_inset
+</cell>
+</row>
+<row>
+<cell alignment="center" valignment="top" bottomline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304328925
+
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304329099
+
+\size footnotesize
+Features bits 32:63
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" bottomline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304329102
+
+\size footnotesize
+Features bits 32:63
+\end_layout
+
+\end_inset
+</cell>
+</row>
+</lyxtabular>
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
 Immediately following these general headers, there may be device-specific
  headers:
 \end_layout
@@ -1193,7 +1391,7 @@ Immediately following these general headers, there may be device-specific
 \begin_layout Standard
 \begin_inset Tabular
 <lyxtabular version="3" rows="4" columns="2">
-<features>
+<features tabularvalignment="middle">
 <column alignment="left" valignment="top" width="0">
 <column alignment="left" valignment="top" width="0">
 <row>
@@ -1348,7 +1546,20 @@ Feature Bits
 The least significant 31 bits of the first configuration field indicates
  the features that the device supports (the high bit is reserved, and will
  be used to indicate the presence of future feature bits elsewhere).
- The bits are allocated as follows:
+ 
+\change_inserted 1 1304331636
+If more than 31 feature bits are supported, the device indicates so by setting
+ feature bit 31 (see 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "cha:Reserved-Feature-Bits"
+
+\end_inset
+
+).
+ 
+\change_unchanged
+The bits are allocated as follows:
 \end_layout
 
 \begin_layout Description
@@ -1372,7 +1583,33 @@ to
 \begin_inset space ~
 \end_inset
 
-30 Feature bits reserved for extensions to the queue mechanism
+
+\change_inserted 1 1304329326
+4
+\change_deleted 1 1304329325
+3
+\change_unchanged
+0 Feature bits reserved for extensions to the queue 
+\change_inserted 1 1304540448
+and feature negotiation 
+\change_unchanged
+mechanism
+\change_inserted 1 1304540449
+s
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1 1304329398
+41
+\begin_inset space ~
+\end_inset
+
+to
+\begin_inset space ~
+\end_inset
+
+63 Feature bits reserved for future extensions
 \end_layout
 
 \begin_layout Standard
@@ -1407,6 +1644,19 @@ This allows for forwards and backwards compatibility: if the device is enhanced
  support, it will not see that feature bit in the Device Features field
  and can go into backwards compatibility mode (or, for poor implementations,
  set the FAILED Device Status bit).
+\change_inserted 1 1304329423
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1 1304331742
+Access to feature bits 32 to 63 is enabled by Guest by setting feature bit
+ 31.
+ If this bit is unset, Device must assume that all feature bits > 31 are
+ unset.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Subsubsection
@@ -1891,7 +2141,38 @@ flags
 
  field is currently 0 or 1: 1 indicating that we do not need an interrupt
  when the device consumes a descriptor from the available ring.
- This interrupt suppression is merely an optimization; it may not suppress
+ 
+\change_inserted 1 1306923367
+Alternatively, the guest can ask the device to delay interrupts until an
+ entry with an index specified by the 
+\begin_inset Quotes eld
+\end_inset
+
+used_event
+\begin_inset Quotes erd
+\end_inset
+
+ field is written in the used ring (equivalently, until the 
+\emph on
+idx
+\emph default
+ field in the used ring will reach the value 
+\emph on
+used_event + 1
+\emph default
+).
+ The method employed by the device is controlled by the VIRTIO_RING_F_EVENT_IDX
+ feature bit (see 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "cha:Reserved-Feature-Bits"
+
+\end_inset
+
+).
+ 
+\change_unchanged
+This interrupt suppression is merely an optimization; it may not suppress
  interrupts entirely.
 \end_layout
 
@@ -1940,6 +2221,17 @@ struct vring_avail {
 \begin_layout Plain Layout
 
    u16 ring[qsz]; /* qsz is the Queue Size field read from device */
+\change_inserted 1 1304329945
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304329957
+
+   u16 used_event;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -1963,8 +2255,71 @@ The used ring is where the device returns buffers once it is done with them.
 \emph on
 available
 \emph default
- ring (the flag is kept here because this is the only part of the virtqueue
- written by the device).
+ ring
+\change_inserted 1 1304540575
+.
+ Alternatively, the 
+\begin_inset Quotes eld
+\end_inset
+
+avail_event
+\begin_inset Quotes erd
+\end_inset
+
+ field can be used by the device to hint that no notification is necessary
+ until an entry with an index specified by the 
+\begin_inset Quotes eld
+\end_inset
+
+avail_event
+\begin_inset Quotes erd
+\end_inset
+
+ is written in the available ring (equivalently, until the 
+\emph on
+idx
+\emph default
+ field in the available ring will reach the value 
+\emph on
+avail_event + 1
+\emph default
+).
+
+\change_unchanged
+ 
+\change_inserted 1 1304540614
+The method employed by the device is controlled by the guest through the
+ VIRTIO_RING_F_EVENT_IDX feature bit (see 
+\begin_inset CommandInset ref
+LatexCommand ref
+reference "cha:Reserved-Feature-Bits"
+
+\end_inset
+
+).
+ 
+\change_deleted 1 1304331235
+(the flag is kept here because this is the only part of the virtqueue written
+ by the device)
+\change_inserted 1 1304540560
+
+\begin_inset Foot
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304331235
+These fields are kept here because this is the only part of the virtqueue
+ written by the device
+\change_unchanged
+
+\end_layout
+
+\end_inset
+
+
+\change_unchanged
+.
 \end_layout
 
 \begin_layout Standard
@@ -2046,6 +2401,17 @@ struct vring_used {
 \begin_layout Plain Layout
 
     struct vring_used_elem ring[qsz];
+\change_inserted 1 1304330369
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304330380
+
+    u16 avail_event;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -2065,9 +2431,13 @@ Helpers for Managing Virtqueues
 \begin_layout Standard
 The Linux Kernel Source code contains the definitions above and helper routines
  in a more usable form, in include/linux/virtio_ring.h.
- This was explicitly licensed by IBM under the (3-clause) BSD license so
- that it can be freely used by all other projects, and is reproduced (with
- slight variation to remove Linux assumptions) in Appendix A.
+ This was explicitly licensed by IBM 
+\change_inserted 1 1304342159
+and Red Hat 
+\change_unchanged
+under the (3-clause) BSD license so that it can be freely used by all other
+ projects, and is reproduced (with slight variation to remove Linux assumptions)
+ in Appendix A.
 \end_layout
 
 \begin_layout Section
@@ -2374,12 +2744,61 @@ before
 \emph default
  checking the suppression flag: it's OK to notify gratuitously, but not
  to omit a required notification.
- So again, we use a memory barrier here before reading the flags.
+ So again, we use a memory barrier here before reading the flags
+\change_inserted 1 1304336099
+ or the avail_event field
+\change_unchanged
+.
 \end_layout
 
 \begin_layout Standard
-If the VRING_USED_F_NOTIFY flag is not set, we go ahead and write to the
- PCI configuration space.
+If 
+\change_inserted 1 1304336234
+the VIRTIO_F_RING_EVENT_IDX feature is not negotiated, and if 
+\change_unchanged
+the VRING_USED_F_NOTIFY flag is not set, we go ahead and write to the PCI
+ configuration space.
+\change_inserted 1 1304336255
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1 1304336617
+If the VIRTIO_F_RING_EVENT_IDX feature is negotiated, we read the avail_event
+ field in the available ring structure.
+ If the available index crossed_the 
+\emph on
+avail_event
+\emph default
+ field value since the last notification, we go ahead and write to the PCI
+ configuration space.
+ The 
+\emph on
+avail_event
+\emph default
+ field wraps naturally at 65536 as well:
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1 1304336524
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304336569
+
+(u16)(new_idx - avail_event - 1) < (u16)(new_idx - old_idx)
+\end_layout
+
+\end_inset
+
+
+\change_unchanged
+
 \end_layout
 
 \begin_layout Subsection
@@ -2408,8 +2827,66 @@ Update the used ring idx.
 \end_layout
 
 \begin_layout Enumerate
-If the VRING_AVAIL_F_NO_INTERRUPT flag is not set in avail\SpecialChar \nobreakdash-
->flags:
+
+\change_inserted 1 1304336736
+Determine whether an interrupt is necessary:
+\end_layout
+
+\begin_deeper
+\begin_layout Enumerate
+
+\change_inserted 1 1306923440
+If the VIRTIO_F_RING_EVENT_IDX feature is not negotiated: check if 
+\change_deleted 1 1304336781
+I
+\change_unchanged
+f the VRING_AVAIL_F_NO_INTERRUPT flag is not set in avail\SpecialChar \nobreakdash-
+>flags
+\change_inserted 1 1304336788
+
+\end_layout
+
+\begin_layout Enumerate
+
+\change_deleted 1 1304336785
+:
+\change_inserted 1 1306923443
+If the VIRTIO_F_RING_EVENT_IDX feature is negotiated: check whether the
+ used index crossed the 
+\emph on
+used_event
+\emph default
+ field value since the last update.
+ The 
+\emph on
+used_event
+\emph default
+ field wraps naturally at 65536 as well:
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304336902
+
+(u16)(new_idx - used_event - 1) < (u16)(new_idx - old_idx)
+\end_layout
+
+\end_inset
+
+
+\change_unchanged
+
+\end_layout
+
+\end_deeper
+\begin_layout Enumerate
+
+\change_inserted 1 1304336714
+If an interrupt is necessary:
+\change_unchanged
+
 \end_layout
 
 \begin_deeper
@@ -2464,13 +2941,87 @@ If MSI-X capability is enabled: look through the used rings of each virtqueue
 \end_layout
 
 \begin_layout Standard
+
+\change_inserted 1 1306923408
+For each ring, guest should then disable interrupts by writing VRING_AVAIL_F_NO_
+INTERRUPT flag in avail structure, if required.
+ It can then process used ring entries finally enabling interrupts by clearing
+ the VRING_AVAIL_F_NO_INTERRUPT flag or updating the EVENT_IDX field in
+ the available structure, Guest should then execute a memory barrier, and
+ then recheck the ring empty condition.
+ This is necessary to handle the case where, after the last check and before
+ enabling interrupts, an interrupt has been suppressed by the device:
+\end_layout
+
+\begin_layout Standard
 \begin_inset listings
 inline false
 status open
 
 \begin_layout Plain Layout
 
-while (vq->last_seen_used != vring->used.idx) {
+\change_inserted 1 1304342051
+
+vring_disable_interrupts(vq);
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341878
+
+for (;;) {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341880
+
+    if 
+\change_deleted 1 1304341882
+while 
+\change_unchanged
+(vq->last_seen_used != vring->used.idx) {
+\change_inserted 1 1304341888
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304342047
+
+		vring_enable_interrupts(vq);
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341986
+
+		mb();
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341964
+
+		if (vq->last_seen_used != vring->used.idx)
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341974
+
+			break;
+\change_unchanged
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341887
+
+    }
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -2668,6 +3219,7 @@ Clusters of functionality which are always implemented together can use
 \begin_layout Standard
 \begin_inset CommandInset nomencl_print
 LatexCommand printnomenclature
+set_width "none"
 
 \end_inset
 
@@ -2721,6 +3273,15 @@ status open
 \begin_layout Plain Layout
 
  * Copyright 2007, 2009, IBM Corporation
+\change_inserted 1 1304341032
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341075
+
+ * Copyright 2011, Red Hat, Inc
 \end_layout
 
 \begin_layout Plain Layout
@@ -3019,6 +3580,17 @@ struct vring_avail {
 \begin_layout Plain Layout
 
         uint16_t ring[];
+\change_inserted 1 1304340808
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304340816
+
+        uint16_t used_event;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -3090,6 +3662,17 @@ struct vring_used {
 \begin_layout Plain Layout
 
         struct vring_used_elem ring[];
+\change_inserted 1 1304340824
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304340831
+
+        uint16_t avail_event;
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -3227,7 +3810,13 @@ struct vring {
 
 \begin_layout Plain Layout
 
- *      __u16 used_idx;
+ *      __u16 
+\change_inserted 1 1306923408
+EVENT_IDX
+\change_deleted 1 1306923408
+used_idx
+\change_unchanged
+;
 \end_layout
 
 \begin_layout Plain Layout
@@ -3326,12 +3915,58 @@ static inline unsigned vring_size(unsigned int num, unsigned long align)
 
 \begin_layout Plain Layout
 
-                + sizeof(uint16_t)*2 + sizeof(struct vring_used_elem)*num;
+                + sizeof(uint16_t)*
+\change_deleted 1 1304340844
+2
+\change_inserted 1 1304340844
+3
+\change_unchanged
+ + sizeof(struct vring_used_elem)*num;
+\end_layout
+
+\begin_layout Plain Layout
+
+}
+\change_inserted 1 1304340918
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304340918
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304340987
+
+static inline int vring_need_event(uint16_t event_idx, uint16_t new_idx,
+ uint16_t old_idx)
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304340944
+
+{
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1 1304341001
+
+         return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx
+ - old_idx); 
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_inserted 1 1304340938
+
 }
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -3355,7 +3990,13 @@ Appendix B: Reserved Feature Bits
 \end_layout
 
 \begin_layout Standard
-Currently there are three device-independent feature bits defined:
+Currently there are 
+\change_inserted 1 1306923235
+five
+\change_deleted 1 1304330657
+three
+\change_unchanged
+ device-independent feature bits defined:
 \end_layout
 
 \begin_layout Description
@@ -3365,7 +4006,11 @@ VIRTIO_F_NOTIFY_ON_EMPTY
 
 (24) Negotiating this feature indicates that the driver wants an interrupt
  if the device runs out of available descriptors on a virtqueue, even though
- interrupts are suppressed using the VRING_AVAIL_F_NO_INTERRUPT flag.
+ interrupts are suppressed using the VRING_AVAIL_F_NO_INTERRUPT flag
+\change_inserted 1 1304341161
+ or the used_event field
+\change_unchanged
+.
  An example of this is the networking driver: it doesn't need to know every
  time a packet is transmitted, but it does need to free the transmitted
  packets a finite time after they are transmitted.
@@ -3390,6 +4035,53 @@ reference "sub:Indirect-Descriptors"
 \end_layout
 
 \begin_layout Description
+
+\change_inserted 1 1306923206
+VIRTIO_F_RING_EVENT_IDX(29) This feature enables the 
+\emph on
+used_event
+\emph default
+ and the 
+\emph on
+avail_event
+\emph default
+ fields.
+ If set, it indicates that the device should ignore the 
+\emph on
+flags
+\emph default
+ field in the available ring structure.
+ Instead, the
+\emph on
+ used_event
+\emph default
+ field in this structure is used by guest to suppress device interrupts.
+ Further, the driver should ignore the 
+\emph on
+flags
+\emph default
+ field in the used ring structure.
+ Instead, the 
+\emph on
+avail_event
+\emph default
+ field in this structure is used by the device to suppress notifications.
+ If unset, the driver should ignore the 
+\emph on
+used_event
+\emph default
+ field; the device should ignore the 
+\emph on
+avail_event
+\emph default
+ field; the 
+\emph on
+flags
+\emph default
+ field is used
+\end_layout
+
+\begin_layout Description
 VIRTIO_F_BAD_FEATURE(30) This feature should never be negotiated by the
  guest; doing so is an indication that the guest is faulty
 \begin_inset Foot
@@ -3403,6 +4095,16 @@ An experimental virtio PCI driver contained in Linux version 2.6.25 had this
 \end_inset
 
 
+\change_inserted 1 1304330854
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1 1304330961
+VIRTIO_F_FEATURES_HIGH(31) This feature indicates that the device supports
+ feature bits 32:63.
+ If unset, feature bits 32:63 are unset.
 \end_layout
 
 \begin_layout Chapter*
-- 
1.7.5.53.gc233e

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-08-04  7:55 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-01 10:25 [PATCHv3] virtio-spec: 64 bit features, used/avail event, fixes Michael S. Tsirkin
2011-06-02  1:49 ` Rusty Russell
2011-06-02  1:49 ` Rusty Russell
2011-08-03 16:05   ` Michael S. Tsirkin
2011-08-03 16:05   ` Michael S. Tsirkin
2011-08-03 16:15     ` Gerd Hoffmann
2011-08-03 16:15     ` Gerd Hoffmann
2011-08-03 16:29       ` Michael S. Tsirkin
2011-08-03 16:29       ` Michael S. Tsirkin
2011-08-03 16:39       ` Michael S. Tsirkin
2011-08-03 16:39       ` Michael S. Tsirkin
2011-08-04  7:55         ` Gerd Hoffmann
2011-08-04  7:55         ` Gerd Hoffmann
2011-06-01 10:25 Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.