All of lore.kernel.org
 help / color / mirror / Atom feed
* packed ring layout proposal v2
@ 2017-02-08  3:20 Michael S. Tsirkin
  0 siblings, 0 replies; 4+ messages in thread
From: Michael S. Tsirkin @ 2017-02-08  3:20 UTC (permalink / raw)
  To: virtio-dev; +Cc: virtualization

This is an update from v1 version.
Changes:
- update event suppression mechanism
- separate options for indirect and direct s/g
- lots of new features

---

Performance analysis of this is in my kvm forum 2016 presentation.
The idea is to have a r/w descriptor in a ring structure,
replacing the used and available ring, index and descriptor
buffer.

* Descriptor ring:

Guest adds descriptors with unique index values and DESC_HW set in flags.
Host overwrites used descriptors with correct len, index, and DESC_HW
clear.  Flags are always set/cleared last.

#define DESC_HW 0x0080

struct desc {
        __le64 addr;
        __le32 len;
        __le16 index;
        __le16 flags;
};

When DESC_HW is set, descriptor belongs to device. When it is clear,
it belongs to the driver.

We can use 1 bit to set direction
/* This marks a buffer as write-only (otherwise read-only). */
#define VRING_DESC_F_WRITE      2

* Scatter/gather support

We can use 1 bit to chain s/g entries in a request, same as virtio 1.0:

/* This marks a buffer as continuing via the next field. */
#define VRING_DESC_F_NEXT       1

Unlike virtio 1.0, all descriptors must have distinct ID values.

Also unlike virtio 1.0, use of this flag will be an optional feature
(e.g. VIRTIO_F_DESC_NEXT) so both devices and drivers can opt out of it.

* Indirect buffers

Can be marked like in virtio 1.0:

/* This means the buffer contains a table of buffer descriptors. */
#define VRING_DESC_F_INDIRECT   4

Unlike virtio 1.0, this is a table, not a list:
struct indirect_descriptor_table {
        /* The actual descriptors (16 bytes each) */
        struct virtq_desc desc[len / 16];
};

The first descriptor is located at start of the indirect descriptor
table, additional indirect descriptors come immediately afterwards.
DESC_F_WRITE is the only valid flag for descriptors in the indirect
table. Others should be set to 0 and are ignored.  id is also set to 0
and should be ignored.

virtio 1.0 seems to allow a s/g entry followed by
an indirect descriptor. This does not seem useful,
so we do not allow that anymore.

This support would be an optional feature, same as in virtio 1.0

* Batching descriptors:

virtio 1.0 allows passing a batch of descriptors in both directions, by
incrementing the used/avail index by values > 1.  We can support this by
chaining a list of descriptors through a bit the flags field.
To allow use together with s/g, a different bit will be used.

#define VRING_DESC_F_BATCH_NEXT 0x0010

Batching works for both driver and device descriptors.



* Processing descriptors in and out of order

Device processing all descriptors in order can simply flip
the DESC_HW bit as it is done with descriptors.

Device can write descriptors out in order as they are used, overwriting
descriptors that are there.

Device must not use a descriptor until DESC_HW is set.
It is only required to look at the first descriptor
submitted.

Driver must not overwrite a descriptor until DESC_HW is clear.
It is only required to look at the first descriptor
submitted.

* Device specific descriptor flags
We have a lot of unused space in the descriptor.  This can be put to
good use by reserving some flag bits for device use.
For example, network device can set a bit to request
that header in the descriptor is suppressed
(in case it's all 0s anyway). This reduces cache utilization.

Note: this feature can be supported in virtio 1.0 as well,
as we have unused bits in both descriptor and used ring there.

* Descriptor length in device descriptors

virtio 1.0 places strict requirements on descriptor length. For example
it must be 0 in used ring of TX VQ of a network device since nothing is
written.  In practice guests do not seem to use this, so we can simplify
devices a bit by removing this requirement - if length is unused it
should be ignored by driver.

Some devices use identically-sized buffers in all descriptors.
Ignoring length for driver descriptors there could be an option too.

* Writing at an offset

Some devices might want to write into some descriptors
at an offset, the offset would be in config space,
and a descriptor flag could indicate this:

#define VRING_DESC_F_OFFSET 0x0020

How exactly to use the offset would be device specific,
for example it can be used to align beginning of packet
in the 1st buffer for mergeable buffers to cache line boundary
while also aligning rest of buffers.

* Non power-of-2 ring sizes

As the ring simply wraps around, there's no reason to
require ring size to be power of two.
It can be made a separate feature though.


* Interrupt/event suppression

virtio 1.0 has two mechanisms for suppression but only
one can be used at a time. we pack them together
in a structure - one for interrupts, one for notifications:

struct event {
	__le16 idx;
	__le16 flags;
}

Both fields would be optional, with a feature bit:
VIRTIO_F_EVENT_IDX
VIRTIO_F_EVENT_FLAGS

* Flags can be used like in virtio 1.0, by storing a special
value there:

#define VRING_F_EVENT_ENABLE  0x0

#define VRING_F_EVENT_DISABLE 0x1

* Event index would be in the range 0 to 2 * Queue Size
(to detect wrap arounds) and wrap to 0 after that.

The assumption is that each side maintains an internal
descriptor counter 0 to 2 * Queue Size that wraps to 0.
In that case, interrupt triggers when counter reaches
the given value.

* If both features are negotiated, a special flags value
can be used to switch to event idx:

#define VRING_F_EVENT_IDX     0x2


* Prototype

A partial prototype can be found under
tools/virtio/ringtest/ring.c

Test run:
[mst@tuck ringtest]$ time ./ring 
real    0m0.399s
user    0m0.791s
sys     0m0.000s
[mst@tuck ringtest]$ time ./virtio_ring_0_9
real    0m0.503s
user    0m0.999s
sys     0m0.000s

It is planned to update it to this interface. Future changes and
enhancements can (and should) be tested against this prototype.

* Feature sets
In particular are we going overboard with feature bits?  It becomes hard
to support all combinations in drivers and devices.  Maybe we should
document reasonable feature sets to be supported for each device.

* Known issues/ideas

This layout is optimized for host/guest communication,
in a sense even more aggressively than virtio 1.0.
It might be suboptimal for PCI hardware implementations.
However, one notes that current virtio pci drivers are
unlikely to work with PCI hardware implementations anyway
(e.g. due to use of SMP barriers for ordering).

Suggestions for improving this are welcome but need to be tested to make
sure our main use case does not regress.  Of course some improvements
might be made optional, but if we add too many of these driver becomes
unmanageable.

---

Note: should this proposal be accepted and approved, one or more
      claims disclosed to the TC admin and listed on the Virtio TC
      IPR page https://www.oasis-open.org/committees/virtio/ipr.php
      might become Essential Claims.

-- 
MST

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: packed ring layout proposal v2
       [not found]   ` <20170209181955-mutt-send-email-mst@kernel.org>
@ 2017-02-09 18:27     ` Christian Borntraeger
  0 siblings, 0 replies; 4+ messages in thread
From: Christian Borntraeger @ 2017-02-09 18:27 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtio-dev, virtualization

On 02/09/2017 06:43 PM, Michael S. Tsirkin wrote:
[...]
>> [...] 
>>> Note: should this proposal be accepted and approved, one or more
>>>       claims disclosed to the TC admin and listed on the Virtio TC
>>>       IPR page https://www.oasis-open.org/committees/virtio/ipr.php
>>>       might become Essential Claims.
>>>
>>
>> I have trouble parsing that legal stuff. Do I read that right, that
>> these claims can be implemented as part of a virtio implementation without
>> any worries (e.g. non open source HW implementation or non open source
>> hypervisor)?
> 
> By that legal stuff do you mean the IPR or the Note?

Both in combination.
 
> Not representing Red Hat here, and definitely not legal advice, below is
> just my personal understanding of the IPR requirements.
> 
> Virtio TC operates under the Non-Assertion Mode of the OASIS IPR Policy:
> https://www.oasis-open.org/policies-guidelines/ipr#Non-Assertion-Mode
> 
> As far as I can tell, the hardware and software question is covered
> by that policy, since it states:
> 
> 	Covered Product - includes only those specific portions of a
> 	product (hardware, software or combinations thereof)
> 
> Also as far as I can tell IPR does not mention source at all, so I think
> virtio IPR would apply to open and closed source software equally.
> 
> The Note is included to satisfy the disclosure requirements.
> 
> Does this answer the question?

Yes, thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: packed ring layout proposal v2
  2017-02-08 13:37 ` packed ring layout proposal v2 Christian Borntraeger
@ 2017-02-09 17:43   ` Michael S. Tsirkin
       [not found]   ` <20170209181955-mutt-send-email-mst@kernel.org>
  1 sibling, 0 replies; 4+ messages in thread
From: Michael S. Tsirkin @ 2017-02-09 17:43 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: virtio-dev, virtualization

On Wed, Feb 08, 2017 at 02:37:57PM +0100, Christian Borntraeger wrote:
> On 02/08/2017 04:20 AM, Michael S. Tsirkin wrote:
> [...]
> > * Prototype
> > 
> > A partial prototype can be found under
> > tools/virtio/ringtest/ring.c
> > 
> > Test run:
> > [mst@tuck ringtest]$ time ./ring 
> > real    0m0.399s
> > user    0m0.791s
> > sys     0m0.000s
> > [mst@tuck ringtest]$ time ./virtio_ring_0_9
> > real    0m0.503s
> > user    0m0.999s
> > sys     0m0.000s
> 
> I see similar improvements on s390, so I think this would be a very nice 
> improvement.
> 
> [...] 
> > Note: should this proposal be accepted and approved, one or more
> >       claims disclosed to the TC admin and listed on the Virtio TC
> >       IPR page https://www.oasis-open.org/committees/virtio/ipr.php
> >       might become Essential Claims.
> > 
> 
> I have trouble parsing that legal stuff. Do I read that right, that
> these claims can be implemented as part of a virtio implementation without
> any worries (e.g. non open source HW implementation or non open source
> hypervisor)?

By that legal stuff do you mean the IPR or the Note?

Not representing Red Hat here, and definitely not legal advice, below is
just my personal understanding of the IPR requirements.

Virtio TC operates under the Non-Assertion Mode of the OASIS IPR Policy:
https://www.oasis-open.org/policies-guidelines/ipr#Non-Assertion-Mode

As far as I can tell, the hardware and software question is covered
by that policy, since it states:

	Covered Product - includes only those specific portions of a
	product (hardware, software or combinations thereof)

Also as far as I can tell IPR does not mention source at all, so I think
virtio IPR would apply to open and closed source software equally.

The Note is included to satisfy the disclosure requirements.

Does this answer the question?

-- 
MST

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: packed ring layout proposal v2
  2017-09-10  5:06 [virtio-dev] packed ring layout proposal v3 Michael S. Tsirkin
@ 2017-02-08 13:37 ` Christian Borntraeger
  2017-02-09 17:43   ` Michael S. Tsirkin
       [not found]   ` <20170209181955-mutt-send-email-mst@kernel.org>
  0 siblings, 2 replies; 4+ messages in thread
From: Christian Borntraeger @ 2017-02-08 13:37 UTC (permalink / raw)
  To: Michael S. Tsirkin, virtio-dev; +Cc: virtualization

On 02/08/2017 04:20 AM, Michael S. Tsirkin wrote:
[...]
> * Prototype
> 
> A partial prototype can be found under
> tools/virtio/ringtest/ring.c
> 
> Test run:
> [mst@tuck ringtest]$ time ./ring 
> real    0m0.399s
> user    0m0.791s
> sys     0m0.000s
> [mst@tuck ringtest]$ time ./virtio_ring_0_9
> real    0m0.503s
> user    0m0.999s
> sys     0m0.000s

I see similar improvements on s390, so I think this would be a very nice 
improvement.

[...] 
> Note: should this proposal be accepted and approved, one or more
>       claims disclosed to the TC admin and listed on the Virtio TC
>       IPR page https://www.oasis-open.org/committees/virtio/ipr.php
>       might become Essential Claims.
> 

I have trouble parsing that legal stuff. Do I read that right, that
these claims can be implemented as part of a virtio implementation without
any worries (e.g. non open source HW implementation or non open source
hypervisor)?

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-02-09 18:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-08  3:20 packed ring layout proposal v2 Michael S. Tsirkin
2017-09-10  5:06 [virtio-dev] packed ring layout proposal v3 Michael S. Tsirkin
2017-02-08 13:37 ` packed ring layout proposal v2 Christian Borntraeger
2017-02-09 17:43   ` Michael S. Tsirkin
     [not found]   ` <20170209181955-mutt-send-email-mst@kernel.org>
2017-02-09 18:27     ` Christian Borntraeger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.