All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
@ 2022-09-28  1:43 Xuan Zhuo
  2022-09-28  4:05 ` Michael S. Tsirkin
  0 siblings, 1 reply; 31+ messages in thread
From: Xuan Zhuo @ 2022-09-28  1:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Virtio-Dev, Kangjie Xu, Heng Qi, Jason Wang

References: <1663297006-64248-1-git-send-email-hengqi@linux.alibaba.com>
 <3c3cc916-c605-1ed2-d3ff-d8d8ce668f17@redhat.com>
 <20220920032824.GA125047@h68b04307.sqa.eu95>
 <CACGkMEtOHXtNtG13Zxpc+bhYJVyefp999kttz4YcPi22nMBSpQ@mail.gmail.com>
 <1663903426.8765974-1-xuanzhuo@linux.alibaba.com>
 <CACGkMEsmqngS-HYPzzV8b9dmv95=W6dJebAa=asXGmo7VEAnoA@mail.gmail.com>
 <20220923013341-mutt-send-email-mst@kernel.org>
 <619af5b3-4c38-955f-0f7f-f351f5a9527e@redhat.com>
In-Reply-To: <619af5b3-4c38-955f-0f7f-f351f5a9527e@redhat.com>


hi Michael

I see you reverted to v7, our discussion is here.

Thanks.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-28  1:43 [virtio-dev] [PATCH v8] virtio_net: support for split transport header Xuan Zhuo
@ 2022-09-28  4:05 ` Michael S. Tsirkin
  0 siblings, 0 replies; 31+ messages in thread
From: Michael S. Tsirkin @ 2022-09-28  4:05 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Virtio-Dev, Kangjie Xu, Heng Qi, Jason Wang

On Wed, Sep 28, 2022 at 09:43:36AM +0800, Xuan Zhuo wrote:
> References: <1663297006-64248-1-git-send-email-hengqi@linux.alibaba.com>
>  <3c3cc916-c605-1ed2-d3ff-d8d8ce668f17@redhat.com>
>  <20220920032824.GA125047@h68b04307.sqa.eu95>
>  <CACGkMEtOHXtNtG13Zxpc+bhYJVyefp999kttz4YcPi22nMBSpQ@mail.gmail.com>
>  <1663903426.8765974-1-xuanzhuo@linux.alibaba.com>
>  <CACGkMEsmqngS-HYPzzV8b9dmv95=W6dJebAa=asXGmo7VEAnoA@mail.gmail.com>
>  <20220923013341-mutt-send-email-mst@kernel.org>
>  <619af5b3-4c38-955f-0f7f-f351f5a9527e@redhat.com>
> In-Reply-To: <619af5b3-4c38-955f-0f7f-f351f5a9527e@redhat.com>
> 
> 
> hi Michael
> 
> I see you reverted to v7, our discussion is here.
> 
> Thanks.

Sure, I know. I responded to v7 because Jason mentioned it.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-10-08  4:37                     ` Jason Wang
                                         ` (2 preceding siblings ...)
  2022-10-18  3:07                       ` Xuan Zhuo
@ 2022-10-20  8:16                       ` Heng Qi
  3 siblings, 0 replies; 31+ messages in thread
From: Heng Qi @ 2022-10-20  8:16 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, Xuan Zhuo, Virtio-Dev, Kangjie Xu

On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > Jason I think the issue with previous proposals is that they conflict
> > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > >
> > > > >
> > > > > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > > > > to not split the header when the layout doesn't fit for header splitting.
> > > > > (And this seems the case even if we're using buffers).
> > > >
> > > > Well spec says:
> > > >
> > > >         indicates to both the device and the driver that no
> > > >         assumptions were made about framing.
> > > >
> > > > if device assumes that descriptor boundaries are where
> > > > driver wants packet to be stored that is clearly
> > > > an assumption.
> > >
> > > Yes but what I want to say is, the device can choose to not split the
> > > packet if the framing doesn't fit. Does it still comply with the above
> > > description?
> > >
> > > Thanks
> >
> > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > For example, if driver wants to split the header at some specific
> > offset this is already possible without extra functionality.
> 
> I'm not sure how this would work without the support from the device.
> This probably can only work if:
> 
> 1) the driver know what kind of packet it can receive
> 2) protocol have fixed length of the header
> 
> This is probably not true consider:
> 
> 1) TCP and UDP have different header length
> 2) IPv6 has an variable length of the header
> 
> 
> >
> > Let's keep it that way.
> >
> > Now, let's formulate what are some of the problems with the current way.
> >
> >
> >
> > A- mergeable buffers is even more flexible, since a single packet
> >   is built up of multiple buffers. And in theory device can
> >   choose arbitrary set of buffers to store a packet.
> >   So you could supply a small buffer for headers followed by a bigger
> >   one for payload, in theory even without any changes.
> >   Problem 1: However since this is not how devices currently operate,
> >   a feature bit would be helpful.
> 
> How do we know the bigger buffer is sufficient for the packet? If we
> try to allocate 64K (not sufficient for the future even) it breaks the
> effort of the mergeable buffer:
> 
> header buffer #1
> payload buffer #1
> header buffer #2
> payload buffer #2
> 
> Is the device expected to
> 
> 1) fill payload in header buffer #2, this breaks the effort that we
> want to make payload page aligned
> 2) skip header buffer #2, in this case, the device assumes the framing
> when it breaks any layout
> 
> >
> >   Problem 2: Also, in the past we found it useful to be able to figure out whether
> >   packet fits in a single buffer without looking at the header.
> >   For this reason, we have this text:
> >
> >         If a receive packet is spread over multiple buffers, the device
> >         MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> >         1 buffers) completely up to the full length of each buffer
> >         supplied by the driver.
> >
> >   if we want to keep this optimization and allow using a separate
> >   buffer for headers, then I think we could rely on the feature bit
> >   from Problem 1 and just make an exception for the first buffer.
> >   Also num_buffers is then always >= 2, maybe state this to avoid
> >   confusion.
> >
> >
> >
> >
> >
> > B- without mergeable, there's no flexibility. In particular, there can
> > not be uninitialized space between header and data.
> 
> I had two questions
> 
> 1) why is this not a problem of mergeable? There's no guarantee that
> the header is just the length of what the driver allocates for header
> buffer anyhow
> 
> E.g the header length could be smaller than the header buffer, the
> device still needs to skip part of the space in the header buffer.
> 
> 2) it should be the responsibility of the driver to handle the
> uninitialized space, it should do anything that is necessary for
> security, more below
> 


We've talked a bit more about split header so far, but there still seem to
be some issues, so let's recap.

一、 Method Discussion Review

In order to adapt to the Eric's tcp receive interface to achieve zero copy,
header and payload are required to be stored separately, and the payload is
stored in a paged alignment way. Therefore, we have discussed several options
for split header as follows:

1: method A ( depend on the descriptor chain )
|                         receive buffer                            | 
|              0th descriptor                      | 1th descriptor | 
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|      payload   | 
Method A uses a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be put
independently in a page, which is very beneficial for the zerocopy 
implemented by the upper layer. 

The advantage of method A is that the implementation is clearer, it can support normal
header spit and the rollback conditions. It can also easily support xdp. The downside is
that devices operating directly on the descriptor chain may cause the layering violation,
and also affect the performance.

2. method B ( depend on mergeable buffer)
|                   receive buffer (page)                                 | receive buffer (page) | 
| <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|         payload       | 
^
|
pointer to device

Method B is based on your previous suggestion, it is implemented based
on mergeable buffer, filling a separate page each time. 

If the split header is negotiated and the packet can be successfully split by the device,
the device needs to find at least two buffers, namely two pages, one for the virtio-net header
and transport header, and the other for the payload.

The advantage of method B is that it relies on mergeable buffer instead of the descriptor chain.
It overcomes the shortcomings of method A and can achieve the purpose of the device focusing
on the buffer instead of the descriptor. Its disadvantage is that it causes memory waste.

3. method C ( depend on mergeable buffer )
| small buffer | data buffer (page) | small buffer | data buffer (page) | small buffer | data buffer (page) |

Method B fills a separate page each time, while method C needs to fill the small buffer and
page buffer separately. Method C puts the header in small buffer and the payload in a page.

The advantage of method C is that some buffers are filled for header and data respectively,
which reduces the memory waste of method B. However, this method is difficult to weigh
the number of filled header buffers and data buffers, and an unreasonable proportion will
affect performance. For example, in a scenario with a large number of large packets,
too many header buffers will affect performance, or in a scenario with a large number of small
packets, too many data buffers can also affect performance. At the same time, if some protocols
with a large number of packets do not support split header, the existence of the header buffers
will also affect performance.

二、Points of agreement and disagreement

1. What we have now agreed upon is that:
None of the three methods break VIRTIO_F_ANY_LAYOUT, they make virtio net header and
packet header stored together.

We have now agreed to relax the following in the split header scenario,
	"indicates to both the device and the driver that no assumptions were made about framing."
because when a bigger packet comes, and a data buffer is not enough to store this packet,
the device either chooses to skip the next header buffer to break what the spec says above,
or chooses not to skip the header buffer and cannot make payload page aligned.
Therefore, all three methods need to relax the above requirements.

2. What we haven't now agreed upon is that:
The point where we don't agree now is that we don't have a more precise discussion of which
approach to take, but we're still bouncing between approaches.
At present, all three approaches seem to achieve our requirements, but each has advantages
and disadvantages. Should we focus on the most important points, such as performance to choose.
It seems a little difficult to cover everything?

三、Two forms of implementing receive zerocopy

the Eric's tcp receive interface requires the header and payload are stored in separate buffers, and the payload is
stored in a paged alignment way.

Now, io_uring also proposes a new receive zerocopy method, which requires header and payload
to be stored in separate buffers, but does not require payload page aligned.
https://lore.kernel.org/io-uring/20221007211713.170714-1-jonathan.lemon@gmail.com/T/#m678770d1fa7040fd76ed35026b93dfcbf25f6196


Thanks.

> > If we had flexibility here, this could be
> > helpful for alignment, security, etc.
> > Unfortunately, our hands are tied:
> >
> >
> >         \field{len} is particularly useful
> >         for drivers using untrusted buffers: if a driver does not know exactly
> >         how much has been written by the device, the driver would have to zero
> >         the buffer in advance to ensure no data leakage occurs.
> >
> >         For example, a network driver may hand a received buffer directly to
> >         an unprivileged userspace application.  If the network device has not
> >         overwritten the bytes which were in that buffer, this could leak the
> >         contents of freed memory from other processes to the application.
> 
> This should be a bug of the driver. For example, if the driver wants
> to implement zerocopy, it must guarantee that every byte was written
> by the device before mapping it to the userspace, if it can't it
> should do the copy instead.
> 
> >
> >
> > so all buffers have to be initialized completely up to the reported
> > used length.
> >
> > It could be that the guarantee is not relevant in some use-cases.
> > We would have to specify that this is an exception to the rule,
> > explain that drivers must be careful about information leaks.
> > Let's say there's a feature bit that adds uninitialized space
> > somewhere. How much was added? We can find out by parsing the
> > packet but once you start doing that just to assemble the skb
> > you have already lost performance.
> 
> I don't get here, for those uninitialized spaces, it looks just
> tailroom for the skb.
> 
> Thanks
> 
> > So lots of spec work, some security risks, and unclear performance.
> >
> >
> >
> >
> > Is above a fair summary?
> >
> >
> >
> > If yes I would say let's address A, but make sure we ask drivers
> > to clear the new feature bit if there's no mergeable
> > (as opposed to e.g. failing probe) so we can add
> > support for !mergeable down the road.
> >
> >
> >
> >
> >
> > --
> > MST
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-10-08  4:37                     ` Jason Wang
  2022-10-09  1:49                       ` Xuan Zhuo
  2022-10-10 17:11                       ` Michael S. Tsirkin
@ 2022-10-18  3:07                       ` Xuan Zhuo
  2022-10-20  8:16                       ` Heng Qi
  3 siblings, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2022-10-18  3:07 UTC (permalink / raw)
  To: Jason Wang; +Cc: Virtio-Dev, Kangjie Xu, Heng Qi, Michael S. Tsirkin



FYI:

A new rx zerocopy idea, which is different from Eric's tcp mmap zerocopy, the
buffer comes from user mode and put to device, no longer requires page alignment,
but still depends on split header.

   https://lore.kernel.org/io-uring/20221007211713.170714-1-jonathan.lemon@gmail.com/T/#m678770d1fa7040fd76ed35026b93dfcbf25f6196


Thanks.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-10-13  6:47                             ` Jason Wang
@ 2022-10-13 14:33                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 31+ messages in thread
From: Michael S. Tsirkin @ 2022-10-13 14:33 UTC (permalink / raw)
  To: Jason Wang; +Cc: Xuan Zhuo, Virtio-Dev, Kangjie Xu, Heng Qi

On Thu, Oct 13, 2022 at 02:47:55PM +0800, Jason Wang wrote:
> On Wed, Oct 12, 2022 at 1:05 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Oct 12, 2022 at 11:17:30AM +0800, Jason Wang wrote:
> > > On Tue, Oct 11, 2022 at 1:12 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> > > > > On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > > > > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > > > > > Jason I think the issue with previous proposals is that they conflict
> > > > > > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > > > > > > > > to not split the header when the layout doesn't fit for header splitting.
> > > > > > > > > (And this seems the case even if we're using buffers).
> > > > > > > >
> > > > > > > > Well spec says:
> > > > > > > >
> > > > > > > >         indicates to both the device and the driver that no
> > > > > > > >         assumptions were made about framing.
> > > > > > > >
> > > > > > > > if device assumes that descriptor boundaries are where
> > > > > > > > driver wants packet to be stored that is clearly
> > > > > > > > an assumption.
> > > > > > >
> > > > > > > Yes but what I want to say is, the device can choose to not split the
> > > > > > > packet if the framing doesn't fit. Does it still comply with the above
> > > > > > > description?
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > > > > > For example, if driver wants to split the header at some specific
> > > > > > offset this is already possible without extra functionality.
> > > > >
> > > > > I'm not sure how this would work without the support from the device.
> > > > > This probably can only work if:
> > > > >
> > > > > 1) the driver know what kind of packet it can receive
> > > > > 2) protocol have fixed length of the header
> > > > >
> > > > > This is probably not true consider:
> > > > >
> > > > > 1) TCP and UDP have different header length
> > > > > 2) IPv6 has an variable length of the header
> > > >
> > > > We currently say:
> > > >
> > > > If a receive packet is spread over multiple buffers, the device
> > > > MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > > > 1 buffers) completely up to the full length of each buffer
> > > > supplied by the driver.
> > > >
> > > > the idea is basically just to lift this requirement for specific
> > > > packets. Things certainly worked before, this is just an
> > > > optimization.
> > >
> > > The problem is, the offset is not fixed and can be determined by the
> > > driver. It's variable so it requires the device to parse the packet to
> > > know.
> > >
> > > Thanks
> >
> > IIUC device parsing packet and splitting the header is exactly what the
> > feature is supposed to do. It's benefitial if device is a hardware one.
> 
> Right, so in any way (even with a mergeable buffer), if the device
> parses and splits, we need to tweak the section you quote above from
> the spec, since there will be some space left in the buffer other than
> the last one?
> 
> Thanks

Exactly. My point was, we know relaxing this does not conflict with
other parts of the spec since we didn't use to require this.

> >
> >
> > > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > Let's keep it that way.
> > > > > >
> > > > > > Now, let's formulate what are some of the problems with the current way.
> > > > > >
> > > > > >
> > > > > >
> > > > > > A- mergeable buffers is even more flexible, since a single packet
> > > > > >   is built up of multiple buffers. And in theory device can
> > > > > >   choose arbitrary set of buffers to store a packet.
> > > > > >   So you could supply a small buffer for headers followed by a bigger
> > > > > >   one for payload, in theory even without any changes.
> > > > > >   Problem 1: However since this is not how devices currently operate,
> > > > > >   a feature bit would be helpful.
> > > > >
> > > > > How do we know the bigger buffer is sufficient for the packet? If we
> > > > > try to allocate 64K (not sufficient for the future even) it breaks the
> > > > > effort of the mergeable buffer:
> > > > >
> > > > > header buffer #1
> > > > > payload buffer #1
> > > > > header buffer #2
> > > > > payload buffer #2
> > > > >
> > > > > Is the device expected to
> > > > >
> > > > > 1) fill payload in header buffer #2, this breaks the effort that we
> > > > > want to make payload page aligned
> > > > > 2) skip header buffer #2, in this case, the device assumes the framing
> > > > > when it breaks any layout
> > > > >
> > > > > >
> > > > > >   Problem 2: Also, in the past we found it useful to be able to figure out whether
> > > > > >   packet fits in a single buffer without looking at the header.
> > > > > >   For this reason, we have this text:
> > > > > >
> > > > > >         If a receive packet is spread over multiple buffers, the device
> > > > > >         MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > > > > >         1 buffers) completely up to the full length of each buffer
> > > > > >         supplied by the driver.
> > > > > >
> > > > > >   if we want to keep this optimization and allow using a separate
> > > > > >   buffer for headers, then I think we could rely on the feature bit
> > > > > >   from Problem 1 and just make an exception for the first buffer.
> > > > > >   Also num_buffers is then always >= 2, maybe state this to avoid
> > > > > >   confusion.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > B- without mergeable, there's no flexibility. In particular, there can
> > > > > > not be uninitialized space between header and data.
> > > > >
> > > > > I had two questions
> > > > >
> > > > > 1) why is this not a problem of mergeable? There's no guarantee that
> > > > > the header is just the length of what the driver allocates for header
> > > > > buffer anyhow
> > > > >
> > > > > E.g the header length could be smaller than the header buffer, the
> > > > > device still needs to skip part of the space in the header buffer.
> > > > >
> > > > > 2) it should be the responsibility of the driver to handle the
> > > > > uninitialized space, it should do anything that is necessary for
> > > > > security, more below
> > > > >
> > > > > > If we had flexibility here, this could be
> > > > > > helpful for alignment, security, etc.
> > > > > > Unfortunately, our hands are tied:
> > > > > >
> > > > > >
> > > > > >         \field{len} is particularly useful
> > > > > >         for drivers using untrusted buffers: if a driver does not know exactly
> > > > > >         how much has been written by the device, the driver would have to zero
> > > > > >         the buffer in advance to ensure no data leakage occurs.
> > > > > >
> > > > > >         For example, a network driver may hand a received buffer directly to
> > > > > >         an unprivileged userspace application.  If the network device has not
> > > > > >         overwritten the bytes which were in that buffer, this could leak the
> > > > > >         contents of freed memory from other processes to the application.
> > > > >
> > > > > This should be a bug of the driver. For example, if the driver wants
> > > > > to implement zerocopy, it must guarantee that every byte was written
> > > > > by the device before mapping it to the userspace, if it can't it
> > > > > should do the copy instead.
> > > > >
> > > > > >
> > > > > >
> > > > > > so all buffers have to be initialized completely up to the reported
> > > > > > used length.
> > > > > >
> > > > > > It could be that the guarantee is not relevant in some use-cases.
> > > > > > We would have to specify that this is an exception to the rule,
> > > > > > explain that drivers must be careful about information leaks.
> > > > > > Let's say there's a feature bit that adds uninitialized space
> > > > > > somewhere. How much was added? We can find out by parsing the
> > > > > > packet but once you start doing that just to assemble the skb
> > > > > > you have already lost performance.
> > > > >
> > > > > I don't get here, for those uninitialized spaces, it looks just
> > > > > tailroom for the skb.
> > > > >
> > > > > Thanks
> > > > >
> > > > > > So lots of spec work, some security risks, and unclear performance.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Is above a fair summary?
> > > > > >
> > > > > >
> > > > > >
> > > > > > If yes I would say let's address A, but make sure we ask drivers
> > > > > > to clear the new feature bit if there's no mergeable
> > > > > > (as opposed to e.g. failing probe) so we can add
> > > > > > support for !mergeable down the road.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > MST
> > > > > >
> > > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-10-12  5:05                           ` Michael S. Tsirkin
@ 2022-10-13  6:47                             ` Jason Wang
  2022-10-13 14:33                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2022-10-13  6:47 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Xuan Zhuo, Virtio-Dev, Kangjie Xu, Heng Qi

On Wed, Oct 12, 2022 at 1:05 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Oct 12, 2022 at 11:17:30AM +0800, Jason Wang wrote:
> > On Tue, Oct 11, 2022 at 1:12 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> > > > On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > > > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > > > > Jason I think the issue with previous proposals is that they conflict
> > > > > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > > > > >
> > > > > > > >
> > > > > > > > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > > > > > > > to not split the header when the layout doesn't fit for header splitting.
> > > > > > > > (And this seems the case even if we're using buffers).
> > > > > > >
> > > > > > > Well spec says:
> > > > > > >
> > > > > > >         indicates to both the device and the driver that no
> > > > > > >         assumptions were made about framing.
> > > > > > >
> > > > > > > if device assumes that descriptor boundaries are where
> > > > > > > driver wants packet to be stored that is clearly
> > > > > > > an assumption.
> > > > > >
> > > > > > Yes but what I want to say is, the device can choose to not split the
> > > > > > packet if the framing doesn't fit. Does it still comply with the above
> > > > > > description?
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > > > > For example, if driver wants to split the header at some specific
> > > > > offset this is already possible without extra functionality.
> > > >
> > > > I'm not sure how this would work without the support from the device.
> > > > This probably can only work if:
> > > >
> > > > 1) the driver know what kind of packet it can receive
> > > > 2) protocol have fixed length of the header
> > > >
> > > > This is probably not true consider:
> > > >
> > > > 1) TCP and UDP have different header length
> > > > 2) IPv6 has an variable length of the header
> > >
> > > We currently say:
> > >
> > > If a receive packet is spread over multiple buffers, the device
> > > MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > > 1 buffers) completely up to the full length of each buffer
> > > supplied by the driver.
> > >
> > > the idea is basically just to lift this requirement for specific
> > > packets. Things certainly worked before, this is just an
> > > optimization.
> >
> > The problem is, the offset is not fixed and can be determined by the
> > driver. It's variable so it requires the device to parse the packet to
> > know.
> >
> > Thanks
>
> IIUC device parsing packet and splitting the header is exactly what the
> feature is supposed to do. It's benefitial if device is a hardware one.

Right, so in any way (even with a mergeable buffer), if the device
parses and splits, we need to tweak the section you quote above from
the spec, since there will be some space left in the buffer other than
the last one?

Thanks

>
>
> > >
> > >
> > >
> > > >
> > > > >
> > > > > Let's keep it that way.
> > > > >
> > > > > Now, let's formulate what are some of the problems with the current way.
> > > > >
> > > > >
> > > > >
> > > > > A- mergeable buffers is even more flexible, since a single packet
> > > > >   is built up of multiple buffers. And in theory device can
> > > > >   choose arbitrary set of buffers to store a packet.
> > > > >   So you could supply a small buffer for headers followed by a bigger
> > > > >   one for payload, in theory even without any changes.
> > > > >   Problem 1: However since this is not how devices currently operate,
> > > > >   a feature bit would be helpful.
> > > >
> > > > How do we know the bigger buffer is sufficient for the packet? If we
> > > > try to allocate 64K (not sufficient for the future even) it breaks the
> > > > effort of the mergeable buffer:
> > > >
> > > > header buffer #1
> > > > payload buffer #1
> > > > header buffer #2
> > > > payload buffer #2
> > > >
> > > > Is the device expected to
> > > >
> > > > 1) fill payload in header buffer #2, this breaks the effort that we
> > > > want to make payload page aligned
> > > > 2) skip header buffer #2, in this case, the device assumes the framing
> > > > when it breaks any layout
> > > >
> > > > >
> > > > >   Problem 2: Also, in the past we found it useful to be able to figure out whether
> > > > >   packet fits in a single buffer without looking at the header.
> > > > >   For this reason, we have this text:
> > > > >
> > > > >         If a receive packet is spread over multiple buffers, the device
> > > > >         MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > > > >         1 buffers) completely up to the full length of each buffer
> > > > >         supplied by the driver.
> > > > >
> > > > >   if we want to keep this optimization and allow using a separate
> > > > >   buffer for headers, then I think we could rely on the feature bit
> > > > >   from Problem 1 and just make an exception for the first buffer.
> > > > >   Also num_buffers is then always >= 2, maybe state this to avoid
> > > > >   confusion.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > B- without mergeable, there's no flexibility. In particular, there can
> > > > > not be uninitialized space between header and data.
> > > >
> > > > I had two questions
> > > >
> > > > 1) why is this not a problem of mergeable? There's no guarantee that
> > > > the header is just the length of what the driver allocates for header
> > > > buffer anyhow
> > > >
> > > > E.g the header length could be smaller than the header buffer, the
> > > > device still needs to skip part of the space in the header buffer.
> > > >
> > > > 2) it should be the responsibility of the driver to handle the
> > > > uninitialized space, it should do anything that is necessary for
> > > > security, more below
> > > >
> > > > > If we had flexibility here, this could be
> > > > > helpful for alignment, security, etc.
> > > > > Unfortunately, our hands are tied:
> > > > >
> > > > >
> > > > >         \field{len} is particularly useful
> > > > >         for drivers using untrusted buffers: if a driver does not know exactly
> > > > >         how much has been written by the device, the driver would have to zero
> > > > >         the buffer in advance to ensure no data leakage occurs.
> > > > >
> > > > >         For example, a network driver may hand a received buffer directly to
> > > > >         an unprivileged userspace application.  If the network device has not
> > > > >         overwritten the bytes which were in that buffer, this could leak the
> > > > >         contents of freed memory from other processes to the application.
> > > >
> > > > This should be a bug of the driver. For example, if the driver wants
> > > > to implement zerocopy, it must guarantee that every byte was written
> > > > by the device before mapping it to the userspace, if it can't it
> > > > should do the copy instead.
> > > >
> > > > >
> > > > >
> > > > > so all buffers have to be initialized completely up to the reported
> > > > > used length.
> > > > >
> > > > > It could be that the guarantee is not relevant in some use-cases.
> > > > > We would have to specify that this is an exception to the rule,
> > > > > explain that drivers must be careful about information leaks.
> > > > > Let's say there's a feature bit that adds uninitialized space
> > > > > somewhere. How much was added? We can find out by parsing the
> > > > > packet but once you start doing that just to assemble the skb
> > > > > you have already lost performance.
> > > >
> > > > I don't get here, for those uninitialized spaces, it looks just
> > > > tailroom for the skb.
> > > >
> > > > Thanks
> > > >
> > > > > So lots of spec work, some security risks, and unclear performance.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Is above a fair summary?
> > > > >
> > > > >
> > > > >
> > > > > If yes I would say let's address A, but make sure we ask drivers
> > > > > to clear the new feature bit if there's no mergeable
> > > > > (as opposed to e.g. failing probe) so we can add
> > > > > support for !mergeable down the road.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > MST
> > > > >
> > >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-10-12  3:17                         ` Jason Wang
@ 2022-10-12  5:05                           ` Michael S. Tsirkin
  2022-10-13  6:47                             ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2022-10-12  5:05 UTC (permalink / raw)
  To: Jason Wang; +Cc: Xuan Zhuo, Virtio-Dev, Kangjie Xu, Heng Qi

On Wed, Oct 12, 2022 at 11:17:30AM +0800, Jason Wang wrote:
> On Tue, Oct 11, 2022 at 1:12 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> > > On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > > > Jason I think the issue with previous proposals is that they conflict
> > > > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > > > >
> > > > > > >
> > > > > > > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > > > > > > to not split the header when the layout doesn't fit for header splitting.
> > > > > > > (And this seems the case even if we're using buffers).
> > > > > >
> > > > > > Well spec says:
> > > > > >
> > > > > >         indicates to both the device and the driver that no
> > > > > >         assumptions were made about framing.
> > > > > >
> > > > > > if device assumes that descriptor boundaries are where
> > > > > > driver wants packet to be stored that is clearly
> > > > > > an assumption.
> > > > >
> > > > > Yes but what I want to say is, the device can choose to not split the
> > > > > packet if the framing doesn't fit. Does it still comply with the above
> > > > > description?
> > > > >
> > > > > Thanks
> > > >
> > > > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > > > For example, if driver wants to split the header at some specific
> > > > offset this is already possible without extra functionality.
> > >
> > > I'm not sure how this would work without the support from the device.
> > > This probably can only work if:
> > >
> > > 1) the driver know what kind of packet it can receive
> > > 2) protocol have fixed length of the header
> > >
> > > This is probably not true consider:
> > >
> > > 1) TCP and UDP have different header length
> > > 2) IPv6 has an variable length of the header
> >
> > We currently say:
> >
> > If a receive packet is spread over multiple buffers, the device
> > MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > 1 buffers) completely up to the full length of each buffer
> > supplied by the driver.
> >
> > the idea is basically just to lift this requirement for specific
> > packets. Things certainly worked before, this is just an
> > optimization.
> 
> The problem is, the offset is not fixed and can be determined by the
> driver. It's variable so it requires the device to parse the packet to
> know.
> 
> Thanks

IIUC device parsing packet and splitting the header is exactly what the
feature is supposed to do. It's benefitial if device is a hardware one.


> >
> >
> >
> > >
> > > >
> > > > Let's keep it that way.
> > > >
> > > > Now, let's formulate what are some of the problems with the current way.
> > > >
> > > >
> > > >
> > > > A- mergeable buffers is even more flexible, since a single packet
> > > >   is built up of multiple buffers. And in theory device can
> > > >   choose arbitrary set of buffers to store a packet.
> > > >   So you could supply a small buffer for headers followed by a bigger
> > > >   one for payload, in theory even without any changes.
> > > >   Problem 1: However since this is not how devices currently operate,
> > > >   a feature bit would be helpful.
> > >
> > > How do we know the bigger buffer is sufficient for the packet? If we
> > > try to allocate 64K (not sufficient for the future even) it breaks the
> > > effort of the mergeable buffer:
> > >
> > > header buffer #1
> > > payload buffer #1
> > > header buffer #2
> > > payload buffer #2
> > >
> > > Is the device expected to
> > >
> > > 1) fill payload in header buffer #2, this breaks the effort that we
> > > want to make payload page aligned
> > > 2) skip header buffer #2, in this case, the device assumes the framing
> > > when it breaks any layout
> > >
> > > >
> > > >   Problem 2: Also, in the past we found it useful to be able to figure out whether
> > > >   packet fits in a single buffer without looking at the header.
> > > >   For this reason, we have this text:
> > > >
> > > >         If a receive packet is spread over multiple buffers, the device
> > > >         MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > > >         1 buffers) completely up to the full length of each buffer
> > > >         supplied by the driver.
> > > >
> > > >   if we want to keep this optimization and allow using a separate
> > > >   buffer for headers, then I think we could rely on the feature bit
> > > >   from Problem 1 and just make an exception for the first buffer.
> > > >   Also num_buffers is then always >= 2, maybe state this to avoid
> > > >   confusion.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > B- without mergeable, there's no flexibility. In particular, there can
> > > > not be uninitialized space between header and data.
> > >
> > > I had two questions
> > >
> > > 1) why is this not a problem of mergeable? There's no guarantee that
> > > the header is just the length of what the driver allocates for header
> > > buffer anyhow
> > >
> > > E.g the header length could be smaller than the header buffer, the
> > > device still needs to skip part of the space in the header buffer.
> > >
> > > 2) it should be the responsibility of the driver to handle the
> > > uninitialized space, it should do anything that is necessary for
> > > security, more below
> > >
> > > > If we had flexibility here, this could be
> > > > helpful for alignment, security, etc.
> > > > Unfortunately, our hands are tied:
> > > >
> > > >
> > > >         \field{len} is particularly useful
> > > >         for drivers using untrusted buffers: if a driver does not know exactly
> > > >         how much has been written by the device, the driver would have to zero
> > > >         the buffer in advance to ensure no data leakage occurs.
> > > >
> > > >         For example, a network driver may hand a received buffer directly to
> > > >         an unprivileged userspace application.  If the network device has not
> > > >         overwritten the bytes which were in that buffer, this could leak the
> > > >         contents of freed memory from other processes to the application.
> > >
> > > This should be a bug of the driver. For example, if the driver wants
> > > to implement zerocopy, it must guarantee that every byte was written
> > > by the device before mapping it to the userspace, if it can't it
> > > should do the copy instead.
> > >
> > > >
> > > >
> > > > so all buffers have to be initialized completely up to the reported
> > > > used length.
> > > >
> > > > It could be that the guarantee is not relevant in some use-cases.
> > > > We would have to specify that this is an exception to the rule,
> > > > explain that drivers must be careful about information leaks.
> > > > Let's say there's a feature bit that adds uninitialized space
> > > > somewhere. How much was added? We can find out by parsing the
> > > > packet but once you start doing that just to assemble the skb
> > > > you have already lost performance.
> > >
> > > I don't get here, for those uninitialized spaces, it looks just
> > > tailroom for the skb.
> > >
> > > Thanks
> > >
> > > > So lots of spec work, some security risks, and unclear performance.
> > > >
> > > >
> > > >
> > > >
> > > > Is above a fair summary?
> > > >
> > > >
> > > >
> > > > If yes I would say let's address A, but make sure we ask drivers
> > > > to clear the new feature bit if there's no mergeable
> > > > (as opposed to e.g. failing probe) so we can add
> > > > support for !mergeable down the road.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > MST
> > > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-10-10 17:11                       ` Michael S. Tsirkin
@ 2022-10-12  3:17                         ` Jason Wang
  2022-10-12  5:05                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2022-10-12  3:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Xuan Zhuo, Virtio-Dev, Kangjie Xu, Heng Qi

On Tue, Oct 11, 2022 at 1:12 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> > On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > > Jason I think the issue with previous proposals is that they conflict
> > > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > > >
> > > > > >
> > > > > > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > > > > > to not split the header when the layout doesn't fit for header splitting.
> > > > > > (And this seems the case even if we're using buffers).
> > > > >
> > > > > Well spec says:
> > > > >
> > > > >         indicates to both the device and the driver that no
> > > > >         assumptions were made about framing.
> > > > >
> > > > > if device assumes that descriptor boundaries are where
> > > > > driver wants packet to be stored that is clearly
> > > > > an assumption.
> > > >
> > > > Yes but what I want to say is, the device can choose to not split the
> > > > packet if the framing doesn't fit. Does it still comply with the above
> > > > description?
> > > >
> > > > Thanks
> > >
> > > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > > For example, if driver wants to split the header at some specific
> > > offset this is already possible without extra functionality.
> >
> > I'm not sure how this would work without the support from the device.
> > This probably can only work if:
> >
> > 1) the driver know what kind of packet it can receive
> > 2) protocol have fixed length of the header
> >
> > This is probably not true consider:
> >
> > 1) TCP and UDP have different header length
> > 2) IPv6 has an variable length of the header
>
> We currently say:
>
> If a receive packet is spread over multiple buffers, the device
> MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> 1 buffers) completely up to the full length of each buffer
> supplied by the driver.
>
> the idea is basically just to lift this requirement for specific
> packets. Things certainly worked before, this is just an
> optimization.

The problem is, the offset is not fixed and can be determined by the
driver. It's variable so it requires the device to parse the packet to
know.

Thanks

>
>
>
> >
> > >
> > > Let's keep it that way.
> > >
> > > Now, let's formulate what are some of the problems with the current way.
> > >
> > >
> > >
> > > A- mergeable buffers is even more flexible, since a single packet
> > >   is built up of multiple buffers. And in theory device can
> > >   choose arbitrary set of buffers to store a packet.
> > >   So you could supply a small buffer for headers followed by a bigger
> > >   one for payload, in theory even without any changes.
> > >   Problem 1: However since this is not how devices currently operate,
> > >   a feature bit would be helpful.
> >
> > How do we know the bigger buffer is sufficient for the packet? If we
> > try to allocate 64K (not sufficient for the future even) it breaks the
> > effort of the mergeable buffer:
> >
> > header buffer #1
> > payload buffer #1
> > header buffer #2
> > payload buffer #2
> >
> > Is the device expected to
> >
> > 1) fill payload in header buffer #2, this breaks the effort that we
> > want to make payload page aligned
> > 2) skip header buffer #2, in this case, the device assumes the framing
> > when it breaks any layout
> >
> > >
> > >   Problem 2: Also, in the past we found it useful to be able to figure out whether
> > >   packet fits in a single buffer without looking at the header.
> > >   For this reason, we have this text:
> > >
> > >         If a receive packet is spread over multiple buffers, the device
> > >         MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > >         1 buffers) completely up to the full length of each buffer
> > >         supplied by the driver.
> > >
> > >   if we want to keep this optimization and allow using a separate
> > >   buffer for headers, then I think we could rely on the feature bit
> > >   from Problem 1 and just make an exception for the first buffer.
> > >   Also num_buffers is then always >= 2, maybe state this to avoid
> > >   confusion.
> > >
> > >
> > >
> > >
> > >
> > > B- without mergeable, there's no flexibility. In particular, there can
> > > not be uninitialized space between header and data.
> >
> > I had two questions
> >
> > 1) why is this not a problem of mergeable? There's no guarantee that
> > the header is just the length of what the driver allocates for header
> > buffer anyhow
> >
> > E.g the header length could be smaller than the header buffer, the
> > device still needs to skip part of the space in the header buffer.
> >
> > 2) it should be the responsibility of the driver to handle the
> > uninitialized space, it should do anything that is necessary for
> > security, more below
> >
> > > If we had flexibility here, this could be
> > > helpful for alignment, security, etc.
> > > Unfortunately, our hands are tied:
> > >
> > >
> > >         \field{len} is particularly useful
> > >         for drivers using untrusted buffers: if a driver does not know exactly
> > >         how much has been written by the device, the driver would have to zero
> > >         the buffer in advance to ensure no data leakage occurs.
> > >
> > >         For example, a network driver may hand a received buffer directly to
> > >         an unprivileged userspace application.  If the network device has not
> > >         overwritten the bytes which were in that buffer, this could leak the
> > >         contents of freed memory from other processes to the application.
> >
> > This should be a bug of the driver. For example, if the driver wants
> > to implement zerocopy, it must guarantee that every byte was written
> > by the device before mapping it to the userspace, if it can't it
> > should do the copy instead.
> >
> > >
> > >
> > > so all buffers have to be initialized completely up to the reported
> > > used length.
> > >
> > > It could be that the guarantee is not relevant in some use-cases.
> > > We would have to specify that this is an exception to the rule,
> > > explain that drivers must be careful about information leaks.
> > > Let's say there's a feature bit that adds uninitialized space
> > > somewhere. How much was added? We can find out by parsing the
> > > packet but once you start doing that just to assemble the skb
> > > you have already lost performance.
> >
> > I don't get here, for those uninitialized spaces, it looks just
> > tailroom for the skb.
> >
> > Thanks
> >
> > > So lots of spec work, some security risks, and unclear performance.
> > >
> > >
> > >
> > >
> > > Is above a fair summary?
> > >
> > >
> > >
> > > If yes I would say let's address A, but make sure we ask drivers
> > > to clear the new feature bit if there's no mergeable
> > > (as opposed to e.g. failing probe) so we can add
> > > support for !mergeable down the road.
> > >
> > >
> > >
> > >
> > >
> > > --
> > > MST
> > >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-10-08  4:37                     ` Jason Wang
  2022-10-09  1:49                       ` Xuan Zhuo
@ 2022-10-10 17:11                       ` Michael S. Tsirkin
  2022-10-12  3:17                         ` Jason Wang
  2022-10-18  3:07                       ` Xuan Zhuo
  2022-10-20  8:16                       ` Heng Qi
  3 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2022-10-10 17:11 UTC (permalink / raw)
  To: Jason Wang; +Cc: Xuan Zhuo, Virtio-Dev, Kangjie Xu, Heng Qi

On Sat, Oct 08, 2022 at 12:37:45PM +0800, Jason Wang wrote:
> On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > Jason I think the issue with previous proposals is that they conflict
> > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > >
> > > > >
> > > > > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > > > > to not split the header when the layout doesn't fit for header splitting.
> > > > > (And this seems the case even if we're using buffers).
> > > >
> > > > Well spec says:
> > > >
> > > >         indicates to both the device and the driver that no
> > > >         assumptions were made about framing.
> > > >
> > > > if device assumes that descriptor boundaries are where
> > > > driver wants packet to be stored that is clearly
> > > > an assumption.
> > >
> > > Yes but what I want to say is, the device can choose to not split the
> > > packet if the framing doesn't fit. Does it still comply with the above
> > > description?
> > >
> > > Thanks
> >
> > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > For example, if driver wants to split the header at some specific
> > offset this is already possible without extra functionality.
> 
> I'm not sure how this would work without the support from the device.
> This probably can only work if:
> 
> 1) the driver know what kind of packet it can receive
> 2) protocol have fixed length of the header
> 
> This is probably not true consider:
> 
> 1) TCP and UDP have different header length
> 2) IPv6 has an variable length of the header

We currently say:

If a receive packet is spread over multiple buffers, the device
MUST use all buffers but the last (i.e. the first \field{num_buffers} -
1 buffers) completely up to the full length of each buffer
supplied by the driver.

the idea is basically just to lift this requirement for specific
packets. Things certainly worked before, this is just an
optimization.



> 
> >
> > Let's keep it that way.
> >
> > Now, let's formulate what are some of the problems with the current way.
> >
> >
> >
> > A- mergeable buffers is even more flexible, since a single packet
> >   is built up of multiple buffers. And in theory device can
> >   choose arbitrary set of buffers to store a packet.
> >   So you could supply a small buffer for headers followed by a bigger
> >   one for payload, in theory even without any changes.
> >   Problem 1: However since this is not how devices currently operate,
> >   a feature bit would be helpful.
> 
> How do we know the bigger buffer is sufficient for the packet? If we
> try to allocate 64K (not sufficient for the future even) it breaks the
> effort of the mergeable buffer:
> 
> header buffer #1
> payload buffer #1
> header buffer #2
> payload buffer #2
> 
> Is the device expected to
> 
> 1) fill payload in header buffer #2, this breaks the effort that we
> want to make payload page aligned
> 2) skip header buffer #2, in this case, the device assumes the framing
> when it breaks any layout
> 
> >
> >   Problem 2: Also, in the past we found it useful to be able to figure out whether
> >   packet fits in a single buffer without looking at the header.
> >   For this reason, we have this text:
> >
> >         If a receive packet is spread over multiple buffers, the device
> >         MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> >         1 buffers) completely up to the full length of each buffer
> >         supplied by the driver.
> >
> >   if we want to keep this optimization and allow using a separate
> >   buffer for headers, then I think we could rely on the feature bit
> >   from Problem 1 and just make an exception for the first buffer.
> >   Also num_buffers is then always >= 2, maybe state this to avoid
> >   confusion.
> >
> >
> >
> >
> >
> > B- without mergeable, there's no flexibility. In particular, there can
> > not be uninitialized space between header and data.
> 
> I had two questions
> 
> 1) why is this not a problem of mergeable? There's no guarantee that
> the header is just the length of what the driver allocates for header
> buffer anyhow
> 
> E.g the header length could be smaller than the header buffer, the
> device still needs to skip part of the space in the header buffer.
> 
> 2) it should be the responsibility of the driver to handle the
> uninitialized space, it should do anything that is necessary for
> security, more below
> 
> > If we had flexibility here, this could be
> > helpful for alignment, security, etc.
> > Unfortunately, our hands are tied:
> >
> >
> >         \field{len} is particularly useful
> >         for drivers using untrusted buffers: if a driver does not know exactly
> >         how much has been written by the device, the driver would have to zero
> >         the buffer in advance to ensure no data leakage occurs.
> >
> >         For example, a network driver may hand a received buffer directly to
> >         an unprivileged userspace application.  If the network device has not
> >         overwritten the bytes which were in that buffer, this could leak the
> >         contents of freed memory from other processes to the application.
> 
> This should be a bug of the driver. For example, if the driver wants
> to implement zerocopy, it must guarantee that every byte was written
> by the device before mapping it to the userspace, if it can't it
> should do the copy instead.
> 
> >
> >
> > so all buffers have to be initialized completely up to the reported
> > used length.
> >
> > It could be that the guarantee is not relevant in some use-cases.
> > We would have to specify that this is an exception to the rule,
> > explain that drivers must be careful about information leaks.
> > Let's say there's a feature bit that adds uninitialized space
> > somewhere. How much was added? We can find out by parsing the
> > packet but once you start doing that just to assemble the skb
> > you have already lost performance.
> 
> I don't get here, for those uninitialized spaces, it looks just
> tailroom for the skb.
> 
> Thanks
> 
> > So lots of spec work, some security risks, and unclear performance.
> >
> >
> >
> >
> > Is above a fair summary?
> >
> >
> >
> > If yes I would say let's address A, but make sure we ask drivers
> > to clear the new feature bit if there's no mergeable
> > (as opposed to e.g. failing probe) so we can add
> > support for !mergeable down the road.
> >
> >
> >
> >
> >
> > --
> > MST
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-10-08  4:37                     ` Jason Wang
@ 2022-10-09  1:49                       ` Xuan Zhuo
  2022-10-10 17:11                       ` Michael S. Tsirkin
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2022-10-09  1:49 UTC (permalink / raw)
  To: Jason Wang; +Cc: Virtio-Dev, Kangjie Xu, Heng Qi, Michael S. Tsirkin

On Sat, 8 Oct 2022 12:37:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > Jason I think the issue with previous proposals is that they conflict
> > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > >
> > > > >
> > > > > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > > > > to not split the header when the layout doesn't fit for header splitting.
> > > > > (And this seems the case even if we're using buffers).
> > > >
> > > > Well spec says:
> > > >
> > > >         indicates to both the device and the driver that no
> > > >         assumptions were made about framing.
> > > >
> > > > if device assumes that descriptor boundaries are where
> > > > driver wants packet to be stored that is clearly
> > > > an assumption.
> > >
> > > Yes but what I want to say is, the device can choose to not split the
> > > packet if the framing doesn't fit. Does it still comply with the above
> > > description?
> > >
> > > Thanks
> >
> > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > For example, if driver wants to split the header at some specific
> > offset this is already possible without extra functionality.
>
> I'm not sure how this would work without the support from the device.
> This probably can only work if:
>
> 1) the driver know what kind of packet it can receive
> 2) protocol have fixed length of the header
>
> This is probably not true consider:
>
> 1) TCP and UDP have different header length
> 2) IPv6 has an variable length of the header
>
>
> >
> > Let's keep it that way.
> >
> > Now, let's formulate what are some of the problems with the current way.
> >
> >
> >
> > A- mergeable buffers is even more flexible, since a single packet
> >   is built up of multiple buffers. And in theory device can
> >   choose arbitrary set of buffers to store a packet.
> >   So you could supply a small buffer for headers followed by a bigger
> >   one for payload, in theory even without any changes.
> >   Problem 1: However since this is not how devices currently operate,
> >   a feature bit would be helpful.
>
> How do we know the bigger buffer is sufficient for the packet? If we
> try to allocate 64K (not sufficient for the future even) it breaks the
> effort of the mergeable buffer:
>
> header buffer #1
> payload buffer #1
> header buffer #2
> payload buffer #2
>
> Is the device expected to
>
> 1) fill payload in header buffer #2, this breaks the effort that we
> want to make payload page aligned
> 2) skip header buffer #2, in this case, the device assumes the framing
> when it breaks any layout
>
> >
> >   Problem 2: Also, in the past we found it useful to be able to figure out whether
> >   packet fits in a single buffer without looking at the header.
> >   For this reason, we have this text:
> >
> >         If a receive packet is spread over multiple buffers, the device
> >         MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> >         1 buffers) completely up to the full length of each buffer
> >         supplied by the driver.
> >
> >   if we want to keep this optimization and allow using a separate
> >   buffer for headers, then I think we could rely on the feature bit
> >   from Problem 1 and just make an exception for the first buffer.
> >   Also num_buffers is then always >= 2, maybe state this to avoid
> >   confusion.
> >
> >
> >
> >
> >
> > B- without mergeable, there's no flexibility. In particular, there can
> > not be uninitialized space between header and data.
>
> I had two questions
>
> 1) why is this not a problem of mergeable? There's no guarantee that
> the header is just the length of what the driver allocates for header
> buffer anyhow
>
> E.g the header length could be smaller than the header buffer, the
> device still needs to skip part of the space in the header buffer.
>
> 2) it should be the responsibility of the driver to handle the
> uninitialized space, it should do anything that is necessary for
> security, more below
>
> > If we had flexibility here, this could be
> > helpful for alignment, security, etc.
> > Unfortunately, our hands are tied:
> >
> >
> >         \field{len} is particularly useful
> >         for drivers using untrusted buffers: if a driver does not know exactly
> >         how much has been written by the device, the driver would have to zero
> >         the buffer in advance to ensure no data leakage occurs.
> >
> >         For example, a network driver may hand a received buffer directly to
> >         an unprivileged userspace application.  If the network device has not
> >         overwritten the bytes which were in that buffer, this could leak the
> >         contents of freed memory from other processes to the application.
>
> This should be a bug of the driver. For example, if the driver wants
> to implement zerocopy, it must guarantee that every byte was written
> by the device before mapping it to the userspace, if it can't it
> should do the copy instead.
>
> >
> >
> > so all buffers have to be initialized completely up to the reported
> > used length.
> >
> > It could be that the guarantee is not relevant in some use-cases.
> > We would have to specify that this is an exception to the rule,
> > explain that drivers must be careful about information leaks.
> > Let's say there's a feature bit that adds uninitialized space
> > somewhere. How much was added? We can find out by parsing the
> > packet but once you start doing that just to assemble the skb
> > you have already lost performance.
>
> I don't get here, for those uninitialized spaces, it looks just
> tailroom for the skb.


I think the core of the problem here is that the uninitialized space within the
range of specified in the len.

Thanks.


>
> Thanks
>
> > So lots of spec work, some security risks, and unclear performance.
> >
> >
> >
> >
> > Is above a fair summary?
> >
> >
> >
> > If yes I would say let's address A, but make sure we ask drivers
> > to clear the new feature bit if there's no mergeable
> > (as opposed to e.g. failing probe) so we can add
> > support for !mergeable down the road.
> >
> >
> >
> >
> >
> > --
> > MST
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-29  7:04                   ` Michael S. Tsirkin
  2022-09-29  8:24                     ` Xuan Zhuo
@ 2022-10-08  4:37                     ` Jason Wang
  2022-10-09  1:49                       ` Xuan Zhuo
                                         ` (3 more replies)
  1 sibling, 4 replies; 31+ messages in thread
From: Jason Wang @ 2022-10-08  4:37 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Xuan Zhuo, Virtio-Dev, Kangjie Xu, Heng Qi

On Thu, Sep 29, 2022 at 3:04 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > Jason I think the issue with previous proposals is that they conflict
> > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > >
> > > >
> > > > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > > > to not split the header when the layout doesn't fit for header splitting.
> > > > (And this seems the case even if we're using buffers).
> > >
> > > Well spec says:
> > >
> > >         indicates to both the device and the driver that no
> > >         assumptions were made about framing.
> > >
> > > if device assumes that descriptor boundaries are where
> > > driver wants packet to be stored that is clearly
> > > an assumption.
> >
> > Yes but what I want to say is, the device can choose to not split the
> > packet if the framing doesn't fit. Does it still comply with the above
> > description?
> >
> > Thanks
>
> The point of ANY_LAYOUT is to give drivers maximum flexibility.
> For example, if driver wants to split the header at some specific
> offset this is already possible without extra functionality.

I'm not sure how this would work without the support from the device.
This probably can only work if:

1) the driver know what kind of packet it can receive
2) protocol have fixed length of the header

This is probably not true consider:

1) TCP and UDP have different header length
2) IPv6 has an variable length of the header


>
> Let's keep it that way.
>
> Now, let's formulate what are some of the problems with the current way.
>
>
>
> A- mergeable buffers is even more flexible, since a single packet
>   is built up of multiple buffers. And in theory device can
>   choose arbitrary set of buffers to store a packet.
>   So you could supply a small buffer for headers followed by a bigger
>   one for payload, in theory even without any changes.
>   Problem 1: However since this is not how devices currently operate,
>   a feature bit would be helpful.

How do we know the bigger buffer is sufficient for the packet? If we
try to allocate 64K (not sufficient for the future even) it breaks the
effort of the mergeable buffer:

header buffer #1
payload buffer #1
header buffer #2
payload buffer #2

Is the device expected to

1) fill payload in header buffer #2, this breaks the effort that we
want to make payload page aligned
2) skip header buffer #2, in this case, the device assumes the framing
when it breaks any layout

>
>   Problem 2: Also, in the past we found it useful to be able to figure out whether
>   packet fits in a single buffer without looking at the header.
>   For this reason, we have this text:
>
>         If a receive packet is spread over multiple buffers, the device
>         MUST use all buffers but the last (i.e. the first \field{num_buffers} -
>         1 buffers) completely up to the full length of each buffer
>         supplied by the driver.
>
>   if we want to keep this optimization and allow using a separate
>   buffer for headers, then I think we could rely on the feature bit
>   from Problem 1 and just make an exception for the first buffer.
>   Also num_buffers is then always >= 2, maybe state this to avoid
>   confusion.
>
>
>
>
>
> B- without mergeable, there's no flexibility. In particular, there can
> not be uninitialized space between header and data.

I had two questions

1) why is this not a problem of mergeable? There's no guarantee that
the header is just the length of what the driver allocates for header
buffer anyhow

E.g the header length could be smaller than the header buffer, the
device still needs to skip part of the space in the header buffer.

2) it should be the responsibility of the driver to handle the
uninitialized space, it should do anything that is necessary for
security, more below

> If we had flexibility here, this could be
> helpful for alignment, security, etc.
> Unfortunately, our hands are tied:
>
>
>         \field{len} is particularly useful
>         for drivers using untrusted buffers: if a driver does not know exactly
>         how much has been written by the device, the driver would have to zero
>         the buffer in advance to ensure no data leakage occurs.
>
>         For example, a network driver may hand a received buffer directly to
>         an unprivileged userspace application.  If the network device has not
>         overwritten the bytes which were in that buffer, this could leak the
>         contents of freed memory from other processes to the application.

This should be a bug of the driver. For example, if the driver wants
to implement zerocopy, it must guarantee that every byte was written
by the device before mapping it to the userspace, if it can't it
should do the copy instead.

>
>
> so all buffers have to be initialized completely up to the reported
> used length.
>
> It could be that the guarantee is not relevant in some use-cases.
> We would have to specify that this is an exception to the rule,
> explain that drivers must be careful about information leaks.
> Let's say there's a feature bit that adds uninitialized space
> somewhere. How much was added? We can find out by parsing the
> packet but once you start doing that just to assemble the skb
> you have already lost performance.

I don't get here, for those uninitialized spaces, it looks just
tailroom for the skb.

Thanks

> So lots of spec work, some security risks, and unclear performance.
>
>
>
>
> Is above a fair summary?
>
>
>
> If yes I would say let's address A, but make sure we ask drivers
> to clear the new feature bit if there's no mergeable
> (as opposed to e.g. failing probe) so we can add
> support for !mergeable down the road.
>
>
>
>
>
> --
> MST
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-29 10:06                       ` Michael S. Tsirkin
@ 2022-09-29 11:48                         ` Xuan Zhuo
  0 siblings, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2022-09-29 11:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Virtio-Dev, Kangjie Xu, Heng Qi, Jason Wang

On Thu, 29 Sep 2022 06:06:41 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Sep 29, 2022 at 04:24:02PM +0800, Xuan Zhuo wrote:
> > On Thu, 29 Sep 2022 03:04:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > > Jason I think the issue with previous proposals is that they conflict
> > > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > > >
> > > > > >
> > > > > > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > > > > > to not split the header when the layout doesn't fit for header splitting.
> > > > > > (And this seems the case even if we're using buffers).
> > > > >
> > > > > Well spec says:
> > > > >
> > > > >         indicates to both the device and the driver that no
> > > > >         assumptions were made about framing.
> > > > >
> > > > > if device assumes that descriptor boundaries are where
> > > > > driver wants packet to be stored that is clearly
> > > > > an assumption.
> > > >
> > > > Yes but what I want to say is, the device can choose to not split the
> > > > packet if the framing doesn't fit. Does it still comply with the above
> > > > description?
> > > >
> > > > Thanks
> > >
> > > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > > For example, if driver wants to split the header at some specific
> > > offset this is already possible without extra functionality.
> > >
> > > Let's keep it that way.
> > >
> > > Now, let's formulate what are some of the problems with the current way.
> > >
> > >
> > >
> > > A- mergeable buffers is even more flexible, since a single packet
> > >   is built up of multiple buffers.
> >
> > If I understand correctly, this is our v8.
>
> I think it is, or at least close.
>
> > > And in theory device can
> > >   choose arbitrary set of buffers to store a packet.
> > >   So you could supply a small buffer for headers followed by a bigger
> > >   one for payload, in theory even without any changes.
> >
> > This is very interesting, I did not think of this point.
> > This is helpful to reduce the waste of memory.
>
> Hmm good point. I should add: since we are no longer fully using the
> first buffer the feature has a memory cost and - in case of a cache
> pressure - can degrade performance rather than improve it.
> Thus, allowing flexibility for both devices and drivers at runtime
> rather than fixing things through features thus sounds like a good idea.
>
>
> > >   Problem 1: However since this is not how devices currently operate,
> > >   a feature bit would be helpful.
> > >
> > >   Problem 2: Also, in the past we found it useful to be able to figure out whether
> > >   packet fits in a single buffer without looking at the header.
> > >   For this reason, we have this text:
> > >
> > > 	If a receive packet is spread over multiple buffers, the device
> > > 	MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > > 	1 buffers) completely up to the full length of each buffer
> > > 	supplied by the driver.
> > >
> > >   if we want to keep this optimization and allow using a separate
> > >   buffer for headers, then I think we could rely on the feature bit
> > >   from Problem 1 and just make an exception for the first buffer.
> > >   Also num_buffers is then always >= 2, maybe state this to avoid
> > >   confusion.
> > >
> >
> > Yes, I think this is feasible.
>
>
> Thinking more about this, a question is what to do about packets without
> split header. I can see several options
> A- add some buffers just for the non split case. without in-order they
>   can just stay available. with in-order they have to be recycled which
>   might be expensive.
>   They will also occupy space in the ring. more memory costs.
>   Don't much like it for above reasons.
>   OTOH then I wanted to work on partial-order anyway. Maybe it's time
>   to prioritize that work.

Yes, I also prefer order processing, although we mess up the order, which is
good in some cases.

>
> B- write all of the packet in the payload buffer and just skip header buffer (e.g. make it 0 size?)
>   payload within packet will be misaligned then. Do we care? maybe not -
>   this was supposed to be an exception. In this case:
>   Problem B: where should virtio net header
>   go then? we can put it in the header buffer still, or we can
>   store it linear with the packet.  or we can leave both options, up to device.
>   what is better might depend on
>   different factors.  any chance of performance testing?


avail ring:
| small buffer | page | small buffer | page | small buffer | page | small buffer | page |

In this case, I think it is a good idea to set the small buffer to 0 size for
non-split headers.

This is an advantage over desc chain. The desc chain can only put the virtio-net
header in the small buffer.



>
> > >
> > >
> > >
> > >
> > > B- without mergeable, there's no flexibility. In particular, there can
> > > not be uninitialized space between header and data. If we had flexibility here, this could be
> > > helpful for alignment, security, etc.
> > > Unfortunately, our hands are tied:
> > >
> > >
> > > 	\field{len} is particularly useful
> > > 	for drivers using untrusted buffers: if a driver does not know exactly
> > > 	how much has been written by the device, the driver would have to zero
> > > 	the buffer in advance to ensure no data leakage occurs.
> > >
> > > 	For example, a network driver may hand a received buffer directly to
> > > 	an unprivileged userspace application.  If the network device has not
> > > 	overwritten the bytes which were in that buffer, this could leak the
> > > 	contents of freed memory from other processes to the application.
> >
> >
> > I don't think this is very troublesome, the device can memset the hole by 0.
>
> Yes it can. But that has a performance cost. How large - depends on the
> hole size and a bunch of other factors.

The actual hole size is not large, general IP + TCP size.

Thanks


>
> > >
> > >
> > > so all buffers have to be initialized completely up to the reported
> > > used length.
> > >
> > > It could be that the guarantee is not relevant in some use-cases.
> > > We would have to specify that this is an exception to the rule,
> > > explain that drivers must be careful about information leaks.
> > > Let's say there's a feature bit that adds uninitialized space
> > > somewhere. How much was added? We can find out by parsing the
> > > packet but once you start doing that just to assemble the skb
> > > you have already lost performance.
> > > So lots of spec work, some security risks, and unclear performance.
> > >
> > >
> > >
> > >
> > > Is above a fair summary?
> > >
> > >
> > >
> > > If yes I would say let's address A, but make sure we ask drivers
> > > to clear the new feature bit if there's no mergeable
> > > (as opposed to e.g. failing probe) so we can add
> > > support for !mergeable down the road.
> > >
> > >
> > >
> > >
> > >
> > > --
> > > MST
> > >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-29  8:24                     ` Xuan Zhuo
@ 2022-09-29 10:06                       ` Michael S. Tsirkin
  2022-09-29 11:48                         ` Xuan Zhuo
  0 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2022-09-29 10:06 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Virtio-Dev, Kangjie Xu, Heng Qi, Jason Wang

On Thu, Sep 29, 2022 at 04:24:02PM +0800, Xuan Zhuo wrote:
> On Thu, 29 Sep 2022 03:04:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > > Jason I think the issue with previous proposals is that they conflict
> > > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > > >
> > > > >
> > > > > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > > > > to not split the header when the layout doesn't fit for header splitting.
> > > > > (And this seems the case even if we're using buffers).
> > > >
> > > > Well spec says:
> > > >
> > > >         indicates to both the device and the driver that no
> > > >         assumptions were made about framing.
> > > >
> > > > if device assumes that descriptor boundaries are where
> > > > driver wants packet to be stored that is clearly
> > > > an assumption.
> > >
> > > Yes but what I want to say is, the device can choose to not split the
> > > packet if the framing doesn't fit. Does it still comply with the above
> > > description?
> > >
> > > Thanks
> >
> > The point of ANY_LAYOUT is to give drivers maximum flexibility.
> > For example, if driver wants to split the header at some specific
> > offset this is already possible without extra functionality.
> >
> > Let's keep it that way.
> >
> > Now, let's formulate what are some of the problems with the current way.
> >
> >
> >
> > A- mergeable buffers is even more flexible, since a single packet
> >   is built up of multiple buffers.
> 
> If I understand correctly, this is our v8.

I think it is, or at least close.

> > And in theory device can
> >   choose arbitrary set of buffers to store a packet.
> >   So you could supply a small buffer for headers followed by a bigger
> >   one for payload, in theory even without any changes.
> 
> This is very interesting, I did not think of this point.
> This is helpful to reduce the waste of memory.

Hmm good point. I should add: since we are no longer fully using the
first buffer the feature has a memory cost and - in case of a cache
pressure - can degrade performance rather than improve it.
Thus, allowing flexibility for both devices and drivers at runtime
rather than fixing things through features thus sounds like a good idea.


> >   Problem 1: However since this is not how devices currently operate,
> >   a feature bit would be helpful.
> >
> >   Problem 2: Also, in the past we found it useful to be able to figure out whether
> >   packet fits in a single buffer without looking at the header.
> >   For this reason, we have this text:
> >
> > 	If a receive packet is spread over multiple buffers, the device
> > 	MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> > 	1 buffers) completely up to the full length of each buffer
> > 	supplied by the driver.
> >
> >   if we want to keep this optimization and allow using a separate
> >   buffer for headers, then I think we could rely on the feature bit
> >   from Problem 1 and just make an exception for the first buffer.
> >   Also num_buffers is then always >= 2, maybe state this to avoid
> >   confusion.
> >
> 
> Yes, I think this is feasible.


Thinking more about this, a question is what to do about packets without
split header. I can see several options
A- add some buffers just for the non split case. without in-order they
  can just stay available. with in-order they have to be recycled which
  might be expensive.
  They will also occupy space in the ring. more memory costs.
  Don't much like it for above reasons.
  OTOH then I wanted to work on partial-order anyway. Maybe it's time
  to prioritize that work.

B- write all of the packet in the payload buffer and just skip header buffer (e.g. make it 0 size?)
  payload within packet will be misaligned then. Do we care? maybe not -
  this was supposed to be an exception. In this case:
  Problem B: where should virtio net header
  go then? we can put it in the header buffer still, or we can
  store it linear with the packet.  or we can leave both options, up to device.
  what is better might depend on
  different factors.  any chance of performance testing?

> >
> >
> >
> >
> > B- without mergeable, there's no flexibility. In particular, there can
> > not be uninitialized space between header and data. If we had flexibility here, this could be
> > helpful for alignment, security, etc.
> > Unfortunately, our hands are tied:
> >
> >
> > 	\field{len} is particularly useful
> > 	for drivers using untrusted buffers: if a driver does not know exactly
> > 	how much has been written by the device, the driver would have to zero
> > 	the buffer in advance to ensure no data leakage occurs.
> >
> > 	For example, a network driver may hand a received buffer directly to
> > 	an unprivileged userspace application.  If the network device has not
> > 	overwritten the bytes which were in that buffer, this could leak the
> > 	contents of freed memory from other processes to the application.
> 
> 
> I don't think this is very troublesome, the device can memset the hole by 0.

Yes it can. But that has a performance cost. How large - depends on the
hole size and a bunch of other factors.

> >
> >
> > so all buffers have to be initialized completely up to the reported
> > used length.
> >
> > It could be that the guarantee is not relevant in some use-cases.
> > We would have to specify that this is an exception to the rule,
> > explain that drivers must be careful about information leaks.
> > Let's say there's a feature bit that adds uninitialized space
> > somewhere. How much was added? We can find out by parsing the
> > packet but once you start doing that just to assemble the skb
> > you have already lost performance.
> > So lots of spec work, some security risks, and unclear performance.
> >
> >
> >
> >
> > Is above a fair summary?
> >
> >
> >
> > If yes I would say let's address A, but make sure we ask drivers
> > to clear the new feature bit if there's no mergeable
> > (as opposed to e.g. failing probe) so we can add
> > support for !mergeable down the road.
> >
> >
> >
> >
> >
> > --
> > MST
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-29  7:04                   ` Michael S. Tsirkin
@ 2022-09-29  8:24                     ` Xuan Zhuo
  2022-09-29 10:06                       ` Michael S. Tsirkin
  2022-10-08  4:37                     ` Jason Wang
  1 sibling, 1 reply; 31+ messages in thread
From: Xuan Zhuo @ 2022-09-29  8:24 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Virtio-Dev, Kangjie Xu, Heng Qi, Jason Wang

On Thu, 29 Sep 2022 03:04:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> > On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > > Jason I think the issue with previous proposals is that they conflict
> > > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > > driver flexibility in arranging the packet in memory is benefitial.
> > > >
> > > >
> > > > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > > > to not split the header when the layout doesn't fit for header splitting.
> > > > (And this seems the case even if we're using buffers).
> > >
> > > Well spec says:
> > >
> > >         indicates to both the device and the driver that no
> > >         assumptions were made about framing.
> > >
> > > if device assumes that descriptor boundaries are where
> > > driver wants packet to be stored that is clearly
> > > an assumption.
> >
> > Yes but what I want to say is, the device can choose to not split the
> > packet if the framing doesn't fit. Does it still comply with the above
> > description?
> >
> > Thanks
>
> The point of ANY_LAYOUT is to give drivers maximum flexibility.
> For example, if driver wants to split the header at some specific
> offset this is already possible without extra functionality.
>
> Let's keep it that way.
>
> Now, let's formulate what are some of the problems with the current way.
>
>
>
> A- mergeable buffers is even more flexible, since a single packet
>   is built up of multiple buffers.

If I understand correctly, this is our v8.

> And in theory device can
>   choose arbitrary set of buffers to store a packet.
>   So you could supply a small buffer for headers followed by a bigger
>   one for payload, in theory even without any changes.

This is very interesting, I did not think of this point.
This is helpful to reduce the waste of memory.

>   Problem 1: However since this is not how devices currently operate,
>   a feature bit would be helpful.
>
>   Problem 2: Also, in the past we found it useful to be able to figure out whether
>   packet fits in a single buffer without looking at the header.
>   For this reason, we have this text:
>
> 	If a receive packet is spread over multiple buffers, the device
> 	MUST use all buffers but the last (i.e. the first \field{num_buffers} -
> 	1 buffers) completely up to the full length of each buffer
> 	supplied by the driver.
>
>   if we want to keep this optimization and allow using a separate
>   buffer for headers, then I think we could rely on the feature bit
>   from Problem 1 and just make an exception for the first buffer.
>   Also num_buffers is then always >= 2, maybe state this to avoid
>   confusion.
>

Yes, I think this is feasible.

>
>
>
>
> B- without mergeable, there's no flexibility. In particular, there can
> not be uninitialized space between header and data. If we had flexibility here, this could be
> helpful for alignment, security, etc.
> Unfortunately, our hands are tied:
>
>
> 	\field{len} is particularly useful
> 	for drivers using untrusted buffers: if a driver does not know exactly
> 	how much has been written by the device, the driver would have to zero
> 	the buffer in advance to ensure no data leakage occurs.
>
> 	For example, a network driver may hand a received buffer directly to
> 	an unprivileged userspace application.  If the network device has not
> 	overwritten the bytes which were in that buffer, this could leak the
> 	contents of freed memory from other processes to the application.


I don't think this is very troublesome, the device can memset the hole by 0.

>
>
> so all buffers have to be initialized completely up to the reported
> used length.
>
> It could be that the guarantee is not relevant in some use-cases.
> We would have to specify that this is an exception to the rule,
> explain that drivers must be careful about information leaks.
> Let's say there's a feature bit that adds uninitialized space
> somewhere. How much was added? We can find out by parsing the
> packet but once you start doing that just to assemble the skb
> you have already lost performance.
> So lots of spec work, some security risks, and unclear performance.
>
>
>
>
> Is above a fair summary?
>
>
>
> If yes I would say let's address A, but make sure we ask drivers
> to clear the new feature bit if there's no mergeable
> (as opposed to e.g. failing probe) so we can add
> support for !mergeable down the road.
>
>
>
>
>
> --
> MST
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-29  1:48                 ` Jason Wang
@ 2022-09-29  7:04                   ` Michael S. Tsirkin
  2022-09-29  8:24                     ` Xuan Zhuo
  2022-10-08  4:37                     ` Jason Wang
  0 siblings, 2 replies; 31+ messages in thread
From: Michael S. Tsirkin @ 2022-09-29  7:04 UTC (permalink / raw)
  To: Jason Wang; +Cc: Xuan Zhuo, Virtio-Dev, Kangjie Xu, Heng Qi

On Thu, Sep 29, 2022 at 09:48:33AM +0800, Jason Wang wrote:
> On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > > Jason I think the issue with previous proposals is that they conflict
> > > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > > driver flexibility in arranging the packet in memory is benefitial.
> > >
> > >
> > > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > > to not split the header when the layout doesn't fit for header splitting.
> > > (And this seems the case even if we're using buffers).
> >
> > Well spec says:
> >
> >         indicates to both the device and the driver that no
> >         assumptions were made about framing.
> >
> > if device assumes that descriptor boundaries are where
> > driver wants packet to be stored that is clearly
> > an assumption.
> 
> Yes but what I want to say is, the device can choose to not split the
> packet if the framing doesn't fit. Does it still comply with the above
> description?
> 
> Thanks

The point of ANY_LAYOUT is to give drivers maximum flexibility.
For example, if driver wants to split the header at some specific
offset this is already possible without extra functionality.

Let's keep it that way.

Now, let's formulate what are some of the problems with the current way.



A- mergeable buffers is even more flexible, since a single packet
  is built up of multiple buffers. And in theory device can
  choose arbitrary set of buffers to store a packet.
  So you could supply a small buffer for headers followed by a bigger
  one for payload, in theory even without any changes.
  Problem 1: However since this is not how devices currently operate,
  a feature bit would be helpful.

  Problem 2: Also, in the past we found it useful to be able to figure out whether
  packet fits in a single buffer without looking at the header.
  For this reason, we have this text:

	If a receive packet is spread over multiple buffers, the device
	MUST use all buffers but the last (i.e. the first \field{num_buffers} -
	1 buffers) completely up to the full length of each buffer
	supplied by the driver.

  if we want to keep this optimization and allow using a separate
  buffer for headers, then I think we could rely on the feature bit
  from Problem 1 and just make an exception for the first buffer.
  Also num_buffers is then always >= 2, maybe state this to avoid
  confusion.

  



B- without mergeable, there's no flexibility. In particular, there can
not be uninitialized space between header and data. If we had flexibility here, this could be
helpful for alignment, security, etc.
Unfortunately, our hands are tied:


	\field{len} is particularly useful
	for drivers using untrusted buffers: if a driver does not know exactly
	how much has been written by the device, the driver would have to zero
	the buffer in advance to ensure no data leakage occurs.

	For example, a network driver may hand a received buffer directly to
	an unprivileged userspace application.  If the network device has not
	overwritten the bytes which were in that buffer, this could leak the
	contents of freed memory from other processes to the application.


so all buffers have to be initialized completely up to the reported
used length.

It could be that the guarantee is not relevant in some use-cases.
We would have to specify that this is an exception to the rule,
explain that drivers must be careful about information leaks.
Let's say there's a feature bit that adds uninitialized space
somewhere. How much was added? We can find out by parsing the
packet but once you start doing that just to assemble the skb
you have already lost performance.
So lots of spec work, some security risks, and unclear performance.




Is above a fair summary?



If yes I would say let's address A, but make sure we ask drivers
to clear the new feature bit if there's no mergeable
(as opposed to e.g. failing probe) so we can add
support for !mergeable down the road.





-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-28 13:39               ` Michael S. Tsirkin
@ 2022-09-29  1:48                 ` Jason Wang
  2022-09-29  7:04                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2022-09-29  1:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Xuan Zhuo, Virtio-Dev, Kangjie Xu, Heng Qi

On Wed, Sep 28, 2022 at 9:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > > Jason I think the issue with previous proposals is that they conflict
> > > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > > driver flexibility in arranging the packet in memory is benefitial.
> >
> >
> > Yes, but I didn't found how it can conflict the any_layout. Device can just
> > to not split the header when the layout doesn't fit for header splitting.
> > (And this seems the case even if we're using buffers).
>
> Well spec says:
>
>         indicates to both the device and the driver that no
>         assumptions were made about framing.
>
> if device assumes that descriptor boundaries are where
> driver wants packet to be stored that is clearly
> an assumption.

Yes but what I want to say is, the device can choose to not split the
packet if the framing doesn't fit. Does it still comply with the above
description?

Thanks

>
>
> --
> MST
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-26  8:06             ` Jason Wang
@ 2022-09-28 13:39               ` Michael S. Tsirkin
  2022-09-29  1:48                 ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2022-09-28 13:39 UTC (permalink / raw)
  To: Jason Wang; +Cc: Xuan Zhuo, Virtio-Dev, Kangjie Xu, Heng Qi

On Mon, Sep 26, 2022 at 04:06:17PM +0800, Jason Wang wrote:
> > Jason I think the issue with previous proposals is that they conflict
> > with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> > driver flexibility in arranging the packet in memory is benefitial.
> 
> 
> Yes, but I didn't found how it can conflict the any_layout. Device can just
> to not split the header when the layout doesn't fit for header splitting.
> (And this seems the case even if we're using buffers).

Well spec says:

	indicates to both the device and the driver that no
	assumptions were made about framing.

if device assumes that descriptor boundaries are where
driver wants packet to be stored that is clearly
an assumption.


-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-23  5:59           ` Michael S. Tsirkin
  2022-09-23  6:57             ` Xuan Zhuo
@ 2022-09-26  8:06             ` Jason Wang
  2022-09-28 13:39               ` Michael S. Tsirkin
  1 sibling, 1 reply; 31+ messages in thread
From: Jason Wang @ 2022-09-26  8:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Xuan Zhuo, Virtio-Dev, Kangjie Xu, Heng Qi


在 2022/9/23 13:59, Michael S. Tsirkin 写道:
> On Fri, Sep 23, 2022 at 12:04:28PM +0800, Jason Wang wrote:
>> On Fri, Sep 23, 2022 at 11:33 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>> On Wed, 21 Sep 2022 14:20:19 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>> On Tue, Sep 20, 2022 at 11:28 AM Heng Qi <hengqi@linux.alibaba.com> wrote:
>>>>> On Tue, Sep 20, 2022 at 09:59:22AM +0800, Jason Wang wrote:
>>>>>> 在 2022/9/16 10:56, hengqi 写道:
>>>>>>> From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>>>>>
>>>>>>> The purpose of this feature is to split the transport header and the payload
>>>>>>> of the packet.
>>>>>>>
>>>>>>> |                     receive buffer1(page)            | receive buffer2(page) |
>>>>>>> |<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|        payload        |
>>>>>>>               |<------------------------------->|
>>>>>>>                                ^
>>>>>>>                                |
>>>>>>>                             max_len
>>>>>>>
>>>>>>> We can use one page for every receive buffer. In this way, we can ensure that
>>>>>>> all payloads can be independently in a page, which is very beneficial for
>>>>>>> the zerocopy implemented by the upper layer.
>>>>>>>
>>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>>>>>> Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
>>>>>>> ---
>>>>>>> v8:
>>>>>>>      1. Do not depend on descriptor chain. @Michael S. Tsirkin
>>>>>>>      2. Add \field{offset} and \field{max_len}.
>>>>>>>      3. Fix some presentation issues. @Jason Wang
>>>>>>>      4. Clarify some paragraphs.
>>>>>>>
>>>>>>> v7:
>>>>>>>      1. Fix some presentation issues.
>>>>>>>      2. Use "split transport header". @Jason Wang
>>>>>>>      3. Clarify some paragraphs. @Cornelia Huck
>>>>>>>      4. determine the device what to do if it does not perform header split on a packet.
>>>>>>>
>>>>>>> v6:
>>>>>>>      1. Fix some syntax issues. @Cornelia Huck
>>>>>>>      2. Clarify some paragraphs. @Cornelia Huck
>>>>>>>      3. Determine the device what to do if it does not perform header split on a packet.
>>>>>>>
>>>>>>> v5:
>>>>>>>      1. Determine when hdr_len is credible in the process of rx
>>>>>>>      2. Clean up the use of buffers and descriptors
>>>>>>>      3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>>>>>>>
>>>>>>> v4:
>>>>>>>      1. fix typo @Cornelia Huck @Jason Wang
>>>>>>>      2. do not split header for IP fragmentation packet. @Jason Wang
>>>>>>>
>>>>>>>   conformance.tex |  2 ++
>>>>>>>   content.tex     | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>   2 files changed, 93 insertions(+)
>>>>>>>
>>>>>>> diff --git a/conformance.tex b/conformance.tex
>>>>>>> index 2b86fc6..4e2b82e 100644
>>>>>>> --- a/conformance.tex
>>>>>>> +++ b/conformance.tex
>>>>>>> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>>>>>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>>>>>>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>>>>>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>>>>>> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>>>>>>   \end{itemize}
>>>>>>>   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
>>>>>>> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>>>>>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>>>>>>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>>>>>>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>>>>>> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>>>>>>   \end{itemize}
>>>>>>>   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
>>>>>>> diff --git a/content.tex b/content.tex
>>>>>>> index e863709..fad9dea 100644
>>>>>>> --- a/content.tex
>>>>>>> +++ b/content.tex
>>>>>>> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>>>>>>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>>>>>>       channel.
>>>>>>> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
>>>>>>> +    the transport header and the payload.
>>>>>>> +
>>>>>>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>>>>>>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>>>>>>> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>>>>>>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>>>>>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>>>>>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>>>>>>> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_MRG_RXBUF.
>>>>>>>   \end{description}
>>>>>>>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
>>>>>>> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>>>>>>>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>>>>>>>   #define VIRTIO_NET_HDR_F_DATA_VALID    2
>>>>>>>   #define VIRTIO_NET_HDR_F_RSC_INFO      4
>>>>>>> +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
>>>>>>>           u8 flags;
>>>>>>>   #define VIRTIO_NET_HDR_GSO_NONE        0
>>>>>>>   #define VIRTIO_NET_HDR_GSO_TCPV4       1
>>>>>>> @@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>>>>>   been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>>>>>>>   transport header size.
>>>>>>>   The driver MUST NOT rely on \field{hdr_len} to be correct.
>>>>>>> +
>>>>>>> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
>>>>>>> +the driver SHOULD treat the \field{hdr_len} as the length of the transport
>>>>>>> +header inside the first buffer.
>>>>>>> +
>>>>>>>   \begin{note}
>>>>>>>   This is due to various bugs in implementations.
>>>>>>>   \end{note}
>>>>>>> @@ -4483,6 +4493,87 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>>>>>   according to the native endian of the guest rather than
>>>>>>>   (necessarily when not using the legacy interface) little-endian.
>>>>>>> +\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>>>>>> +
>>>>>>> +If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
>>>>>>> +the device supports splitting the transport header and the payload.
>>>>>>> +The transport header and the payload will be separated into different
>>>>>>> +buffers.
>>>>>>> +
>>>>>>> +\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
>>>>>>> +
>>>>>>> +To configure the split transport header, the following layout structure
>>>>>>> +and definitions are used:
>>>>>>> +
>>>>>>> +\begin{lstlisting}
>>>>>>> +struct virtio_net_split_transport_header_config {
>>>>>>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
>>>>>>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
>>>>>>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
>>>>>>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
>>>>>>> +    le64 type;
>>>>>>> +    le16 offset;
>>>>>>> +    le16 max_len;
>>>>>>> +};
>>>>>>> +
>>>>>>> +#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
>>>>>>> + #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
>>>>>>> +\end{lstlisting}
>>>>>>> +
>>>>>>> +The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
>>>>>>> +VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split
>>>>>>> +header configuration.
>>>>>>> +
>>>>>>> +The driver can enable or disable split transport header for different transport
>>>>>>> +protocols by setting or clearing corresponding bits in \field{type}.
>>>>>>> +
>>>>>>> +\begin{itemize}
>>>>>>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
>>>>>>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
>>>>>>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
>>>>>>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
>>>>>>> +\end{itemize}
>>>>>>> +
>>>>>>> +\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>>>>>> +
>>>>>>> +A device MUST disable transport header splitting upon reset and initialization.
>>>>>>> +
>>>>>>> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
>>>>>>> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
>>>>>>> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
>>>>>>> +
>>>>>>> +A device MUST NOT split the transport header if it encounters any of the following cases:
>>>>>>> +\begin{itemize}
>>>>>>> +    \item The device does not recognize the transport protocol of the packet.
>>>>>>> +    \item The packet is an IP fragmentation.
>>>>>>> +    \item The splitting of the specific transport protocol is not enabled via
>>>>>>> +        VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET.
>>>>>>> +    \item At most one buffer is available.
>>>>>>
>>>>>> So this means the feature is disabled for the device without
>>>>>> merge-able buffer? Note that, even in the case of mergeable buffer,
>>>>>> it doesn't mean a buffer that only contains a single descriptor.
>>>>>>
>>>>>>
>>>>> Yes, since the purpose of this scheme is to no longer depend on descriptor chains,
>>>>> the buffer submitted to the receiveq can be thought of as containing only one descriptor.
>>>>> So this feature depends on the mergeable buffer.
>>>> To tell the truth, I'm not sure this is a good choice. We never had a
>>>> feature that depends solely on the mergeable rx buffer before.
>>>> Especially considering that using a descriptor chain is not hard. And
>>>> I'm not sure we should care too much on the overhead since the
>>>> splitting is enabled by the administrator when it needs e.g zerocopy.
>>>
>>> It's overwhelmed us, and we haven't been able to agree on this.
>> Sorry for this but let's make an agreement before posting a new version.
> Right, let's do that.
> Jason I think the issue with previous proposals is that they conflict
> with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> driver flexibility in arranging the packet in memory is benefitial.


Yes, but I didn't found how it can conflict the any_layout. Device can 
just to not split the header when the layout doesn't fit for header 
splitting. (And this seems the case even if we're using buffers).


>
>
>
>>> Michael doesn't want to use desc chain, it's not just a performance issue. In an
>>> early email, he mentioned that desc chain may be abandoned in the future. So we
>>> have been trying not to rely on desc chain.
>> This seems to be a very large change which seems a little bit too
>> early to be considered.
> I'd like to put it in other terms. Fundamentally devices are not
> supposed to talk about descriptors at all. Descriptors are
> a way to describe buffers, and devices should all work in terms
> of buffers. I am working on cleaning up the spec from confusion
> and terminology mixups. We have several major sources all over the spec:
> - descriptor/buffer used inconsistently
> - feature negotiated/offered used inconsistently
> - field exists/valid used inconsistently


Ok.


>
> My way to address the first issue is to make sure devices all work
> with buffers. And buffers are described by descriptors (makes sense,
> right?) and made available to device by driver and used by device.
>
> The advantage of this is layering - we can change the way buffers
> are passed around without changing devices. And, it matches
> the virtio API nicely.


Yes, but this looks more like a spec tweak instead of a problem existed 
now: Driver doesn't care about the descriptors since driver talks to 
virtio core with sg so it has sufficient flexibility to organize the 
layout to what it wants.

1) sg #1, vnet_headers + l2/l3/l4 header
2) sg #2, l4 payload

For direct/indirect descriptors, core can simply using direct/indirect 
descriptor chain to describe the sg and place the payload in second 
buffer (instead of the descriptor).

For mergeable buffer, it might be more challenge since it's not easy to 
know if a buffer will be used for header or payload in advance. This 
makes it not easy to align payload buffer at page boundary especially 
consider packet header may need headrooms. We can't simply allocate 
pages and feed them to the device, that calls some new facilities:

1) offset to write as suggested by this patch, some memory might be 
wasted for the buffer that is used for header.
2) or device can assume header/payload buffers so driver can post 
buffers like:

2.1) header buffer #1
2.2) payload buffer #1
2.3) header buffer #2
2.4) payload buffer #2
...

There could be some transitional overhead (memory/descriptor wasting) 
when enabling and disabling header splitting but it should be affordable.


>
> Existing devices are all fine with this - they do not pass any
> information in the descriptor. Yes, this seems like an option to
> pass some information around, but I am not convinced it is worth
> the layering violation.
>
> By comparison, ability to write data at an offset seems generally
> useful, in particular we have a very old issue even without
> the split header feature where with mergeable buffers
> if we attempt to align the data in the 1st buffer at a cache line
> boundary by adding an offset before ETH header, then when it spills
> over to the second buffer it will be misaligned there. Wastes
> an extra cache line for such packets. Offsets can allow fixing this.


Probably but as discussed, this seems to be an independent feature and I 
don't see how it can help for splitting headers. And does this conflict 
with the any_layout somehow?


>
>
> I don't see architecturally what is wrong with making feature just
> depend on mergeable buffers for now.


Yes, but I don't see anything that prevent us from adding support for 
non-merge-able case. (Supporting mergeable buffer seems even more complex)

Thanks


>   We can always allow a combination
> down the road. Let's just make it clear that if drivers see SPLIT &&
> !MERGEABLE they should not fail probe, they should instead clear the
> split header feature.
>
>
>
>
>>> If we can't make a feature depend only on mergeable, should we use solution B?
>>>
>>>       2. Scheme B ( refer to your suggestion )
>>>
>>>       Our rethinking approach is no longer based on descriptor chain.
>>>
>>>       We refer to your proposed offset-based scheme as scheme B.
>> The offset seems to be the suggestion of Michael.
>>
>> I think I like the design of v7 for several reasons:
>>
>> 1) easy to reserve head/tailroom without any extension of the spec
>> 2) easy to work with mergeable rx buffer
>> 3) it is the model used by modern NIC like e810 [1]
>>
>> [1] e810 manual 2.4 Figure 10-4 have a nic diagram to demonstrate how
>> it works which is similar to v7
>>
>> Thanks
>>
>>>       As you suggested, scheme B gives the device a buffer, using offset to
>>>       indicate where to place the payload. Like this:
>>>
>>>       <header>...<padding>... <beginning of page><data>
>>>
>>>       But how to apply for this buffer?
>>>       Since we want the payload to be placed on a separate page, the method
>>>       we consider is to directly alloc two pages from driver of contiguous memory.
>>>
>>>       Then the beginning of this contiguous memory is used to store the headroom,
>>>       and the contiguous memory after the headroom is directly handed over to the device.
>>>       Similar to the following:
>>>
>>>       [------------------ receive buffer(2 pages) ------------------------------]
>>>       [<------------first page -------------------><------ second page -------->]
>>>       [<-----><virtnet hdr> <mac,ip,tcp>..<padding><       payload             >]
>>>          ^    ^
>>>          |    |
>>>          |    pointer to device
>>>          |
>>>          |
>>>          Driver reserved, the later part is filled
>>>
>>> We have already entered v8, but we have not been able to reach an agreement on
>>> the basic capabilities. I want to solve this problem first.
>>>
>>> @Jason @Michael
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>>>>>> +    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
>>>>>>
>>>>>> I don't get the reason why we need max_len. Can't it implied in the
>>>>>> length of the first descriptor?
>>>>>>
>>>>> Split transport header is usually used in high-throughput scenarios, such as GSO-enabled cases.
>>>>> Therefore, it is best to reserve tailroom with $ (the length of the buffer) - (\field{offset} + \filed{max_len}) $
>>>>> in the first buffer to build the non-linear data area of the socket buffer.
>>>>>
>>>>>>> +\end{itemize}
>>>>>>> +
>>>>>>> +If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
>>>>>>> +bit in \field{flags} MUST be set. The transport header MUST be on the first
>>>>>>> +buffer, following the virtio net header. The payload MUST start from the
>>>>>>> +second buffer. The device MUST set \field{hdr_len} of structure
>>>>>>> +virtio_net_hdr to the length of the transport header.
>>>>>>> +The used length still reports the number of bytes it has written to memory.
>>>>>>> +
>>>>>>> +\field{offset} and \field{max_len} are valid when device uses the first buffer.
>>>>>>> +The device MUST reserve space in the first buffer using \filed{offset}.
>>>>>>> +If \field{offset} exceeds the length of the buffer, the device MUST drop
>>>>>>> +the receive packets.
>>>>>>
>>>>>> Can the device simply don't split the packet in this case? Anyhow we
>>>>>> need synchronize the driver with the device in the case (e.g when
>>>>>> driver is try to having a new max_len).
>>>>>>
>>>>> We think that \field{offset} is actively set by the driver, so the driver
>>>>> will also receive packets according to this offset.
>>>>> But if the case is considered to be caused by driver error settings,
>>>>> the device can do not split the packet.
>>>> Note that protocol like ipv6 allows variable length of the header,
>>>> falling back to not split the header seems better to me.
>>>>
>>>> Thanks
>>>>
>>>>>> (I wonder if the offset deserves a independent feature (but depends
>>>>>> on the merge able) in this case).
>>>>>>
>>>>> Okay, we can consider later.
>>>>>
>>>>>>>   The maximum available length of the first buffer
>>>>>>> +used by the device is specified by \field{max_len}.
>>>>>>
>>>>>> Similarly the max length seems to be implied by length - offset?
>>>>>>
>>>>> You can refer to the above answer about \field{max_len} similarly.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>>   If \field{max_len} is 0 or
>>>>>>> +$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
>>>>>>> +the device can use the entire buffer starting at \field{offset}.
>>>>>>> +
>>>>>>> +\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>>>>>> +
>>>>>>> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
>>>>>>> +SHOULD treat the contents of \field{hdr_len} as the length of the transport header
>>>>>>> +inside the first buffer.
>>>>>>> +
>>>>>>> +If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
>>>>>>>   \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-23 11:04                   ` Michael S. Tsirkin
@ 2022-09-23 12:40                     ` Xuan Zhuo
  0 siblings, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2022-09-23 12:40 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Virtio-Dev, Kangjie Xu, Heng Qi, Jason Wang

On Fri, 23 Sep 2022 07:04:10 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Sep 23, 2022 at 06:48:56PM +0800, Xuan Zhuo wrote:
> > On Fri, 23 Sep 2022 06:44:54 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Fri, Sep 23, 2022 at 02:57:27PM +0800, Xuan Zhuo wrote:
> > > > > > > Michael doesn't want to use desc chain, it's not just a performance issue. In an
> > > > > > > early email, he mentioned that desc chain may be abandoned in the future. So we
> > > > > > > have been trying not to rely on desc chain.
> > > > > >
> > > > > > This seems to be a very large change which seems a little bit too
> > > > > > early to be considered.
> > > > >
> > > > > I'd like to put it in other terms. Fundamentally devices are not
> > > > > supposed to talk about descriptors at all. Descriptors are
> > > > > a way to describe buffers, and devices should all work in terms
> > > > > of buffers. I am working on cleaning up the spec from confusion
> > > > > and terminology mixups. We have several major sources all over the spec:
> > > > > - descriptor/buffer used inconsistently
> > > > > - feature negotiated/offered used inconsistently
> > > > > - field exists/valid used inconsistently
> > > > >
> > > > > My way to address the first issue is to make sure devices all work
> > > > > with buffers. And buffers are described by descriptors (makes sense,
> > > > > right?) and made available to device by driver and used by device.
> > > >
> > > > Can we try to keep desc info? I think it's a very important feature for
> > > > virtio-net, and many NICs are designed based on this.
> > > >
> > > > Our current solutions B and C will waste a lot of memory. If the page occupied
> > > > by the header can be quickly reclaimed, then we have to do a copy in the driver.
> > >
> > > Not sure I understand. Can you explain?
> >
> >
> > For example B, we allocate two consecutive pages, the second page is used for
> > payload. The first page is used to reserve the header. The first page is too
> > wasteful, 4k of space, only the header is saved. In this way, we can copy away
> > the data of the first page, so that the first page can be quickly reclaimed.
> >
> > Why two consecutive pages? The payload achieves page alignment, and there is
> > still some space in front of the page, so we must allocate two connected pages.
>
> Are we talking without mergeable here?

I think B(offset) is not dependent on mergeable .

C must depend on mergeable.

Thanks.

>
>
> > >
> > > > >
> > > > > The advantage of this is layering - we can change the way buffers
> > > > > are passed around without changing devices. And, it matches
> > > > > the virtio API nicely.
> > > > >
> > > > > Existing devices are all fine with this - they do not pass any
> > > > > information in the descriptor. Yes, this seems like an option to
> > > > > pass some information around, but I am not convinced it is worth
> > > > > the layering violation.
> > > > >
> > > > > By comparison, ability to write data at an offset seems generally
> > > > > useful, in particular we have a very old issue even without
> > > > > the split header feature where with mergeable buffers
> > > > > if we attempt to align the data in the 1st buffer at a cache line
> > > > > boundary by adding an offset before ETH header, then when it spills
> > > > > over to the second buffer it will be misaligned there. Wastes
> > > > > an extra cache line for such packets. Offsets can allow fixing this.
> > > > >
> > > >
> > > > Scheme B In addition to the memory problem, under this scheme, if we want to
> > > > implement tcp zerocopy, then we must apply for two consecutive pages for a
> > > > buffer. The second page is used to place the payload. The first is used to place
> > > > the header.
> > >
> > >
> > > I just don't understand why there would be a difference. Explanation?
> >
> > I'm concerned that allocating two contiguous pages in large numbers will cause
> > unknowable problems(such perfermance)
> >
> > Thanks.
> >
> >
> > >
> > > >
> > > > >
> > > > > I don't see architecturally what is wrong with making feature just
> > > > > depend on mergeable buffers for now. We can always allow a combination
> > > > > down the road. Let's just make it clear that if drivers see SPLIT &&
> > > > > !MERGEABLE they should not fail probe, they should instead clear the
> > > > > split header feature.
> > > > >
> > > >
> > > > According to my understanding, for B, it does not depend on mergeable.
> > > >
> > > > If we give up desc info completely, then we prefer B.
> > > >
> > > > Hi Jason, what are your thoughts?
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > If we can't make a feature depend only on mergeable, should we use solution B?
> > > > > > >
> > > > > > >      2. Scheme B ( refer to your suggestion )
> > > > > > >
> > > > > > >      Our rethinking approach is no longer based on descriptor chain.
> > > > > > >
> > > > > > >      We refer to your proposed offset-based scheme as scheme B.
> > > > > >
> > > > > > The offset seems to be the suggestion of Michael.
> > > > > >
> > > > > > I think I like the design of v7 for several reasons:
> > > > > >
> > > > > > 1) easy to reserve head/tailroom without any extension of the spec
> > > > > > 2) easy to work with mergeable rx buffer
> > > > > > 3) it is the model used by modern NIC like e810 [1]
> > > > > >
> > > > > > [1] e810 manual 2.4 Figure 10-4 have a nic diagram to demonstrate how
> > > > > > it works which is similar to v7
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >      As you suggested, scheme B gives the device a buffer, using offset to
> > > > > > >      indicate where to place the payload. Like this:
> > > > > > >
> > > > > > >      <header>...<padding>... <beginning of page><data>
> > > > > > >
> > > > > > >      But how to apply for this buffer?
> > > > > > >      Since we want the payload to be placed on a separate page, the method
> > > > > > >      we consider is to directly alloc two pages from driver of contiguous memory.
> > > > > > >
> > > > > > >      Then the beginning of this contiguous memory is used to store the headroom,
> > > > > > >      and the contiguous memory after the headroom is directly handed over to the device.
> > > > > > >      Similar to the following:
> > > > > > >
> > > > > > >      [------------------ receive buffer(2 pages) ------------------------------]
> > > > > > >      [<------------first page -------------------><------ second page -------->]
> > > > > > >      [<-----><virtnet hdr> <mac,ip,tcp>..<padding><       payload             >]
> > > > > > >         ^    ^
> > > > > > >         |    |
> > > > > > >         |    pointer to device
> > > > > > >         |
> > > > > > >         |
> > > > > > >         Driver reserved, the later part is filled
> > > > > > >
> > > > > > > We have already entered v8, but we have not been able to reach an agreement on
> > > > > > > the basic capabilities. I want to solve this problem first.
> > > > > > >
> > > > > > > @Jason @Michael
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > >+    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I don't get the reason why we need max_len. Can't it implied in the
> > > > > > > > > > length of the first descriptor?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Split transport header is usually used in high-throughput scenarios, such as GSO-enabled cases.
> > > > > > > > > Therefore, it is best to reserve tailroom with $ (the length of the buffer) - (\field{offset} + \filed{max_len}) $
> > > > > > > > > in the first buffer to build the non-linear data area of the socket buffer.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >+\end{itemize}
> > > > > > > > > > >+
> > > > > > > > > > >+If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
> > > > > > > > > > >+bit in \field{flags} MUST be set. The transport header MUST be on the first
> > > > > > > > > > >+buffer, following the virtio net header. The payload MUST start from the
> > > > > > > > > > >+second buffer. The device MUST set \field{hdr_len} of structure
> > > > > > > > > > >+virtio_net_hdr to the length of the transport header.
> > > > > > > > > > >+The used length still reports the number of bytes it has written to memory.
> > > > > > > > > > >+
> > > > > > > > > > >+\field{offset} and \field{max_len} are valid when device uses the first buffer.
> > > > > > > > > > >+The device MUST reserve space in the first buffer using \filed{offset}.
> > > > > > > > > > >+If \field{offset} exceeds the length of the buffer, the device MUST drop
> > > > > > > > > > >+the receive packets.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Can the device simply don't split the packet in this case? Anyhow we
> > > > > > > > > > need synchronize the driver with the device in the case (e.g when
> > > > > > > > > > driver is try to having a new max_len).
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > We think that \field{offset} is actively set by the driver, so the driver
> > > > > > > > > will also receive packets according to this offset.
> > > > > > > > > But if the case is considered to be caused by driver error settings,
> > > > > > > > > the device can do not split the packet.
> > > > > > > >
> > > > > > > > Note that protocol like ipv6 allows variable length of the header,
> > > > > > > > falling back to not split the header seems better to me.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > (I wonder if the offset deserves a independent feature (but depends
> > > > > > > > > > on the merge able) in this case).
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Okay, we can consider later.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >  The maximum available length of the first buffer
> > > > > > > > > > >+used by the device is specified by \field{max_len}.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Similarly the max length seems to be implied by length - offset?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > You can refer to the above answer about \field{max_len} similarly.
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >  If \field{max_len} is 0 or
> > > > > > > > > > >+$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
> > > > > > > > > > >+the device can use the entire buffer starting at \field{offset}.
> > > > > > > > > > >+
> > > > > > > > > > >+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > > > > > > >+
> > > > > > > > > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> > > > > > > > > > >+SHOULD treat the contents of \field{hdr_len} as the length of the transport header
> > > > > > > > > > >+inside the first buffer.
> > > > > > > > > > >+
> > > > > > > > > > >+If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
> > > > > > > > > > >  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > >
> > >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-23 10:48                 ` Xuan Zhuo
@ 2022-09-23 11:04                   ` Michael S. Tsirkin
  2022-09-23 12:40                     ` Xuan Zhuo
  0 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2022-09-23 11:04 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Virtio-Dev, Kangjie Xu, Heng Qi, Jason Wang

On Fri, Sep 23, 2022 at 06:48:56PM +0800, Xuan Zhuo wrote:
> On Fri, 23 Sep 2022 06:44:54 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Fri, Sep 23, 2022 at 02:57:27PM +0800, Xuan Zhuo wrote:
> > > > > > Michael doesn't want to use desc chain, it's not just a performance issue. In an
> > > > > > early email, he mentioned that desc chain may be abandoned in the future. So we
> > > > > > have been trying not to rely on desc chain.
> > > > >
> > > > > This seems to be a very large change which seems a little bit too
> > > > > early to be considered.
> > > >
> > > > I'd like to put it in other terms. Fundamentally devices are not
> > > > supposed to talk about descriptors at all. Descriptors are
> > > > a way to describe buffers, and devices should all work in terms
> > > > of buffers. I am working on cleaning up the spec from confusion
> > > > and terminology mixups. We have several major sources all over the spec:
> > > > - descriptor/buffer used inconsistently
> > > > - feature negotiated/offered used inconsistently
> > > > - field exists/valid used inconsistently
> > > >
> > > > My way to address the first issue is to make sure devices all work
> > > > with buffers. And buffers are described by descriptors (makes sense,
> > > > right?) and made available to device by driver and used by device.
> > >
> > > Can we try to keep desc info? I think it's a very important feature for
> > > virtio-net, and many NICs are designed based on this.
> > >
> > > Our current solutions B and C will waste a lot of memory. If the page occupied
> > > by the header can be quickly reclaimed, then we have to do a copy in the driver.
> >
> > Not sure I understand. Can you explain?
> 
> 
> For example B, we allocate two consecutive pages, the second page is used for
> payload. The first page is used to reserve the header. The first page is too
> wasteful, 4k of space, only the header is saved. In this way, we can copy away
> the data of the first page, so that the first page can be quickly reclaimed.
> 
> Why two consecutive pages? The payload achieves page alignment, and there is
> still some space in front of the page, so we must allocate two connected pages.

Are we talking without mergeable here?


> >
> > > >
> > > > The advantage of this is layering - we can change the way buffers
> > > > are passed around without changing devices. And, it matches
> > > > the virtio API nicely.
> > > >
> > > > Existing devices are all fine with this - they do not pass any
> > > > information in the descriptor. Yes, this seems like an option to
> > > > pass some information around, but I am not convinced it is worth
> > > > the layering violation.
> > > >
> > > > By comparison, ability to write data at an offset seems generally
> > > > useful, in particular we have a very old issue even without
> > > > the split header feature where with mergeable buffers
> > > > if we attempt to align the data in the 1st buffer at a cache line
> > > > boundary by adding an offset before ETH header, then when it spills
> > > > over to the second buffer it will be misaligned there. Wastes
> > > > an extra cache line for such packets. Offsets can allow fixing this.
> > > >
> > >
> > > Scheme B In addition to the memory problem, under this scheme, if we want to
> > > implement tcp zerocopy, then we must apply for two consecutive pages for a
> > > buffer. The second page is used to place the payload. The first is used to place
> > > the header.
> >
> >
> > I just don't understand why there would be a difference. Explanation?
> 
> I'm concerned that allocating two contiguous pages in large numbers will cause
> unknowable problems(such perfermance)
> 
> Thanks.
> 
> 
> >
> > >
> > > >
> > > > I don't see architecturally what is wrong with making feature just
> > > > depend on mergeable buffers for now. We can always allow a combination
> > > > down the road. Let's just make it clear that if drivers see SPLIT &&
> > > > !MERGEABLE they should not fail probe, they should instead clear the
> > > > split header feature.
> > > >
> > >
> > > According to my understanding, for B, it does not depend on mergeable.
> > >
> > > If we give up desc info completely, then we prefer B.
> > >
> > > Hi Jason, what are your thoughts?
> > >
> > > Thanks.
> > >
> > > >
> > > >
> > > >
> > > > > >
> > > > > > If we can't make a feature depend only on mergeable, should we use solution B?
> > > > > >
> > > > > >      2. Scheme B ( refer to your suggestion )
> > > > > >
> > > > > >      Our rethinking approach is no longer based on descriptor chain.
> > > > > >
> > > > > >      We refer to your proposed offset-based scheme as scheme B.
> > > > >
> > > > > The offset seems to be the suggestion of Michael.
> > > > >
> > > > > I think I like the design of v7 for several reasons:
> > > > >
> > > > > 1) easy to reserve head/tailroom without any extension of the spec
> > > > > 2) easy to work with mergeable rx buffer
> > > > > 3) it is the model used by modern NIC like e810 [1]
> > > > >
> > > > > [1] e810 manual 2.4 Figure 10-4 have a nic diagram to demonstrate how
> > > > > it works which is similar to v7
> > > > >
> > > > > Thanks
> > > > >
> > > > > >      As you suggested, scheme B gives the device a buffer, using offset to
> > > > > >      indicate where to place the payload. Like this:
> > > > > >
> > > > > >      <header>...<padding>... <beginning of page><data>
> > > > > >
> > > > > >      But how to apply for this buffer?
> > > > > >      Since we want the payload to be placed on a separate page, the method
> > > > > >      we consider is to directly alloc two pages from driver of contiguous memory.
> > > > > >
> > > > > >      Then the beginning of this contiguous memory is used to store the headroom,
> > > > > >      and the contiguous memory after the headroom is directly handed over to the device.
> > > > > >      Similar to the following:
> > > > > >
> > > > > >      [------------------ receive buffer(2 pages) ------------------------------]
> > > > > >      [<------------first page -------------------><------ second page -------->]
> > > > > >      [<-----><virtnet hdr> <mac,ip,tcp>..<padding><       payload             >]
> > > > > >         ^    ^
> > > > > >         |    |
> > > > > >         |    pointer to device
> > > > > >         |
> > > > > >         |
> > > > > >         Driver reserved, the later part is filled
> > > > > >
> > > > > > We have already entered v8, but we have not been able to reach an agreement on
> > > > > > the basic capabilities. I want to solve this problem first.
> > > > > >
> > > > > > @Jason @Michael
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > >+    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I don't get the reason why we need max_len. Can't it implied in the
> > > > > > > > > length of the first descriptor?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Split transport header is usually used in high-throughput scenarios, such as GSO-enabled cases.
> > > > > > > > Therefore, it is best to reserve tailroom with $ (the length of the buffer) - (\field{offset} + \filed{max_len}) $
> > > > > > > > in the first buffer to build the non-linear data area of the socket buffer.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >+\end{itemize}
> > > > > > > > > >+
> > > > > > > > > >+If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
> > > > > > > > > >+bit in \field{flags} MUST be set. The transport header MUST be on the first
> > > > > > > > > >+buffer, following the virtio net header. The payload MUST start from the
> > > > > > > > > >+second buffer. The device MUST set \field{hdr_len} of structure
> > > > > > > > > >+virtio_net_hdr to the length of the transport header.
> > > > > > > > > >+The used length still reports the number of bytes it has written to memory.
> > > > > > > > > >+
> > > > > > > > > >+\field{offset} and \field{max_len} are valid when device uses the first buffer.
> > > > > > > > > >+The device MUST reserve space in the first buffer using \filed{offset}.
> > > > > > > > > >+If \field{offset} exceeds the length of the buffer, the device MUST drop
> > > > > > > > > >+the receive packets.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Can the device simply don't split the packet in this case? Anyhow we
> > > > > > > > > need synchronize the driver with the device in the case (e.g when
> > > > > > > > > driver is try to having a new max_len).
> > > > > > > > >
> > > > > > > >
> > > > > > > > We think that \field{offset} is actively set by the driver, so the driver
> > > > > > > > will also receive packets according to this offset.
> > > > > > > > But if the case is considered to be caused by driver error settings,
> > > > > > > > the device can do not split the packet.
> > > > > > >
> > > > > > > Note that protocol like ipv6 allows variable length of the header,
> > > > > > > falling back to not split the header seems better to me.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > > (I wonder if the offset deserves a independent feature (but depends
> > > > > > > > > on the merge able) in this case).
> > > > > > > > >
> > > > > > > >
> > > > > > > > Okay, we can consider later.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >  The maximum available length of the first buffer
> > > > > > > > > >+used by the device is specified by \field{max_len}.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Similarly the max length seems to be implied by length - offset?
> > > > > > > > >
> > > > > > > >
> > > > > > > > You can refer to the above answer about \field{max_len} similarly.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >  If \field{max_len} is 0 or
> > > > > > > > > >+$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
> > > > > > > > > >+the device can use the entire buffer starting at \field{offset}.
> > > > > > > > > >+
> > > > > > > > > >+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > > > > > >+
> > > > > > > > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> > > > > > > > > >+SHOULD treat the contents of \field{hdr_len} as the length of the transport header
> > > > > > > > > >+inside the first buffer.
> > > > > > > > > >+
> > > > > > > > > >+If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
> > > > > > > > > >  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > >
> > > > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-23 10:44               ` Michael S. Tsirkin
@ 2022-09-23 10:48                 ` Xuan Zhuo
  2022-09-23 11:04                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 31+ messages in thread
From: Xuan Zhuo @ 2022-09-23 10:48 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Virtio-Dev, Kangjie Xu, Heng Qi, Jason Wang

On Fri, 23 Sep 2022 06:44:54 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Sep 23, 2022 at 02:57:27PM +0800, Xuan Zhuo wrote:
> > > > > Michael doesn't want to use desc chain, it's not just a performance issue. In an
> > > > > early email, he mentioned that desc chain may be abandoned in the future. So we
> > > > > have been trying not to rely on desc chain.
> > > >
> > > > This seems to be a very large change which seems a little bit too
> > > > early to be considered.
> > >
> > > I'd like to put it in other terms. Fundamentally devices are not
> > > supposed to talk about descriptors at all. Descriptors are
> > > a way to describe buffers, and devices should all work in terms
> > > of buffers. I am working on cleaning up the spec from confusion
> > > and terminology mixups. We have several major sources all over the spec:
> > > - descriptor/buffer used inconsistently
> > > - feature negotiated/offered used inconsistently
> > > - field exists/valid used inconsistently
> > >
> > > My way to address the first issue is to make sure devices all work
> > > with buffers. And buffers are described by descriptors (makes sense,
> > > right?) and made available to device by driver and used by device.
> >
> > Can we try to keep desc info? I think it's a very important feature for
> > virtio-net, and many NICs are designed based on this.
> >
> > Our current solutions B and C will waste a lot of memory. If the page occupied
> > by the header can be quickly reclaimed, then we have to do a copy in the driver.
>
> Not sure I understand. Can you explain?


For example B, we allocate two consecutive pages, the second page is used for
payload. The first page is used to reserve the header. The first page is too
wasteful, 4k of space, only the header is saved. In this way, we can copy away
the data of the first page, so that the first page can be quickly reclaimed.

Why two consecutive pages? The payload achieves page alignment, and there is
still some space in front of the page, so we must allocate two connected pages.

>
> > >
> > > The advantage of this is layering - we can change the way buffers
> > > are passed around without changing devices. And, it matches
> > > the virtio API nicely.
> > >
> > > Existing devices are all fine with this - they do not pass any
> > > information in the descriptor. Yes, this seems like an option to
> > > pass some information around, but I am not convinced it is worth
> > > the layering violation.
> > >
> > > By comparison, ability to write data at an offset seems generally
> > > useful, in particular we have a very old issue even without
> > > the split header feature where with mergeable buffers
> > > if we attempt to align the data in the 1st buffer at a cache line
> > > boundary by adding an offset before ETH header, then when it spills
> > > over to the second buffer it will be misaligned there. Wastes
> > > an extra cache line for such packets. Offsets can allow fixing this.
> > >
> >
> > Scheme B In addition to the memory problem, under this scheme, if we want to
> > implement tcp zerocopy, then we must apply for two consecutive pages for a
> > buffer. The second page is used to place the payload. The first is used to place
> > the header.
>
>
> I just don't understand why there would be a difference. Explanation?

I'm concerned that allocating two contiguous pages in large numbers will cause
unknowable problems(such perfermance)

Thanks.


>
> >
> > >
> > > I don't see architecturally what is wrong with making feature just
> > > depend on mergeable buffers for now. We can always allow a combination
> > > down the road. Let's just make it clear that if drivers see SPLIT &&
> > > !MERGEABLE they should not fail probe, they should instead clear the
> > > split header feature.
> > >
> >
> > According to my understanding, for B, it does not depend on mergeable.
> >
> > If we give up desc info completely, then we prefer B.
> >
> > Hi Jason, what are your thoughts?
> >
> > Thanks.
> >
> > >
> > >
> > >
> > > > >
> > > > > If we can't make a feature depend only on mergeable, should we use solution B?
> > > > >
> > > > >      2. Scheme B ( refer to your suggestion )
> > > > >
> > > > >      Our rethinking approach is no longer based on descriptor chain.
> > > > >
> > > > >      We refer to your proposed offset-based scheme as scheme B.
> > > >
> > > > The offset seems to be the suggestion of Michael.
> > > >
> > > > I think I like the design of v7 for several reasons:
> > > >
> > > > 1) easy to reserve head/tailroom without any extension of the spec
> > > > 2) easy to work with mergeable rx buffer
> > > > 3) it is the model used by modern NIC like e810 [1]
> > > >
> > > > [1] e810 manual 2.4 Figure 10-4 have a nic diagram to demonstrate how
> > > > it works which is similar to v7
> > > >
> > > > Thanks
> > > >
> > > > >      As you suggested, scheme B gives the device a buffer, using offset to
> > > > >      indicate where to place the payload. Like this:
> > > > >
> > > > >      <header>...<padding>... <beginning of page><data>
> > > > >
> > > > >      But how to apply for this buffer?
> > > > >      Since we want the payload to be placed on a separate page, the method
> > > > >      we consider is to directly alloc two pages from driver of contiguous memory.
> > > > >
> > > > >      Then the beginning of this contiguous memory is used to store the headroom,
> > > > >      and the contiguous memory after the headroom is directly handed over to the device.
> > > > >      Similar to the following:
> > > > >
> > > > >      [------------------ receive buffer(2 pages) ------------------------------]
> > > > >      [<------------first page -------------------><------ second page -------->]
> > > > >      [<-----><virtnet hdr> <mac,ip,tcp>..<padding><       payload             >]
> > > > >         ^    ^
> > > > >         |    |
> > > > >         |    pointer to device
> > > > >         |
> > > > >         |
> > > > >         Driver reserved, the later part is filled
> > > > >
> > > > > We have already entered v8, but we have not been able to reach an agreement on
> > > > > the basic capabilities. I want to solve this problem first.
> > > > >
> > > > > @Jason @Michael
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > >+    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
> > > > > > > >
> > > > > > > >
> > > > > > > > I don't get the reason why we need max_len. Can't it implied in the
> > > > > > > > length of the first descriptor?
> > > > > > > >
> > > > > > >
> > > > > > > Split transport header is usually used in high-throughput scenarios, such as GSO-enabled cases.
> > > > > > > Therefore, it is best to reserve tailroom with $ (the length of the buffer) - (\field{offset} + \filed{max_len}) $
> > > > > > > in the first buffer to build the non-linear data area of the socket buffer.
> > > > > > >
> > > > > > > >
> > > > > > > > >+\end{itemize}
> > > > > > > > >+
> > > > > > > > >+If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
> > > > > > > > >+bit in \field{flags} MUST be set. The transport header MUST be on the first
> > > > > > > > >+buffer, following the virtio net header. The payload MUST start from the
> > > > > > > > >+second buffer. The device MUST set \field{hdr_len} of structure
> > > > > > > > >+virtio_net_hdr to the length of the transport header.
> > > > > > > > >+The used length still reports the number of bytes it has written to memory.
> > > > > > > > >+
> > > > > > > > >+\field{offset} and \field{max_len} are valid when device uses the first buffer.
> > > > > > > > >+The device MUST reserve space in the first buffer using \filed{offset}.
> > > > > > > > >+If \field{offset} exceeds the length of the buffer, the device MUST drop
> > > > > > > > >+the receive packets.
> > > > > > > >
> > > > > > > >
> > > > > > > > Can the device simply don't split the packet in this case? Anyhow we
> > > > > > > > need synchronize the driver with the device in the case (e.g when
> > > > > > > > driver is try to having a new max_len).
> > > > > > > >
> > > > > > >
> > > > > > > We think that \field{offset} is actively set by the driver, so the driver
> > > > > > > will also receive packets according to this offset.
> > > > > > > But if the case is considered to be caused by driver error settings,
> > > > > > > the device can do not split the packet.
> > > > > >
> > > > > > Note that protocol like ipv6 allows variable length of the header,
> > > > > > falling back to not split the header seems better to me.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > > (I wonder if the offset deserves a independent feature (but depends
> > > > > > > > on the merge able) in this case).
> > > > > > > >
> > > > > > >
> > > > > > > Okay, we can consider later.
> > > > > > >
> > > > > > > >
> > > > > > > > >  The maximum available length of the first buffer
> > > > > > > > >+used by the device is specified by \field{max_len}.
> > > > > > > >
> > > > > > > >
> > > > > > > > Similarly the max length seems to be implied by length - offset?
> > > > > > > >
> > > > > > >
> > > > > > > You can refer to the above answer about \field{max_len} similarly.
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > >
> > > > > > > > >  If \field{max_len} is 0 or
> > > > > > > > >+$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
> > > > > > > > >+the device can use the entire buffer starting at \field{offset}.
> > > > > > > > >+
> > > > > > > > >+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > > > > >+
> > > > > > > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> > > > > > > > >+SHOULD treat the contents of \field{hdr_len} as the length of the transport header
> > > > > > > > >+inside the first buffer.
> > > > > > > > >+
> > > > > > > > >+If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
> > > > > > > > >  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > > > >
> > > > > >
> > > > > >
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > >
> > > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-23  6:57             ` Xuan Zhuo
@ 2022-09-23 10:44               ` Michael S. Tsirkin
  2022-09-23 10:48                 ` Xuan Zhuo
  0 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2022-09-23 10:44 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Virtio-Dev, Kangjie Xu, Heng Qi, Jason Wang

On Fri, Sep 23, 2022 at 02:57:27PM +0800, Xuan Zhuo wrote:
> > > > Michael doesn't want to use desc chain, it's not just a performance issue. In an
> > > > early email, he mentioned that desc chain may be abandoned in the future. So we
> > > > have been trying not to rely on desc chain.
> > >
> > > This seems to be a very large change which seems a little bit too
> > > early to be considered.
> >
> > I'd like to put it in other terms. Fundamentally devices are not
> > supposed to talk about descriptors at all. Descriptors are
> > a way to describe buffers, and devices should all work in terms
> > of buffers. I am working on cleaning up the spec from confusion
> > and terminology mixups. We have several major sources all over the spec:
> > - descriptor/buffer used inconsistently
> > - feature negotiated/offered used inconsistently
> > - field exists/valid used inconsistently
> >
> > My way to address the first issue is to make sure devices all work
> > with buffers. And buffers are described by descriptors (makes sense,
> > right?) and made available to device by driver and used by device.
> 
> Can we try to keep desc info? I think it's a very important feature for
> virtio-net, and many NICs are designed based on this.
> 
> Our current solutions B and C will waste a lot of memory. If the page occupied
> by the header can be quickly reclaimed, then we have to do a copy in the driver.

Not sure I understand. Can you explain?

> >
> > The advantage of this is layering - we can change the way buffers
> > are passed around without changing devices. And, it matches
> > the virtio API nicely.
> >
> > Existing devices are all fine with this - they do not pass any
> > information in the descriptor. Yes, this seems like an option to
> > pass some information around, but I am not convinced it is worth
> > the layering violation.
> >
> > By comparison, ability to write data at an offset seems generally
> > useful, in particular we have a very old issue even without
> > the split header feature where with mergeable buffers
> > if we attempt to align the data in the 1st buffer at a cache line
> > boundary by adding an offset before ETH header, then when it spills
> > over to the second buffer it will be misaligned there. Wastes
> > an extra cache line for such packets. Offsets can allow fixing this.
> >
> 
> Scheme B In addition to the memory problem, under this scheme, if we want to
> implement tcp zerocopy, then we must apply for two consecutive pages for a
> buffer. The second page is used to place the payload. The first is used to place
> the header.


I just don't understand why there would be a difference. Explanation?

> 
> >
> > I don't see architecturally what is wrong with making feature just
> > depend on mergeable buffers for now. We can always allow a combination
> > down the road. Let's just make it clear that if drivers see SPLIT &&
> > !MERGEABLE they should not fail probe, they should instead clear the
> > split header feature.
> >
> 
> According to my understanding, for B, it does not depend on mergeable.
> 
> If we give up desc info completely, then we prefer B.
> 
> Hi Jason, what are your thoughts?
> 
> Thanks.
> 
> >
> >
> >
> > > >
> > > > If we can't make a feature depend only on mergeable, should we use solution B?
> > > >
> > > >      2. Scheme B ( refer to your suggestion )
> > > >
> > > >      Our rethinking approach is no longer based on descriptor chain.
> > > >
> > > >      We refer to your proposed offset-based scheme as scheme B.
> > >
> > > The offset seems to be the suggestion of Michael.
> > >
> > > I think I like the design of v7 for several reasons:
> > >
> > > 1) easy to reserve head/tailroom without any extension of the spec
> > > 2) easy to work with mergeable rx buffer
> > > 3) it is the model used by modern NIC like e810 [1]
> > >
> > > [1] e810 manual 2.4 Figure 10-4 have a nic diagram to demonstrate how
> > > it works which is similar to v7
> > >
> > > Thanks
> > >
> > > >      As you suggested, scheme B gives the device a buffer, using offset to
> > > >      indicate where to place the payload. Like this:
> > > >
> > > >      <header>...<padding>... <beginning of page><data>
> > > >
> > > >      But how to apply for this buffer?
> > > >      Since we want the payload to be placed on a separate page, the method
> > > >      we consider is to directly alloc two pages from driver of contiguous memory.
> > > >
> > > >      Then the beginning of this contiguous memory is used to store the headroom,
> > > >      and the contiguous memory after the headroom is directly handed over to the device.
> > > >      Similar to the following:
> > > >
> > > >      [------------------ receive buffer(2 pages) ------------------------------]
> > > >      [<------------first page -------------------><------ second page -------->]
> > > >      [<-----><virtnet hdr> <mac,ip,tcp>..<padding><       payload             >]
> > > >         ^    ^
> > > >         |    |
> > > >         |    pointer to device
> > > >         |
> > > >         |
> > > >         Driver reserved, the later part is filled
> > > >
> > > > We have already entered v8, but we have not been able to reach an agreement on
> > > > the basic capabilities. I want to solve this problem first.
> > > >
> > > > @Jason @Michael
> > > >
> > > > Thanks.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > > >+    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
> > > > > > >
> > > > > > >
> > > > > > > I don't get the reason why we need max_len. Can't it implied in the
> > > > > > > length of the first descriptor?
> > > > > > >
> > > > > >
> > > > > > Split transport header is usually used in high-throughput scenarios, such as GSO-enabled cases.
> > > > > > Therefore, it is best to reserve tailroom with $ (the length of the buffer) - (\field{offset} + \filed{max_len}) $
> > > > > > in the first buffer to build the non-linear data area of the socket buffer.
> > > > > >
> > > > > > >
> > > > > > > >+\end{itemize}
> > > > > > > >+
> > > > > > > >+If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
> > > > > > > >+bit in \field{flags} MUST be set. The transport header MUST be on the first
> > > > > > > >+buffer, following the virtio net header. The payload MUST start from the
> > > > > > > >+second buffer. The device MUST set \field{hdr_len} of structure
> > > > > > > >+virtio_net_hdr to the length of the transport header.
> > > > > > > >+The used length still reports the number of bytes it has written to memory.
> > > > > > > >+
> > > > > > > >+\field{offset} and \field{max_len} are valid when device uses the first buffer.
> > > > > > > >+The device MUST reserve space in the first buffer using \filed{offset}.
> > > > > > > >+If \field{offset} exceeds the length of the buffer, the device MUST drop
> > > > > > > >+the receive packets.
> > > > > > >
> > > > > > >
> > > > > > > Can the device simply don't split the packet in this case? Anyhow we
> > > > > > > need synchronize the driver with the device in the case (e.g when
> > > > > > > driver is try to having a new max_len).
> > > > > > >
> > > > > >
> > > > > > We think that \field{offset} is actively set by the driver, so the driver
> > > > > > will also receive packets according to this offset.
> > > > > > But if the case is considered to be caused by driver error settings,
> > > > > > the device can do not split the packet.
> > > > >
> > > > > Note that protocol like ipv6 allows variable length of the header,
> > > > > falling back to not split the header seems better to me.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > > (I wonder if the offset deserves a independent feature (but depends
> > > > > > > on the merge able) in this case).
> > > > > > >
> > > > > >
> > > > > > Okay, we can consider later.
> > > > > >
> > > > > > >
> > > > > > > >  The maximum available length of the first buffer
> > > > > > > >+used by the device is specified by \field{max_len}.
> > > > > > >
> > > > > > >
> > > > > > > Similarly the max length seems to be implied by length - offset?
> > > > > > >
> > > > > >
> > > > > > You can refer to the above answer about \field{max_len} similarly.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > >  If \field{max_len} is 0 or
> > > > > > > >+$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
> > > > > > > >+the device can use the entire buffer starting at \field{offset}.
> > > > > > > >+
> > > > > > > >+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > > > >+
> > > > > > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> > > > > > > >+SHOULD treat the contents of \field{hdr_len} as the length of the transport header
> > > > > > > >+inside the first buffer.
> > > > > > > >+
> > > > > > > >+If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
> > > > > > > >  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > > >
> > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > >
> > > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-23  5:59           ` Michael S. Tsirkin
@ 2022-09-23  6:57             ` Xuan Zhuo
  2022-09-23 10:44               ` Michael S. Tsirkin
  2022-09-26  8:06             ` Jason Wang
  1 sibling, 1 reply; 31+ messages in thread
From: Xuan Zhuo @ 2022-09-23  6:57 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Virtio-Dev, Kangjie Xu, Heng Qi, Jason Wang

On Fri, 23 Sep 2022 01:59:31 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Sep 23, 2022 at 12:04:28PM +0800, Jason Wang wrote:
> > On Fri, Sep 23, 2022 at 11:33 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 21 Sep 2022 14:20:19 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Sep 20, 2022 at 11:28 AM Heng Qi <hengqi@linux.alibaba.com> wrote:
> > > > >
> > > > > On Tue, Sep 20, 2022 at 09:59:22AM +0800, Jason Wang wrote:
> > > > > >
> > > > > > 在 2022/9/16 10:56, hengqi 写道:
> > > > > > >From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > >
> > > > > > >The purpose of this feature is to split the transport header and the payload
> > > > > > >of the packet.
> > > > > > >
> > > > > > >|                     receive buffer1(page)            | receive buffer2(page) |
> > > > > > >|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|        payload        |
> > > > > > >              |<------------------------------->|
> > > > > > >                               ^
> > > > > > >                               |
> > > > > > >                            max_len
> > > > > > >
> > > > > > >We can use one page for every receive buffer. In this way, we can ensure that
> > > > > > >all payloads can be independently in a page, which is very beneficial for
> > > > > > >the zerocopy implemented by the upper layer.
> > > > > > >
> > > > > > >Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > >Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > > > > > >Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> > > > > > >---
> > > > > > >v8:
> > > > > > >     1. Do not depend on descriptor chain. @Michael S. Tsirkin
> > > > > > >     2. Add \field{offset} and \field{max_len}.
> > > > > > >     3. Fix some presentation issues. @Jason Wang
> > > > > > >     4. Clarify some paragraphs.
> > > > > > >
> > > > > > >v7:
> > > > > > >     1. Fix some presentation issues.
> > > > > > >     2. Use "split transport header". @Jason Wang
> > > > > > >     3. Clarify some paragraphs. @Cornelia Huck
> > > > > > >     4. determine the device what to do if it does not perform header split on a packet.
> > > > > > >
> > > > > > >v6:
> > > > > > >     1. Fix some syntax issues. @Cornelia Huck
> > > > > > >     2. Clarify some paragraphs. @Cornelia Huck
> > > > > > >     3. Determine the device what to do if it does not perform header split on a packet.
> > > > > > >
> > > > > > >v5:
> > > > > > >     1. Determine when hdr_len is credible in the process of rx
> > > > > > >     2. Clean up the use of buffers and descriptors
> > > > > > >     3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
> > > > > > >
> > > > > > >v4:
> > > > > > >     1. fix typo @Cornelia Huck @Jason Wang
> > > > > > >     2. do not split header for IP fragmentation packet. @Jason Wang
> > > > > > >
> > > > > > >  conformance.tex |  2 ++
> > > > > > >  content.tex     | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > >  2 files changed, 93 insertions(+)
> > > > > > >
> > > > > > >diff --git a/conformance.tex b/conformance.tex
> > > > > > >index 2b86fc6..4e2b82e 100644
> > > > > > >--- a/conformance.tex
> > > > > > >+++ b/conformance.tex
> > > > > > >@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > > > > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> > > > > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > > > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > > > >+\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > > >  \end{itemize}
> > > > > > >  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> > > > > > >@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > > > > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> > > > > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> > > > > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > > > >+\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > > >  \end{itemize}
> > > > > > >  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> > > > > > >diff --git a/content.tex b/content.tex
> > > > > > >index e863709..fad9dea 100644
> > > > > > >--- a/content.tex
> > > > > > >+++ b/content.tex
> > > > > > >@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> > > > > > >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > > > > > >      channel.
> > > > > > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> > > > > > >+    the transport header and the payload.
> > > > > > >+
> > > > > > >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> > > > > > >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > > > > > >@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> > > > > > >  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> > > > > > >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> > > > > > >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > > > > > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_MRG_RXBUF.
> > > > > > >  \end{description}
> > > > > > >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > > > > > >@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> > > > > > >  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
> > > > > > >  #define VIRTIO_NET_HDR_F_DATA_VALID    2
> > > > > > >  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> > > > > > >+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
> > > > > > >          u8 flags;
> > > > > > >  #define VIRTIO_NET_HDR_GSO_NONE        0
> > > > > > >  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> > > > > > >@@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > > > > >  been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
> > > > > > >  transport header size.
> > > > > > >  The driver MUST NOT rely on \field{hdr_len} to be correct.
> > > > > > >+
> > > > > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> > > > > > >+the driver SHOULD treat the \field{hdr_len} as the length of the transport
> > > > > > >+header inside the first buffer.
> > > > > > >+
> > > > > > >  \begin{note}
> > > > > > >  This is due to various bugs in implementations.
> > > > > > >  \end{note}
> > > > > > >@@ -4483,6 +4493,87 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > > > > > >  according to the native endian of the guest rather than
> > > > > > >  (necessarily when not using the legacy interface) little-endian.
> > > > > > >+\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > > >+
> > > > > > >+If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> > > > > > >+the device supports splitting the transport header and the payload.
> > > > > > >+The transport header and the payload will be separated into different
> > > > > > >+buffers.
> > > > > > >+
> > > > > > >+\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> > > > > > >+
> > > > > > >+To configure the split transport header, the following layout structure
> > > > > > >+and definitions are used:
> > > > > > >+
> > > > > > >+\begin{lstlisting}
> > > > > > >+struct virtio_net_split_transport_header_config {
> > > > > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> > > > > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> > > > > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> > > > > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> > > > > > >+    le64 type;
> > > > > > >+    le16 offset;
> > > > > > >+    le16 max_len;
> > > > > > >+};
> > > > > > >+
> > > > > > >+#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> > > > > > >+ #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> > > > > > >+\end{lstlisting}
> > > > > > >+
> > > > > > >+The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> > > > > > >+VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split
> > > > > > >+header configuration.
> > > > > > >+
> > > > > > >+The driver can enable or disable split transport header for different transport
> > > > > > >+protocols by setting or clearing corresponding bits in \field{type}.
> > > > > > >+
> > > > > > >+\begin{itemize}
> > > > > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> > > > > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> > > > > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> > > > > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> > > > > > >+\end{itemize}
> > > > > > >+
> > > > > > >+\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > > >+
> > > > > > >+A device MUST disable transport header splitting upon reset and initialization.
> > > > > > >+
> > > > > > >+If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> > > > > > >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> > > > > > >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> > > > > > >+
> > > > > > >+A device MUST NOT split the transport header if it encounters any of the following cases:
> > > > > > >+\begin{itemize}
> > > > > > >+    \item The device does not recognize the transport protocol of the packet.
> > > > > > >+    \item The packet is an IP fragmentation.
> > > > > > >+    \item The splitting of the specific transport protocol is not enabled via
> > > > > > >+        VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET.
> > > > > > >+    \item At most one buffer is available.
> > > > > >
> > > > > >
> > > > > > So this means the feature is disabled for the device without
> > > > > > merge-able buffer? Note that, even in the case of mergeable buffer,
> > > > > > it doesn't mean a buffer that only contains a single descriptor.
> > > > > >
> > > > > >
> > > > >
> > > > > Yes, since the purpose of this scheme is to no longer depend on descriptor chains,
> > > > > the buffer submitted to the receiveq can be thought of as containing only one descriptor.
> > > > > So this feature depends on the mergeable buffer.
> > > >
> > > > To tell the truth, I'm not sure this is a good choice. We never had a
> > > > feature that depends solely on the mergeable rx buffer before.
> > > > Especially considering that using a descriptor chain is not hard. And
> > > > I'm not sure we should care too much on the overhead since the
> > > > splitting is enabled by the administrator when it needs e.g zerocopy.
> > >
> > >
> > > It's overwhelmed us, and we haven't been able to agree on this.
> >
> > Sorry for this but let's make an agreement before posting a new version.
>
> Right, let's do that.
> Jason I think the issue with previous proposals is that they conflict
> with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
> driver flexibility in arranging the packet in memory is benefitial.
>
>
>
> > >
> > > Michael doesn't want to use desc chain, it's not just a performance issue. In an
> > > early email, he mentioned that desc chain may be abandoned in the future. So we
> > > have been trying not to rely on desc chain.
> >
> > This seems to be a very large change which seems a little bit too
> > early to be considered.
>
> I'd like to put it in other terms. Fundamentally devices are not
> supposed to talk about descriptors at all. Descriptors are
> a way to describe buffers, and devices should all work in terms
> of buffers. I am working on cleaning up the spec from confusion
> and terminology mixups. We have several major sources all over the spec:
> - descriptor/buffer used inconsistently
> - feature negotiated/offered used inconsistently
> - field exists/valid used inconsistently
>
> My way to address the first issue is to make sure devices all work
> with buffers. And buffers are described by descriptors (makes sense,
> right?) and made available to device by driver and used by device.

Can we try to keep desc info? I think it's a very important feature for
virtio-net, and many NICs are designed based on this.

Our current solutions B and C will waste a lot of memory. If the page occupied
by the header can be quickly reclaimed, then we have to do a copy in the driver.

>
> The advantage of this is layering - we can change the way buffers
> are passed around without changing devices. And, it matches
> the virtio API nicely.
>
> Existing devices are all fine with this - they do not pass any
> information in the descriptor. Yes, this seems like an option to
> pass some information around, but I am not convinced it is worth
> the layering violation.
>
> By comparison, ability to write data at an offset seems generally
> useful, in particular we have a very old issue even without
> the split header feature where with mergeable buffers
> if we attempt to align the data in the 1st buffer at a cache line
> boundary by adding an offset before ETH header, then when it spills
> over to the second buffer it will be misaligned there. Wastes
> an extra cache line for such packets. Offsets can allow fixing this.
>

Scheme B In addition to the memory problem, under this scheme, if we want to
implement tcp zerocopy, then we must apply for two consecutive pages for a
buffer. The second page is used to place the payload. The first is used to place
the header.


>
> I don't see architecturally what is wrong with making feature just
> depend on mergeable buffers for now. We can always allow a combination
> down the road. Let's just make it clear that if drivers see SPLIT &&
> !MERGEABLE they should not fail probe, they should instead clear the
> split header feature.
>

According to my understanding, for B, it does not depend on mergeable.

If we give up desc info completely, then we prefer B.

Hi Jason, what are your thoughts?

Thanks.

>
>
>
> > >
> > > If we can't make a feature depend only on mergeable, should we use solution B?
> > >
> > >      2. Scheme B ( refer to your suggestion )
> > >
> > >      Our rethinking approach is no longer based on descriptor chain.
> > >
> > >      We refer to your proposed offset-based scheme as scheme B.
> >
> > The offset seems to be the suggestion of Michael.
> >
> > I think I like the design of v7 for several reasons:
> >
> > 1) easy to reserve head/tailroom without any extension of the spec
> > 2) easy to work with mergeable rx buffer
> > 3) it is the model used by modern NIC like e810 [1]
> >
> > [1] e810 manual 2.4 Figure 10-4 have a nic diagram to demonstrate how
> > it works which is similar to v7
> >
> > Thanks
> >
> > >      As you suggested, scheme B gives the device a buffer, using offset to
> > >      indicate where to place the payload. Like this:
> > >
> > >      <header>...<padding>... <beginning of page><data>
> > >
> > >      But how to apply for this buffer?
> > >      Since we want the payload to be placed on a separate page, the method
> > >      we consider is to directly alloc two pages from driver of contiguous memory.
> > >
> > >      Then the beginning of this contiguous memory is used to store the headroom,
> > >      and the contiguous memory after the headroom is directly handed over to the device.
> > >      Similar to the following:
> > >
> > >      [------------------ receive buffer(2 pages) ------------------------------]
> > >      [<------------first page -------------------><------ second page -------->]
> > >      [<-----><virtnet hdr> <mac,ip,tcp>..<padding><       payload             >]
> > >         ^    ^
> > >         |    |
> > >         |    pointer to device
> > >         |
> > >         |
> > >         Driver reserved, the later part is filled
> > >
> > > We have already entered v8, but we have not been able to reach an agreement on
> > > the basic capabilities. I want to solve this problem first.
> > >
> > > @Jason @Michael
> > >
> > > Thanks.
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > >
> > > > > > >+    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
> > > > > >
> > > > > >
> > > > > > I don't get the reason why we need max_len. Can't it implied in the
> > > > > > length of the first descriptor?
> > > > > >
> > > > >
> > > > > Split transport header is usually used in high-throughput scenarios, such as GSO-enabled cases.
> > > > > Therefore, it is best to reserve tailroom with $ (the length of the buffer) - (\field{offset} + \filed{max_len}) $
> > > > > in the first buffer to build the non-linear data area of the socket buffer.
> > > > >
> > > > > >
> > > > > > >+\end{itemize}
> > > > > > >+
> > > > > > >+If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
> > > > > > >+bit in \field{flags} MUST be set. The transport header MUST be on the first
> > > > > > >+buffer, following the virtio net header. The payload MUST start from the
> > > > > > >+second buffer. The device MUST set \field{hdr_len} of structure
> > > > > > >+virtio_net_hdr to the length of the transport header.
> > > > > > >+The used length still reports the number of bytes it has written to memory.
> > > > > > >+
> > > > > > >+\field{offset} and \field{max_len} are valid when device uses the first buffer.
> > > > > > >+The device MUST reserve space in the first buffer using \filed{offset}.
> > > > > > >+If \field{offset} exceeds the length of the buffer, the device MUST drop
> > > > > > >+the receive packets.
> > > > > >
> > > > > >
> > > > > > Can the device simply don't split the packet in this case? Anyhow we
> > > > > > need synchronize the driver with the device in the case (e.g when
> > > > > > driver is try to having a new max_len).
> > > > > >
> > > > >
> > > > > We think that \field{offset} is actively set by the driver, so the driver
> > > > > will also receive packets according to this offset.
> > > > > But if the case is considered to be caused by driver error settings,
> > > > > the device can do not split the packet.
> > > >
> > > > Note that protocol like ipv6 allows variable length of the header,
> > > > falling back to not split the header seems better to me.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > > (I wonder if the offset deserves a independent feature (but depends
> > > > > > on the merge able) in this case).
> > > > > >
> > > > >
> > > > > Okay, we can consider later.
> > > > >
> > > > > >
> > > > > > >  The maximum available length of the first buffer
> > > > > > >+used by the device is specified by \field{max_len}.
> > > > > >
> > > > > >
> > > > > > Similarly the max length seems to be implied by length - offset?
> > > > > >
> > > > >
> > > > > You can refer to the above answer about \field{max_len} similarly.
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > >  If \field{max_len} is 0 or
> > > > > > >+$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
> > > > > > >+the device can use the entire buffer starting at \field{offset}.
> > > > > > >+
> > > > > > >+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > > >+
> > > > > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> > > > > > >+SHOULD treat the contents of \field{hdr_len} as the length of the transport header
> > > > > > >+inside the first buffer.
> > > > > > >+
> > > > > > >+If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
> > > > > > >  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-23  4:04         ` Jason Wang
@ 2022-09-23  5:59           ` Michael S. Tsirkin
  2022-09-23  6:57             ` Xuan Zhuo
  2022-09-26  8:06             ` Jason Wang
  0 siblings, 2 replies; 31+ messages in thread
From: Michael S. Tsirkin @ 2022-09-23  5:59 UTC (permalink / raw)
  To: Jason Wang; +Cc: Xuan Zhuo, Virtio-Dev, Kangjie Xu, Heng Qi

On Fri, Sep 23, 2022 at 12:04:28PM +0800, Jason Wang wrote:
> On Fri, Sep 23, 2022 at 11:33 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 21 Sep 2022 14:20:19 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Sep 20, 2022 at 11:28 AM Heng Qi <hengqi@linux.alibaba.com> wrote:
> > > >
> > > > On Tue, Sep 20, 2022 at 09:59:22AM +0800, Jason Wang wrote:
> > > > >
> > > > > 在 2022/9/16 10:56, hengqi 写道:
> > > > > >From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > >
> > > > > >The purpose of this feature is to split the transport header and the payload
> > > > > >of the packet.
> > > > > >
> > > > > >|                     receive buffer1(page)            | receive buffer2(page) |
> > > > > >|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|        payload        |
> > > > > >              |<------------------------------->|
> > > > > >                               ^
> > > > > >                               |
> > > > > >                            max_len
> > > > > >
> > > > > >We can use one page for every receive buffer. In this way, we can ensure that
> > > > > >all payloads can be independently in a page, which is very beneficial for
> > > > > >the zerocopy implemented by the upper layer.
> > > > > >
> > > > > >Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > >Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > > > > >Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> > > > > >---
> > > > > >v8:
> > > > > >     1. Do not depend on descriptor chain. @Michael S. Tsirkin
> > > > > >     2. Add \field{offset} and \field{max_len}.
> > > > > >     3. Fix some presentation issues. @Jason Wang
> > > > > >     4. Clarify some paragraphs.
> > > > > >
> > > > > >v7:
> > > > > >     1. Fix some presentation issues.
> > > > > >     2. Use "split transport header". @Jason Wang
> > > > > >     3. Clarify some paragraphs. @Cornelia Huck
> > > > > >     4. determine the device what to do if it does not perform header split on a packet.
> > > > > >
> > > > > >v6:
> > > > > >     1. Fix some syntax issues. @Cornelia Huck
> > > > > >     2. Clarify some paragraphs. @Cornelia Huck
> > > > > >     3. Determine the device what to do if it does not perform header split on a packet.
> > > > > >
> > > > > >v5:
> > > > > >     1. Determine when hdr_len is credible in the process of rx
> > > > > >     2. Clean up the use of buffers and descriptors
> > > > > >     3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
> > > > > >
> > > > > >v4:
> > > > > >     1. fix typo @Cornelia Huck @Jason Wang
> > > > > >     2. do not split header for IP fragmentation packet. @Jason Wang
> > > > > >
> > > > > >  conformance.tex |  2 ++
> > > > > >  content.tex     | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  2 files changed, 93 insertions(+)
> > > > > >
> > > > > >diff --git a/conformance.tex b/conformance.tex
> > > > > >index 2b86fc6..4e2b82e 100644
> > > > > >--- a/conformance.tex
> > > > > >+++ b/conformance.tex
> > > > > >@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > > > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> > > > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > > >+\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > >  \end{itemize}
> > > > > >  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> > > > > >@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > > > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> > > > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> > > > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > > >+\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > >  \end{itemize}
> > > > > >  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> > > > > >diff --git a/content.tex b/content.tex
> > > > > >index e863709..fad9dea 100644
> > > > > >--- a/content.tex
> > > > > >+++ b/content.tex
> > > > > >@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> > > > > >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > > > > >      channel.
> > > > > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> > > > > >+    the transport header and the payload.
> > > > > >+
> > > > > >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> > > > > >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > > > > >@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> > > > > >  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> > > > > >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> > > > > >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > > > > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_MRG_RXBUF.
> > > > > >  \end{description}
> > > > > >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > > > > >@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> > > > > >  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
> > > > > >  #define VIRTIO_NET_HDR_F_DATA_VALID    2
> > > > > >  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> > > > > >+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
> > > > > >          u8 flags;
> > > > > >  #define VIRTIO_NET_HDR_GSO_NONE        0
> > > > > >  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> > > > > >@@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > > > >  been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
> > > > > >  transport header size.
> > > > > >  The driver MUST NOT rely on \field{hdr_len} to be correct.
> > > > > >+
> > > > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> > > > > >+the driver SHOULD treat the \field{hdr_len} as the length of the transport
> > > > > >+header inside the first buffer.
> > > > > >+
> > > > > >  \begin{note}
> > > > > >  This is due to various bugs in implementations.
> > > > > >  \end{note}
> > > > > >@@ -4483,6 +4493,87 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > > > > >  according to the native endian of the guest rather than
> > > > > >  (necessarily when not using the legacy interface) little-endian.
> > > > > >+\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > >+
> > > > > >+If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> > > > > >+the device supports splitting the transport header and the payload.
> > > > > >+The transport header and the payload will be separated into different
> > > > > >+buffers.
> > > > > >+
> > > > > >+\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> > > > > >+
> > > > > >+To configure the split transport header, the following layout structure
> > > > > >+and definitions are used:
> > > > > >+
> > > > > >+\begin{lstlisting}
> > > > > >+struct virtio_net_split_transport_header_config {
> > > > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> > > > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> > > > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> > > > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> > > > > >+    le64 type;
> > > > > >+    le16 offset;
> > > > > >+    le16 max_len;
> > > > > >+};
> > > > > >+
> > > > > >+#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> > > > > >+ #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> > > > > >+\end{lstlisting}
> > > > > >+
> > > > > >+The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> > > > > >+VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split
> > > > > >+header configuration.
> > > > > >+
> > > > > >+The driver can enable or disable split transport header for different transport
> > > > > >+protocols by setting or clearing corresponding bits in \field{type}.
> > > > > >+
> > > > > >+\begin{itemize}
> > > > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> > > > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> > > > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> > > > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> > > > > >+\end{itemize}
> > > > > >+
> > > > > >+\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > >+
> > > > > >+A device MUST disable transport header splitting upon reset and initialization.
> > > > > >+
> > > > > >+If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> > > > > >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> > > > > >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> > > > > >+
> > > > > >+A device MUST NOT split the transport header if it encounters any of the following cases:
> > > > > >+\begin{itemize}
> > > > > >+    \item The device does not recognize the transport protocol of the packet.
> > > > > >+    \item The packet is an IP fragmentation.
> > > > > >+    \item The splitting of the specific transport protocol is not enabled via
> > > > > >+        VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET.
> > > > > >+    \item At most one buffer is available.
> > > > >
> > > > >
> > > > > So this means the feature is disabled for the device without
> > > > > merge-able buffer? Note that, even in the case of mergeable buffer,
> > > > > it doesn't mean a buffer that only contains a single descriptor.
> > > > >
> > > > >
> > > >
> > > > Yes, since the purpose of this scheme is to no longer depend on descriptor chains,
> > > > the buffer submitted to the receiveq can be thought of as containing only one descriptor.
> > > > So this feature depends on the mergeable buffer.
> > >
> > > To tell the truth, I'm not sure this is a good choice. We never had a
> > > feature that depends solely on the mergeable rx buffer before.
> > > Especially considering that using a descriptor chain is not hard. And
> > > I'm not sure we should care too much on the overhead since the
> > > splitting is enabled by the administrator when it needs e.g zerocopy.
> >
> >
> > It's overwhelmed us, and we haven't been able to agree on this.
> 
> Sorry for this but let's make an agreement before posting a new version.

Right, let's do that.
Jason I think the issue with previous proposals is that they conflict
with VIRTIO_F_ANY_LAYOUT. We have repeatedly found that giving the
driver flexibility in arranging the packet in memory is benefitial.



> >
> > Michael doesn't want to use desc chain, it's not just a performance issue. In an
> > early email, he mentioned that desc chain may be abandoned in the future. So we
> > have been trying not to rely on desc chain.
> 
> This seems to be a very large change which seems a little bit too
> early to be considered.

I'd like to put it in other terms. Fundamentally devices are not
supposed to talk about descriptors at all. Descriptors are
a way to describe buffers, and devices should all work in terms
of buffers. I am working on cleaning up the spec from confusion
and terminology mixups. We have several major sources all over the spec:
- descriptor/buffer used inconsistently
- feature negotiated/offered used inconsistently
- field exists/valid used inconsistently

My way to address the first issue is to make sure devices all work
with buffers. And buffers are described by descriptors (makes sense,
right?) and made available to device by driver and used by device.

The advantage of this is layering - we can change the way buffers
are passed around without changing devices. And, it matches
the virtio API nicely.

Existing devices are all fine with this - they do not pass any
information in the descriptor. Yes, this seems like an option to
pass some information around, but I am not convinced it is worth
the layering violation.

By comparison, ability to write data at an offset seems generally
useful, in particular we have a very old issue even without
the split header feature where with mergeable buffers
if we attempt to align the data in the 1st buffer at a cache line
boundary by adding an offset before ETH header, then when it spills
over to the second buffer it will be misaligned there. Wastes
an extra cache line for such packets. Offsets can allow fixing this.


I don't see architecturally what is wrong with making feature just
depend on mergeable buffers for now. We can always allow a combination
down the road. Let's just make it clear that if drivers see SPLIT &&
!MERGEABLE they should not fail probe, they should instead clear the
split header feature.




> >
> > If we can't make a feature depend only on mergeable, should we use solution B?
> >
> >      2. Scheme B ( refer to your suggestion )
> >
> >      Our rethinking approach is no longer based on descriptor chain.
> >
> >      We refer to your proposed offset-based scheme as scheme B.
> 
> The offset seems to be the suggestion of Michael.
> 
> I think I like the design of v7 for several reasons:
> 
> 1) easy to reserve head/tailroom without any extension of the spec
> 2) easy to work with mergeable rx buffer
> 3) it is the model used by modern NIC like e810 [1]
> 
> [1] e810 manual 2.4 Figure 10-4 have a nic diagram to demonstrate how
> it works which is similar to v7
> 
> Thanks
> 
> >      As you suggested, scheme B gives the device a buffer, using offset to
> >      indicate where to place the payload. Like this:
> >
> >      <header>...<padding>... <beginning of page><data>
> >
> >      But how to apply for this buffer?
> >      Since we want the payload to be placed on a separate page, the method
> >      we consider is to directly alloc two pages from driver of contiguous memory.
> >
> >      Then the beginning of this contiguous memory is used to store the headroom,
> >      and the contiguous memory after the headroom is directly handed over to the device.
> >      Similar to the following:
> >
> >      [------------------ receive buffer(2 pages) ------------------------------]
> >      [<------------first page -------------------><------ second page -------->]
> >      [<-----><virtnet hdr> <mac,ip,tcp>..<padding><       payload             >]
> >         ^    ^
> >         |    |
> >         |    pointer to device
> >         |
> >         |
> >         Driver reserved, the later part is filled
> >
> > We have already entered v8, but we have not been able to reach an agreement on
> > the basic capabilities. I want to solve this problem first.
> >
> > @Jason @Michael
> >
> > Thanks.
> >
> >
> >
> >
> >
> > >
> > > >
> > > > > >+    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
> > > > >
> > > > >
> > > > > I don't get the reason why we need max_len. Can't it implied in the
> > > > > length of the first descriptor?
> > > > >
> > > >
> > > > Split transport header is usually used in high-throughput scenarios, such as GSO-enabled cases.
> > > > Therefore, it is best to reserve tailroom with $ (the length of the buffer) - (\field{offset} + \filed{max_len}) $
> > > > in the first buffer to build the non-linear data area of the socket buffer.
> > > >
> > > > >
> > > > > >+\end{itemize}
> > > > > >+
> > > > > >+If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
> > > > > >+bit in \field{flags} MUST be set. The transport header MUST be on the first
> > > > > >+buffer, following the virtio net header. The payload MUST start from the
> > > > > >+second buffer. The device MUST set \field{hdr_len} of structure
> > > > > >+virtio_net_hdr to the length of the transport header.
> > > > > >+The used length still reports the number of bytes it has written to memory.
> > > > > >+
> > > > > >+\field{offset} and \field{max_len} are valid when device uses the first buffer.
> > > > > >+The device MUST reserve space in the first buffer using \filed{offset}.
> > > > > >+If \field{offset} exceeds the length of the buffer, the device MUST drop
> > > > > >+the receive packets.
> > > > >
> > > > >
> > > > > Can the device simply don't split the packet in this case? Anyhow we
> > > > > need synchronize the driver with the device in the case (e.g when
> > > > > driver is try to having a new max_len).
> > > > >
> > > >
> > > > We think that \field{offset} is actively set by the driver, so the driver
> > > > will also receive packets according to this offset.
> > > > But if the case is considered to be caused by driver error settings,
> > > > the device can do not split the packet.
> > >
> > > Note that protocol like ipv6 allows variable length of the header,
> > > falling back to not split the header seems better to me.
> > >
> > > Thanks
> > >
> > > >
> > > > > (I wonder if the offset deserves a independent feature (but depends
> > > > > on the merge able) in this case).
> > > > >
> > > >
> > > > Okay, we can consider later.
> > > >
> > > > >
> > > > > >  The maximum available length of the first buffer
> > > > > >+used by the device is specified by \field{max_len}.
> > > > >
> > > > >
> > > > > Similarly the max length seems to be implied by length - offset?
> > > > >
> > > >
> > > > You can refer to the above answer about \field{max_len} similarly.
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > >  If \field{max_len} is 0 or
> > > > > >+$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
> > > > > >+the device can use the entire buffer starting at \field{offset}.
> > > > > >+
> > > > > >+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > > >+
> > > > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> > > > > >+SHOULD treat the contents of \field{hdr_len} as the length of the transport header
> > > > > >+inside the first buffer.
> > > > > >+
> > > > > >+If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
> > > > > >  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-23  3:23       ` Xuan Zhuo
@ 2022-09-23  4:04         ` Jason Wang
  2022-09-23  5:59           ` Michael S. Tsirkin
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2022-09-23  4:04 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Virtio-Dev, Michael S. Tsirkin, Kangjie Xu, Heng Qi

On Fri, Sep 23, 2022 at 11:33 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 21 Sep 2022 14:20:19 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Sep 20, 2022 at 11:28 AM Heng Qi <hengqi@linux.alibaba.com> wrote:
> > >
> > > On Tue, Sep 20, 2022 at 09:59:22AM +0800, Jason Wang wrote:
> > > >
> > > > 在 2022/9/16 10:56, hengqi 写道:
> > > > >From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > >
> > > > >The purpose of this feature is to split the transport header and the payload
> > > > >of the packet.
> > > > >
> > > > >|                     receive buffer1(page)            | receive buffer2(page) |
> > > > >|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|        payload        |
> > > > >              |<------------------------------->|
> > > > >                               ^
> > > > >                               |
> > > > >                            max_len
> > > > >
> > > > >We can use one page for every receive buffer. In this way, we can ensure that
> > > > >all payloads can be independently in a page, which is very beneficial for
> > > > >the zerocopy implemented by the upper layer.
> > > > >
> > > > >Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > >Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > > > >Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> > > > >---
> > > > >v8:
> > > > >     1. Do not depend on descriptor chain. @Michael S. Tsirkin
> > > > >     2. Add \field{offset} and \field{max_len}.
> > > > >     3. Fix some presentation issues. @Jason Wang
> > > > >     4. Clarify some paragraphs.
> > > > >
> > > > >v7:
> > > > >     1. Fix some presentation issues.
> > > > >     2. Use "split transport header". @Jason Wang
> > > > >     3. Clarify some paragraphs. @Cornelia Huck
> > > > >     4. determine the device what to do if it does not perform header split on a packet.
> > > > >
> > > > >v6:
> > > > >     1. Fix some syntax issues. @Cornelia Huck
> > > > >     2. Clarify some paragraphs. @Cornelia Huck
> > > > >     3. Determine the device what to do if it does not perform header split on a packet.
> > > > >
> > > > >v5:
> > > > >     1. Determine when hdr_len is credible in the process of rx
> > > > >     2. Clean up the use of buffers and descriptors
> > > > >     3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
> > > > >
> > > > >v4:
> > > > >     1. fix typo @Cornelia Huck @Jason Wang
> > > > >     2. do not split header for IP fragmentation packet. @Jason Wang
> > > > >
> > > > >  conformance.tex |  2 ++
> > > > >  content.tex     | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  2 files changed, 93 insertions(+)
> > > > >
> > > > >diff --git a/conformance.tex b/conformance.tex
> > > > >index 2b86fc6..4e2b82e 100644
> > > > >--- a/conformance.tex
> > > > >+++ b/conformance.tex
> > > > >@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> > > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > >+\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > >  \end{itemize}
> > > > >  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> > > > >@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> > > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> > > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > >+\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > >  \end{itemize}
> > > > >  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> > > > >diff --git a/content.tex b/content.tex
> > > > >index e863709..fad9dea 100644
> > > > >--- a/content.tex
> > > > >+++ b/content.tex
> > > > >@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> > > > >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > > > >      channel.
> > > > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> > > > >+    the transport header and the payload.
> > > > >+
> > > > >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> > > > >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > > > >@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> > > > >  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> > > > >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> > > > >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > > > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_MRG_RXBUF.
> > > > >  \end{description}
> > > > >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > > > >@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> > > > >  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
> > > > >  #define VIRTIO_NET_HDR_F_DATA_VALID    2
> > > > >  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> > > > >+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
> > > > >          u8 flags;
> > > > >  #define VIRTIO_NET_HDR_GSO_NONE        0
> > > > >  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> > > > >@@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > > >  been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
> > > > >  transport header size.
> > > > >  The driver MUST NOT rely on \field{hdr_len} to be correct.
> > > > >+
> > > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> > > > >+the driver SHOULD treat the \field{hdr_len} as the length of the transport
> > > > >+header inside the first buffer.
> > > > >+
> > > > >  \begin{note}
> > > > >  This is due to various bugs in implementations.
> > > > >  \end{note}
> > > > >@@ -4483,6 +4493,87 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > > > >  according to the native endian of the guest rather than
> > > > >  (necessarily when not using the legacy interface) little-endian.
> > > > >+\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > >+
> > > > >+If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> > > > >+the device supports splitting the transport header and the payload.
> > > > >+The transport header and the payload will be separated into different
> > > > >+buffers.
> > > > >+
> > > > >+\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> > > > >+
> > > > >+To configure the split transport header, the following layout structure
> > > > >+and definitions are used:
> > > > >+
> > > > >+\begin{lstlisting}
> > > > >+struct virtio_net_split_transport_header_config {
> > > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> > > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> > > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> > > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> > > > >+    le64 type;
> > > > >+    le16 offset;
> > > > >+    le16 max_len;
> > > > >+};
> > > > >+
> > > > >+#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> > > > >+ #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> > > > >+\end{lstlisting}
> > > > >+
> > > > >+The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> > > > >+VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split
> > > > >+header configuration.
> > > > >+
> > > > >+The driver can enable or disable split transport header for different transport
> > > > >+protocols by setting or clearing corresponding bits in \field{type}.
> > > > >+
> > > > >+\begin{itemize}
> > > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> > > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> > > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> > > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> > > > >+\end{itemize}
> > > > >+
> > > > >+\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > >+
> > > > >+A device MUST disable transport header splitting upon reset and initialization.
> > > > >+
> > > > >+If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> > > > >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> > > > >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> > > > >+
> > > > >+A device MUST NOT split the transport header if it encounters any of the following cases:
> > > > >+\begin{itemize}
> > > > >+    \item The device does not recognize the transport protocol of the packet.
> > > > >+    \item The packet is an IP fragmentation.
> > > > >+    \item The splitting of the specific transport protocol is not enabled via
> > > > >+        VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET.
> > > > >+    \item At most one buffer is available.
> > > >
> > > >
> > > > So this means the feature is disabled for the device without
> > > > merge-able buffer? Note that, even in the case of mergeable buffer,
> > > > it doesn't mean a buffer that only contains a single descriptor.
> > > >
> > > >
> > >
> > > Yes, since the purpose of this scheme is to no longer depend on descriptor chains,
> > > the buffer submitted to the receiveq can be thought of as containing only one descriptor.
> > > So this feature depends on the mergeable buffer.
> >
> > To tell the truth, I'm not sure this is a good choice. We never had a
> > feature that depends solely on the mergeable rx buffer before.
> > Especially considering that using a descriptor chain is not hard. And
> > I'm not sure we should care too much on the overhead since the
> > splitting is enabled by the administrator when it needs e.g zerocopy.
>
>
> It's overwhelmed us, and we haven't been able to agree on this.

Sorry for this but let's make an agreement before posting a new version.

>
> Michael doesn't want to use desc chain, it's not just a performance issue. In an
> early email, he mentioned that desc chain may be abandoned in the future. So we
> have been trying not to rely on desc chain.

This seems to be a very large change which seems a little bit too
early to be considered.

>
> If we can't make a feature depend only on mergeable, should we use solution B?
>
>      2. Scheme B ( refer to your suggestion )
>
>      Our rethinking approach is no longer based on descriptor chain.
>
>      We refer to your proposed offset-based scheme as scheme B.

The offset seems to be the suggestion of Michael.

I think I like the design of v7 for several reasons:

1) easy to reserve head/tailroom without any extension of the spec
2) easy to work with mergeable rx buffer
3) it is the model used by modern NIC like e810 [1]

[1] e810 manual 2.4 Figure 10-4 have a nic diagram to demonstrate how
it works which is similar to v7

Thanks

>      As you suggested, scheme B gives the device a buffer, using offset to
>      indicate where to place the payload. Like this:
>
>      <header>...<padding>... <beginning of page><data>
>
>      But how to apply for this buffer?
>      Since we want the payload to be placed on a separate page, the method
>      we consider is to directly alloc two pages from driver of contiguous memory.
>
>      Then the beginning of this contiguous memory is used to store the headroom,
>      and the contiguous memory after the headroom is directly handed over to the device.
>      Similar to the following:
>
>      [------------------ receive buffer(2 pages) ------------------------------]
>      [<------------first page -------------------><------ second page -------->]
>      [<-----><virtnet hdr> <mac,ip,tcp>..<padding><       payload             >]
>         ^    ^
>         |    |
>         |    pointer to device
>         |
>         |
>         Driver reserved, the later part is filled
>
> We have already entered v8, but we have not been able to reach an agreement on
> the basic capabilities. I want to solve this problem first.
>
> @Jason @Michael
>
> Thanks.
>
>
>
>
>
> >
> > >
> > > > >+    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
> > > >
> > > >
> > > > I don't get the reason why we need max_len. Can't it implied in the
> > > > length of the first descriptor?
> > > >
> > >
> > > Split transport header is usually used in high-throughput scenarios, such as GSO-enabled cases.
> > > Therefore, it is best to reserve tailroom with $ (the length of the buffer) - (\field{offset} + \filed{max_len}) $
> > > in the first buffer to build the non-linear data area of the socket buffer.
> > >
> > > >
> > > > >+\end{itemize}
> > > > >+
> > > > >+If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
> > > > >+bit in \field{flags} MUST be set. The transport header MUST be on the first
> > > > >+buffer, following the virtio net header. The payload MUST start from the
> > > > >+second buffer. The device MUST set \field{hdr_len} of structure
> > > > >+virtio_net_hdr to the length of the transport header.
> > > > >+The used length still reports the number of bytes it has written to memory.
> > > > >+
> > > > >+\field{offset} and \field{max_len} are valid when device uses the first buffer.
> > > > >+The device MUST reserve space in the first buffer using \filed{offset}.
> > > > >+If \field{offset} exceeds the length of the buffer, the device MUST drop
> > > > >+the receive packets.
> > > >
> > > >
> > > > Can the device simply don't split the packet in this case? Anyhow we
> > > > need synchronize the driver with the device in the case (e.g when
> > > > driver is try to having a new max_len).
> > > >
> > >
> > > We think that \field{offset} is actively set by the driver, so the driver
> > > will also receive packets according to this offset.
> > > But if the case is considered to be caused by driver error settings,
> > > the device can do not split the packet.
> >
> > Note that protocol like ipv6 allows variable length of the header,
> > falling back to not split the header seems better to me.
> >
> > Thanks
> >
> > >
> > > > (I wonder if the offset deserves a independent feature (but depends
> > > > on the merge able) in this case).
> > > >
> > >
> > > Okay, we can consider later.
> > >
> > > >
> > > > >  The maximum available length of the first buffer
> > > > >+used by the device is specified by \field{max_len}.
> > > >
> > > >
> > > > Similarly the max length seems to be implied by length - offset?
> > > >
> > >
> > > You can refer to the above answer about \field{max_len} similarly.
> > >
> > > Thanks.
> > >
> > >
> > > > Thanks
> > > >
> > > >
> > > > >  If \field{max_len} is 0 or
> > > > >+$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
> > > > >+the device can use the entire buffer starting at \field{offset}.
> > > > >+
> > > > >+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > > >+
> > > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> > > > >+SHOULD treat the contents of \field{hdr_len} as the length of the transport header
> > > > >+inside the first buffer.
> > > > >+
> > > > >+If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
> > > > >  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-21  6:20     ` Jason Wang
  2022-09-21  6:23       ` Jason Wang
@ 2022-09-23  3:23       ` Xuan Zhuo
  2022-09-23  4:04         ` Jason Wang
  1 sibling, 1 reply; 31+ messages in thread
From: Xuan Zhuo @ 2022-09-23  3:23 UTC (permalink / raw)
  To: Jason Wang; +Cc: Virtio-Dev, Michael S. Tsirkin, Kangjie Xu, Heng Qi

On Wed, 21 Sep 2022 14:20:19 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Sep 20, 2022 at 11:28 AM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >
> > On Tue, Sep 20, 2022 at 09:59:22AM +0800, Jason Wang wrote:
> > >
> > > 在 2022/9/16 10:56, hengqi 写道:
> > > >From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >
> > > >The purpose of this feature is to split the transport header and the payload
> > > >of the packet.
> > > >
> > > >|                     receive buffer1(page)            | receive buffer2(page) |
> > > >|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|        payload        |
> > > >              |<------------------------------->|
> > > >                               ^
> > > >                               |
> > > >                            max_len
> > > >
> > > >We can use one page for every receive buffer. In this way, we can ensure that
> > > >all payloads can be independently in a page, which is very beneficial for
> > > >the zerocopy implemented by the upper layer.
> > > >
> > > >Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > > >Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> > > >---
> > > >v8:
> > > >     1. Do not depend on descriptor chain. @Michael S. Tsirkin
> > > >     2. Add \field{offset} and \field{max_len}.
> > > >     3. Fix some presentation issues. @Jason Wang
> > > >     4. Clarify some paragraphs.
> > > >
> > > >v7:
> > > >     1. Fix some presentation issues.
> > > >     2. Use "split transport header". @Jason Wang
> > > >     3. Clarify some paragraphs. @Cornelia Huck
> > > >     4. determine the device what to do if it does not perform header split on a packet.
> > > >
> > > >v6:
> > > >     1. Fix some syntax issues. @Cornelia Huck
> > > >     2. Clarify some paragraphs. @Cornelia Huck
> > > >     3. Determine the device what to do if it does not perform header split on a packet.
> > > >
> > > >v5:
> > > >     1. Determine when hdr_len is credible in the process of rx
> > > >     2. Clean up the use of buffers and descriptors
> > > >     3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
> > > >
> > > >v4:
> > > >     1. fix typo @Cornelia Huck @Jason Wang
> > > >     2. do not split header for IP fragmentation packet. @Jason Wang
> > > >
> > > >  conformance.tex |  2 ++
> > > >  content.tex     | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 93 insertions(+)
> > > >
> > > >diff --git a/conformance.tex b/conformance.tex
> > > >index 2b86fc6..4e2b82e 100644
> > > >--- a/conformance.tex
> > > >+++ b/conformance.tex
> > > >@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > >+\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > >  \end{itemize}
> > > >  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> > > >@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > >+\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > >  \end{itemize}
> > > >  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> > > >diff --git a/content.tex b/content.tex
> > > >index e863709..fad9dea 100644
> > > >--- a/content.tex
> > > >+++ b/content.tex
> > > >@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> > > >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > > >      channel.
> > > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> > > >+    the transport header and the payload.
> > > >+
> > > >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> > > >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > > >@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> > > >  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> > > >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> > > >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_MRG_RXBUF.
> > > >  \end{description}
> > > >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > > >@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> > > >  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
> > > >  #define VIRTIO_NET_HDR_F_DATA_VALID    2
> > > >  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> > > >+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
> > > >          u8 flags;
> > > >  #define VIRTIO_NET_HDR_GSO_NONE        0
> > > >  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> > > >@@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > >  been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
> > > >  transport header size.
> > > >  The driver MUST NOT rely on \field{hdr_len} to be correct.
> > > >+
> > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> > > >+the driver SHOULD treat the \field{hdr_len} as the length of the transport
> > > >+header inside the first buffer.
> > > >+
> > > >  \begin{note}
> > > >  This is due to various bugs in implementations.
> > > >  \end{note}
> > > >@@ -4483,6 +4493,87 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > > >  according to the native endian of the guest rather than
> > > >  (necessarily when not using the legacy interface) little-endian.
> > > >+\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > >+
> > > >+If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> > > >+the device supports splitting the transport header and the payload.
> > > >+The transport header and the payload will be separated into different
> > > >+buffers.
> > > >+
> > > >+\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> > > >+
> > > >+To configure the split transport header, the following layout structure
> > > >+and definitions are used:
> > > >+
> > > >+\begin{lstlisting}
> > > >+struct virtio_net_split_transport_header_config {
> > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> > > >+    le64 type;
> > > >+    le16 offset;
> > > >+    le16 max_len;
> > > >+};
> > > >+
> > > >+#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> > > >+ #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> > > >+\end{lstlisting}
> > > >+
> > > >+The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> > > >+VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split
> > > >+header configuration.
> > > >+
> > > >+The driver can enable or disable split transport header for different transport
> > > >+protocols by setting or clearing corresponding bits in \field{type}.
> > > >+
> > > >+\begin{itemize}
> > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> > > >+\end{itemize}
> > > >+
> > > >+\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > >+
> > > >+A device MUST disable transport header splitting upon reset and initialization.
> > > >+
> > > >+If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> > > >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> > > >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> > > >+
> > > >+A device MUST NOT split the transport header if it encounters any of the following cases:
> > > >+\begin{itemize}
> > > >+    \item The device does not recognize the transport protocol of the packet.
> > > >+    \item The packet is an IP fragmentation.
> > > >+    \item The splitting of the specific transport protocol is not enabled via
> > > >+        VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET.
> > > >+    \item At most one buffer is available.
> > >
> > >
> > > So this means the feature is disabled for the device without
> > > merge-able buffer? Note that, even in the case of mergeable buffer,
> > > it doesn't mean a buffer that only contains a single descriptor.
> > >
> > >
> >
> > Yes, since the purpose of this scheme is to no longer depend on descriptor chains,
> > the buffer submitted to the receiveq can be thought of as containing only one descriptor.
> > So this feature depends on the mergeable buffer.
>
> To tell the truth, I'm not sure this is a good choice. We never had a
> feature that depends solely on the mergeable rx buffer before.
> Especially considering that using a descriptor chain is not hard. And
> I'm not sure we should care too much on the overhead since the
> splitting is enabled by the administrator when it needs e.g zerocopy.


It's overwhelmed us, and we haven't been able to agree on this.

Michael doesn't want to use desc chain, it's not just a performance issue. In an
early email, he mentioned that desc chain may be abandoned in the future. So we
have been trying not to rely on desc chain.

If we can't make a feature depend only on mergeable, should we use solution B?

     2. Scheme B ( refer to your suggestion )

     Our rethinking approach is no longer based on descriptor chain.

     We refer to your proposed offset-based scheme as scheme B.
     As you suggested, scheme B gives the device a buffer, using offset to
     indicate where to place the payload. Like this:

     <header>...<padding>... <beginning of page><data>

     But how to apply for this buffer?
     Since we want the payload to be placed on a separate page, the method
     we consider is to directly alloc two pages from driver of contiguous memory.

     Then the beginning of this contiguous memory is used to store the headroom,
     and the contiguous memory after the headroom is directly handed over to the device.
     Similar to the following:

     [------------------ receive buffer(2 pages) ------------------------------]
     [<------------first page -------------------><------ second page -------->]
     [<-----><virtnet hdr> <mac,ip,tcp>..<padding><       payload             >]
        ^    ^
        |    |
        |    pointer to device
        |
        |
        Driver reserved, the later part is filled

We have already entered v8, but we have not been able to reach an agreement on
the basic capabilities. I want to solve this problem first.

@Jason @Michael

Thanks.





>
> >
> > > >+    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
> > >
> > >
> > > I don't get the reason why we need max_len. Can't it implied in the
> > > length of the first descriptor?
> > >
> >
> > Split transport header is usually used in high-throughput scenarios, such as GSO-enabled cases.
> > Therefore, it is best to reserve tailroom with $ (the length of the buffer) - (\field{offset} + \filed{max_len}) $
> > in the first buffer to build the non-linear data area of the socket buffer.
> >
> > >
> > > >+\end{itemize}
> > > >+
> > > >+If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
> > > >+bit in \field{flags} MUST be set. The transport header MUST be on the first
> > > >+buffer, following the virtio net header. The payload MUST start from the
> > > >+second buffer. The device MUST set \field{hdr_len} of structure
> > > >+virtio_net_hdr to the length of the transport header.
> > > >+The used length still reports the number of bytes it has written to memory.
> > > >+
> > > >+\field{offset} and \field{max_len} are valid when device uses the first buffer.
> > > >+The device MUST reserve space in the first buffer using \filed{offset}.
> > > >+If \field{offset} exceeds the length of the buffer, the device MUST drop
> > > >+the receive packets.
> > >
> > >
> > > Can the device simply don't split the packet in this case? Anyhow we
> > > need synchronize the driver with the device in the case (e.g when
> > > driver is try to having a new max_len).
> > >
> >
> > We think that \field{offset} is actively set by the driver, so the driver
> > will also receive packets according to this offset.
> > But if the case is considered to be caused by driver error settings,
> > the device can do not split the packet.
>
> Note that protocol like ipv6 allows variable length of the header,
> falling back to not split the header seems better to me.
>
> Thanks
>
> >
> > > (I wonder if the offset deserves a independent feature (but depends
> > > on the merge able) in this case).
> > >
> >
> > Okay, we can consider later.
> >
> > >
> > > >  The maximum available length of the first buffer
> > > >+used by the device is specified by \field{max_len}.
> > >
> > >
> > > Similarly the max length seems to be implied by length - offset?
> > >
> >
> > You can refer to the above answer about \field{max_len} similarly.
> >
> > Thanks.
> >
> >
> > > Thanks
> > >
> > >
> > > >  If \field{max_len} is 0 or
> > > >+$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
> > > >+the device can use the entire buffer starting at \field{offset}.
> > > >+
> > > >+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > >+
> > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> > > >+SHOULD treat the contents of \field{hdr_len} as the length of the transport header
> > > >+inside the first buffer.
> > > >+
> > > >+If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
> > > >  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-21  6:20     ` Jason Wang
@ 2022-09-21  6:23       ` Jason Wang
  2022-09-23  3:23       ` Xuan Zhuo
  1 sibling, 0 replies; 31+ messages in thread
From: Jason Wang @ 2022-09-21  6:23 UTC (permalink / raw)
  To: Heng Qi; +Cc: Virtio-Dev, Michael S. Tsirkin, Xuan Zhuo, Kangjie Xu

On Wed, Sep 21, 2022 at 2:20 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Sep 20, 2022 at 11:28 AM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >
> > On Tue, Sep 20, 2022 at 09:59:22AM +0800, Jason Wang wrote:
> > >
> > > 在 2022/9/16 10:56, hengqi 写道:
> > > >From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >
> > > >The purpose of this feature is to split the transport header and the payload
> > > >of the packet.
> > > >
> > > >|                     receive buffer1(page)            | receive buffer2(page) |
> > > >|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|        payload        |
> > > >              |<------------------------------->|
> > > >                               ^
> > > >                               |
> > > >                            max_len
> > > >
> > > >We can use one page for every receive buffer. In this way, we can ensure that
> > > >all payloads can be independently in a page, which is very beneficial for
> > > >the zerocopy implemented by the upper layer.
> > > >
> > > >Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > > >Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> > > >---
> > > >v8:
> > > >     1. Do not depend on descriptor chain. @Michael S. Tsirkin
> > > >     2. Add \field{offset} and \field{max_len}.
> > > >     3. Fix some presentation issues. @Jason Wang
> > > >     4. Clarify some paragraphs.
> > > >
> > > >v7:
> > > >     1. Fix some presentation issues.
> > > >     2. Use "split transport header". @Jason Wang
> > > >     3. Clarify some paragraphs. @Cornelia Huck
> > > >     4. determine the device what to do if it does not perform header split on a packet.
> > > >
> > > >v6:
> > > >     1. Fix some syntax issues. @Cornelia Huck
> > > >     2. Clarify some paragraphs. @Cornelia Huck
> > > >     3. Determine the device what to do if it does not perform header split on a packet.
> > > >
> > > >v5:
> > > >     1. Determine when hdr_len is credible in the process of rx
> > > >     2. Clean up the use of buffers and descriptors
> > > >     3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
> > > >
> > > >v4:
> > > >     1. fix typo @Cornelia Huck @Jason Wang
> > > >     2. do not split header for IP fragmentation packet. @Jason Wang
> > > >
> > > >  conformance.tex |  2 ++
> > > >  content.tex     | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 93 insertions(+)
> > > >
> > > >diff --git a/conformance.tex b/conformance.tex
> > > >index 2b86fc6..4e2b82e 100644
> > > >--- a/conformance.tex
> > > >+++ b/conformance.tex
> > > >@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > >+\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > >  \end{itemize}
> > > >  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> > > >@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> > > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > >+\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > >  \end{itemize}
> > > >  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> > > >diff --git a/content.tex b/content.tex
> > > >index e863709..fad9dea 100644
> > > >--- a/content.tex
> > > >+++ b/content.tex
> > > >@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> > > >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > > >      channel.
> > > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> > > >+    the transport header and the payload.
> > > >+
> > > >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> > > >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > > >@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> > > >  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> > > >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> > > >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_MRG_RXBUF.
> > > >  \end{description}
> > > >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > > >@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> > > >  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
> > > >  #define VIRTIO_NET_HDR_F_DATA_VALID    2
> > > >  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> > > >+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
> > > >          u8 flags;
> > > >  #define VIRTIO_NET_HDR_GSO_NONE        0
> > > >  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> > > >@@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > >  been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
> > > >  transport header size.
> > > >  The driver MUST NOT rely on \field{hdr_len} to be correct.
> > > >+
> > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> > > >+the driver SHOULD treat the \field{hdr_len} as the length of the transport
> > > >+header inside the first buffer.
> > > >+
> > > >  \begin{note}
> > > >  This is due to various bugs in implementations.
> > > >  \end{note}
> > > >@@ -4483,6 +4493,87 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > > >  according to the native endian of the guest rather than
> > > >  (necessarily when not using the legacy interface) little-endian.
> > > >+\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > >+
> > > >+If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> > > >+the device supports splitting the transport header and the payload.
> > > >+The transport header and the payload will be separated into different
> > > >+buffers.
> > > >+
> > > >+\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> > > >+
> > > >+To configure the split transport header, the following layout structure
> > > >+and definitions are used:
> > > >+
> > > >+\begin{lstlisting}
> > > >+struct virtio_net_split_transport_header_config {
> > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> > > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> > > >+    le64 type;
> > > >+    le16 offset;
> > > >+    le16 max_len;
> > > >+};
> > > >+
> > > >+#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> > > >+ #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> > > >+\end{lstlisting}
> > > >+
> > > >+The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> > > >+VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split
> > > >+header configuration.
> > > >+
> > > >+The driver can enable or disable split transport header for different transport
> > > >+protocols by setting or clearing corresponding bits in \field{type}.
> > > >+
> > > >+\begin{itemize}
> > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> > > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> > > >+\end{itemize}
> > > >+
> > > >+\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > >+
> > > >+A device MUST disable transport header splitting upon reset and initialization.
> > > >+
> > > >+If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> > > >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> > > >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> > > >+
> > > >+A device MUST NOT split the transport header if it encounters any of the following cases:
> > > >+\begin{itemize}
> > > >+    \item The device does not recognize the transport protocol of the packet.
> > > >+    \item The packet is an IP fragmentation.
> > > >+    \item The splitting of the specific transport protocol is not enabled via
> > > >+        VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET.
> > > >+    \item At most one buffer is available.
> > >
> > >
> > > So this means the feature is disabled for the device without
> > > merge-able buffer? Note that, even in the case of mergeable buffer,
> > > it doesn't mean a buffer that only contains a single descriptor.
> > >
> > >
> >
> > Yes, since the purpose of this scheme is to no longer depend on descriptor chains,
> > the buffer submitted to the receiveq can be thought of as containing only one descriptor.
> > So this feature depends on the mergeable buffer.
>
> To tell the truth, I'm not sure this is a good choice. We never had a
> feature that depends solely on the mergeable rx buffer before.
> Especially considering that using a descriptor chain is not hard. And
> I'm not sure we should care too much on the overhead since the
> splitting is enabled by the administrator when it needs e.g zerocopy.
>
> >
> > > >+    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
> > >
> > >
> > > I don't get the reason why we need max_len. Can't it implied in the
> > > length of the first descriptor?
> > >
> >
> > Split transport header is usually used in high-throughput scenarios, such as GSO-enabled cases.
> > Therefore, it is best to reserve tailroom with $ (the length of the buffer) - (\field{offset} + \filed{max_len}) $
> > in the first buffer to build the non-linear data area of the socket buffer.

But this doesn't answer the question, without max_len, if tailroom is
needed, we can simply allocate a larger buffer?

Thanks

> >
> > >
> > > >+\end{itemize}
> > > >+
> > > >+If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
> > > >+bit in \field{flags} MUST be set. The transport header MUST be on the first
> > > >+buffer, following the virtio net header. The payload MUST start from the
> > > >+second buffer. The device MUST set \field{hdr_len} of structure
> > > >+virtio_net_hdr to the length of the transport header.
> > > >+The used length still reports the number of bytes it has written to memory.
> > > >+
> > > >+\field{offset} and \field{max_len} are valid when device uses the first buffer.
> > > >+The device MUST reserve space in the first buffer using \filed{offset}.
> > > >+If \field{offset} exceeds the length of the buffer, the device MUST drop
> > > >+the receive packets.
> > >
> > >
> > > Can the device simply don't split the packet in this case? Anyhow we
> > > need synchronize the driver with the device in the case (e.g when
> > > driver is try to having a new max_len).
> > >
> >
> > We think that \field{offset} is actively set by the driver, so the driver
> > will also receive packets according to this offset.
> > But if the case is considered to be caused by driver error settings,
> > the device can do not split the packet.
>
> Note that protocol like ipv6 allows variable length of the header,
> falling back to not split the header seems better to me.
>
> Thanks
>
> >
> > > (I wonder if the offset deserves a independent feature (but depends
> > > on the merge able) in this case).
> > >
> >
> > Okay, we can consider later.
> >
> > >
> > > >  The maximum available length of the first buffer
> > > >+used by the device is specified by \field{max_len}.
> > >
> > >
> > > Similarly the max length seems to be implied by length - offset?
> > >
> >
> > You can refer to the above answer about \field{max_len} similarly.
> >
> > Thanks.
> >
> >
> > > Thanks
> > >
> > >
> > > >  If \field{max_len} is 0 or
> > > >+$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
> > > >+the device can use the entire buffer starting at \field{offset}.
> > > >+
> > > >+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > > >+
> > > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> > > >+SHOULD treat the contents of \field{hdr_len} as the length of the transport header
> > > >+inside the first buffer.
> > > >+
> > > >+If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
> > > >  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-20  3:28   ` Heng Qi
@ 2022-09-21  6:20     ` Jason Wang
  2022-09-21  6:23       ` Jason Wang
  2022-09-23  3:23       ` Xuan Zhuo
  0 siblings, 2 replies; 31+ messages in thread
From: Jason Wang @ 2022-09-21  6:20 UTC (permalink / raw)
  To: Heng Qi; +Cc: Virtio-Dev, Michael S. Tsirkin, Xuan Zhuo, Kangjie Xu

On Tue, Sep 20, 2022 at 11:28 AM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> On Tue, Sep 20, 2022 at 09:59:22AM +0800, Jason Wang wrote:
> >
> > 在 2022/9/16 10:56, hengqi 写道:
> > >From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > >
> > >The purpose of this feature is to split the transport header and the payload
> > >of the packet.
> > >
> > >|                     receive buffer1(page)            | receive buffer2(page) |
> > >|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|        payload        |
> > >              |<------------------------------->|
> > >                               ^
> > >                               |
> > >                            max_len
> > >
> > >We can use one page for every receive buffer. In this way, we can ensure that
> > >all payloads can be independently in a page, which is very beneficial for
> > >the zerocopy implemented by the upper layer.
> > >
> > >Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > >Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > >Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> > >---
> > >v8:
> > >     1. Do not depend on descriptor chain. @Michael S. Tsirkin
> > >     2. Add \field{offset} and \field{max_len}.
> > >     3. Fix some presentation issues. @Jason Wang
> > >     4. Clarify some paragraphs.
> > >
> > >v7:
> > >     1. Fix some presentation issues.
> > >     2. Use "split transport header". @Jason Wang
> > >     3. Clarify some paragraphs. @Cornelia Huck
> > >     4. determine the device what to do if it does not perform header split on a packet.
> > >
> > >v6:
> > >     1. Fix some syntax issues. @Cornelia Huck
> > >     2. Clarify some paragraphs. @Cornelia Huck
> > >     3. Determine the device what to do if it does not perform header split on a packet.
> > >
> > >v5:
> > >     1. Determine when hdr_len is credible in the process of rx
> > >     2. Clean up the use of buffers and descriptors
> > >     3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
> > >
> > >v4:
> > >     1. fix typo @Cornelia Huck @Jason Wang
> > >     2. do not split header for IP fragmentation packet. @Jason Wang
> > >
> > >  conformance.tex |  2 ++
> > >  content.tex     | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 93 insertions(+)
> > >
> > >diff --git a/conformance.tex b/conformance.tex
> > >index 2b86fc6..4e2b82e 100644
> > >--- a/conformance.tex
> > >+++ b/conformance.tex
> > >@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > >+\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > >  \end{itemize}
> > >  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> > >@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > >+\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > >  \end{itemize}
> > >  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> > >diff --git a/content.tex b/content.tex
> > >index e863709..fad9dea 100644
> > >--- a/content.tex
> > >+++ b/content.tex
> > >@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> > >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > >      channel.
> > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> > >+    the transport header and the payload.
> > >+
> > >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> > >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > >@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> > >  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> > >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> > >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_MRG_RXBUF.
> > >  \end{description}
> > >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > >@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> > >  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
> > >  #define VIRTIO_NET_HDR_F_DATA_VALID    2
> > >  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> > >+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
> > >          u8 flags;
> > >  #define VIRTIO_NET_HDR_GSO_NONE        0
> > >  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> > >@@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > >  been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
> > >  transport header size.
> > >  The driver MUST NOT rely on \field{hdr_len} to be correct.
> > >+
> > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> > >+the driver SHOULD treat the \field{hdr_len} as the length of the transport
> > >+header inside the first buffer.
> > >+
> > >  \begin{note}
> > >  This is due to various bugs in implementations.
> > >  \end{note}
> > >@@ -4483,6 +4493,87 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > >  according to the native endian of the guest rather than
> > >  (necessarily when not using the legacy interface) little-endian.
> > >+\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > >+
> > >+If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> > >+the device supports splitting the transport header and the payload.
> > >+The transport header and the payload will be separated into different
> > >+buffers.
> > >+
> > >+\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> > >+
> > >+To configure the split transport header, the following layout structure
> > >+and definitions are used:
> > >+
> > >+\begin{lstlisting}
> > >+struct virtio_net_split_transport_header_config {
> > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> > >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> > >+    le64 type;
> > >+    le16 offset;
> > >+    le16 max_len;
> > >+};
> > >+
> > >+#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> > >+ #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> > >+\end{lstlisting}
> > >+
> > >+The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> > >+VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split
> > >+header configuration.
> > >+
> > >+The driver can enable or disable split transport header for different transport
> > >+protocols by setting or clearing corresponding bits in \field{type}.
> > >+
> > >+\begin{itemize}
> > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> > >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> > >+\end{itemize}
> > >+
> > >+\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > >+
> > >+A device MUST disable transport header splitting upon reset and initialization.
> > >+
> > >+If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> > >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> > >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> > >+
> > >+A device MUST NOT split the transport header if it encounters any of the following cases:
> > >+\begin{itemize}
> > >+    \item The device does not recognize the transport protocol of the packet.
> > >+    \item The packet is an IP fragmentation.
> > >+    \item The splitting of the specific transport protocol is not enabled via
> > >+        VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET.
> > >+    \item At most one buffer is available.
> >
> >
> > So this means the feature is disabled for the device without
> > merge-able buffer? Note that, even in the case of mergeable buffer,
> > it doesn't mean a buffer that only contains a single descriptor.
> >
> >
>
> Yes, since the purpose of this scheme is to no longer depend on descriptor chains,
> the buffer submitted to the receiveq can be thought of as containing only one descriptor.
> So this feature depends on the mergeable buffer.

To tell the truth, I'm not sure this is a good choice. We never had a
feature that depends solely on the mergeable rx buffer before.
Especially considering that using a descriptor chain is not hard. And
I'm not sure we should care too much on the overhead since the
splitting is enabled by the administrator when it needs e.g zerocopy.

>
> > >+    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
> >
> >
> > I don't get the reason why we need max_len. Can't it implied in the
> > length of the first descriptor?
> >
>
> Split transport header is usually used in high-throughput scenarios, such as GSO-enabled cases.
> Therefore, it is best to reserve tailroom with $ (the length of the buffer) - (\field{offset} + \filed{max_len}) $
> in the first buffer to build the non-linear data area of the socket buffer.
>
> >
> > >+\end{itemize}
> > >+
> > >+If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
> > >+bit in \field{flags} MUST be set. The transport header MUST be on the first
> > >+buffer, following the virtio net header. The payload MUST start from the
> > >+second buffer. The device MUST set \field{hdr_len} of structure
> > >+virtio_net_hdr to the length of the transport header.
> > >+The used length still reports the number of bytes it has written to memory.
> > >+
> > >+\field{offset} and \field{max_len} are valid when device uses the first buffer.
> > >+The device MUST reserve space in the first buffer using \filed{offset}.
> > >+If \field{offset} exceeds the length of the buffer, the device MUST drop
> > >+the receive packets.
> >
> >
> > Can the device simply don't split the packet in this case? Anyhow we
> > need synchronize the driver with the device in the case (e.g when
> > driver is try to having a new max_len).
> >
>
> We think that \field{offset} is actively set by the driver, so the driver
> will also receive packets according to this offset.
> But if the case is considered to be caused by driver error settings,
> the device can do not split the packet.

Note that protocol like ipv6 allows variable length of the header,
falling back to not split the header seems better to me.

Thanks

>
> > (I wonder if the offset deserves a independent feature (but depends
> > on the merge able) in this case).
> >
>
> Okay, we can consider later.
>
> >
> > >  The maximum available length of the first buffer
> > >+used by the device is specified by \field{max_len}.
> >
> >
> > Similarly the max length seems to be implied by length - offset?
> >
>
> You can refer to the above answer about \field{max_len} similarly.
>
> Thanks.
>
>
> > Thanks
> >
> >
> > >  If \field{max_len} is 0 or
> > >+$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
> > >+the device can use the entire buffer starting at \field{offset}.
> > >+
> > >+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > >+
> > >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> > >+SHOULD treat the contents of \field{hdr_len} as the length of the transport header
> > >+inside the first buffer.
> > >+
> > >+If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
> > >  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-20  1:59 ` Jason Wang
@ 2022-09-20  3:28   ` Heng Qi
  2022-09-21  6:20     ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Heng Qi @ 2022-09-20  3:28 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtio-dev, Michael S. Tsirkin, Xuan Zhuo, kangjie.xu

On Tue, Sep 20, 2022 at 09:59:22AM +0800, Jason Wang wrote:
> 
> 在 2022/9/16 10:56, hengqi 写道:
> >From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >
> >The purpose of this feature is to split the transport header and the payload
> >of the packet.
> >
> >|                     receive buffer1(page)            | receive buffer2(page) |
> >|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|        payload        |
> >              |<------------------------------->|
> >                               ^
> >                               |
> >                            max_len
> >
> >We can use one page for every receive buffer. In this way, we can ensure that
> >all payloads can be independently in a page, which is very beneficial for
> >the zerocopy implemented by the upper layer.
> >
> >Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> >Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> >---
> >v8:
> >	1. Do not depend on descriptor chain. @Michael S. Tsirkin
> >	2. Add \field{offset} and \field{max_len}.
> >	3. Fix some presentation issues. @Jason Wang
> >	4. Clarify some paragraphs.
> >
> >v7:
> >	1. Fix some presentation issues.
> >	2. Use "split transport header". @Jason Wang
> >	3. Clarify some paragraphs. @Cornelia Huck
> >	4. determine the device what to do if it does not perform header split on a packet.
> >
> >v6:
> >	1. Fix some syntax issues. @Cornelia Huck
> >	2. Clarify some paragraphs. @Cornelia Huck
> >	3. Determine the device what to do if it does not perform header split on a packet.
> >
> >v5:
> >	1. Determine when hdr_len is credible in the process of rx
> >	2. Clean up the use of buffers and descriptors
> >	3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
> >
> >v4:
> >	1. fix typo @Cornelia Huck @Jason Wang
> >	2. do not split header for IP fragmentation packet. @Jason Wang
> >
> >  conformance.tex |  2 ++
> >  content.tex     | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 93 insertions(+)
> >
> >diff --git a/conformance.tex b/conformance.tex
> >index 2b86fc6..4e2b82e 100644
> >--- a/conformance.tex
> >+++ b/conformance.tex
> >@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> >+\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> >  \end{itemize}
> >  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> >@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> >+\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> >  \end{itemize}
> >  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> >diff --git a/content.tex b/content.tex
> >index e863709..fad9dea 100644
> >--- a/content.tex
> >+++ b/content.tex
> >@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> >      channel.
> >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> >+    the transport header and the payload.
> >+
> >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> >@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> >  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> >+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_MRG_RXBUF.
> >  \end{description}
> >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> >@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> >  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
> >  #define VIRTIO_NET_HDR_F_DATA_VALID    2
> >  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> >+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
> >          u8 flags;
> >  #define VIRTIO_NET_HDR_GSO_NONE        0
> >  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> >@@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> >  been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
> >  transport header size.
> >  The driver MUST NOT rely on \field{hdr_len} to be correct.
> >+
> >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> >+the driver SHOULD treat the \field{hdr_len} as the length of the transport
> >+header inside the first buffer.
> >+
> >  \begin{note}
> >  This is due to various bugs in implementations.
> >  \end{note}
> >@@ -4483,6 +4493,87 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >  according to the native endian of the guest rather than
> >  (necessarily when not using the legacy interface) little-endian.
> >+\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> >+
> >+If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> >+the device supports splitting the transport header and the payload.
> >+The transport header and the payload will be separated into different
> >+buffers.
> >+
> >+\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> >+
> >+To configure the split transport header, the following layout structure
> >+and definitions are used:
> >+
> >+\begin{lstlisting}
> >+struct virtio_net_split_transport_header_config {
> >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> >+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> >+    le64 type;
> >+    le16 offset;
> >+    le16 max_len;
> >+};
> >+
> >+#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> >+ #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> >+\end{lstlisting}
> >+
> >+The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> >+VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split
> >+header configuration.
> >+
> >+The driver can enable or disable split transport header for different transport
> >+protocols by setting or clearing corresponding bits in \field{type}.
> >+
> >+\begin{itemize}
> >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> >+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> >+\end{itemize}
> >+
> >+\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> >+
> >+A device MUST disable transport header splitting upon reset and initialization.
> >+
> >+If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> >+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> >+
> >+A device MUST NOT split the transport header if it encounters any of the following cases:
> >+\begin{itemize}
> >+    \item The device does not recognize the transport protocol of the packet.
> >+    \item The packet is an IP fragmentation.
> >+    \item The splitting of the specific transport protocol is not enabled via
> >+        VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET.
> >+    \item At most one buffer is available.
> 
> 
> So this means the feature is disabled for the device without
> merge-able buffer? Note that, even in the case of mergeable buffer,
> it doesn't mean a buffer that only contains a single descriptor.
> 
> 

Yes, since the purpose of this scheme is to no longer depend on descriptor chains,
the buffer submitted to the receiveq can be thought of as containing only one descriptor.
So this feature depends on the mergeable buffer.

> >+    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
> 
> 
> I don't get the reason why we need max_len. Can't it implied in the
> length of the first descriptor?
> 

Split transport header is usually used in high-throughput scenarios, such as GSO-enabled cases.
Therefore, it is best to reserve tailroom with $ (the length of the buffer) - (\field{offset} + \filed{max_len}) $
in the first buffer to build the non-linear data area of the socket buffer.

> 
> >+\end{itemize}
> >+
> >+If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
> >+bit in \field{flags} MUST be set. The transport header MUST be on the first
> >+buffer, following the virtio net header. The payload MUST start from the
> >+second buffer. The device MUST set \field{hdr_len} of structure
> >+virtio_net_hdr to the length of the transport header.
> >+The used length still reports the number of bytes it has written to memory.
> >+
> >+\field{offset} and \field{max_len} are valid when device uses the first buffer.
> >+The device MUST reserve space in the first buffer using \filed{offset}.
> >+If \field{offset} exceeds the length of the buffer, the device MUST drop
> >+the receive packets.
> 
> 
> Can the device simply don't split the packet in this case? Anyhow we
> need synchronize the driver with the device in the case (e.g when
> driver is try to having a new max_len).
> 

We think that \field{offset} is actively set by the driver, so the driver
will also receive packets according to this offset.
But if the case is considered to be caused by driver error settings,
the device can do not split the packet.

> (I wonder if the offset deserves a independent feature (but depends
> on the merge able) in this case).
> 

Okay, we can consider later.

> 
> >  The maximum available length of the first buffer
> >+used by the device is specified by \field{max_len}.
> 
> 
> Similarly the max length seems to be implied by length - offset?
> 

You can refer to the above answer about \field{max_len} similarly.

Thanks.


> Thanks
> 
> 
> >  If \field{max_len} is 0 or
> >+$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
> >+the device can use the entire buffer starting at \field{offset}.
> >+
> >+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> >+
> >+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> >+SHOULD treat the contents of \field{hdr_len} as the length of the transport header
> >+inside the first buffer.
> >+
> >+If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
> >  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [virtio-dev] [PATCH v8] virtio_net: support for split transport header
  2022-09-16  2:56 hengqi
@ 2022-09-20  1:59 ` Jason Wang
  2022-09-20  3:28   ` Heng Qi
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2022-09-20  1:59 UTC (permalink / raw)
  To: hengqi, virtio-dev; +Cc: Michael S. Tsirkin, Xuan Zhuo, kangjie.xu


在 2022/9/16 10:56, hengqi 写道:
> From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> The purpose of this feature is to split the transport header and the payload
> of the packet.
>
> |                     receive buffer1(page)            | receive buffer2(page) |
> |<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|        payload        |
>               |<------------------------------->|
>                                ^
>                                |
>                             max_len
>
> We can use one page for every receive buffer. In this way, we can ensure that
> all payloads can be independently in a page, which is very beneficial for
> the zerocopy implemented by the upper layer.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> ---
> v8:
> 	1. Do not depend on descriptor chain. @Michael S. Tsirkin
> 	2. Add \field{offset} and \field{max_len}.
> 	3. Fix some presentation issues. @Jason Wang
> 	4. Clarify some paragraphs.
>
> v7:
> 	1. Fix some presentation issues.
> 	2. Use "split transport header". @Jason Wang
> 	3. Clarify some paragraphs. @Cornelia Huck
> 	4. determine the device what to do if it does not perform header split on a packet.
>
> v6:
> 	1. Fix some syntax issues. @Cornelia Huck
> 	2. Clarify some paragraphs. @Cornelia Huck
> 	3. Determine the device what to do if it does not perform header split on a packet.
>
> v5:
> 	1. Determine when hdr_len is credible in the process of rx
> 	2. Clean up the use of buffers and descriptors
> 	3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>
> v4:
> 	1. fix typo @Cornelia Huck @Jason Wang
> 	2. do not split header for IP fragmentation packet. @Jason Wang
>
>   conformance.tex |  2 ++
>   content.tex     | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 93 insertions(+)
>
> diff --git a/conformance.tex b/conformance.tex
> index 2b86fc6..4e2b82e 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>   \end{itemize}
>   
>   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>   \end{itemize}
>   
>   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> diff --git a/content.tex b/content.tex
> index e863709..fad9dea 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>       channel.
>   
> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> +    the transport header and the payload.
> +
>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>   
>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_MRG_RXBUF.
>   \end{description}
>   
>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>   #define VIRTIO_NET_HDR_F_DATA_VALID    2
>   #define VIRTIO_NET_HDR_F_RSC_INFO      4
> +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
>           u8 flags;
>   #define VIRTIO_NET_HDR_GSO_NONE        0
>   #define VIRTIO_NET_HDR_GSO_TCPV4       1
> @@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>   been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>   transport header size.
>   The driver MUST NOT rely on \field{hdr_len} to be correct.
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> +the driver SHOULD treat the \field{hdr_len} as the length of the transport
> +header inside the first buffer.
> +
>   \begin{note}
>   This is due to various bugs in implementations.
>   \end{note}
> @@ -4483,6 +4493,87 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>   according to the native endian of the guest rather than
>   (necessarily when not using the legacy interface) little-endian.
>   
> +\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> +the device supports splitting the transport header and the payload.
> +The transport header and the payload will be separated into different
> +buffers.
> +
> +\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> +
> +To configure the split transport header, the following layout structure
> +and definitions are used:
> +
> +\begin{lstlisting}
> +struct virtio_net_split_transport_header_config {
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> +    le64 type;
> +    le16 offset;
> +    le16 max_len;
> +};
> +
> +#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> + #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> +\end{lstlisting}
> +
> +The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> +VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split
> +header configuration.
> +
> +The driver can enable or disable split transport header for different transport
> +protocols by setting or clearing corresponding bits in \field{type}.
> +
> +\begin{itemize}
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> +\end{itemize}
> +
> +\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +A device MUST disable transport header splitting upon reset and initialization.
> +
> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> +
> +A device MUST NOT split the transport header if it encounters any of the following cases:
> +\begin{itemize}
> +    \item The device does not recognize the transport protocol of the packet.
> +    \item The packet is an IP fragmentation.
> +    \item The splitting of the specific transport protocol is not enabled via
> +        VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET.
> +    \item At most one buffer is available.


So this means the feature is disabled for the device without merge-able 
buffer? Note that, even in the case of mergeable buffer, it doesn't mean 
a buffer that only contains a single descriptor.


> +    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.


I don't get the reason why we need max_len. Can't it implied in the 
length of the first descriptor?


> +\end{itemize}
> +
> +If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
> +bit in \field{flags} MUST be set. The transport header MUST be on the first
> +buffer, following the virtio net header. The payload MUST start from the
> +second buffer. The device MUST set \field{hdr_len} of structure
> +virtio_net_hdr to the length of the transport header.
> +The used length still reports the number of bytes it has written to memory.
> +
> +\field{offset} and \field{max_len} are valid when device uses the first buffer.
> +The device MUST reserve space in the first buffer using \filed{offset}.
> +If \field{offset} exceeds the length of the buffer, the device MUST drop
> +the receive packets.


Can the device simply don't split the packet in this case? Anyhow we 
need synchronize the driver with the device in the case (e.g when driver 
is try to having a new max_len).

(I wonder if the offset deserves a independent feature (but depends on 
the merge able) in this case).


>   The maximum available length of the first buffer
> +used by the device is specified by \field{max_len}.


Similarly the max length seems to be implied by length - offset?

Thanks


>   If \field{max_len} is 0 or
> +$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
> +the device can use the entire buffer starting at \field{offset}.
> +
> +\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> +SHOULD treat the contents of \field{hdr_len} as the length of the transport header
> +inside the first buffer.
> +
> +If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
>   
>   \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [virtio-dev] [PATCH v8] virtio_net: support for split transport header
@ 2022-09-16  2:56 hengqi
  2022-09-20  1:59 ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: hengqi @ 2022-09-16  2:56 UTC (permalink / raw)
  To: virtio-dev; +Cc: Jason Wang, Michael S. Tsirkin, Xuan Zhuo, kangjie.xu, Heng Qi

From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

The purpose of this feature is to split the transport header and the payload
of the packet.

|                     receive buffer1(page)            | receive buffer2(page) |
|<- offset ->| virtnet hdr | mac | ip | tcp |<- hold ->|        payload        |
             |<------------------------------->|
                              ^
                              |
                           max_len

We can use one page for every receive buffer. In this way, we can ensure that
all payloads can be independently in a page, which is very beneficial for
the zerocopy implemented by the upper layer.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
---
v8:
	1. Do not depend on descriptor chain. @Michael S. Tsirkin
	2. Add \field{offset} and \field{max_len}.
	3. Fix some presentation issues. @Jason Wang
	4. Clarify some paragraphs.

v7:
	1. Fix some presentation issues.
	2. Use "split transport header". @Jason Wang
	3. Clarify some paragraphs. @Cornelia Huck
	4. determine the device what to do if it does not perform header split on a packet.

v6:
	1. Fix some syntax issues. @Cornelia Huck
	2. Clarify some paragraphs. @Cornelia Huck
	3. Determine the device what to do if it does not perform header split on a packet.

v5:
	1. Determine when hdr_len is credible in the process of rx
	2. Clean up the use of buffers and descriptors
	3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge

v4:
	1. fix typo @Cornelia Huck @Jason Wang
	2. do not split header for IP fragmentation packet. @Jason Wang

 conformance.tex |  2 ++
 content.tex     | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..fad9dea 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
     channel.
 
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
+    the transport header and the payload.
+
 \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
 
 \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
 \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
 \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_MRG_RXBUF.
 \end{description}
 
 \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
 #define VIRTIO_NET_HDR_F_DATA_VALID    2
 #define VIRTIO_NET_HDR_F_RSC_INFO      4
+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
         u8 flags;
 #define VIRTIO_NET_HDR_GSO_NONE        0
 #define VIRTIO_NET_HDR_GSO_TCPV4       1
@@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
 been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
 transport header size.
 The driver MUST NOT rely on \field{hdr_len} to be correct.
+
+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
+the driver SHOULD treat the \field{hdr_len} as the length of the transport
+header inside the first buffer.
+
 \begin{note}
 This is due to various bugs in implementations.
 \end{note}
@@ -4483,6 +4493,87 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
 according to the native endian of the guest rather than
 (necessarily when not using the legacy interface) little-endian.
 
+\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
+
+If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
+the device supports splitting the transport header and the payload.
+The transport header and the payload will be separated into different
+buffers.
+
+\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
+
+To configure the split transport header, the following layout structure
+and definitions are used:
+
+\begin{lstlisting}
+struct virtio_net_split_transport_header_config {
+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
+    le64 type;
+    le16 offset;
+    le16 max_len;
+};
+
+#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
+ #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
+\end{lstlisting}
+
+The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
+VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split
+header configuration.
+
+The driver can enable or disable split transport header for different transport
+protocols by setting or clearing corresponding bits in \field{type}.
+
+\begin{itemize}
+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
+\end{itemize}
+
+\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
+
+A device MUST disable transport header splitting upon reset and initialization.
+
+If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
+
+A device MUST NOT split the transport header if it encounters any of the following cases:
+\begin{itemize}
+    \item The device does not recognize the transport protocol of the packet.
+    \item The packet is an IP fragmentation.
+    \item The splitting of the specific transport protocol is not enabled via
+        VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET.
+    \item At most one buffer is available.
+    \item The total size of the virtio net header and the transport header exceeds \field{max_len}.
+\end{itemize}
+
+If the transport header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER
+bit in \field{flags} MUST be set. The transport header MUST be on the first
+buffer, following the virtio net header. The payload MUST start from the
+second buffer. The device MUST set \field{hdr_len} of structure
+virtio_net_hdr to the length of the transport header.
+The used length still reports the number of bytes it has written to memory.
+
+\field{offset} and \field{max_len} are valid when device uses the first buffer.
+The device MUST reserve space in the first buffer using \filed{offset}.
+If \field{offset} exceeds the length of the buffer, the device MUST drop
+the receive packets. The maximum available length of the first buffer
+used by the device is specified by \field{max_len}. If \field{max_len} is 0 or
+$ \field{offset} + \field{max_len} $ is greater than the length of the buffer,
+the device can use the entire buffer starting at \field{offset}.
+
+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
+
+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
+SHOULD treat the contents of \field{hdr_len} as the length of the transport header
+inside the first buffer.
+
+If \field{max_len} is not equal to 0, it MUST be greater than the size of the virtio net header.
 
 \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
 
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-10-20  8:16 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-28  1:43 [virtio-dev] [PATCH v8] virtio_net: support for split transport header Xuan Zhuo
2022-09-28  4:05 ` Michael S. Tsirkin
  -- strict thread matches above, loose matches on Subject: below --
2022-09-16  2:56 hengqi
2022-09-20  1:59 ` Jason Wang
2022-09-20  3:28   ` Heng Qi
2022-09-21  6:20     ` Jason Wang
2022-09-21  6:23       ` Jason Wang
2022-09-23  3:23       ` Xuan Zhuo
2022-09-23  4:04         ` Jason Wang
2022-09-23  5:59           ` Michael S. Tsirkin
2022-09-23  6:57             ` Xuan Zhuo
2022-09-23 10:44               ` Michael S. Tsirkin
2022-09-23 10:48                 ` Xuan Zhuo
2022-09-23 11:04                   ` Michael S. Tsirkin
2022-09-23 12:40                     ` Xuan Zhuo
2022-09-26  8:06             ` Jason Wang
2022-09-28 13:39               ` Michael S. Tsirkin
2022-09-29  1:48                 ` Jason Wang
2022-09-29  7:04                   ` Michael S. Tsirkin
2022-09-29  8:24                     ` Xuan Zhuo
2022-09-29 10:06                       ` Michael S. Tsirkin
2022-09-29 11:48                         ` Xuan Zhuo
2022-10-08  4:37                     ` Jason Wang
2022-10-09  1:49                       ` Xuan Zhuo
2022-10-10 17:11                       ` Michael S. Tsirkin
2022-10-12  3:17                         ` Jason Wang
2022-10-12  5:05                           ` Michael S. Tsirkin
2022-10-13  6:47                             ` Jason Wang
2022-10-13 14:33                               ` Michael S. Tsirkin
2022-10-18  3:07                       ` Xuan Zhuo
2022-10-20  8:16                       ` Heng Qi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.