All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-dev] [PATCH v7] virtio_net: support split header
@ 2022-08-16  9:34 Heng Qi
  2022-08-25 14:22 ` Cornelia Huck
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Heng Qi @ 2022-08-16  9:34 UTC (permalink / raw)
  To: virtio-dev; +Cc: jasowang, mst, xuanzhuo, kangjie.xu

From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

The purpose of this feature is to split the header and the payload of
the packet.

|                    receive buffer                                    |
|                       0th descriptor             | 1th descriptor    |
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
---
v7:
	1. Fix some presentation issues.
	2. Use "split transport header". @Jason Wang
	3. Clarify some paragraphs. @Cornelia Huck
	4. determine the device what to do if it does not perform header split on a packet.

v6:
	1. Fix some syntax issues. @Cornelia Huck
	2. Clarify some paragraphs. @Cornelia Huck
	3. Determine the device what to do if it does not perform header split on a packet.

v5:
	1. Determine when hdr_len is credible in the process of rx
	2. Clean up the use of buffers and descriptors
	3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge

v4:
	1. fix typo @Cornelia Huck @Jason Wang
	2. do not split header for IP fragmentation packet. @Jason Wang

v3:
	1. Fix some syntax issues
	2. Fix some terminology issues
	3. It is not unified with ip alignment, so ip alignment is not included
	4. Make it clear that the device must support four types, in the case of successful negotiation.

 conformance.tex |   2 ++
 content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 104 insertions(+)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..4e2b82e 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..5676da9 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
     channel.
 
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
+    the transport header and the payload.
+
 \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
 
 \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
 \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
 \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
 \end{description}
 
 \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
 #define VIRTIO_NET_HDR_F_DATA_VALID    2
 #define VIRTIO_NET_HDR_F_RSC_INFO      4
+#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
         u8 flags;
 #define VIRTIO_NET_HDR_GSO_NONE        0
 #define VIRTIO_NET_HDR_GSO_TCPV4       1
@@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
 been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
 transport header size.
 The driver MUST NOT rely on \field{hdr_len} to be correct.
+
+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
+the driver SHOULD treat the \field{hdr_len} as the length of the transport header
+inside the first descriptor.
+
 \begin{note}
 This is due to various bugs in implementations.
 \end{note}
@@ -4483,6 +4493,98 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
 according to the native endian of the guest rather than
 (necessarily when not using the legacy interface) little-endian.
 
+\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
+
+If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
+the device supports splitting the transport header and the payload.
+The transport header and the payload will be separated into different
+descriptors.
+
+\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
+
+To configure the split transport header, the following layout structure and definitions
+are used:
+
+\begin{lstlisting}
+struct virtio_net_split_transport_header_config {
+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
+#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
+    le64 type;
+};
+
+#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
+ #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
+\end{lstlisting}
+
+The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
+VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split transport header configuration.
+
+The driver can enable or disable split transport header for different protocols by
+setting or clearing corresponding bits in \field{type}.
+
+\begin{itemize}
+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
+    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
+\end{itemize}
+
+\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
+
+A device MUST initialize \field{type} to 0, and MUST set it to 0
+upon device reset.
+
+If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
+VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
+
+A device MUST NOT split the transport header if it encounters any of the following cases:
+\begin{itemize}
+    \item the device does not recognize the protocol of the packet.
+    \item the packet is an IP fragmentation.
+    \item \field{type} does not include the protocol of the packet.
+    \item the buffer consists of only one descriptor.
+    \item the total size of the virtio net header and the transport header exceeds
+        the size of the first descriptor.
+    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
+        payload exceeds the size of the descriptor chain starting from the 2nd
+        descriptor.
+\end{itemize}
+
+If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit
+in \field{flags} MUST be set. The transport header MUST be on the first
+descriptor, following the virtio net header. The payload MUST start from the
+second descriptor. The device MUST set \field{hdr_len} of structure
+virtio_net_hdr to the length of the transport header.
+
+If all of the following applies:
+\begin{itemize}
+    \item the header is split by the device.
+    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
+    \item the received packet is spread over multiple buffers.
+\end {itemize}
+then if the device uses the buffers after the 1st buffer, and the buffer
+consists of multiple descriptors, the device MUST skip the first descriptor,
+because the first descriptor is used to carry the transport header.
+The used length still reports the number of bytes it has written to memory.
+
+If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, but the device does not split the
+virtio net header and the transport header, and the buffer consists of at least two
+descriptors, the device MUST start with the first descriptor to store the packet, and
+MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
+
+\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
+
+If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
+MUST treat the contents of \field{hdr_len} as the length of the transport header
+inside the first descriptor.
+
+If the split transport header is enabled, the buffers submitted to receiveq by the
+driver MUST be composed of at least two descriptors.
+When the buffer consists of two descriptors, the length of the first
+descriptor MUST be greater than the one of the virtio net header.
 
 \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
 
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] [PATCH v7] virtio_net: support split header
  2022-08-16  9:34 [virtio-dev] [PATCH v7] virtio_net: support split header Heng Qi
@ 2022-08-25 14:22 ` Cornelia Huck
  2022-08-30 11:23   ` Heng Qi
  2022-08-30 11:26 ` Heng Qi
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Cornelia Huck @ 2022-08-25 14:22 UTC (permalink / raw)
  To: Heng Qi, virtio-dev; +Cc: jasowang, mst, xuanzhuo, kangjie.xu

On Tue, Aug 16 2022, Heng Qi <hengqi@linux.alibaba.com> wrote:

> From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> The purpose of this feature is to split the header and the payload of
> the packet.
>
> |                    receive buffer                                    |
> |                       0th descriptor             | 1th descriptor    |
> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>
> We can use a buffer plus a separate page when allocating the receive
> buffer. In this way, we can ensure that all payloads can be
> independently in a page, which is very beneficial for the zerocopy
> implemented by the upper layer.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> ---
> v7:
> 	1. Fix some presentation issues.
> 	2. Use "split transport header". @Jason Wang
> 	3. Clarify some paragraphs. @Cornelia Huck
> 	4. determine the device what to do if it does not perform header split on a packet.
>
> v6:
> 	1. Fix some syntax issues. @Cornelia Huck
> 	2. Clarify some paragraphs. @Cornelia Huck
> 	3. Determine the device what to do if it does not perform header split on a packet.
>
> v5:
> 	1. Determine when hdr_len is credible in the process of rx
> 	2. Clean up the use of buffers and descriptors
> 	3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>
> v4:
> 	1. fix typo @Cornelia Huck @Jason Wang
> 	2. do not split header for IP fragmentation packet. @Jason Wang
>
> v3:
> 	1. Fix some syntax issues
> 	2. Fix some terminology issues
> 	3. It is not unified with ip alignment, so ip alignment is not included
> 	4. Make it clear that the device must support four types, in the case of successful negotiation.
>
>  conformance.tex |   2 ++
>  content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 104 insertions(+)

I do not have any further comments on the change, let's see what the
networking folks think.

[Do we require patches to be posted to virtio-comment, or is virtio-dev
enough? I'm a bit unsure right now.]


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] [PATCH v7] virtio_net: support split header
  2022-08-25 14:22 ` Cornelia Huck
@ 2022-08-30 11:23   ` Heng Qi
  0 siblings, 0 replies; 32+ messages in thread
From: Heng Qi @ 2022-08-30 11:23 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: virtio-dev, Jason Wang, mst, xuanzhuo, kangjie.xu

[-- Attachment #1: Type: text/plain, Size: 2375 bytes --]

在 2022/8/25 下午10:22, Cornelia Huck 写道:
> On Tue, Aug 16 2022, Heng Qi<hengqi@linux.alibaba.com>  wrote:
>
>> From: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>>
>> The purpose of this feature is to split the header and the payload of
>> the packet.
>>
>> |                    receive buffer                                    |
>> |                       0th descriptor             | 1th descriptor    |
>> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>>
>> We can use a buffer plus a separate page when allocating the receive
>> buffer. In this way, we can ensure that all payloads can be
>> independently in a page, which is very beneficial for the zerocopy
>> implemented by the upper layer.
>>
>> Signed-off-by: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>> Signed-off-by: Heng Qi<hengqi@linux.alibaba.com>
>> Reviewed-by: Kangjie Xu<kangjie.xu@linux.alibaba.com>
>> ---
>> v7:
>> 	1. Fix some presentation issues.
>> 	2. Use "split transport header". @Jason Wang
>> 	3. Clarify some paragraphs. @Cornelia Huck
>> 	4. determine the device what to do if it does not perform header split on a packet.
>>
>> v6:
>> 	1. Fix some syntax issues. @Cornelia Huck
>> 	2. Clarify some paragraphs. @Cornelia Huck
>> 	3. Determine the device what to do if it does not perform header split on a packet.
>>
>> v5:
>> 	1. Determine when hdr_len is credible in the process of rx
>> 	2. Clean up the use of buffers and descriptors
>> 	3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>>
>> v4:
>> 	1. fix typo @Cornelia Huck @Jason Wang
>> 	2. do not split header for IP fragmentation packet. @Jason Wang
>>
>> v3:
>> 	1. Fix some syntax issues
>> 	2. Fix some terminology issues
>> 	3. It is not unified with ip alignment, so ip alignment is not included
>> 	4. Make it clear that the device must support four types, in the case of successful negotiation.
>>
>>   conformance.tex |   2 ++
>>   content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 104 insertions(+)
> I do not have any further comments on the change, let's see what the
> networking folks think.

Okay. Thanks for your review.

>
> [Do we require patches to be posted to virtio-comment, or is virtio-dev
> enough? I'm a bit unsure right now.]

We can then consider posting patches to virtio-comment.

[-- Attachment #2: Type: text/html, Size: 3309 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] [PATCH v7] virtio_net: support split header
  2022-08-16  9:34 [virtio-dev] [PATCH v7] virtio_net: support split header Heng Qi
  2022-08-25 14:22 ` Cornelia Huck
@ 2022-08-30 11:26 ` Heng Qi
  2022-09-02  4:12 ` Heng Qi
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Heng Qi @ 2022-08-30 11:26 UTC (permalink / raw)
  To: Jason Wang, mst; +Cc: virtio-dev, xuanzhuo, kangjie.xu

[-- Attachment #1: Type: text/plain, Size: 11521 bytes --]


在 2022/8/16 下午5:34, Heng Qi 写道:
> From: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>
> The purpose of this feature is to split the header and the payload of
> the packet.
>
> |                    receive buffer                                    |
> |                       0th descriptor             | 1th descriptor    |
> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>
> We can use a buffer plus a separate page when allocating the receive
> buffer. In this way, we can ensure that all payloads can be
> independently in a page, which is very beneficial for the zerocopy
> implemented by the upper layer.
>
> Signed-off-by: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
> Signed-off-by: Heng Qi<hengqi@linux.alibaba.com>
> Reviewed-by: Kangjie Xu<kangjie.xu@linux.alibaba.com>
> ---
> v7:
> 	1. Fix some presentation issues.
> 	2. Use "split transport header". @Jason Wang
> 	3. Clarify some paragraphs. @Cornelia Huck
> 	4. determine the device what to do if it does not perform header split on a packet.
>
> v6:
> 	1. Fix some syntax issues. @Cornelia Huck
> 	2. Clarify some paragraphs. @Cornelia Huck
> 	3. Determine the device what to do if it does not perform header split on a packet.
>
> v5:
> 	1. Determine when hdr_len is credible in the process of rx
> 	2. Clean up the use of buffers and descriptors
> 	3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>
> v4:
> 	1. fix typo @Cornelia Huck @Jason Wang
> 	2. do not split header for IP fragmentation packet. @Jason Wang
>
> v3:
> 	1. Fix some syntax issues
> 	2. Fix some terminology issues
> 	3. It is not unified with ip alignment, so ip alignment is not included
> 	4. Make it clear that the device must support four types, in the case of successful negotiation.
>
>   conformance.tex |   2 ++
>   content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 104 insertions(+)
>
> diff --git a/conformance.tex b/conformance.tex
> index 2b86fc6..4e2b82e 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>   \end{itemize}
>   
>   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>   \end{itemize}
>   
>   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> diff --git a/content.tex b/content.tex
> index e863709..5676da9 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>       channel.
>   
> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> +    the transport header and the payload.
> +
>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>   
>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>   \end{description}
>   
>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>   #define VIRTIO_NET_HDR_F_DATA_VALID    2
>   #define VIRTIO_NET_HDR_F_RSC_INFO      4
> +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
>           u8 flags;
>   #define VIRTIO_NET_HDR_GSO_NONE        0
>   #define VIRTIO_NET_HDR_GSO_TCPV4       1
> @@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>   been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>   transport header size.
>   The driver MUST NOT rely on \field{hdr_len} to be correct.
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> +the driver SHOULD treat the \field{hdr_len} as the length of the transport header
> +inside the first descriptor.
> +
>   \begin{note}
>   This is due to various bugs in implementations.
>   \end{note}
> @@ -4483,6 +4493,98 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>   according to the native endian of the guest rather than
>   (necessarily when not using the legacy interface) little-endian.
>   
> +\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> +the device supports splitting the transport header and the payload.
> +The transport header and the payload will be separated into different
> +descriptors.
> +
> +\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> +
> +To configure the split transport header, the following layout structure and definitions
> +are used:
> +
> +\begin{lstlisting}
> +struct virtio_net_split_transport_header_config {
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> +    le64 type;
> +};
> +
> +#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> + #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> +\end{lstlisting}
> +
> +The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> +VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split transport header configuration.
> +
> +The driver can enable or disable split transport header for different protocols by
> +setting or clearing corresponding bits in \field{type}.
> +
> +\begin{itemize}
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> +\end{itemize}
> +
> +\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +A device MUST initialize \field{type} to 0, and MUST set it to 0
> +upon device reset.
> +
> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> +
> +A device MUST NOT split the transport header if it encounters any of the following cases:
> +\begin{itemize}
> +    \item the device does not recognize the protocol of the packet.
> +    \item the packet is an IP fragmentation.
> +    \item \field{type} does not include the protocol of the packet.
> +    \item the buffer consists of only one descriptor.
> +    \item the total size of the virtio net header and the transport header exceeds
> +        the size of the first descriptor.
> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
> +        payload exceeds the size of the descriptor chain starting from the 2nd
> +        descriptor.
> +\end{itemize}
> +
> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit
> +in \field{flags} MUST be set. The transport header MUST be on the first
> +descriptor, following the virtio net header. The payload MUST start from the
> +second descriptor. The device MUST set \field{hdr_len} of structure
> +virtio_net_hdr to the length of the transport header.
> +
> +If all of the following applies:
> +\begin{itemize}
> +    \item the header is split by the device.
> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
> +    \item the received packet is spread over multiple buffers.
> +\end {itemize}
> +then if the device uses the buffers after the 1st buffer, and the buffer
> +consists of multiple descriptors, the device MUST skip the first descriptor,
> +because the first descriptor is used to carry the transport header.
> +The used length still reports the number of bytes it has written to memory.
> +
> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, but the device does not split the
> +virtio net header and the transport header, and the buffer consists of at least two
> +descriptors, the device MUST start with the first descriptor to store the packet, and
> +MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
> +
> +\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> +MUST treat the contents of \field{hdr_len} as the length of the transport header
> +inside the first descriptor.
> +
> +If the split transport header is enabled, the buffers submitted to receiveq by the
> +driver MUST be composed of at least two descriptors.
> +When the buffer consists of two descriptors, the length of the first
> +descriptor MUST be greater than the one of the virtio net header.
>   
>   \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>   

Do you have any comments for this v7 version?

[-- Attachment #2: Type: text/html, Size: 11848 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] [PATCH v7] virtio_net: support split header
  2022-08-16  9:34 [virtio-dev] [PATCH v7] virtio_net: support split header Heng Qi
  2022-08-25 14:22 ` Cornelia Huck
  2022-08-30 11:26 ` Heng Qi
@ 2022-09-02  4:12 ` Heng Qi
  2022-09-08 21:18   ` Michael S. Tsirkin
  2022-09-02  6:21 ` [virtio-dev] " Jason Wang
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Heng Qi @ 2022-09-02  4:12 UTC (permalink / raw)
  To: virtio-dev; +Cc: Jason Wang, mst, Cornelia Huck, xuanzhuo, kangjie.xu

[-- Attachment #1: Type: text/plain, Size: 11769 bytes --]

在 2022/8/16 下午5:34, Heng Qi 写道:
> From: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>
> The purpose of this feature is to split the header and the payload of
> the packet.
>
> |                    receive buffer                                    |
> |                       0th descriptor             | 1th descriptor    |
> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>
> We can use a buffer plus a separate page when allocating the receive
> buffer. In this way, we can ensure that all payloads can be
> independently in a page, which is very beneficial for the zerocopy
> implemented by the upper layer.
>
> Signed-off-by: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
> Signed-off-by: Heng Qi<hengqi@linux.alibaba.com>
> Reviewed-by: Kangjie Xu<kangjie.xu@linux.alibaba.com>
> ---
> v7:
> 	1. Fix some presentation issues.
> 	2. Use "split transport header". @Jason Wang
> 	3. Clarify some paragraphs. @Cornelia Huck
> 	4. determine the device what to do if it does not perform header split on a packet.
>
> v6:
> 	1. Fix some syntax issues. @Cornelia Huck
> 	2. Clarify some paragraphs. @Cornelia Huck
> 	3. Determine the device what to do if it does not perform header split on a packet.
>
> v5:
> 	1. Determine when hdr_len is credible in the process of rx
> 	2. Clean up the use of buffers and descriptors
> 	3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>
> v4:
> 	1. fix typo @Cornelia Huck @Jason Wang
> 	2. do not split header for IP fragmentation packet. @Jason Wang
>
> v3:
> 	1. Fix some syntax issues
> 	2. Fix some terminology issues
> 	3. It is not unified with ip alignment, so ip alignment is not included
> 	4. Make it clear that the device must support four types, in the case of successful negotiation.
>
>   conformance.tex |   2 ++
>   content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 104 insertions(+)
>
> diff --git a/conformance.tex b/conformance.tex
> index 2b86fc6..4e2b82e 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>   \end{itemize}
>   
>   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>   \end{itemize}
>   
>   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> diff --git a/content.tex b/content.tex
> index e863709..5676da9 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>       channel.
>   
> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> +    the transport header and the payload.
> +
>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>   
>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>   \end{description}
>   
>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>   #define VIRTIO_NET_HDR_F_DATA_VALID    2
>   #define VIRTIO_NET_HDR_F_RSC_INFO      4
> +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
>           u8 flags;
>   #define VIRTIO_NET_HDR_GSO_NONE        0
>   #define VIRTIO_NET_HDR_GSO_TCPV4       1
> @@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>   been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>   transport header size.
>   The driver MUST NOT rely on \field{hdr_len} to be correct.
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> +the driver SHOULD treat the \field{hdr_len} as the length of the transport header
> +inside the first descriptor.
> +
>   \begin{note}
>   This is due to various bugs in implementations.
>   \end{note}
> @@ -4483,6 +4493,98 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>   according to the native endian of the guest rather than
>   (necessarily when not using the legacy interface) little-endian.
>   
> +\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> +the device supports splitting the transport header and the payload.
> +The transport header and the payload will be separated into different
> +descriptors.
> +
> +\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> +
> +To configure the split transport header, the following layout structure and definitions
> +are used:
> +
> +\begin{lstlisting}
> +struct virtio_net_split_transport_header_config {
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> +    le64 type;
> +};
> +
> +#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> + #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> +\end{lstlisting}
> +
> +The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> +VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split transport header configuration.
> +
> +The driver can enable or disable split transport header for different protocols by
> +setting or clearing corresponding bits in \field{type}.
> +
> +\begin{itemize}
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> +\end{itemize}
> +
> +\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +A device MUST initialize \field{type} to 0, and MUST set it to 0
> +upon device reset.
> +
> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> +
> +A device MUST NOT split the transport header if it encounters any of the following cases:
> +\begin{itemize}
> +    \item the device does not recognize the protocol of the packet.
> +    \item the packet is an IP fragmentation.
> +    \item \field{type} does not include the protocol of the packet.
> +    \item the buffer consists of only one descriptor.
> +    \item the total size of the virtio net header and the transport header exceeds
> +        the size of the first descriptor.
> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
> +        payload exceeds the size of the descriptor chain starting from the 2nd
> +        descriptor.
> +\end{itemize}
> +
> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit
> +in \field{flags} MUST be set. The transport header MUST be on the first
> +descriptor, following the virtio net header. The payload MUST start from the
> +second descriptor. The device MUST set \field{hdr_len} of structure
> +virtio_net_hdr to the length of the transport header.
> +
> +If all of the following applies:
> +\begin{itemize}
> +    \item the header is split by the device.
> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
> +    \item the received packet is spread over multiple buffers.
> +\end {itemize}
> +then if the device uses the buffers after the 1st buffer, and the buffer
> +consists of multiple descriptors, the device MUST skip the first descriptor,
> +because the first descriptor is used to carry the transport header.
> +The used length still reports the number of bytes it has written to memory.
> +
> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, but the device does not split the
> +virtio net header and the transport header, and the buffer consists of at least two
> +descriptors, the device MUST start with the first descriptor to store the packet, and
> +MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
> +
> +\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> +MUST treat the contents of \field{hdr_len} as the length of the transport header
> +inside the first descriptor.
> +
> +If the split transport header is enabled, the buffers submitted to receiveq by the
> +driver MUST be composed of at least two descriptors.
> +When the buffer consists of two descriptors, the length of the first
> +descriptor MUST be greater than the one of the virtio net header.
>   
>   \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>   

If the reviewers do not have more comments on "split header v7",
then I submit an issue in the virtio-spec repository.
Please vote on whether the new "split header" feature is added
to the virtio specification.


The link ishttps://github.com/oasis-tcs/virtio-spec/issues/144.


Thanks.

[-- Attachment #2: Type: text/html, Size: 12216 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-08-16  9:34 [virtio-dev] [PATCH v7] virtio_net: support split header Heng Qi
                   ` (2 preceding siblings ...)
  2022-09-02  4:12 ` Heng Qi
@ 2022-09-02  6:21 ` Jason Wang
  2022-09-02  6:41   ` Michael S. Tsirkin
                     ` (2 more replies)
  2022-09-02  6:48 ` Michael S. Tsirkin
  2022-09-07 11:16 ` [virtio-dev] " Heng Qi
  5 siblings, 3 replies; 32+ messages in thread
From: Jason Wang @ 2022-09-02  6:21 UTC (permalink / raw)
  To: Heng Qi; +Cc: Virtio-Dev, mst, Xuan Zhuo, Kangjie Xu

On Tue, Aug 16, 2022 at 5:35 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> The purpose of this feature is to split the header and the payload of
> the packet.
>
> |                    receive buffer                                    |
> |                       0th descriptor             | 1th descriptor    |
> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>
> We can use a buffer plus a separate page when allocating the receive
> buffer. In this way, we can ensure that all payloads can be
> independently in a page, which is very beneficial for the zerocopy
> implemented by the upper layer.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> ---
> v7:
>         1. Fix some presentation issues.
>         2. Use "split transport header". @Jason Wang
>         3. Clarify some paragraphs. @Cornelia Huck
>         4. determine the device what to do if it does not perform header split on a packet.
>
> v6:
>         1. Fix some syntax issues. @Cornelia Huck
>         2. Clarify some paragraphs. @Cornelia Huck
>         3. Determine the device what to do if it does not perform header split on a packet.
>
> v5:
>         1. Determine when hdr_len is credible in the process of rx
>         2. Clean up the use of buffers and descriptors
>         3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>
> v4:
>         1. fix typo @Cornelia Huck @Jason Wang
>         2. do not split header for IP fragmentation packet. @Jason Wang
>
> v3:
>         1. Fix some syntax issues
>         2. Fix some terminology issues
>         3. It is not unified with ip alignment, so ip alignment is not included
>         4. Make it clear that the device must support four types, in the case of successful negotiation.
>
>  conformance.tex |   2 ++
>  content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 104 insertions(+)
>
> diff --git a/conformance.tex b/conformance.tex
> index 2b86fc6..4e2b82e 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>  \end{itemize}
>
>  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>  \end{itemize}
>
>  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> diff --git a/content.tex b/content.tex
> index e863709..5676da9 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>      channel.
>
> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> +    the transport header and the payload.
> +
>  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>
>  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>  \end{description}
>
>  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>  #define VIRTIO_NET_HDR_F_DATA_VALID    2
>  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
>          u8 flags;
>  #define VIRTIO_NET_HDR_GSO_NONE        0
>  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> @@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>  been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>  transport header size.
>  The driver MUST NOT rely on \field{hdr_len} to be correct.
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> +the driver SHOULD treat the \field{hdr_len} as the length of the transport header
> +inside the first descriptor.
> +
>  \begin{note}
>  This is due to various bugs in implementations.
>  \end{note}
> @@ -4483,6 +4493,98 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>  according to the native endian of the guest rather than
>  (necessarily when not using the legacy interface) little-endian.
>
> +\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> +the device supports splitting the transport header and the payload.
> +The transport header and the payload will be separated into different
> +descriptors.
> +
> +\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> +
> +To configure the split transport header, the following layout structure and definitions
> +are used:
> +
> +\begin{lstlisting}
> +struct virtio_net_split_transport_header_config {
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> +    le64 type;
> +};
> +
> +#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> + #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> +\end{lstlisting}
> +
> +The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> +VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split transport header configuration.
> +
> +The driver can enable or disable split transport header for different protocols by
> +setting or clearing corresponding bits in \field{type}.
> +
> +\begin{itemize}
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> +\end{itemize}
> +
> +\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +A device MUST initialize \field{type} to 0, and MUST set it to 0
> +upon device reset.

Nit: type is a field of the command, not something belongs to the
device. (though an implementation should record the types internally).
maybe it's better to say

A device MUST disable header splitting upon reset and initialization?

> +
> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> +
> +A device MUST NOT split the transport header if it encounters any of the following cases:
> +\begin{itemize}
> +    \item the device does not recognize the protocol of the packet.
> +    \item the packet is an IP fragmentation.
> +    \item \field{type} does not include the protocol of the packet.

"The splitting of the specific transport protocol is not enabled via
VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET?"

> +    \item the buffer consists of only one descriptor.
> +    \item the total size of the virtio net header and the transport header exceeds
> +        the size of the first descriptor.
> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
> +        payload exceeds the size of the descriptor chain starting from the 2nd
> +        descriptor.
> +\end{itemize}
> +
> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit
> +in \field{flags} MUST be set. The transport header MUST be on the first
> +descriptor, following the virtio net header. The payload MUST start from the
> +second descriptor. The device MUST set \field{hdr_len} of structure
> +virtio_net_hdr to the length of the transport header.
> +
> +If all of the following applies:
> +\begin{itemize}
> +    \item the header is split by the device.
> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
> +    \item the received packet is spread over multiple buffers.
> +\end {itemize}
> +then if the device uses the buffers after the 1st buffer, and the buffer

"the following buffers"?

> +consists of multiple descriptors, the device MUST skip the first descriptor,

"skip the first descriptor of the following buffers"?

> +because the first descriptor is used to carry the transport header.
> +The used length still reports the number of bytes it has written to memory.
> +
> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, but the device does not split the
> +virtio net header and the transport header, and the buffer consists of at least two
> +descriptors, the device MUST start with the first descriptor to store the packet, and
> +MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.

Is this intended to describe what happens if the device doesn't split
the header? If yes, I wonder if it's better to just drop this.
(Duplicated with message framing part?)

> +
> +\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> +MUST treat the contents of \field{hdr_len} as the length of the transport header

I'd change "MUST" to "SHOULD" as the driver needs to validate the
hdr_len before trying to use it.

> +inside the first descriptor.
> +
> +If the split transport header is enabled, the buffers submitted to receiveq by the
> +driver MUST be composed of at least two descriptors.
> +When the buffer consists of two descriptors, the length of the first
> +descriptor MUST be greater than the one of the virtio net header.

Similarly, should we use "SHOULD" here. Or maybe we can see, if the
driver want the device to split the header, it MUST ....

Thanks

>
>  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>
> --
> 1.8.3.1
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-02  6:21 ` [virtio-dev] " Jason Wang
@ 2022-09-02  6:41   ` Michael S. Tsirkin
  2022-09-02  8:58     ` Heng Qi
  2022-09-02  7:36   ` Heng Qi
  2022-09-02  8:26   ` Heng Qi
  2 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2022-09-02  6:41 UTC (permalink / raw)
  To: Jason Wang; +Cc: Heng Qi, Virtio-Dev, Xuan Zhuo, Kangjie Xu

On Fri, Sep 02, 2022 at 02:21:04PM +0800, Jason Wang wrote:
> On Tue, Aug 16, 2022 at 5:35 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >
> > From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >
> > The purpose of this feature is to split the header and the payload of
> > the packet.
> >
> > |                    receive buffer                                    |
> > |                       0th descriptor             | 1th descriptor    |
> > | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
> >
> > We can use a buffer plus a separate page when allocating the receive
> > buffer. In this way, we can ensure that all payloads can be
> > independently in a page, which is very beneficial for the zerocopy
> > implemented by the upper layer.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> > ---
> > v7:
> >         1. Fix some presentation issues.
> >         2. Use "split transport header". @Jason Wang
> >         3. Clarify some paragraphs. @Cornelia Huck
> >         4. determine the device what to do if it does not perform header split on a packet.
> >
> > v6:
> >         1. Fix some syntax issues. @Cornelia Huck
> >         2. Clarify some paragraphs. @Cornelia Huck
> >         3. Determine the device what to do if it does not perform header split on a packet.
> >
> > v5:
> >         1. Determine when hdr_len is credible in the process of rx
> >         2. Clean up the use of buffers and descriptors
> >         3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
> >
> > v4:
> >         1. fix typo @Cornelia Huck @Jason Wang
> >         2. do not split header for IP fragmentation packet. @Jason Wang
> >
> > v3:
> >         1. Fix some syntax issues
> >         2. Fix some terminology issues
> >         3. It is not unified with ip alignment, so ip alignment is not included
> >         4. Make it clear that the device must support four types, in the case of successful negotiation.
> >
> >  conformance.tex |   2 ++
> >  content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 104 insertions(+)
> >
> > diff --git a/conformance.tex b/conformance.tex
> > index 2b86fc6..4e2b82e 100644
> > --- a/conformance.tex
> > +++ b/conformance.tex
> > @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> >  \end{itemize}
> >
> >  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> > @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> >  \end{itemize}
> >
> >  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> > diff --git a/content.tex b/content.tex
> > index e863709..5676da9 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> >      channel.
> >
> > +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> > +    the transport header and the payload.
> > +
> >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> >
> >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> >  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
> >  \end{description}
> >
> >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> >  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
> >  #define VIRTIO_NET_HDR_F_DATA_VALID    2
> >  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> > +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
> >          u8 flags;
> >  #define VIRTIO_NET_HDR_GSO_NONE        0
> >  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> > @@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> >  been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
> >  transport header size.
> >  The driver MUST NOT rely on \field{hdr_len} to be correct.
> > +
> > +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> > +the driver SHOULD treat the \field{hdr_len} as the length of the transport header
> > +inside the first descriptor.
> > +
> >  \begin{note}
> >  This is due to various bugs in implementations.
> >  \end{note}
> > @@ -4483,6 +4493,98 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >  according to the native endian of the guest rather than
> >  (necessarily when not using the legacy interface) little-endian.
> >
> > +\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > +
> > +If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> > +the device supports splitting the transport header and the payload.
> > +The transport header and the payload will be separated into different
> > +descriptors.
> > +
> > +\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> > +
> > +To configure the split transport header, the following layout structure and definitions
> > +are used:
> > +
> > +\begin{lstlisting}
> > +struct virtio_net_split_transport_header_config {
> > +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> > +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> > +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> > +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> > +    le64 type;
> > +};
> > +
> > +#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> > + #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> > +\end{lstlisting}
> > +
> > +The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> > +VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split transport header configuration.
> > +
> > +The driver can enable or disable split transport header for different protocols by
> > +setting or clearing corresponding bits in \field{type}.
> > +
> > +\begin{itemize}
> > +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> > +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> > +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> > +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> > +\end{itemize}
> > +
> > +\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > +
> > +A device MUST initialize \field{type} to 0, and MUST set it to 0
> > +upon device reset.
> 
> Nit: type is a field of the command, not something belongs to the
> device. (though an implementation should record the types internally).
> maybe it's better to say
> 
> A device MUST disable header splitting upon reset and initialization?

I don't know what that will mean, either.

just what are you trying to say here?

> > +
> > +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> > +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> > +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> > +
> > +A device MUST NOT split the transport header if it encounters any of the following cases:
> > +\begin{itemize}
> > +    \item the device does not recognize the protocol of the packet.
> > +    \item the packet is an IP fragmentation.
> > +    \item \field{type} does not include the protocol of the packet.
> 
> "The splitting of the specific transport protocol is not enabled via
> VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET?"
> 
> > +    \item the buffer consists of only one descriptor.
> > +    \item the total size of the virtio net header and the transport header exceeds
> > +        the size of the first descriptor.
> > +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
> > +        payload exceeds the size of the descriptor chain starting from the 2nd
> > +        descriptor.
> > +\end{itemize}
> > +
> > +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit
> > +in \field{flags} MUST be set. The transport header MUST be on the first
> > +descriptor, following the virtio net header. The payload MUST start from the
> > +second descriptor. The device MUST set \field{hdr_len} of structure
> > +virtio_net_hdr to the length of the transport header.
> > +
> > +If all of the following applies:
> > +\begin{itemize}
> > +    \item the header is split by the device.
> > +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
> > +    \item the received packet is spread over multiple buffers.
> > +\end {itemize}
> > +then if the device uses the buffers after the 1st buffer, and the buffer
> 
> "the following buffers"?
> 
> > +consists of multiple descriptors, the device MUST skip the first descriptor,
> 
> "skip the first descriptor of the following buffers"?
> 
> > +because the first descriptor is used to carry the transport header.
> > +The used length still reports the number of bytes it has written to memory.
> > +
> > +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, but the device does not split the
> > +virtio net header and the transport header, and the buffer consists of at least two
> > +descriptors, the device MUST start with the first descriptor to store the packet, and
> > +MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
> 
> Is this intended to describe what happens if the device doesn't split
> the header? If yes, I wonder if it's better to just drop this.
> (Duplicated with message framing part?)

Jason maybe you understand what is going on here?
The use of descriptors here in the device description just confuses
me to the point where I don't understand what is going on.
Devices should ever only deal with buffers and offsets
within buffers.



> 
> > +
> > +\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> > +
> > +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> > +MUST treat the contents of \field{hdr_len} as the length of the transport header
> 
> I'd change "MUST" to "SHOULD" as the driver needs to validate the
> hdr_len before trying to use it.
> 
> > +inside the first descriptor.
> > +
> > +If the split transport header is enabled, the buffers submitted to receiveq by the
> > +driver MUST be composed of at least two descriptors.
> > +When the buffer consists of two descriptors, the length of the first
> > +descriptor MUST be greater than the one of the virtio net header.
> 
> Similarly, should we use "SHOULD" here. Or maybe we can see, if the
> driver want the device to split the header, it MUST ....
> 
> Thanks
> 
> >
> >  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> >
> > --
> > 1.8.3.1
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-08-16  9:34 [virtio-dev] [PATCH v7] virtio_net: support split header Heng Qi
                   ` (3 preceding siblings ...)
  2022-09-02  6:21 ` [virtio-dev] " Jason Wang
@ 2022-09-02  6:48 ` Michael S. Tsirkin
  2022-09-07 11:16 ` [virtio-dev] " Heng Qi
  5 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2022-09-02  6:48 UTC (permalink / raw)
  To: Heng Qi; +Cc: virtio-dev, jasowang, xuanzhuo, kangjie.xu

On Tue, Aug 16, 2022 at 05:34:55PM +0800, Heng Qi wrote:
> From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> 
> The purpose of this feature is to split the header and the payload of
> the packet.
> 
> |                    receive buffer                                    |
> |                       0th descriptor             | 1th descriptor    |
> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
> 
> We can use a buffer plus a separate page when allocating the receive
> buffer. In this way, we can ensure that all payloads can be
> independently in a page, which is very beneficial for the zerocopy
> implemented by the upper layer.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>

OK so really VIRTIO_NET_F_MRG_RXBUF and !VIRTIO_NET_F_MRG_RXBUF
are completely different in their behaviour.

Which is really important for you? I would suggest we first
try to get that one case right, later the other one.





> ---
> v7:
> 	1. Fix some presentation issues.
> 	2. Use "split transport header". @Jason Wang
> 	3. Clarify some paragraphs. @Cornelia Huck
> 	4. determine the device what to do if it does not perform header split on a packet.
> 
> v6:
> 	1. Fix some syntax issues. @Cornelia Huck
> 	2. Clarify some paragraphs. @Cornelia Huck
> 	3. Determine the device what to do if it does not perform header split on a packet.
> 
> v5:
> 	1. Determine when hdr_len is credible in the process of rx
> 	2. Clean up the use of buffers and descriptors
> 	3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
> 
> v4:
> 	1. fix typo @Cornelia Huck @Jason Wang
> 	2. do not split header for IP fragmentation packet. @Jason Wang
> 
> v3:
> 	1. Fix some syntax issues
> 	2. Fix some terminology issues
> 	3. It is not unified with ip alignment, so ip alignment is not included
> 	4. Make it clear that the device must support four types, in the case of successful negotiation.
> 
>  conformance.tex |   2 ++
>  content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 104 insertions(+)
> 
> diff --git a/conformance.tex b/conformance.tex
> index 2b86fc6..4e2b82e 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>  \end{itemize}
>  
>  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>  \end{itemize}
>  
>  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> diff --git a/content.tex b/content.tex
> index e863709..5676da9 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>      channel.
>  
> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> +    the transport header and the payload.
> +
>  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>  
>  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>  \end{description}
>  
>  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>  #define VIRTIO_NET_HDR_F_DATA_VALID    2
>  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
>          u8 flags;
>  #define VIRTIO_NET_HDR_GSO_NONE        0
>  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> @@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>  been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>  transport header size.
>  The driver MUST NOT rely on \field{hdr_len} to be correct.
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> +the driver SHOULD treat the \field{hdr_len} as the length of the transport header
> +inside the first descriptor.
> +
>  \begin{note}
>  This is due to various bugs in implementations.
>  \end{note}
> @@ -4483,6 +4493,98 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>  according to the native endian of the guest rather than
>  (necessarily when not using the legacy interface) little-endian.
>  
> +\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> +the device supports splitting the transport header and the payload.
> +The transport header and the payload will be separated into different
> +descriptors.
> +
> +\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> +
> +To configure the split transport header, the following layout structure and definitions
> +are used:
> +
> +\begin{lstlisting}
> +struct virtio_net_split_transport_header_config {
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> +    le64 type;
> +};
> +
> +#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> + #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> +\end{lstlisting}
> +
> +The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> +VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split transport header configuration.
> +
> +The driver can enable or disable split transport header for different protocols by
> +setting or clearing corresponding bits in \field{type}.
> +
> +\begin{itemize}
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> +\end{itemize}
> +
> +\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +A device MUST initialize \field{type} to 0, and MUST set it to 0
> +upon device reset.
> +
> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> +
> +A device MUST NOT split the transport header if it encounters any of the following cases:
> +\begin{itemize}
> +    \item the device does not recognize the protocol of the packet.
> +    \item the packet is an IP fragmentation.
> +    \item \field{type} does not include the protocol of the packet.
> +    \item the buffer consists of only one descriptor.
> +    \item the total size of the virtio net header and the transport header exceeds
> +        the size of the first descriptor.
> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
> +        payload exceeds the size of the descriptor chain starting from the 2nd
> +        descriptor.
> +\end{itemize}
> +
> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit
> +in \field{flags} MUST be set. The transport header MUST be on the first
> +descriptor, following the virtio net header. The payload MUST start from the
> +second descriptor. The device MUST set \field{hdr_len} of structure
> +virtio_net_hdr to the length of the transport header.
> +
> +If all of the following applies:
> +\begin{itemize}
> +    \item the header is split by the device.
> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
> +    \item the received packet is spread over multiple buffers.
> +\end {itemize}
> +then if the device uses the buffers after the 1st buffer, and the buffer
> +consists of multiple descriptors, the device MUST skip the first descriptor,
> +because the first descriptor is used to carry the transport header.
> +The used length still reports the number of bytes it has written to memory.
> +
> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, but the device does not split the
> +virtio net header and the transport header, and the buffer consists of at least two
> +descriptors, the device MUST start with the first descriptor to store the packet, and
> +MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
> +
> +\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> +MUST treat the contents of \field{hdr_len} as the length of the transport header
> +inside the first descriptor.
> +
> +If the split transport header is enabled, the buffers submitted to receiveq by the
> +driver MUST be composed of at least two descriptors.
> +When the buffer consists of two descriptors, the length of the first
> +descriptor MUST be greater than the one of the virtio net header.
>  
>  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>  
> -- 
> 1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7] virtio_net: support split header
  2022-09-02  6:21 ` [virtio-dev] " Jason Wang
  2022-09-02  6:41   ` Michael S. Tsirkin
@ 2022-09-02  7:36   ` Heng Qi
  2022-09-04 20:27     ` Michael S. Tsirkin
  2022-09-02  8:26   ` Heng Qi
  2 siblings, 1 reply; 32+ messages in thread
From: Heng Qi @ 2022-09-02  7:36 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtio-dev, mst, Cornelia Huck, xuanzhuo, kangjie.xu

[-- Attachment #1: Type: text/plain, Size: 13558 bytes --]

在 2022/9/2 下午2:21, Jason Wang 写道:
> On Tue, Aug 16, 2022 at 5:35 PM Heng Qi<hengqi@linux.alibaba.com>  wrote:
>> From: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>>
>> The purpose of this feature is to split the header and the payload of
>> the packet.
>>
>> |                    receive buffer                                    |
>> |                       0th descriptor             | 1th descriptor    |
>> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>>
>> We can use a buffer plus a separate page when allocating the receive
>> buffer. In this way, we can ensure that all payloads can be
>> independently in a page, which is very beneficial for the zerocopy
>> implemented by the upper layer.
>>
>> Signed-off-by: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>> Signed-off-by: Heng Qi<hengqi@linux.alibaba.com>
>> Reviewed-by: Kangjie Xu<kangjie.xu@linux.alibaba.com>
>> ---
>> v7:
>>          1. Fix some presentation issues.
>>          2. Use "split transport header". @Jason Wang
>>          3. Clarify some paragraphs. @Cornelia Huck
>>          4. determine the device what to do if it does not perform header split on a packet.
>>
>> v6:
>>          1. Fix some syntax issues. @Cornelia Huck
>>          2. Clarify some paragraphs. @Cornelia Huck
>>          3. Determine the device what to do if it does not perform header split on a packet.
>>
>> v5:
>>          1. Determine when hdr_len is credible in the process of rx
>>          2. Clean up the use of buffers and descriptors
>>          3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>>
>> v4:
>>          1. fix typo @Cornelia Huck @Jason Wang
>>          2. do not split header for IP fragmentation packet. @Jason Wang
>>
>> v3:
>>          1. Fix some syntax issues
>>          2. Fix some terminology issues
>>          3. It is not unified with ip alignment, so ip alignment is not included
>>          4. Make it clear that the device must support four types, in the case of successful negotiation.
>>
>>   conformance.tex |   2 ++
>>   content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 104 insertions(+)
>>
>> diff --git a/conformance.tex b/conformance.tex
>> index 2b86fc6..4e2b82e 100644
>> --- a/conformance.tex
>> +++ b/conformance.tex
>> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>   \end{itemize}
>>
>>   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
>> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>   \end{itemize}
>>
>>   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
>> diff --git a/content.tex b/content.tex
>> index e863709..5676da9 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>       channel.
>>
>> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
>> +    the transport header and the payload.
>> +
>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>
>>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>>   \end{description}
>>
>>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
>> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>>   #define VIRTIO_NET_HDR_F_DATA_VALID    2
>>   #define VIRTIO_NET_HDR_F_RSC_INFO      4
>> +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
>>           u8 flags;
>>   #define VIRTIO_NET_HDR_GSO_NONE        0
>>   #define VIRTIO_NET_HDR_GSO_TCPV4       1
>> @@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>   been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>>   transport header size.
>>   The driver MUST NOT rely on \field{hdr_len} to be correct.
>> +
>> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
>> +the driver SHOULD treat the \field{hdr_len} as the length of the transport header
>> +inside the first descriptor.
>> +
>>   \begin{note}
>>   This is due to various bugs in implementations.
>>   \end{note}
>> @@ -4483,6 +4493,98 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>   according to the native endian of the guest rather than
>>   (necessarily when not using the legacy interface) little-endian.
>>
>> +\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>> +
>> +If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
>> +the device supports splitting the transport header and the payload.
>> +The transport header and the payload will be separated into different
>> +descriptors.
>> +
>> +\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
>> +
>> +To configure the split transport header, the following layout structure and definitions
>> +are used:
>> +
>> +\begin{lstlisting}
>> +struct virtio_net_split_transport_header_config {
>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
>> +    le64 type;
>> +};
>> +
>> +#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
>> + #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
>> +\end{lstlisting}
>> +
>> +The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
>> +VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split transport header configuration.
>> +
>> +The driver can enable or disable split transport header for different protocols by
>> +setting or clearing corresponding bits in \field{type}.
>> +
>> +\begin{itemize}
>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
>> +\end{itemize}
>> +
>> +\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>> +
>> +A device MUST initialize \field{type} to 0, and MUST set it to 0
>> +upon device reset.
> Nit: type is a field of the command, not something belongs to the
> device. (though an implementation should record the types internally).
> maybe it's better to say
>
> A device MUST disable header splitting upon reset and initialization?

The previous description is indeed a bit inappropriate.

I will fix it in the next version.

>
>> +
>> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
>> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
>> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
>> +
>> +A device MUST NOT split the transport header if it encounters any of the following cases:
>> +\begin{itemize}
>> +    \item the device does not recognize the protocol of the packet.
>> +    \item the packet is an IP fragmentation.
>> +    \item \field{type} does not include the protocol of the packet.
> "The splitting of the specific transport protocol is not enabled via
> VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET?"

Okay, I'll use this description in the next version.

>
>> +    \item the buffer consists of only one descriptor.
>> +    \item the total size of the virtio net header and the transport header exceeds
>> +        the size of the first descriptor.
>> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
>> +        payload exceeds the size of the descriptor chain starting from the 2nd
>> +        descriptor.
>> +\end{itemize}
>> +
>> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit
>> +in \field{flags} MUST be set. The transport header MUST be on the first
>> +descriptor, following the virtio net header. The payload MUST start from the
>> +second descriptor. The device MUST set \field{hdr_len} of structure
>> +virtio_net_hdr to the length of the transport header.
>> +
>> +If all of the following applies:
>> +\begin{itemize}
>> +    \item the header is split by the device.
>> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
>> +    \item the received packet is spread over multiple buffers.
>> +\end {itemize}
>> +then if the device uses the buffers after the 1st buffer, and the buffer
> "the following buffers"?


If all of the following cases apply, we describe how the device uses buffers
to store headers and payloads.

So the above description should be: "If all of the following cases apply:".

>
>> +consists of multiple descriptors, the device MUST skip the first descriptor,
> "skip the first descriptor of the following buffers"?

Yes. The device MUST skip the first descriptor of all following buffers
starting with the second buffer.

>
>> +because the first descriptor is used to carry the transport header.
>> +The used length still reports the number of bytes it has written to memory.
>> +
>> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, but the device does not split the
>> +virtio net header and the transport header, and the buffer consists of at least two
>> +descriptors, the device MUST start with the first descriptor to store the packet, and
>> +MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
> Is this intended to describe what happens if the device doesn't split
> the header? If yes, I wonder if it's better to just drop this.
> (Duplicated with message framing part?)

This describes what happens if the device does not split the header. I think it's better
to keep this to indicate what the device needs to do if VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER
is negotiated, but the header is not split.

>> +
>> +\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>> +
>> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
>> +MUST treat the contents of \field{hdr_len} as the length of the transport header
> I'd change "MUST" to "SHOULD" as the driver needs to validate the
> hdr_len before trying to use it.

Okay, I'll use "SHOULD" instead of "MUST" in the next version.

>
>> +inside the first descriptor.
>> +
>> +If the split transport header is enabled, the buffers submitted to receiveq by the
>> +driver MUST be composed of at least two descriptors.
>> +When the buffer consists of two descriptors, the length of the first
>> +descriptor MUST be greater than the one of the virtio net header.
> Similarly, should we use "SHOULD" here. Or maybe we can see, if the
> driver want the device to split the header, it MUST ....
>
> Thanks

We need to clarify that the purpose of header splitting is to make all payloads
can be independently in a page, which is beneficial for the zerocopy
implemented by the upper layer.

If the driver does not enforce that the buffers submitted to the receiveq MUST
be composed of at least two descriptors, then header splitting will become meaningless,
or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at this time.


Thanks.


[-- Attachment #2: Type: text/html, Size: 15896 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-02  6:21 ` [virtio-dev] " Jason Wang
  2022-09-02  6:41   ` Michael S. Tsirkin
  2022-09-02  7:36   ` Heng Qi
@ 2022-09-02  8:26   ` Heng Qi
  2022-09-06  5:53     ` Jason Wang
  2 siblings, 1 reply; 32+ messages in thread
From: Heng Qi @ 2022-09-02  8:26 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtio-dev, mst, xuanzhuo, kangjie.xu

[-- Attachment #1: Type: text/plain, Size: 12142 bytes --]


在 2022/9/2 下午2:21, Jason Wang 写道:
> On Tue, Aug 16, 2022 at 5:35 PM Heng Qi<hengqi@linux.alibaba.com>  wrote:
>> From: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>>
>> The purpose of this feature is to split the header and the payload of
>> the packet.
>>
>> |                    receive buffer                                    |
>> |                       0th descriptor             | 1th descriptor    |
>> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>>
>> We can use a buffer plus a separate page when allocating the receive
>> buffer. In this way, we can ensure that all payloads can be
>> independently in a page, which is very beneficial for the zerocopy
>> implemented by the upper layer.
>>
>> Signed-off-by: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>> Signed-off-by: Heng Qi<hengqi@linux.alibaba.com>
>> Reviewed-by: Kangjie Xu<kangjie.xu@linux.alibaba.com>
>> ---
>> v7:
>>          1. Fix some presentation issues.
>>          2. Use "split transport header". @Jason Wang
>>          3. Clarify some paragraphs. @Cornelia Huck
>>          4. determine the device what to do if it does not perform header split on a packet.
>>
>> v6:
>>          1. Fix some syntax issues. @Cornelia Huck
>>          2. Clarify some paragraphs. @Cornelia Huck
>>          3. Determine the device what to do if it does not perform header split on a packet.
>>
>> v5:
>>          1. Determine when hdr_len is credible in the process of rx
>>          2. Clean up the use of buffers and descriptors
>>          3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>>
>> v4:
>>          1. fix typo @Cornelia Huck @Jason Wang
>>          2. do not split header for IP fragmentation packet. @Jason Wang
>>
>> v3:
>>          1. Fix some syntax issues
>>          2. Fix some terminology issues
>>          3. It is not unified with ip alignment, so ip alignment is not included
>>          4. Make it clear that the device must support four types, in the case of successful negotiation.
>>
>>   conformance.tex |   2 ++
>>   content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 104 insertions(+)
>>
>> diff --git a/conformance.tex b/conformance.tex
>> index 2b86fc6..4e2b82e 100644
>> --- a/conformance.tex
>> +++ b/conformance.tex
>> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>   \end{itemize}
>>
>>   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
>> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>   \end{itemize}
>>
>>   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
>> diff --git a/content.tex b/content.tex
>> index e863709..5676da9 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>       channel.
>>
>> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
>> +    the transport header and the payload.
>> +
>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>
>>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>>   \end{description}
>>
>>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
>> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>>   #define VIRTIO_NET_HDR_F_DATA_VALID    2
>>   #define VIRTIO_NET_HDR_F_RSC_INFO      4
>> +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
>>           u8 flags;
>>   #define VIRTIO_NET_HDR_GSO_NONE        0
>>   #define VIRTIO_NET_HDR_GSO_TCPV4       1
>> @@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>   been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>>   transport header size.
>>   The driver MUST NOT rely on \field{hdr_len} to be correct.
>> +
>> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
>> +the driver SHOULD treat the \field{hdr_len} as the length of the transport header
>> +inside the first descriptor.
>> +
>>   \begin{note}
>>   This is due to various bugs in implementations.
>>   \end{note}
>> @@ -4483,6 +4493,98 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>   according to the native endian of the guest rather than
>>   (necessarily when not using the legacy interface) little-endian.
>>
>> +\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>> +
>> +If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
>> +the device supports splitting the transport header and the payload.
>> +The transport header and the payload will be separated into different
>> +descriptors.
>> +
>> +\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
>> +
>> +To configure the split transport header, the following layout structure and definitions
>> +are used:
>> +
>> +\begin{lstlisting}
>> +struct virtio_net_split_transport_header_config {
>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
>> +    le64 type;
>> +};
>> +
>> +#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
>> + #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
>> +\end{lstlisting}
>> +
>> +The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
>> +VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split transport header configuration.
>> +
>> +The driver can enable or disable split transport header for different protocols by
>> +setting or clearing corresponding bits in \field{type}.
>> +
>> +\begin{itemize}
>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
>> +\end{itemize}
>> +
>> +\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>> +
>> +A device MUST initialize \field{type} to 0, and MUST set it to 0
>> +upon device reset.
> Nit: type is a field of the command, not something belongs to the
> device. (though an implementation should record the types internally).
> maybe it's better to say
>
> A device MUST disable header splitting upon reset and initialization?
>
>> +
>> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
>> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
>> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
>> +
>> +A device MUST NOT split the transport header if it encounters any of the following cases:
>> +\begin{itemize}
>> +    \item the device does not recognize the protocol of the packet.
>> +    \item the packet is an IP fragmentation.
>> +    \item \field{type} does not include the protocol of the packet.
> "The splitting of the specific transport protocol is not enabled via
> VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET?"
>
>> +    \item the buffer consists of only one descriptor.
>> +    \item the total size of the virtio net header and the transport header exceeds
>> +        the size of the first descriptor.
>> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
>> +        payload exceeds the size of the descriptor chain starting from the 2nd
>> +        descriptor.
>> +\end{itemize}
>> +
>> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit
>> +in \field{flags} MUST be set. The transport header MUST be on the first
>> +descriptor, following the virtio net header. The payload MUST start from the
>> +second descriptor. The device MUST set \field{hdr_len} of structure
>> +virtio_net_hdr to the length of the transport header.
>> +
>> +If all of the following applies:
>> +\begin{itemize}
>> +    \item the header is split by the device.
>> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
>> +    \item the received packet is spread over multiple buffers.
>> +\end {itemize}
>> +then if the device uses the buffers after the 1st buffer, and the buffer
> "the following buffers"?
>
>> +consists of multiple descriptors, the device MUST skip the first descriptor,
> "skip the first descriptor of the following buffers"?
>
>> +because the first descriptor is used to carry the transport header.
>> +The used length still reports the number of bytes it has written to memory.
>> +
>> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, but the device does not split the
>> +virtio net header and the transport header, and the buffer consists of at least two
>> +descriptors, the device MUST start with the first descriptor to store the packet, and
>> +MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
> Is this intended to describe what happens if the device doesn't split
> the header? If yes, I wonder if it's better to just drop this.
> (Duplicated with message framing part?)
>
Imagine a case, when the driver tells the device to split the header
only for the TCP protocol through the VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER
command, then all the UDP protocol packets will not be split headers,
and these UDP packets cannot be dropped.

When receiving one of these UDP packets, the device MUST start with
the first descriptor to store the packet, and MUST NOT set
the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.

Thanks.


[-- Attachment #2: Type: text/html, Size: 13138 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-02  6:41   ` Michael S. Tsirkin
@ 2022-09-02  8:58     ` Heng Qi
  2022-09-04 20:31       ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: Heng Qi @ 2022-09-02  8:58 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang, xuanzhuo, kangjie.xu

[-- Attachment #1: Type: text/plain, Size: 14367 bytes --]


在 2022/9/2 下午2:41, Michael S. Tsirkin 写道:
> On Fri, Sep 02, 2022 at 02:21:04PM +0800, Jason Wang wrote:
>> On Tue, Aug 16, 2022 at 5:35 PM Heng Qi<hengqi@linux.alibaba.com>  wrote:
>>> From: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>>>
>>> The purpose of this feature is to split the header and the payload of
>>> the packet.
>>>
>>> |                    receive buffer                                    |
>>> |                       0th descriptor             | 1th descriptor    |
>>> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>>>
>>> We can use a buffer plus a separate page when allocating the receive
>>> buffer. In this way, we can ensure that all payloads can be
>>> independently in a page, which is very beneficial for the zerocopy
>>> implemented by the upper layer.
>>>
>>> Signed-off-by: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>>> Signed-off-by: Heng Qi<hengqi@linux.alibaba.com>
>>> Reviewed-by: Kangjie Xu<kangjie.xu@linux.alibaba.com>
>>> ---
>>> v7:
>>>          1. Fix some presentation issues.
>>>          2. Use "split transport header". @Jason Wang
>>>          3. Clarify some paragraphs. @Cornelia Huck
>>>          4. determine the device what to do if it does not perform header split on a packet.
>>>
>>> v6:
>>>          1. Fix some syntax issues. @Cornelia Huck
>>>          2. Clarify some paragraphs. @Cornelia Huck
>>>          3. Determine the device what to do if it does not perform header split on a packet.
>>>
>>> v5:
>>>          1. Determine when hdr_len is credible in the process of rx
>>>          2. Clean up the use of buffers and descriptors
>>>          3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>>>
>>> v4:
>>>          1. fix typo @Cornelia Huck @Jason Wang
>>>          2. do not split header for IP fragmentation packet. @Jason Wang
>>>
>>> v3:
>>>          1. Fix some syntax issues
>>>          2. Fix some terminology issues
>>>          3. It is not unified with ip alignment, so ip alignment is not included
>>>          4. Make it clear that the device must support four types, in the case of successful negotiation.
>>>
>>>   conformance.tex |   2 ++
>>>   content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 104 insertions(+)
>>>
>>> diff --git a/conformance.tex b/conformance.tex
>>> index 2b86fc6..4e2b82e 100644
>>> --- a/conformance.tex
>>> +++ b/conformance.tex
>>> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>>   \end{itemize}
>>>
>>>   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
>>> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>>   \end{itemize}
>>>
>>>   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
>>> diff --git a/content.tex b/content.tex
>>> index e863709..5676da9 100644
>>> --- a/content.tex
>>> +++ b/content.tex
>>> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>>       channel.
>>>
>>> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
>>> +    the transport header and the payload.
>>> +
>>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>>
>>>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>>> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>>> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>>>   \end{description}
>>>
>>>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
>>> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>>>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>>>   #define VIRTIO_NET_HDR_F_DATA_VALID    2
>>>   #define VIRTIO_NET_HDR_F_RSC_INFO      4
>>> +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
>>>           u8 flags;
>>>   #define VIRTIO_NET_HDR_GSO_NONE        0
>>>   #define VIRTIO_NET_HDR_GSO_TCPV4       1
>>> @@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>   been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>>>   transport header size.
>>>   The driver MUST NOT rely on \field{hdr_len} to be correct.
>>> +
>>> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
>>> +the driver SHOULD treat the \field{hdr_len} as the length of the transport header
>>> +inside the first descriptor.
>>> +
>>>   \begin{note}
>>>   This is due to various bugs in implementations.
>>>   \end{note}
>>> @@ -4483,6 +4493,98 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>   according to the native endian of the guest rather than
>>>   (necessarily when not using the legacy interface) little-endian.
>>>
>>> +\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>> +
>>> +If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
>>> +the device supports splitting the transport header and the payload.
>>> +The transport header and the payload will be separated into different
>>> +descriptors.
>>> +
>>> +\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
>>> +
>>> +To configure the split transport header, the following layout structure and definitions
>>> +are used:
>>> +
>>> +\begin{lstlisting}
>>> +struct virtio_net_split_transport_header_config {
>>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
>>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
>>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
>>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
>>> +    le64 type;
>>> +};
>>> +
>>> +#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
>>> + #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
>>> +\end{lstlisting}
>>> +
>>> +The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
>>> +VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split transport header configuration.
>>> +
>>> +The driver can enable or disable split transport header for different protocols by
>>> +setting or clearing corresponding bits in \field{type}.
>>> +
>>> +\begin{itemize}
>>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
>>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
>>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
>>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
>>> +\end{itemize}
>>> +
>>> +\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>> +
>>> +A device MUST initialize \field{type} to 0, and MUST set it to 0
>>> +upon device reset.
>> Nit: type is a field of the command, not something belongs to the
>> device. (though an implementation should record the types internally).
>> maybe it's better to say
>>
>> A device MUST disable header splitting upon reset and initialization?
> I don't know what that will mean, either.
>
> just what are you trying to say here?

This states that if the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated
and the driver does not use the VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER command to
tell the device which protocol types to split, the device must disable header
splitting upon reset and initialization.

I will take the Jason' suggestion in the next version.

>
>>> +
>>> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
>>> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
>>> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
>>> +
>>> +A device MUST NOT split the transport header if it encounters any of the following cases:
>>> +\begin{itemize}
>>> +    \item the device does not recognize the protocol of the packet.
>>> +    \item the packet is an IP fragmentation.
>>> +    \item \field{type} does not include the protocol of the packet.
>> "The splitting of the specific transport protocol is not enabled via
>> VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET?"
>>
>>> +    \item the buffer consists of only one descriptor.
>>> +    \item the total size of the virtio net header and the transport header exceeds
>>> +        the size of the first descriptor.
>>> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
>>> +        payload exceeds the size of the descriptor chain starting from the 2nd
>>> +        descriptor.
>>> +\end{itemize}
>>> +
>>> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit
>>> +in \field{flags} MUST be set. The transport header MUST be on the first
>>> +descriptor, following the virtio net header. The payload MUST start from the
>>> +second descriptor. The device MUST set \field{hdr_len} of structure
>>> +virtio_net_hdr to the length of the transport header.
>>> +
>>> +If all of the following applies:
>>> +\begin{itemize}
>>> +    \item the header is split by the device.
>>> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
>>> +    \item the received packet is spread over multiple buffers.
>>> +\end {itemize}
>>> +then if the device uses the buffers after the 1st buffer, and the buffer
>> "the following buffers"?
>>
>>> +consists of multiple descriptors, the device MUST skip the first descriptor,
>> "skip the first descriptor of the following buffers"?
>>
>>> +because the first descriptor is used to carry the transport header.
>>> +The used length still reports the number of bytes it has written to memory.
>>> +
>>> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, but the device does not split the
>>> +virtio net header and the transport header, and the buffer consists of at least two
>>> +descriptors, the device MUST start with the first descriptor to store the packet, and
>>> +MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
>> Is this intended to describe what happens if the device doesn't split
>> the header? If yes, I wonder if it's better to just drop this.
>> (Duplicated with message framing part?)
> Jason maybe you understand what is going on here?
> The use of descriptors here in the device description just confuses
> me to the point where I don't understand what is going on.
> Devices should ever only deal with buffers and offsets
> within buffers.
>
>
When VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated,
the driver requires that the buffers submitted to receiveq
MUST be composed of at least two descriptors,
which means that each buffer the device gets is a descriptor chain,
even if the device does not split the header for some packets.

To store packet in the descriptor chain without header splitting
by the device, the device MUST start with the first descriptor of
the descriptor chain to store the packet, and MUST NOT set the
VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.

Thanks.

>>> +
>>> +\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>> +
>>> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
>>> +MUST treat the contents of \field{hdr_len} as the length of the transport header
>> I'd change "MUST" to "SHOULD" as the driver needs to validate the
>> hdr_len before trying to use it.
>>
>>> +inside the first descriptor.
>>> +
>>> +If the split transport header is enabled, the buffers submitted to receiveq by the
>>> +driver MUST be composed of at least two descriptors.
>>> +When the buffer consists of two descriptors, the length of the first
>>> +descriptor MUST be greater than the one of the virtio net header.
>> Similarly, should we use "SHOULD" here. Or maybe we can see, if the
>> driver want the device to split the header, it MUST ....
>>
>> Thanks
>>
>>>   \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>>
>>> --
>>> 1.8.3.1
>>>

[-- Attachment #2: Type: text/html, Size: 16227 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7] virtio_net: support split header
  2022-09-02  7:36   ` Heng Qi
@ 2022-09-04 20:27     ` Michael S. Tsirkin
  2022-09-06  5:56       ` Jason Wang
                         ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2022-09-04 20:27 UTC (permalink / raw)
  To: Heng Qi; +Cc: Jason Wang, virtio-dev, Cornelia Huck, xuanzhuo, kangjie.xu

On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:
> We need to clarify that the purpose of header splitting is to make all payloads
> can be independently in a page, which is beneficial for the zerocopy
> implemented by the upper layer.

absolutely, pls add motivation.

> If the driver does not enforce that the buffers submitted to the receiveq MUST
> be composed of at least two descriptors, then header splitting will become meaningless,
> or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at this time.
> 
> 
> Thanks.
> 
> 


This seems very narrow and unecessarily wasteful of descriptors.
What is wrong in this:

<header>...<padding>... <beginning of page><data>

seems to achieve the goal of data in a separate page without
using extra descriptors.

thus my proposal to replace the requirement of a separate
descriptor with an offset of data from beginning of
buffer that driver sets.


-- 
MST


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-02  8:58     ` Heng Qi
@ 2022-09-04 20:31       ` Michael S. Tsirkin
  2022-09-05  7:52         ` Xuan Zhuo
  0 siblings, 1 reply; 32+ messages in thread
From: Michael S. Tsirkin @ 2022-09-04 20:31 UTC (permalink / raw)
  To: Heng Qi; +Cc: virtio-dev, Jason Wang, xuanzhuo, kangjie.xu

On Fri, Sep 02, 2022 at 04:58:16PM +0800, Heng Qi wrote:
> When VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated,
> the driver requires that the buffers submitted to receiveq
> MUST be composed of at least two descriptors,
> which means that each buffer the device gets is a descriptor chain,
> even if the device does not split the header for some packets.
> 
> To store packet in the descriptor chain without header splitting
> by the device, the device MUST start with the first descriptor of
> the descriptor chain to store the packet, and MUST NOT set the
> VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
> 
> Thanks.

Descriptor chains will hurt performance badly.
How about simply making this feature depend on mergeable buffers?
Then we have a separate buffer for the header and
this works cleanly.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-04 20:31       ` Michael S. Tsirkin
@ 2022-09-05  7:52         ` Xuan Zhuo
  2022-09-05  8:37           ` Heng Qi
  0 siblings, 1 reply; 32+ messages in thread
From: Xuan Zhuo @ 2022-09-05  7:52 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang, kangjie.xu, Heng Qi

On Sun, 4 Sep 2022 16:31:59 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Sep 02, 2022 at 04:58:16PM +0800, Heng Qi wrote:
> > When VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated,
> > the driver requires that the buffers submitted to receiveq
> > MUST be composed of at least two descriptors,
> > which means that each buffer the device gets is a descriptor chain,
> > even if the device does not split the header for some packets.
> >
> > To store packet in the descriptor chain without header splitting
> > by the device, the device MUST start with the first descriptor of
> > the descriptor chain to store the packet, and MUST NOT set the
> > VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
> >
> > Thanks.
>
> Descriptor chains will hurt performance badly.

I understand the reasons for the performance impact here are:
1. Two buffers are used
2. One buffer occupies two descs

This is the same as my understanding in the case of mergeable. We also need to
pack the packets into two buffers, and a packet will eventually occupy two
descs.


> How about simply making this feature depend on mergeable buffers?
> Then we have a separate buffer for the header and
> this works cleanly.


Under mergeable, each buffer is independent, and the split header requires two
unequal descs.

If we implement it based on mergeable, then consider the scenario of tcp
zerocopy, when we fill receive vq, each buffer is an separate page, and if we use an
separate buffer to save the header, then this is a waste, we may
have to copy at the driver layer.

@Qi Do you think there will be other problems with this approach?

Thanks.



>
> --
> MST
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-05  7:52         ` Xuan Zhuo
@ 2022-09-05  8:37           ` Heng Qi
  2022-09-05  9:43             ` Xuan Zhuo
  0 siblings, 1 reply; 32+ messages in thread
From: Heng Qi @ 2022-09-05  8:37 UTC (permalink / raw)
  To: Xuan Zhuo, Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang, kangjie.xu

[-- Attachment #1: Type: text/plain, Size: 2547 bytes --]


在 2022/9/5 下午3:52, Xuan Zhuo 写道:
> On Sun, 4 Sep 2022 16:31:59 -0400, "Michael S. Tsirkin"<mst@redhat.com>  wrote:
>> On Fri, Sep 02, 2022 at 04:58:16PM +0800, Heng Qi wrote:
>>> When VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated,
>>> the driver requires that the buffers submitted to receiveq
>>> MUST be composed of at least two descriptors,
>>> which means that each buffer the device gets is a descriptor chain,
>>> even if the device does not split the header for some packets.
>>>
>>> To store packet in the descriptor chain without header splitting
>>> by the device, the device MUST start with the first descriptor of
>>> the descriptor chain to store the packet, and MUST NOT set the
>>> VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
>>>
>>> Thanks.
>> Descriptor chains will hurt performance badly.
> I understand the reasons for the performance impact here are:
> 1. Two buffers are used
> 2. One buffer occupies two descs
>
> This is the same as my understanding in the case of mergeable. We also need to
> pack the packets into two buffers, and a packet will eventually occupy two
> descs.
>
>
>> How about simply making this feature depend on mergeable buffers?
>> Then we have a separate buffer for the header and
>> this works cleanly.
>
> Under mergeable, each buffer is independent, and the split header requires two
> unequal descs.
>
> If we implement it based on mergeable, then consider the scenario of tcp
> zerocopy, when we fill receive vq, each buffer is an separate page, and if we use an
> separate buffer to save the header, then this is a waste, we may
> have to copy at the driver layer.
>
> @Qi Do you think there will be other problems with this approach?
>
> Thanks.
>
When we think about specs, we shouldn't be too distracted by the implementation.

But when we did think about this, suppose the driver fills by page based on
mergeable mode. in order to use the xdp program, the driver usually takes
the beginning of a single page as the headroom, and fills the rest of the page
into the virtqueue. Therefore, the empty buffer obtained by the
device is always smaller than a page when we implement split header
based on this mode, that is, the data load finally obtained by the driver
is offset from the beginning of the page. This does not enjoy the benefits of zero copy.

At the same time, since the header is always only more than 100 bytes,
the page occupied by the header is a waste of the buffer.

Thanks.







[-- Attachment #2: Type: text/html, Size: 3396 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-05  8:37           ` Heng Qi
@ 2022-09-05  9:43             ` Xuan Zhuo
  2022-09-06  5:47               ` Jason Wang
  2022-09-08 21:18               ` Michael S. Tsirkin
  0 siblings, 2 replies; 32+ messages in thread
From: Xuan Zhuo @ 2022-09-05  9:43 UTC (permalink / raw)
  To: Heng Qi; +Cc: virtio-dev, Jason Wang, kangjie.xu, Michael S. Tsirkin

On Mon, 5 Sep 2022 16:37:57 +0800, Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> 在 2022/9/5 下午3:52, Xuan Zhuo 写道:
> > On Sun, 4 Sep 2022 16:31:59 -0400, "Michael S. Tsirkin"<mst@redhat.com>  wrote:
> >> On Fri, Sep 02, 2022 at 04:58:16PM +0800, Heng Qi wrote:
> >>> When VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated,
> >>> the driver requires that the buffers submitted to receiveq
> >>> MUST be composed of at least two descriptors,
> >>> which means that each buffer the device gets is a descriptor chain,
> >>> even if the device does not split the header for some packets.
> >>>
> >>> To store packet in the descriptor chain without header splitting
> >>> by the device, the device MUST start with the first descriptor of
> >>> the descriptor chain to store the packet, and MUST NOT set the
> >>> VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
> >>>
> >>> Thanks.
> >> Descriptor chains will hurt performance badly.
> > I understand the reasons for the performance impact here are:
> > 1. Two buffers are used
> > 2. One buffer occupies two descs
> >
> > This is the same as my understanding in the case of mergeable. We also need to
> > pack the packets into two buffers, and a packet will eventually occupy two
> > descs.
> >
> >
> >> How about simply making this feature depend on mergeable buffers?
> >> Then we have a separate buffer for the header and
> >> this works cleanly.
> >
> > Under mergeable, each buffer is independent, and the split header requires two
> > unequal descs.
> >
> > If we implement it based on mergeable, then consider the scenario of tcp
> > zerocopy, when we fill receive vq, each buffer is an separate page, and if we use an
> > separate buffer to save the header, then this is a waste, we may
> > have to copy at the driver layer.
> >
> > @Qi Do you think there will be other problems with this approach?
> >
> > Thanks.
> >
> When we think about specs, we shouldn't be too distracted by the implementation.
>
> But when we did think about this, suppose the driver fills by page based on
> mergeable mode. in order to use the xdp program, the driver usually takes
> the beginning of a single page as the headroom, and fills the rest of the page
> into the virtqueue. Therefore, the empty buffer obtained by the
> device is always smaller than a page when we implement split header
> based on this mode, that is, the data load finally obtained by the driver
> is offset from the beginning of the page. This does not enjoy the benefits of zero copy.
>
> At the same time, since the header is always only more than 100 bytes,
> the page occupied by the header is a waste of the buffer.


Yeah that reminds me that merge doesn't feel like it handles this very well.

The essence is that the two buffers used by the split header are different.

Desc Chain is used to bind a small buffer desc and a page desc. I didn't think
of a better way to deal with this problem.

Thanks.


>
> Thanks.
>
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-05  9:43             ` Xuan Zhuo
@ 2022-09-06  5:47               ` Jason Wang
  2022-09-08 21:18               ` Michael S. Tsirkin
  1 sibling, 0 replies; 32+ messages in thread
From: Jason Wang @ 2022-09-06  5:47 UTC (permalink / raw)
  To: virtio-dev


在 2022/9/5 17:43, Xuan Zhuo 写道:
> On Mon, 5 Sep 2022 16:37:57 +0800, Heng Qi <hengqi@linux.alibaba.com> wrote:
>> 在 2022/9/5 下午3:52, Xuan Zhuo 写道:
>>> On Sun, 4 Sep 2022 16:31:59 -0400, "Michael S. Tsirkin"<mst@redhat.com>  wrote:
>>>> On Fri, Sep 02, 2022 at 04:58:16PM +0800, Heng Qi wrote:
>>>>> When VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated,
>>>>> the driver requires that the buffers submitted to receiveq
>>>>> MUST be composed of at least two descriptors,
>>>>> which means that each buffer the device gets is a descriptor chain,
>>>>> even if the device does not split the header for some packets.
>>>>>
>>>>> To store packet in the descriptor chain without header splitting
>>>>> by the device, the device MUST start with the first descriptor of
>>>>> the descriptor chain to store the packet, and MUST NOT set the
>>>>> VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
>>>>>
>>>>> Thanks.
>>>> Descriptor chains will hurt performance badly.
>>> I understand the reasons for the performance impact here are:
>>> 1. Two buffers are used
>>> 2. One buffer occupies two descs
>>>
>>> This is the same as my understanding in the case of mergeable. We also need to
>>> pack the packets into two buffers, and a packet will eventually occupy two
>>> descs.
>>>
>>>
>>>> How about simply making this feature depend on mergeable buffers?
>>>> Then we have a separate buffer for the header and
>>>> this works cleanly.
>>> Under mergeable, each buffer is independent, and the split header requires two
>>> unequal descs.
>>>
>>> If we implement it based on mergeable, then consider the scenario of tcp
>>> zerocopy, when we fill receive vq, each buffer is an separate page, and if we use an
>>> separate buffer to save the header, then this is a waste, we may
>>> have to copy at the driver layer.
>>>
>>> @Qi Do you think there will be other problems with this approach?
>>>
>>> Thanks.
>>>
>> When we think about specs, we shouldn't be too distracted by the implementation.
>>
>> But when we did think about this, suppose the driver fills by page based on
>> mergeable mode. in order to use the xdp program, the driver usually takes
>> the beginning of a single page as the headroom, and fills the rest of the page
>> into the virtqueue. Therefore, the empty buffer obtained by the
>> device is always smaller than a page when we implement split header
>> based on this mode, that is, the data load finally obtained by the driver
>> is offset from the beginning of the page. This does not enjoy the benefits of zero copy.


I don't understand this issue fully, but if it's a driver behavior, we 
can change.


>>
>> At the same time, since the header is always only more than 100 bytes,
>> the page occupied by the header is a waste of the buffer.
>
> Yeah that reminds me that merge doesn't feel like it handles this very well.
>
> The essence is that the two buffers used by the split header are different.
>
> Desc Chain is used to bind a small buffer desc and a page desc. I didn't think
> of a better way to deal with this problem.


I remember that XDP supports multi-buffer now, can it be used here?

Thanks


>
> Thanks.
>
>
>> Thanks.
>>
>>
>>
>>
>>
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-02  8:26   ` Heng Qi
@ 2022-09-06  5:53     ` Jason Wang
  0 siblings, 0 replies; 32+ messages in thread
From: Jason Wang @ 2022-09-06  5:53 UTC (permalink / raw)
  To: Heng Qi; +Cc: virtio-dev, mst, xuanzhuo, kangjie.xu


在 2022/9/2 16:26, Heng Qi 写道:
>
>
> 在 2022/9/2 下午2:21, Jason Wang 写道:
>> On Tue, Aug 16, 2022 at 5:35 PM Heng Qi<hengqi@linux.alibaba.com>  wrote:
>>> From: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>>>
>>> The purpose of this feature is to split the header and the payload of
>>> the packet.
>>>
>>> |                    receive buffer                                    |
>>> |                       0th descriptor             | 1th descriptor    |
>>> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>>>
>>> We can use a buffer plus a separate page when allocating the receive
>>> buffer. In this way, we can ensure that all payloads can be
>>> independently in a page, which is very beneficial for the zerocopy
>>> implemented by the upper layer.
>>>
>>> Signed-off-by: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>>> Signed-off-by: Heng Qi<hengqi@linux.alibaba.com>
>>> Reviewed-by: Kangjie Xu<kangjie.xu@linux.alibaba.com>
>>> ---
>>> v7:
>>>          1. Fix some presentation issues.
>>>          2. Use "split transport header". @Jason Wang
>>>          3. Clarify some paragraphs. @Cornelia Huck
>>>          4. determine the device what to do if it does not perform header split on a packet.
>>>
>>> v6:
>>>          1. Fix some syntax issues. @Cornelia Huck
>>>          2. Clarify some paragraphs. @Cornelia Huck
>>>          3. Determine the device what to do if it does not perform header split on a packet.
>>>
>>> v5:
>>>          1. Determine when hdr_len is credible in the process of rx
>>>          2. Clean up the use of buffers and descriptors
>>>          3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>>>
>>> v4:
>>>          1. fix typo @Cornelia Huck @Jason Wang
>>>          2. do not split header for IP fragmentation packet. @Jason Wang
>>>
>>> v3:
>>>          1. Fix some syntax issues
>>>          2. Fix some terminology issues
>>>          3. It is not unified with ip alignment, so ip alignment is not included
>>>          4. Make it clear that the device must support four types, in the case of successful negotiation.
>>>
>>>   conformance.tex |   2 ++
>>>   content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 104 insertions(+)
>>>
>>> diff --git a/conformance.tex b/conformance.tex
>>> index 2b86fc6..4e2b82e 100644
>>> --- a/conformance.tex
>>> +++ b/conformance.tex
>>> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>>   \end{itemize}
>>>
>>>   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
>>> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>>   \end{itemize}
>>>
>>>   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
>>> diff --git a/content.tex b/content.tex
>>> index e863709..5676da9 100644
>>> --- a/content.tex
>>> +++ b/content.tex
>>> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>>       channel.
>>>
>>> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
>>> +    the transport header and the payload.
>>> +
>>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>>
>>>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>>> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>>> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>>>   \end{description}
>>>
>>>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
>>> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>>>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>>>   #define VIRTIO_NET_HDR_F_DATA_VALID    2
>>>   #define VIRTIO_NET_HDR_F_RSC_INFO      4
>>> +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
>>>           u8 flags;
>>>   #define VIRTIO_NET_HDR_GSO_NONE        0
>>>   #define VIRTIO_NET_HDR_GSO_TCPV4       1
>>> @@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>   been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>>>   transport header size.
>>>   The driver MUST NOT rely on \field{hdr_len} to be correct.
>>> +
>>> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
>>> +the driver SHOULD treat the \field{hdr_len} as the length of the transport header
>>> +inside the first descriptor.
>>> +
>>>   \begin{note}
>>>   This is due to various bugs in implementations.
>>>   \end{note}
>>> @@ -4483,6 +4493,98 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>   according to the native endian of the guest rather than
>>>   (necessarily when not using the legacy interface) little-endian.
>>>
>>> +\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>> +
>>> +If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
>>> +the device supports splitting the transport header and the payload.
>>> +The transport header and the payload will be separated into different
>>> +descriptors.
>>> +
>>> +\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
>>> +
>>> +To configure the split transport header, the following layout structure and definitions
>>> +are used:
>>> +
>>> +\begin{lstlisting}
>>> +struct virtio_net_split_transport_header_config {
>>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
>>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
>>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
>>> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
>>> +    le64 type;
>>> +};
>>> +
>>> +#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
>>> + #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
>>> +\end{lstlisting}
>>> +
>>> +The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
>>> +VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split transport header configuration.
>>> +
>>> +The driver can enable or disable split transport header for different protocols by
>>> +setting or clearing corresponding bits in \field{type}.
>>> +
>>> +\begin{itemize}
>>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
>>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
>>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
>>> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
>>> +\end{itemize}
>>> +
>>> +\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>>> +
>>> +A device MUST initialize \field{type} to 0, and MUST set it to 0
>>> +upon device reset.
>> Nit: type is a field of the command, not something belongs to the
>> device. (though an implementation should record the types internally).
>> maybe it's better to say
>>
>> A device MUST disable header splitting upon reset and initialization?
>>
>>> +
>>> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
>>> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
>>> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
>>> +
>>> +A device MUST NOT split the transport header if it encounters any of the following cases:
>>> +\begin{itemize}
>>> +    \item the device does not recognize the protocol of the packet.
>>> +    \item the packet is an IP fragmentation.
>>> +    \item \field{type} does not include the protocol of the packet.
>> "The splitting of the specific transport protocol is not enabled via
>> VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET?"
>>
>>> +    \item the buffer consists of only one descriptor.
>>> +    \item the total size of the virtio net header and the transport header exceeds
>>> +        the size of the first descriptor.
>>> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
>>> +        payload exceeds the size of the descriptor chain starting from the 2nd
>>> +        descriptor.
>>> +\end{itemize}
>>> +
>>> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit
>>> +in \field{flags} MUST be set. The transport header MUST be on the first
>>> +descriptor, following the virtio net header. The payload MUST start from the
>>> +second descriptor. The device MUST set \field{hdr_len} of structure
>>> +virtio_net_hdr to the length of the transport header.
>>> +
>>> +If all of the following applies:
>>> +\begin{itemize}
>>> +    \item the header is split by the device.
>>> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
>>> +    \item the received packet is spread over multiple buffers.
>>> +\end {itemize}
>>> +then if the device uses the buffers after the 1st buffer, and the buffer
>> "the following buffers"?
>>
>>> +consists of multiple descriptors, the device MUST skip the first descriptor,
>> "skip the first descriptor of the following buffers"?
>>
>>> +because the first descriptor is used to carry the transport header.
>>> +The used length still reports the number of bytes it has written to memory.
>>> +
>>> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, but the device does not split the
>>> +virtio net header and the transport header, and the buffer consists of at least two
>>> +descriptors, the device MUST start with the first descriptor to store the packet, and
>>> +MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
>> Is this intended to describe what happens if the device doesn't split
>> the header? If yes, I wonder if it's better to just drop this.
>> (Duplicated with message framing part?)
>>
> Imagine a case, when the driver tells the device to split the header
> only for the TCP protocol through the VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER
> command, then all the UDP protocol packets will not be split headers,
> and these UDP packets cannot be dropped.
>
> When receiving one of these UDP packets, the device MUST start with
> the first descriptor to store the packet, and MUST NOT set
> the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.


Yes, so this works as in the past. I meant there's probably no need to 
describe the behavior (in detail) when device doesn't split the header 
since it has been described in "Message Framing" subsection.

Thanks


>
> Thanks.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7] virtio_net: support split header
  2022-09-04 20:27     ` Michael S. Tsirkin
@ 2022-09-06  5:56       ` Jason Wang
  2022-09-09  7:41       ` [virtio-dev] " Heng Qi
  2022-09-09 10:22       ` Heng Qi
  2 siblings, 0 replies; 32+ messages in thread
From: Jason Wang @ 2022-09-06  5:56 UTC (permalink / raw)
  To: Michael S. Tsirkin, Heng Qi
  Cc: virtio-dev, Cornelia Huck, xuanzhuo, kangjie.xu


在 2022/9/5 04:27, Michael S. Tsirkin 写道:
> On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:
>> We need to clarify that the purpose of header splitting is to make all payloads
>> can be independently in a page, which is beneficial for the zerocopy
>> implemented by the upper layer.
> absolutely, pls add motivation.
>
>> If the driver does not enforce that the buffers submitted to the receiveq MUST
>> be composed of at least two descriptors, then header splitting will become meaningless,
>> or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at this time.
>>
>>
>> Thanks.
>>
>>
>
> This seems very narrow and unecessarily wasteful of descriptors.
> What is wrong in this:
>
> <header>...<padding>... <beginning of page><data>
>
> seems to achieve the goal of data in a separate page without
> using extra descriptors.
>
> thus my proposal to replace the requirement of a separate
> descriptor with an offset of data from beginning of
> buffer that driver sets.


Not sure I get this, does it mean the header and padding are placed at 
the tail of the previous adjacent page?

Thanks


>
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] [PATCH v7] virtio_net: support split header
  2022-08-16  9:34 [virtio-dev] [PATCH v7] virtio_net: support split header Heng Qi
                   ` (4 preceding siblings ...)
  2022-09-02  6:48 ` Michael S. Tsirkin
@ 2022-09-07 11:16 ` Heng Qi
  5 siblings, 0 replies; 32+ messages in thread
From: Heng Qi @ 2022-09-07 11:16 UTC (permalink / raw)
  To: virtio-dev; +Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, kangjie.xu

[-- Attachment #1: Type: text/plain, Size: 13444 bytes --]


在 2022/8/16 下午5:34, Heng Qi 写道:
> From: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>
> The purpose of this feature is to split the header and the payload of
> the packet.
>
> |                    receive buffer                                    |
> |                       0th descriptor             | 1th descriptor    |
> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>
> We can use a buffer plus a separate page when allocating the receive
> buffer. In this way, we can ensure that all payloads can be
> independently in a page, which is very beneficial for the zerocopy
> implemented by the upper layer.
>
> Signed-off-by: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
> Signed-off-by: Heng Qi<hengqi@linux.alibaba.com>
> Reviewed-by: Kangjie Xu<kangjie.xu@linux.alibaba.com>
> ---
> v7:
> 	1. Fix some presentation issues.
> 	2. Use "split transport header". @Jason Wang
> 	3. Clarify some paragraphs. @Cornelia Huck
> 	4. determine the device what to do if it does not perform header split on a packet.
>
> v6:
> 	1. Fix some syntax issues. @Cornelia Huck
> 	2. Clarify some paragraphs. @Cornelia Huck
> 	3. Determine the device what to do if it does not perform header split on a packet.
>
> v5:
> 	1. Determine when hdr_len is credible in the process of rx
> 	2. Clean up the use of buffers and descriptors
> 	3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>
> v4:
> 	1. fix typo @Cornelia Huck @Jason Wang
> 	2. do not split header for IP fragmentation packet. @Jason Wang
>
> v3:
> 	1. Fix some syntax issues
> 	2. Fix some terminology issues
> 	3. It is not unified with ip alignment, so ip alignment is not included
> 	4. Make it clear that the device must support four types, in the case of successful negotiation.
>
>   conformance.tex |   2 ++
>   content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 104 insertions(+)
>
> diff --git a/conformance.tex b/conformance.tex
> index 2b86fc6..4e2b82e 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>   \end{itemize}
>   
>   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>   \end{itemize}
>   
>   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> diff --git a/content.tex b/content.tex
> index e863709..5676da9 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>       channel.
>   
> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
> +    the transport header and the payload.
> +
>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>   
>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>   \end{description}
>   
>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>   #define VIRTIO_NET_HDR_F_DATA_VALID    2
>   #define VIRTIO_NET_HDR_F_RSC_INFO      4
> +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
>           u8 flags;
>   #define VIRTIO_NET_HDR_GSO_NONE        0
>   #define VIRTIO_NET_HDR_GSO_TCPV4       1
> @@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>   been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>   transport header size.
>   The driver MUST NOT rely on \field{hdr_len} to be correct.
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
> +the driver SHOULD treat the \field{hdr_len} as the length of the transport header
> +inside the first descriptor.
> +
>   \begin{note}
>   This is due to various bugs in implementations.
>   \end{note}
> @@ -4483,6 +4493,98 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>   according to the native endian of the guest rather than
>   (necessarily when not using the legacy interface) little-endian.
>   
> +\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
> +the device supports splitting the transport header and the payload.
> +The transport header and the payload will be separated into different
> +descriptors.
> +
> +\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
> +
> +To configure the split transport header, the following layout structure and definitions
> +are used:
> +
> +\begin{lstlisting}
> +struct virtio_net_split_transport_header_config {
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
> +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
> +    le64 type;
> +};
> +
> +#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
> + #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
> +\end{lstlisting}
> +
> +The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
> +VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split transport header configuration.
> +
> +The driver can enable or disable split transport header for different protocols by
> +setting or clearing corresponding bits in \field{type}.
> +
> +\begin{itemize}
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
> +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
> +\end{itemize}
> +
> +\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +A device MUST initialize \field{type} to 0, and MUST set it to 0
> +upon device reset.
> +
> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
> +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
> +
> +A device MUST NOT split the transport header if it encounters any of the following cases:
> +\begin{itemize}
> +    \item the device does not recognize the protocol of the packet.
> +    \item the packet is an IP fragmentation.
> +    \item \field{type} does not include the protocol of the packet.
> +    \item the buffer consists of only one descriptor.
> +    \item the total size of the virtio net header and the transport header exceeds
> +        the size of the first descriptor.
> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
> +        payload exceeds the size of the descriptor chain starting from the 2nd
> +        descriptor.
> +\end{itemize}
> +
> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit
> +in \field{flags} MUST be set. The transport header MUST be on the first
> +descriptor, following the virtio net header. The payload MUST start from the
> +second descriptor. The device MUST set \field{hdr_len} of structure
> +virtio_net_hdr to the length of the transport header.
> +
> +If all of the following applies:
> +\begin{itemize}
> +    \item the header is split by the device.
> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
> +    \item the received packet is spread over multiple buffers.
> +\end {itemize}
> +then if the device uses the buffers after the 1st buffer, and the buffer
> +consists of multiple descriptors, the device MUST skip the first descriptor,
> +because the first descriptor is used to carry the transport header.
> +The used length still reports the number of bytes it has written to memory.
> +
> +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, but the device does not split the
> +virtio net header and the transport header, and the buffer consists of at least two
> +descriptors, the device MUST start with the first descriptor to store the packet, and
> +MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
> +
> +\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
> +MUST treat the contents of \field{hdr_len} as the length of the transport header
> +inside the first descriptor.
> +
> +If the split transport header is enabled, the buffers submitted to receiveq by the
> +driver MUST be composed of at least two descriptors.
> +When the buffer consists of two descriptors, the length of the first
> +descriptor MUST be greater than the one of the virtio net header.
>   
>   \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>   


Then we reconsider an idea that implements split header based on 
mergeable buffers instead of desciptor chains.

1.

Instead of filling the receiveq with a descriptor chain consisting of a 
small buffer and a separate page, we fill a separate page each time like 
this:(In this scenario, the split header relies on 
VIRTIO_NET_F_MRG_RXBUF)If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is 
negotiated and the packet can be successfully split by the device, the 
device needs to find at least two buffers, namely two pages, one for the 
virtio-net header and the transport header, and the other for the 
payload. Like the following: | receive buffer(page) | receive 
buffer(page) | | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| 
payload |

Compared with the original solution, this method wastes a lot of memory 
in the first buffer, which may affect the performance of receiving 
packets. But we can copy the header in the first buffer through a small 
buffer to release pages quickly .

2.

At the same time, if XDP is considered, then the device needs to add 
headroom at the beginning of the first buffer when receiving packets, so 
that the driver can process programs similar to XDP. To solve this 
problem, can we introduce an offset that requires the device to write 
data from the offset position for the first buffer, like the following: 
| receive buffer(page) | receive buffer(page) | | <-- offset --> | 
virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | 3. Based on 
the above headroom, can we consider introducing a tailroom, which 
requires the device to start writing data with a length of max_len from 
the offset position of the first buffer, thereby leaving space for 
structures such as shared_info. Split header is suitable for 
high-throughput scenarios, tailroom can directly and intuitively 
organize multi-buffer data together, and XDP is also based on tailroom 
to support multi-buffer. Thanks.

[-- Attachment #2: Type: text/html, Size: 15294 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] [PATCH v7] virtio_net: support split header
  2022-09-02  4:12 ` Heng Qi
@ 2022-09-08 21:18   ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2022-09-08 21:18 UTC (permalink / raw)
  To: Heng Qi; +Cc: virtio-dev, Jason Wang, Cornelia Huck, xuanzhuo, kangjie.xu

On Fri, Sep 02, 2022 at 12:12:50PM +0800, Heng Qi wrote:
> 
> 在 2022/8/16 下午5:34, Heng Qi 写道:
> 
>     From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> 
>     The purpose of this feature is to split the header and the payload of
>     the packet.
> 
>     |                    receive buffer                                    |
>     |                       0th descriptor             | 1th descriptor    |
>     | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
> 
>     We can use a buffer plus a separate page when allocating the receive
>     buffer. In this way, we can ensure that all payloads can be
>     independently in a page, which is very beneficial for the zerocopy
>     implemented by the upper layer.
> 
>     Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>     Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>     Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
>     ---
>     v7:
>             1. Fix some presentation issues.
>             2. Use "split transport header". @Jason Wang
>             3. Clarify some paragraphs. @Cornelia Huck
>             4. determine the device what to do if it does not perform header split on a packet.
> 
>     v6:
>             1. Fix some syntax issues. @Cornelia Huck
>             2. Clarify some paragraphs. @Cornelia Huck
>             3. Determine the device what to do if it does not perform header split on a packet.
> 
>     v5:
>             1. Determine when hdr_len is credible in the process of rx
>             2. Clean up the use of buffers and descriptors
>             3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
> 
>     v4:
>             1. fix typo @Cornelia Huck @Jason Wang
>             2. do not split header for IP fragmentation packet. @Jason Wang
> 
>     v3:
>             1. Fix some syntax issues
>             2. Fix some terminology issues
>             3. It is not unified with ip alignment, so ip alignment is not included
>             4. Make it clear that the device must support four types, in the case of successful negotiation.
> 
>      conformance.tex |   2 ++
>      content.tex     | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>      2 files changed, 104 insertions(+)
> 
>     diff --git a/conformance.tex b/conformance.tex
>     index 2b86fc6..4e2b82e 100644
>     --- a/conformance.tex
>     +++ b/conformance.tex
>     @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>      \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>      \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>      \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>     +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>      \end{itemize}
> 
>      \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
>     @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>      \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>      \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>      \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>     +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>      \end{itemize}
> 
>      \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
>     diff --git a/content.tex b/content.tex
>     index e863709..5676da9 100644
>     --- a/content.tex
>     +++ b/content.tex
>     @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>      \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>          channel.
> 
>     +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER (52)] Device supports splitting
>     +    the transport header and the payload.
>     +
>      \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> 
>      \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>     @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>      \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>      \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>      \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>     +\item[VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>      \end{description}
> 
>      \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
>     @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>      #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>      #define VIRTIO_NET_HDR_F_DATA_VALID    2
>      #define VIRTIO_NET_HDR_F_RSC_INFO      4
>     +#define VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER  8
>              u8 flags;
>      #define VIRTIO_NET_HDR_GSO_NONE        0
>      #define VIRTIO_NET_HDR_GSO_TCPV4       1
>     @@ -3823,6 +3828,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>      been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>      transport header size.
>      The driver MUST NOT rely on \field{hdr_len} to be correct.
>     +
>     +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set,
>     +the driver SHOULD treat the \field{hdr_len} as the length of the transport header
>     +inside the first descriptor.
>     +
>      \begin{note}
>      This is due to various bugs in implementations.
>      \end{note}
>     @@ -4483,6 +4493,98 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>      according to the native endian of the guest rather than
>      (necessarily when not using the legacy interface) little-endian.
> 
>     +\paragraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>     +
>     +If the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature is negotiated,
>     +the device supports splitting the transport header and the payload.
>     +The transport header and the payload will be separated into different
>     +descriptors.
>     +
>     +\subparagraph{Split Transport Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header / Setting Split Transport Header}
>     +
>     +To configure the split transport header, the following layout structure and definitions
>     +are used:
>     +
>     +\begin{lstlisting}
>     +struct virtio_net_split_transport_header_config {
>     +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4     (1 << 0)
>     +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6     (1 << 1)
>     +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4     (1 << 2)
>     +#define VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6     (1 << 3)
>     +    le64 type;
>     +};
>     +
>     +#define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER       6
>     + #define VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET   0
>     +\end{lstlisting}
>     +
>     +The class VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER has one command:
>     +VIRTIO_NET_CTRL_SPLIT_TRANSPORT_HEADER_SET applies the new split transport header configuration.
>     +
>     +The driver can enable or disable split transport header for different protocols by
>     +setting or clearing corresponding bits in \field{type}.
>     +
>     +\begin{itemize}
>     +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4: split after ipv4 tcp header
>     +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6: split after ipv6 tcp header
>     +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4: split after ipv4 udp header
>     +    \item VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6: split after ipv6 udp header
>     +\end{itemize}
>     +
>     +\devicenormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>     +
>     +A device MUST initialize \field{type} to 0, and MUST set it to 0
>     +upon device reset.
>     +
>     +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, the device MUST support
>     +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_TCP6,
>     +VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_TRANSPORT_HEADER_TYPE_UDP6.
>     +
>     +A device MUST NOT split the transport header if it encounters any of the following cases:
>     +\begin{itemize}
>     +    \item the device does not recognize the protocol of the packet.
>     +    \item the packet is an IP fragmentation.
>     +    \item \field{type} does not include the protocol of the packet.
>     +    \item the buffer consists of only one descriptor.
>     +    \item the total size of the virtio net header and the transport header exceeds
>     +        the size of the first descriptor.
>     +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
>     +        payload exceeds the size of the descriptor chain starting from the 2nd
>     +        descriptor.
>     +\end{itemize}
>     +
>     +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit
>     +in \field{flags} MUST be set. The transport header MUST be on the first
>     +descriptor, following the virtio net header. The payload MUST start from the
>     +second descriptor. The device MUST set \field{hdr_len} of structure
>     +virtio_net_hdr to the length of the transport header.
>     +
>     +If all of the following applies:
>     +\begin{itemize}
>     +    \item the header is split by the device.
>     +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
>     +    \item the received packet is spread over multiple buffers.
>     +\end {itemize}
>     +then if the device uses the buffers after the 1st buffer, and the buffer
>     +consists of multiple descriptors, the device MUST skip the first descriptor,
>     +because the first descriptor is used to carry the transport header.
>     +The used length still reports the number of bytes it has written to memory.
>     +
>     +If VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated, but the device does not split the
>     +virtio net header and the transport header, and the buffer consists of at least two
>     +descriptors, the device MUST start with the first descriptor to store the packet, and
>     +MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
>     +
>     +\drivernormative{\subparagraph}{Setting Split Transport Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Transport Header}
>     +
>     +If the VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags} is set, the driver
>     +MUST treat the contents of \field{hdr_len} as the length of the transport header
>     +inside the first descriptor.
>     +
>     +If the split transport header is enabled, the buffers submitted to receiveq by the
>     +driver MUST be composed of at least two descriptors.
>     +When the buffer consists of two descriptors, the length of the first
>     +descriptor MUST be greater than the one of the virtio net header.
> 
>      \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> 
> 
> If the reviewers do not have more comments on "split header v7",
> then I submit an issue in the virtio-spec repository.
> Please vote on whether the new "split header" feature is added
> to the virtio specification.
> 
> 
> The link is https://github.com/oasis-tcs/virtio-spec/issues/144.
> 
> 
> Thanks.


I can do that if you like but I sent a bunch of comments Sept 4th.
Pls take a look.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-05  9:43             ` Xuan Zhuo
  2022-09-06  5:47               ` Jason Wang
@ 2022-09-08 21:18               ` Michael S. Tsirkin
  1 sibling, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2022-09-08 21:18 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Heng Qi, virtio-dev, Jason Wang, kangjie.xu

On Mon, Sep 05, 2022 at 05:43:27PM +0800, Xuan Zhuo wrote:
> On Mon, 5 Sep 2022 16:37:57 +0800, Heng Qi <hengqi@linux.alibaba.com> wrote:
> >
> > 在 2022/9/5 下午3:52, Xuan Zhuo 写道:
> > > On Sun, 4 Sep 2022 16:31:59 -0400, "Michael S. Tsirkin"<mst@redhat.com>  wrote:
> > >> On Fri, Sep 02, 2022 at 04:58:16PM +0800, Heng Qi wrote:
> > >>> When VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER is negotiated,
> > >>> the driver requires that the buffers submitted to receiveq
> > >>> MUST be composed of at least two descriptors,
> > >>> which means that each buffer the device gets is a descriptor chain,
> > >>> even if the device does not split the header for some packets.
> > >>>
> > >>> To store packet in the descriptor chain without header splitting
> > >>> by the device, the device MUST start with the first descriptor of
> > >>> the descriptor chain to store the packet, and MUST NOT set the
> > >>> VIRTIO_NET_HDR_F_SPLIT_TRANSPORT_HEADER bit in \field{flags}.
> > >>>
> > >>> Thanks.
> > >> Descriptor chains will hurt performance badly.
> > > I understand the reasons for the performance impact here are:
> > > 1. Two buffers are used
> > > 2. One buffer occupies two descs
> > >
> > > This is the same as my understanding in the case of mergeable. We also need to
> > > pack the packets into two buffers, and a packet will eventually occupy two
> > > descs.
> > >
> > >
> > >> How about simply making this feature depend on mergeable buffers?
> > >> Then we have a separate buffer for the header and
> > >> this works cleanly.
> > >
> > > Under mergeable, each buffer is independent, and the split header requires two
> > > unequal descs.
> > >
> > > If we implement it based on mergeable, then consider the scenario of tcp
> > > zerocopy, when we fill receive vq, each buffer is an separate page, and if we use an
> > > separate buffer to save the header, then this is a waste, we may
> > > have to copy at the driver layer.
> > >
> > > @Qi Do you think there will be other problems with this approach?
> > >
> > > Thanks.
> > >
> > When we think about specs, we shouldn't be too distracted by the implementation.
> >
> > But when we did think about this, suppose the driver fills by page based on
> > mergeable mode. in order to use the xdp program, the driver usually takes
> > the beginning of a single page as the headroom, and fills the rest of the page
> > into the virtqueue. Therefore, the empty buffer obtained by the
> > device is always smaller than a page when we implement split header
> > based on this mode, that is, the data load finally obtained by the driver
> > is offset from the beginning of the page. This does not enjoy the benefits of zero copy.
> >
> > At the same time, since the header is always only more than 100 bytes,
> > the page occupied by the header is a waste of the buffer.
> 
> 
> Yeah that reminds me that merge doesn't feel like it handles this very well.
> 
> The essence is that the two buffers used by the split header are different.
> 
> Desc Chain is used to bind a small buffer desc and a page desc. I didn't think
> of a better way to deal with this problem.
> 
> Thanks.
> 

I sent some suggestions avoiding use of descriptors completely,
using offsets instead. take a look.

> >
> > Thanks.
> >
> >
> >
> >
> >
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-04 20:27     ` Michael S. Tsirkin
  2022-09-06  5:56       ` Jason Wang
@ 2022-09-09  7:41       ` Heng Qi
  2022-09-09 11:15         ` Michael S. Tsirkin
  2022-09-09 10:22       ` Heng Qi
  2 siblings, 1 reply; 32+ messages in thread
From: Heng Qi @ 2022-09-09  7:41 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang, Xuan Zhuo, kangjie.xu



在 2022/9/5 上午4:27, Michael S. Tsirkin 写道:
> On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:
>> We need to clarify that the purpose of header splitting is to make all payloads
>> can be independently in a page, which is beneficial for the zerocopy
>> implemented by the upper layer.
> absolutely, pls add motivation.
>
>> If the driver does not enforce that the buffers submitted to the receiveq MUST
>> be composed of at least two descriptors, then header splitting will become meaningless,
>> or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at this time.
>>
>>
>> Thanks.
>>
>>
>
> This seems very narrow and unecessarily wasteful of descriptors.
> What is wrong in this:
>
> <header>...<padding>... <beginning of page><data>
>
> seems to achieve the goal of data in a separate page without
> using extra descriptors.
>
> thus my proposal to replace the requirement of a separate
> descriptor with an offset of data from beginning of
> buffer that driver sets.
>
>
We have carefully considered your suggestion.

We refer to spec v7 and earlier as scheme A for short. Review scheme A 
below:

| receive buffer |

| 0th descriptor | 1th descriptor |

| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |

We use a buffer plus a separate page when allocating the receive

buffer. In this way, we can ensure that all payloads can be

independently in a page, which is very beneficial for the zerocopy

implemented by the upper layer.

scheme A better solves the problem of headroom, tailroom and memory 
waste, but as you said, this solution relies on descriptor chain.

Our rethinking approach is no longer based on or using descriptor chain.

We refer to your proposed offset-based scheme as scheme B:

As you suggested, scheme B gives the device a buffer, using offset to 
indicate where to place the payload like this:

<header>...<padding>... <beginning of page><data>

But how to apply for this buffer? Since we want the payload to be placed 
on a separate page, the method we consider is to directly apply to the 
driver for two pages of contiguous memory.

Then the beginning of this contiguous memory is used to store the 
headroom, and the contiguous memory after the headroom is directly 
handed over to the device. similar to the following:

<------------------------------------------ receive buffer(2 pages) 
----------------------------------------->

<<---------------------------------- first page 
-----------------------------------><---- second page ------>>

<<Driver reserved, the later part is filled><vheader><transport 
header>..<padding>..<beginning of page><data>>

Based on your previous suggestion, we also considered another new scheme C.

This scheme is implemented based on mergeable buffer, filling a separate 
page each time.

If the split header is negotiated and the packet can be successfully 
split by the device, the device needs to find at least two buffers, 
namely two pages, one for the virtio-net header and transport header, 
and the other for the data payload. Like the following:

| receive buffer1(page) | receive buffer2 (page) |

| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |

At the same time, if XDP is considered, then the device needs to add 
headroom at the beginning of receive buffer1 when receiving packets, so 
that the driver can process programs similar to XDP. In order to solve 
this problem, can scheme C introduce an offset, which requires the 
device to write data from the offset position to receive buffer1, like 
the following:

| receive buffer (page) | receive buffer (page) |

| <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold 
-->| payload |

Then we simply compare the advantages and disadvantages of scheme A(spec 
v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable 
buffer):

1. desc chain:

- A depends on desciptor chain; - B, C do not depend on desciptor chain.

2. page alloc

- B fills two consecutive pages, which causes a great waste of memory 
for small packages such as arp; - C fills a single page, slightly better 
than B.

3. Memory waste:

- The memory waste of scheme A is mainly the 0th descriptor that is 
skipped by the device; - When scheme B and scheme C successfully split 
the header, there is a huge waste of the first page, but the first page 
can be quickly released by copying.

4. headroom

- The headrooms of plan A and plan B are reserved; - Scheme C requires 
the driver to set off to let the device skip off when using receive buffer1.

5. tailroom

- When splitting the header, skb usually needs to store each independent 
page in the non-linear data area based on shinfo. - The tailroom of 
scheme A is reserved by itself; - Scheme B requires the driver to set 
the reserved padding area for the first receive buffer(2 pages) to use 
shinfo when the split header is not successfully executed; - Scheme C 
requires the driver to set max_len for the first receive buffer(page).


Which plan do you prefer?

---

Thanks.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-04 20:27     ` Michael S. Tsirkin
  2022-09-06  5:56       ` Jason Wang
  2022-09-09  7:41       ` [virtio-dev] " Heng Qi
@ 2022-09-09 10:22       ` Heng Qi
  2 siblings, 0 replies; 32+ messages in thread
From: Heng Qi @ 2022-09-09 10:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtio-dev, jasowang, xuanzhuo, kangjie.xu

On Sun, Sep 04, 2022 at 04:27:38PM -0400, Michael S. Tsirkin wrote:
> On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:
> > We need to clarify that the purpose of header splitting is to make all payloads
> > can be independently in a page, which is beneficial for the zerocopy
> > implemented by the upper layer.
> 
> absolutely, pls add motivation.
> 
> > If the driver does not enforce that the buffers submitted to the receiveq MUST
> > be composed of at least two descriptors, then header splitting will become meaningless,
> > or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at this time.
> > 
> > 
> > Thanks.
> > 
> > 
> 
> 
> This seems very narrow and unecessarily wasteful of descriptors.
> What is wrong in this:
> 
> <header>...<padding>... <beginning of page><data>
> 
> seems to achieve the goal of data in a separate page without
> using extra descriptors.
> 
> thus my proposal to replace the requirement of a separate
> descriptor with an offset of data from beginning of
> buffer that driver sets.
>


We have carefully considered your suggestion. 

Let's summarize the schemes we've considered before and now.

1. Scheme A ( refer to spec v7 )

We refer to spec v7 and earlier as scheme A for short. Review scheme A below: 
|                         receive buffer                            | 
|              0th descriptor                      | 1th descriptor | 
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|      payload   | 

We use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be put
independently in a page, which is very beneficial for the zerocopy 
implemented by the upper layer. 

Scheme A better solves the problem of headroom, tailroom and
memory waste, but as you said, this solution relies on descriptor chain. 

2. Scheme B ( refer to your suggestion )

Our rethinking approach is no longer based on descriptor chain.
 
We refer to your proposed offset-based scheme as scheme B.
As you suggested, scheme B gives the device a buffer, using offset to
indicate where to place the payload. Like this: 

<header>...<padding>... <beginning of page><data> 

But how to apply for this buffer?
Since we want the payload to be placed on a separate page, the method
we consider is to directly alloc two pages from driver of contiguous memory. 

Then the beginning of this contiguous memory is used to store the headroom,
and the contiguous memory after the headroom is directly handed over to the device.
Similar to the following: 

[------------------ receive buffer(2 pages) ------------------------------] 
[<------------first page -------------------><------ second page -------->] 
[<-----><virtnet hdr> <mac,ip,tcp>..<padding><       payload             >] 
   ^    ^
   |    |
   |    pointer to device
   |
   |
   Driver reserved, the later part is filled

3. Scheme C (this sheme we have sent to you on September 7th, maybe you miss it.)

Based on your previous suggestion, we also considered another new scheme C. 
This scheme is implemented based on mergeable buffer, filling a separate page each time. 

If the split header is negotiated and the packet can be successfully split by the device,
the device needs to find at least two buffers, namely two pages, one for the virtio-net header
and transport header, and the other for the payload. Like the following: 

|                       receive buffer1(page)      |     receive buffer2 (page)   | 
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|         payload              | 

At the same time, if XDP is considered, then the device needs to add headroom at the
beginning of receive buffer1 when receiving packets, so that the driver can process
programs similar to XDP. 

In order to solve this problem, scheme C introduce an offset which requires
the device to write data from the offset to receive buffer1, like the following: 

|                   receive buffer (page)                                 | receive buffer (page) | 
| <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|         payload       | 
^
|
pointer to device
					   


4. Summarize

Then we simply compare the advantages and disadvantages of scheme A(spec v7),
scheme B (offset buffer(2 pages)) and scheme C (based on mergeable buffer): 

1). descriptor chain: 
         i)  A depends on desciptor chain;
         ii) B, C do not depend on desciptor chain. 

2). page alloc 
         i) B fills with two consecutive pages, which causes a great waste of memory
                for small packages such as arp;
         ii) C fills with a single page, slightly better than B. 

3). Memory waste: 
         i) The memory waste of scheme A is mainly the 0th descriptor
                that is skipped by the device;
         ii) When scheme B and scheme C successfully split the header,
                there is a huge waste of the first page, but the first page
                can be quickly released by copying in driver. 

4). headroom 
         i) The headrooms of plan A and plan B are reserved;
         ii) Scheme C requires the driver to set offset to let the device skip
                offset when using receive buffer1. 

Which plan do you prefer? 

Thanks. 



> 
> -- 
> MST
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-09  7:41       ` [virtio-dev] " Heng Qi
@ 2022-09-09 11:15         ` Michael S. Tsirkin
  2022-09-09 12:38           ` Xuan Zhuo
  2022-09-09 12:47           ` Xuan Zhuo
  0 siblings, 2 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2022-09-09 11:15 UTC (permalink / raw)
  To: Heng Qi; +Cc: virtio-dev, Jason Wang, Xuan Zhuo, kangjie.xu

On Fri, Sep 09, 2022 at 03:41:54PM +0800, Heng Qi wrote:
> 
> 
> 在 2022/9/5 上午4:27, Michael S. Tsirkin 写道:
> > On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:
> > > We need to clarify that the purpose of header splitting is to make all payloads
> > > can be independently in a page, which is beneficial for the zerocopy
> > > implemented by the upper layer.
> > absolutely, pls add motivation.
> > 
> > > If the driver does not enforce that the buffers submitted to the receiveq MUST
> > > be composed of at least two descriptors, then header splitting will become meaningless,
> > > or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at this time.
> > > 
> > > 
> > > Thanks.
> > > 
> > > 
> > 
> > This seems very narrow and unecessarily wasteful of descriptors.
> > What is wrong in this:
> > 
> > <header>...<padding>... <beginning of page><data>
> > 
> > seems to achieve the goal of data in a separate page without
> > using extra descriptors.
> > 
> > thus my proposal to replace the requirement of a separate
> > descriptor with an offset of data from beginning of
> > buffer that driver sets.
> > 
> > 
> We have carefully considered your suggestion.
> 
> We refer to spec v7 and earlier as scheme A for short. Review scheme A
> below:
> 
> | receive buffer |
> 
> | 0th descriptor | 1th descriptor |
> 
> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |
> 
> We use a buffer plus a separate page when allocating the receive
> 
> buffer. In this way, we can ensure that all payloads can be
> 
> independently in a page, which is very beneficial for the zerocopy
> 
> implemented by the upper layer.
> 
> scheme A better solves the problem of headroom, tailroom and memory waste,
> but as you said, this solution relies on descriptor chain.
> 
> Our rethinking approach is no longer based on or using descriptor chain.
> 
> We refer to your proposed offset-based scheme as scheme B:
> 
> As you suggested, scheme B gives the device a buffer, using offset to
> indicate where to place the payload like this:
> 
> <header>...<padding>... <beginning of page><data>
> 
> But how to apply for this buffer? Since we want the payload to be placed on
> a separate page, the method we consider is to directly apply to the driver
> for two pages of contiguous memory.
> 
> Then the beginning of this contiguous memory is used to store the headroom,
> and the contiguous memory after the headroom is directly handed over to the
> device. similar to the following:
> 
> <------------------------------------------ receive buffer(2 pages)
> ----------------------------------------->
> 
> <<---------------------------------- first page
> -----------------------------------><---- second page ------>>
> 
> <<Driver reserved, the later part is filled><vheader><transport
> header>..<padding>..<beginning of page><data>>
> 
> Based on your previous suggestion, we also considered another new scheme C.
> 
> This scheme is implemented based on mergeable buffer, filling a separate
> page each time.
> 
> If the split header is negotiated and the packet can be successfully split
> by the device, the device needs to find at least two buffers, namely two
> pages, one for the virtio-net header and transport header, and the other for
> the data payload. Like the following:
> 
> | receive buffer1(page) | receive buffer2 (page) |
> 
> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |
> 
> At the same time, if XDP is considered, then the device needs to add
> headroom at the beginning of receive buffer1 when receiving packets, so that
> the driver can process programs similar to XDP. In order to solve this
> problem, can scheme C introduce an offset, which requires the device to
> write data from the offset position to receive buffer1, like the following:
> 
> | receive buffer (page) | receive buffer (page) |
> 
> | <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|
> payload |

And in fact, B and C both use an offset now, right?

> Then we simply compare the advantages and disadvantages of scheme A(spec
> v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable
> buffer):
> 
> 1. desc chain:
> 
> - A depends on desciptor chain; - B, C do not depend on desciptor chain.
> 
> 2. page alloc
> 
> - B fills two consecutive pages, which causes a great waste of memory for
> small packages such as arp; - C fills a single page, slightly better than B.
> 
> 3. Memory waste:
> 
> - The memory waste of scheme A is mainly the 0th descriptor that is skipped
> by the device;

there's also the cost of the indirect buffer since that is used when
there is a chain.

> - When scheme B and scheme C successfully split the header,
> there is a huge waste of the first page, but the first page can be quickly
> released by copying.
> 
> 4. headroom
> 
> - The headrooms of plan A and plan B are reserved; - Scheme C requires the
> driver to set off to let the device skip off when using receive buffer1.
> 
> 5. tailroom
> 
> - When splitting the header, skb usually needs to store each independent
> page in the non-linear data area based on shinfo. - The tailroom of scheme A
> is reserved by itself; - Scheme B requires the driver to set the reserved
> padding area for the first receive buffer(2 pages) to use shinfo when the
> split header is not successfully executed; - Scheme C requires the driver to
> set max_len for the first receive buffer(page).
> 
> 
> Which plan do you prefer?

I think either both B and C depending on the mergeable buffers flag,
or just one of these two.

> ---
> 
> Thanks.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-09 11:15         ` Michael S. Tsirkin
@ 2022-09-09 12:38           ` Xuan Zhuo
  2022-09-14  3:34             ` Jason Wang
  2022-09-09 12:47           ` Xuan Zhuo
  1 sibling, 1 reply; 32+ messages in thread
From: Xuan Zhuo @ 2022-09-09 12:38 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang, kangjie.xu, Heng Qi

On Fri, 9 Sep 2022 07:15:02 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Sep 09, 2022 at 03:41:54PM +0800, Heng Qi wrote:
> >
> >
> > 在 2022/9/5 上午4:27, Michael S. Tsirkin 写道:
> > > On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:
> > > > We need to clarify that the purpose of header splitting is to make all payloads
> > > > can be independently in a page, which is beneficial for the zerocopy
> > > > implemented by the upper layer.
> > > absolutely, pls add motivation.
> > >
> > > > If the driver does not enforce that the buffers submitted to the receiveq MUST
> > > > be composed of at least two descriptors, then header splitting will become meaningless,
> > > > or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at this time.
> > > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > >
> > > This seems very narrow and unecessarily wasteful of descriptors.
> > > What is wrong in this:
> > >
> > > <header>...<padding>... <beginning of page><data>
> > >
> > > seems to achieve the goal of data in a separate page without
> > > using extra descriptors.
> > >
> > > thus my proposal to replace the requirement of a separate
> > > descriptor with an offset of data from beginning of
> > > buffer that driver sets.
> > >
> > >
> > We have carefully considered your suggestion.
> >
> > We refer to spec v7 and earlier as scheme A for short. Review scheme A
> > below:
> >
> > | receive buffer |
> >
> > | 0th descriptor | 1th descriptor |
> >
> > | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |
> >
> > We use a buffer plus a separate page when allocating the receive
> >
> > buffer. In this way, we can ensure that all payloads can be
> >
> > independently in a page, which is very beneficial for the zerocopy
> >
> > implemented by the upper layer.
> >
> > scheme A better solves the problem of headroom, tailroom and memory waste,
> > but as you said, this solution relies on descriptor chain.
> >
> > Our rethinking approach is no longer based on or using descriptor chain.
> >
> > We refer to your proposed offset-based scheme as scheme B:
> >
> > As you suggested, scheme B gives the device a buffer, using offset to
> > indicate where to place the payload like this:
> >
> > <header>...<padding>... <beginning of page><data>
> >
> > But how to apply for this buffer? Since we want the payload to be placed on
> > a separate page, the method we consider is to directly apply to the driver
> > for two pages of contiguous memory.
> >
> > Then the beginning of this contiguous memory is used to store the headroom,
> > and the contiguous memory after the headroom is directly handed over to the
> > device. similar to the following:
> >
> > <------------------------------------------ receive buffer(2 pages)
> > ----------------------------------------->
> >
> > <<---------------------------------- first page
> > -----------------------------------><---- second page ------>>
> >
> > <<Driver reserved, the later part is filled><vheader><transport
> > header>..<padding>..<beginning of page><data>>
> >
> > Based on your previous suggestion, we also considered another new scheme C.
> >
> > This scheme is implemented based on mergeable buffer, filling a separate
> > page each time.
> >
> > If the split header is negotiated and the packet can be successfully split
> > by the device, the device needs to find at least two buffers, namely two
> > pages, one for the virtio-net header and transport header, and the other for
> > the data payload. Like the following:
> >
> > | receive buffer1(page) | receive buffer2 (page) |
> >
> > | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |
> >
> > At the same time, if XDP is considered, then the device needs to add
> > headroom at the beginning of receive buffer1 when receiving packets, so that
> > the driver can process programs similar to XDP. In order to solve this
> > problem, can scheme C introduce an offset, which requires the device to
> > write data from the offset position to receive buffer1, like the following:
> >
> > | receive buffer (page) | receive buffer (page) |
> >
> > | <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|
> > payload |
>
> And in fact, B and C both use an offset now, right?


B: offset is used to get the position to place the payload.
C: The offset is used to reserve some space for the device, which the driver can
   use as headroom.

   In order to make the payload page-aligned, we can only hand over the entire
   page to the device, so we cannot reserve some headroom in advance.

>
> > Then we simply compare the advantages and disadvantages of scheme A(spec
> > v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable
> > buffer):
> >
> > 1. desc chain:
> >
> > - A depends on desciptor chain; - B, C do not depend on desciptor chain.
> >
> > 2. page alloc
> >
> > - B fills two consecutive pages, which causes a great waste of memory for
> > small packages such as arp; - C fills a single page, slightly better than B.
> >
> > 3. Memory waste:
> >
> > - The memory waste of scheme A is mainly the 0th descriptor that is skipped
> > by the device;
>
> there's also the cost of the indirect buffer since that is used when
> there is a chain.

Yes


>
> > - When scheme B and scheme C successfully split the header,
> > there is a huge waste of the first page, but the first page can be quickly
> > released by copying.
> >
> > 4. headroom
> >
> > - The headrooms of plan A and plan B are reserved; - Scheme C requires the
> > driver to set off to let the device skip off when using receive buffer1.
> >
> > 5. tailroom
> >
> > - When splitting the header, skb usually needs to store each independent
> > page in the non-linear data area based on shinfo. - The tailroom of scheme A
> > is reserved by itself; - Scheme B requires the driver to set the reserved
> > padding area for the first receive buffer(2 pages) to use shinfo when the
> > split header is not successfully executed; - Scheme C requires the driver to
> > set max_len for the first receive buffer(page).
> >
> >
> > Which plan do you prefer?
>
> I think either both B and C depending on the mergeable buffers flag,
> or just one of these two.

If I understand correctly, B does not depend on mergeable, while C must depend
on mergeable.

Thanks.


>
> > ---
> >
> > Thanks.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-09 11:15         ` Michael S. Tsirkin
  2022-09-09 12:38           ` Xuan Zhuo
@ 2022-09-09 12:47           ` Xuan Zhuo
  2022-09-13  7:20             ` Heng Qi
  1 sibling, 1 reply; 32+ messages in thread
From: Xuan Zhuo @ 2022-09-09 12:47 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang, kangjie.xu, Heng Qi


hi
   Qi also sent another same email today. Due to some email client problems,
   this email has some confusion in the format, so we can discuss
   under another one.

   https://lists.oasis-open.org/archives/virtio-dev/202209/msg00066.html

Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-09 12:47           ` Xuan Zhuo
@ 2022-09-13  7:20             ` Heng Qi
  0 siblings, 0 replies; 32+ messages in thread
From: Heng Qi @ 2022-09-13  7:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang, Xuan Zhuo, kangjie.xu

On Fri, Sep 09, 2022 at 08:47:57PM +0800, Xuan Zhuo wrote:
> 
> hi
>    Qi also sent another same email today. Due to some email client problems,
>    this email has some confusion in the format, so we can discuss
>    under another one.
> 
>    https://lists.oasis-open.org/archives/virtio-dev/202209/msg00066.html
> 

Yes. Due to some formatting issues with the mail client, I resent this new email
which may be in a clear style for your review.

Do you have more questions about the contents of this new email?

Looking forward to your comments.

Thanks.

> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-09 12:38           ` Xuan Zhuo
@ 2022-09-14  3:34             ` Jason Wang
  2022-09-27 21:35               ` Michael S. Tsirkin
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Wang @ 2022-09-14  3:34 UTC (permalink / raw)
  To: Xuan Zhuo, Michael S. Tsirkin; +Cc: virtio-dev, kangjie.xu, Heng Qi


在 2022/9/9 20:38, Xuan Zhuo 写道:
> On Fri, 9 Sep 2022 07:15:02 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> On Fri, Sep 09, 2022 at 03:41:54PM +0800, Heng Qi wrote:
>>>
>>> 在 2022/9/5 上午4:27, Michael S. Tsirkin 写道:
>>>> On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:
>>>>> We need to clarify that the purpose of header splitting is to make all payloads
>>>>> can be independently in a page, which is beneficial for the zerocopy
>>>>> implemented by the upper layer.
>>>> absolutely, pls add motivation.
>>>>
>>>>> If the driver does not enforce that the buffers submitted to the receiveq MUST
>>>>> be composed of at least two descriptors, then header splitting will become meaningless,
>>>>> or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at this time.
>>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>> This seems very narrow and unecessarily wasteful of descriptors.
>>>> What is wrong in this:
>>>>
>>>> <header>...<padding>... <beginning of page><data>
>>>>
>>>> seems to achieve the goal of data in a separate page without
>>>> using extra descriptors.
>>>>
>>>> thus my proposal to replace the requirement of a separate
>>>> descriptor with an offset of data from beginning of
>>>> buffer that driver sets.
>>>>
>>>>
>>> We have carefully considered your suggestion.
>>>
>>> We refer to spec v7 and earlier as scheme A for short. Review scheme A
>>> below:
>>>
>>> | receive buffer |
>>>
>>> | 0th descriptor | 1th descriptor |
>>>
>>> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |
>>>
>>> We use a buffer plus a separate page when allocating the receive
>>>
>>> buffer. In this way, we can ensure that all payloads can be
>>>
>>> independently in a page, which is very beneficial for the zerocopy
>>>
>>> implemented by the upper layer.
>>>
>>> scheme A better solves the problem of headroom, tailroom and memory waste,
>>> but as you said, this solution relies on descriptor chain.
>>>
>>> Our rethinking approach is no longer based on or using descriptor chain.
>>>
>>> We refer to your proposed offset-based scheme as scheme B:
>>>
>>> As you suggested, scheme B gives the device a buffer, using offset to
>>> indicate where to place the payload like this:
>>>
>>> <header>...<padding>... <beginning of page><data>
>>>
>>> But how to apply for this buffer? Since we want the payload to be placed on
>>> a separate page, the method we consider is to directly apply to the driver
>>> for two pages of contiguous memory.
>>>
>>> Then the beginning of this contiguous memory is used to store the headroom,
>>> and the contiguous memory after the headroom is directly handed over to the
>>> device. similar to the following:
>>>
>>> <------------------------------------------ receive buffer(2 pages)
>>> ----------------------------------------->
>>>
>>> <<---------------------------------- first page
>>> -----------------------------------><---- second page ------>>
>>>
>>> <<Driver reserved, the later part is filled><vheader><transport
>>> header>..<padding>..<beginning of page><data>>
>>>
>>> Based on your previous suggestion, we also considered another new scheme C.
>>>
>>> This scheme is implemented based on mergeable buffer, filling a separate
>>> page each time.
>>>
>>> If the split header is negotiated and the packet can be successfully split
>>> by the device, the device needs to find at least two buffers, namely two
>>> pages, one for the virtio-net header and transport header, and the other for
>>> the data payload. Like the following:
>>>
>>> | receive buffer1(page) | receive buffer2 (page) |
>>>
>>> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |
>>>
>>> At the same time, if XDP is considered, then the device needs to add
>>> headroom at the beginning of receive buffer1 when receiving packets, so that
>>> the driver can process programs similar to XDP. In order to solve this
>>> problem, can scheme C introduce an offset, which requires the device to
>>> write data from the offset position to receive buffer1, like the following:
>>>
>>> | receive buffer (page) | receive buffer (page) |
>>>
>>> | <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|
>>> payload |
>> And in fact, B and C both use an offset now, right?
>
> B: offset is used to get the position to place the payload.
> C: The offset is used to reserve some space for the device, which the driver can
>     use as headroom.
>
>     In order to make the payload page-aligned, we can only hand over the entire
>     page to the device, so we cannot reserve some headroom in advance.


For C, it might be better to do some tweak since mergeable buffer 
doesn't forbid using a descriptor chain as a single buffer.

So if it's a descriptor chain we got back the method A by placing the 
payload in a dedicated buffer. If it's not placing the payload in an 
adjacent buffer.

Thanks


>
>>> Then we simply compare the advantages and disadvantages of scheme A(spec
>>> v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable
>>> buffer):
>>>
>>> 1. desc chain:
>>>
>>> - A depends on desciptor chain; - B, C do not depend on desciptor chain.
>>>
>>> 2. page alloc
>>>
>>> - B fills two consecutive pages, which causes a great waste of memory for
>>> small packages such as arp; - C fills a single page, slightly better than B.
>>>
>>> 3. Memory waste:
>>>
>>> - The memory waste of scheme A is mainly the 0th descriptor that is skipped
>>> by the device;
>> there's also the cost of the indirect buffer since that is used when
>> there is a chain.
> Yes
>
>
>>> - When scheme B and scheme C successfully split the header,
>>> there is a huge waste of the first page, but the first page can be quickly
>>> released by copying.
>>>
>>> 4. headroom
>>>
>>> - The headrooms of plan A and plan B are reserved; - Scheme C requires the
>>> driver to set off to let the device skip off when using receive buffer1.
>>>
>>> 5. tailroom
>>>
>>> - When splitting the header, skb usually needs to store each independent
>>> page in the non-linear data area based on shinfo. - The tailroom of scheme A
>>> is reserved by itself; - Scheme B requires the driver to set the reserved
>>> padding area for the first receive buffer(2 pages) to use shinfo when the
>>> split header is not successfully executed; - Scheme C requires the driver to
>>> set max_len for the first receive buffer(page).
>>>
>>>
>>> Which plan do you prefer?
>> I think either both B and C depending on the mergeable buffers flag,
>> or just one of these two.
> If I understand correctly, B does not depend on mergeable, while C must depend
> on mergeable.
>
> Thanks.
>
>
>>> ---
>>>
>>> Thanks.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-14  3:34             ` Jason Wang
@ 2022-09-27 21:35               ` Michael S. Tsirkin
  2022-09-28  2:15                 ` Heng Qi
  2022-09-28  8:01                 ` Xuan Zhuo
  0 siblings, 2 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2022-09-27 21:35 UTC (permalink / raw)
  To: Jason Wang; +Cc: Xuan Zhuo, virtio-dev, kangjie.xu, Heng Qi

On Wed, Sep 14, 2022 at 11:34:43AM +0800, Jason Wang wrote:
> 
> 在 2022/9/9 20:38, Xuan Zhuo 写道:
> > On Fri, 9 Sep 2022 07:15:02 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Fri, Sep 09, 2022 at 03:41:54PM +0800, Heng Qi wrote:
> > > > 
> > > > 在 2022/9/5 上午4:27, Michael S. Tsirkin 写道:
> > > > > On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:
> > > > > > We need to clarify that the purpose of header splitting is to make all payloads
> > > > > > can be independently in a page, which is beneficial for the zerocopy
> > > > > > implemented by the upper layer.
> > > > > absolutely, pls add motivation.
> > > > > 
> > > > > > If the driver does not enforce that the buffers submitted to the receiveq MUST
> > > > > > be composed of at least two descriptors, then header splitting will become meaningless,
> > > > > > or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at this time.
> > > > > > 
> > > > > > 
> > > > > > Thanks.
> > > > > > 
> > > > > > 
> > > > > This seems very narrow and unecessarily wasteful of descriptors.
> > > > > What is wrong in this:
> > > > > 
> > > > > <header>...<padding>... <beginning of page><data>
> > > > > 
> > > > > seems to achieve the goal of data in a separate page without
> > > > > using extra descriptors.
> > > > > 
> > > > > thus my proposal to replace the requirement of a separate
> > > > > descriptor with an offset of data from beginning of
> > > > > buffer that driver sets.
> > > > > 
> > > > > 
> > > > We have carefully considered your suggestion.
> > > > 
> > > > We refer to spec v7 and earlier as scheme A for short. Review scheme A
> > > > below:
> > > > 
> > > > | receive buffer |
> > > > 
> > > > | 0th descriptor | 1th descriptor |
> > > > 
> > > > | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |
> > > > 
> > > > We use a buffer plus a separate page when allocating the receive
> > > > 
> > > > buffer. In this way, we can ensure that all payloads can be
> > > > 
> > > > independently in a page, which is very beneficial for the zerocopy
> > > > 
> > > > implemented by the upper layer.
> > > > 
> > > > scheme A better solves the problem of headroom, tailroom and memory waste,
> > > > but as you said, this solution relies on descriptor chain.
> > > > 
> > > > Our rethinking approach is no longer based on or using descriptor chain.
> > > > 
> > > > We refer to your proposed offset-based scheme as scheme B:
> > > > 
> > > > As you suggested, scheme B gives the device a buffer, using offset to
> > > > indicate where to place the payload like this:
> > > > 
> > > > <header>...<padding>... <beginning of page><data>
> > > > 
> > > > But how to apply for this buffer? Since we want the payload to be placed on
> > > > a separate page, the method we consider is to directly apply to the driver
> > > > for two pages of contiguous memory.
> > > > 
> > > > Then the beginning of this contiguous memory is used to store the headroom,
> > > > and the contiguous memory after the headroom is directly handed over to the
> > > > device. similar to the following:
> > > > 
> > > > <------------------------------------------ receive buffer(2 pages)
> > > > ----------------------------------------->
> > > > 
> > > > <<---------------------------------- first page
> > > > -----------------------------------><---- second page ------>>
> > > > 
> > > > <<Driver reserved, the later part is filled><vheader><transport
> > > > header>..<padding>..<beginning of page><data>>
> > > > 
> > > > Based on your previous suggestion, we also considered another new scheme C.
> > > > 
> > > > This scheme is implemented based on mergeable buffer, filling a separate
> > > > page each time.
> > > > 
> > > > If the split header is negotiated and the packet can be successfully split
> > > > by the device, the device needs to find at least two buffers, namely two
> > > > pages, one for the virtio-net header and transport header, and the other for
> > > > the data payload. Like the following:
> > > > 
> > > > | receive buffer1(page) | receive buffer2 (page) |
> > > > 
> > > > | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |
> > > > 
> > > > At the same time, if XDP is considered, then the device needs to add
> > > > headroom at the beginning of receive buffer1 when receiving packets, so that
> > > > the driver can process programs similar to XDP. In order to solve this
> > > > problem, can scheme C introduce an offset, which requires the device to
> > > > write data from the offset position to receive buffer1, like the following:
> > > > 
> > > > | receive buffer (page) | receive buffer (page) |
> > > > 
> > > > | <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|
> > > > payload |
> > > And in fact, B and C both use an offset now, right?
> > 
> > B: offset is used to get the position to place the payload.
> > C: The offset is used to reserve some space for the device, which the driver can
> >     use as headroom.
> > 
> >     In order to make the payload page-aligned, we can only hand over the entire
> >     page to the device, so we cannot reserve some headroom in advance.
> 
> 
> For C, it might be better to do some tweak since mergeable buffer doesn't
> forbid using a descriptor chain as a single buffer.
> 
> So if it's a descriptor chain we got back the method A by placing the
> payload in a dedicated buffer. If it's not placing the payload in an
> adjacent buffer.
> 
> Thanks

Let's find a way so devices do not care how descriptors are laid out.

> 
> > 
> > > > Then we simply compare the advantages and disadvantages of scheme A(spec
> > > > v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable
> > > > buffer):
> > > > 
> > > > 1. desc chain:
> > > > 
> > > > - A depends on desciptor chain; - B, C do not depend on desciptor chain.
> > > > 
> > > > 2. page alloc
> > > > 
> > > > - B fills two consecutive pages, which causes a great waste of memory for
> > > > small packages such as arp; - C fills a single page, slightly better than B.
> > > > 
> > > > 3. Memory waste:
> > > > 
> > > > - The memory waste of scheme A is mainly the 0th descriptor that is skipped
> > > > by the device;
> > > there's also the cost of the indirect buffer since that is used when
> > > there is a chain.
> > Yes
> > 
> > 
> > > > - When scheme B and scheme C successfully split the header,
> > > > there is a huge waste of the first page, but the first page can be quickly
> > > > released by copying.
> > > > 
> > > > 4. headroom
> > > > 
> > > > - The headrooms of plan A and plan B are reserved; - Scheme C requires the
> > > > driver to set off to let the device skip off when using receive buffer1.
> > > > 
> > > > 5. tailroom
> > > > 
> > > > - When splitting the header, skb usually needs to store each independent
> > > > page in the non-linear data area based on shinfo. - The tailroom of scheme A
> > > > is reserved by itself; - Scheme B requires the driver to set the reserved
> > > > padding area for the first receive buffer(2 pages) to use shinfo when the
> > > > split header is not successfully executed; - Scheme C requires the driver to
> > > > set max_len for the first receive buffer(page).
> > > > 
> > > > 
> > > > Which plan do you prefer?
> > > I think either both B and C depending on the mergeable buffers flag,
> > > or just one of these two.
> > If I understand correctly, B does not depend on mergeable, while C must depend
> > on mergeable.
> > 
> > Thanks.
> > 
> > 
> > > > ---
> > > > 
> > > > Thanks.
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-27 21:35               ` Michael S. Tsirkin
@ 2022-09-28  2:15                 ` Heng Qi
  2022-09-28  8:01                 ` Xuan Zhuo
  1 sibling, 0 replies; 32+ messages in thread
From: Heng Qi @ 2022-09-28  2:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang, Xuan Zhuo, kangjie.xu



在 2022/9/28 上午5:35, Michael S. Tsirkin 写道:
> On Wed, Sep 14, 2022 at 11:34:43AM +0800, Jason Wang wrote:
>> 在 2022/9/9 20:38, Xuan Zhuo 写道:
>>> On Fri, 9 Sep 2022 07:15:02 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>> On Fri, Sep 09, 2022 at 03:41:54PM +0800, Heng Qi wrote:
>>>>> 在 2022/9/5 上午4:27, Michael S. Tsirkin 写道:
>>>>>> On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:
>>>>>>> We need to clarify that the purpose of header splitting is to make all payloads
>>>>>>> can be independently in a page, which is beneficial for the zerocopy
>>>>>>> implemented by the upper layer.
>>>>>> absolutely, pls add motivation.
>>>>>>
>>>>>>> If the driver does not enforce that the buffers submitted to the receiveq MUST
>>>>>>> be composed of at least two descriptors, then header splitting will become meaningless,
>>>>>>> or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at this time.
>>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>> This seems very narrow and unecessarily wasteful of descriptors.
>>>>>> What is wrong in this:
>>>>>>
>>>>>> <header>...<padding>... <beginning of page><data>
>>>>>>
>>>>>> seems to achieve the goal of data in a separate page without
>>>>>> using extra descriptors.
>>>>>>
>>>>>> thus my proposal to replace the requirement of a separate
>>>>>> descriptor with an offset of data from beginning of
>>>>>> buffer that driver sets.
>>>>>>
>>>>>>
>>>>> We have carefully considered your suggestion.
>>>>>
>>>>> We refer to spec v7 and earlier as scheme A for short. Review scheme A
>>>>> below:
>>>>>
>>>>> | receive buffer |
>>>>>
>>>>> | 0th descriptor | 1th descriptor |
>>>>>
>>>>> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |
>>>>>
>>>>> We use a buffer plus a separate page when allocating the receive
>>>>>
>>>>> buffer. In this way, we can ensure that all payloads can be
>>>>>
>>>>> independently in a page, which is very beneficial for the zerocopy
>>>>>
>>>>> implemented by the upper layer.
>>>>>
>>>>> scheme A better solves the problem of headroom, tailroom and memory waste,
>>>>> but as you said, this solution relies on descriptor chain.
>>>>>
>>>>> Our rethinking approach is no longer based on or using descriptor chain.
>>>>>
>>>>> We refer to your proposed offset-based scheme as scheme B:
>>>>>
>>>>> As you suggested, scheme B gives the device a buffer, using offset to
>>>>> indicate where to place the payload like this:
>>>>>
>>>>> <header>...<padding>... <beginning of page><data>
>>>>>
>>>>> But how to apply for this buffer? Since we want the payload to be placed on
>>>>> a separate page, the method we consider is to directly apply to the driver
>>>>> for two pages of contiguous memory.
>>>>>
>>>>> Then the beginning of this contiguous memory is used to store the headroom,
>>>>> and the contiguous memory after the headroom is directly handed over to the
>>>>> device. similar to the following:
>>>>>
>>>>> <------------------------------------------ receive buffer(2 pages)
>>>>> ----------------------------------------->
>>>>>
>>>>> <<---------------------------------- first page
>>>>> -----------------------------------><---- second page ------>>
>>>>>
>>>>> <<Driver reserved, the later part is filled><vheader><transport
>>>>> header>..<padding>..<beginning of page><data>>
>>>>>
>>>>> Based on your previous suggestion, we also considered another new scheme C.
>>>>>
>>>>> This scheme is implemented based on mergeable buffer, filling a separate
>>>>> page each time.
>>>>>
>>>>> If the split header is negotiated and the packet can be successfully split
>>>>> by the device, the device needs to find at least two buffers, namely two
>>>>> pages, one for the virtio-net header and transport header, and the other for
>>>>> the data payload. Like the following:
>>>>>
>>>>> | receive buffer1(page) | receive buffer2 (page) |
>>>>>
>>>>> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |
>>>>>
>>>>> At the same time, if XDP is considered, then the device needs to add
>>>>> headroom at the beginning of receive buffer1 when receiving packets, so that
>>>>> the driver can process programs similar to XDP. In order to solve this
>>>>> problem, can scheme C introduce an offset, which requires the device to
>>>>> write data from the offset position to receive buffer1, like the following:
>>>>>
>>>>> | receive buffer (page) | receive buffer (page) |
>>>>>
>>>>> | <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|
>>>>> payload |
>>>> And in fact, B and C both use an offset now, right?
>>> B: offset is used to get the position to place the payload.
>>> C: The offset is used to reserve some space for the device, which the driver can
>>>      use as headroom.
>>>
>>>      In order to make the payload page-aligned, we can only hand over the entire
>>>      page to the device, so we cannot reserve some headroom in advance.
>>
>> For C, it might be better to do some tweak since mergeable buffer doesn't
>> forbid using a descriptor chain as a single buffer.
>>
>> So if it's a descriptor chain we got back the method A by placing the
>> payload in a dedicated buffer. If it's not placing the payload in an
>> adjacent buffer.
>>
>> Thanks
> Let's find a way so devices do not care how descriptors are laid out.

Hi, I'm not sure if you're replying to the wrong email (now in v7 thread),
if not, I'm guessing you mean we continue to discuss a way for devices
to not care how descriptors are laid out based on split header v7,
and no longer consider the new options B and C?


Thanks.

>
>>>>> Then we simply compare the advantages and disadvantages of scheme A(spec
>>>>> v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable
>>>>> buffer):
>>>>>
>>>>> 1. desc chain:
>>>>>
>>>>> - A depends on desciptor chain; - B, C do not depend on desciptor chain.
>>>>>
>>>>> 2. page alloc
>>>>>
>>>>> - B fills two consecutive pages, which causes a great waste of memory for
>>>>> small packages such as arp; - C fills a single page, slightly better than B.
>>>>>
>>>>> 3. Memory waste:
>>>>>
>>>>> - The memory waste of scheme A is mainly the 0th descriptor that is skipped
>>>>> by the device;
>>>> there's also the cost of the indirect buffer since that is used when
>>>> there is a chain.
>>> Yes
>>>
>>>
>>>>> - When scheme B and scheme C successfully split the header,
>>>>> there is a huge waste of the first page, but the first page can be quickly
>>>>> released by copying.
>>>>>
>>>>> 4. headroom
>>>>>
>>>>> - The headrooms of plan A and plan B are reserved; - Scheme C requires the
>>>>> driver to set off to let the device skip off when using receive buffer1.
>>>>>
>>>>> 5. tailroom
>>>>>
>>>>> - When splitting the header, skb usually needs to store each independent
>>>>> page in the non-linear data area based on shinfo. - The tailroom of scheme A
>>>>> is reserved by itself; - Scheme B requires the driver to set the reserved
>>>>> padding area for the first receive buffer(2 pages) to use shinfo when the
>>>>> split header is not successfully executed; - Scheme C requires the driver to
>>>>> set max_len for the first receive buffer(page).
>>>>>
>>>>>
>>>>> Which plan do you prefer?
>>>> I think either both B and C depending on the mergeable buffers flag,
>>>> or just one of these two.
>>> If I understand correctly, B does not depend on mergeable, while C must depend
>>> on mergeable.
>>>
>>> Thanks.
>>>
>>>
>>>>> ---
>>>>>
>>>>> Thanks.
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [PATCH v7] virtio_net: support split header
  2022-09-27 21:35               ` Michael S. Tsirkin
  2022-09-28  2:15                 ` Heng Qi
@ 2022-09-28  8:01                 ` Xuan Zhuo
  1 sibling, 0 replies; 32+ messages in thread
From: Xuan Zhuo @ 2022-09-28  8:01 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: virtio-dev, kangjie.xu, Heng Qi, Jason Wang

On Tue, 27 Sep 2022 17:35:09 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 14, 2022 at 11:34:43AM +0800, Jason Wang wrote:
> >
> > 在 2022/9/9 20:38, Xuan Zhuo 写道:
> > > On Fri, 9 Sep 2022 07:15:02 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Fri, Sep 09, 2022 at 03:41:54PM +0800, Heng Qi wrote:
> > > > >
> > > > > 在 2022/9/5 上午4:27, Michael S. Tsirkin 写道:
> > > > > > On Fri, Sep 02, 2022 at 03:36:25PM +0800, Heng Qi wrote:
> > > > > > > We need to clarify that the purpose of header splitting is to make all payloads
> > > > > > > can be independently in a page, which is beneficial for the zerocopy
> > > > > > > implemented by the upper layer.
> > > > > > absolutely, pls add motivation.
> > > > > >
> > > > > > > If the driver does not enforce that the buffers submitted to the receiveq MUST
> > > > > > > be composed of at least two descriptors, then header splitting will become meaningless,
> > > > > > > or the VIRTIO_NET_F_SPLIT_TRANSPORT_HEADER feature should not be negotiated at this time.
> > > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > This seems very narrow and unecessarily wasteful of descriptors.
> > > > > > What is wrong in this:
> > > > > >
> > > > > > <header>...<padding>... <beginning of page><data>
> > > > > >
> > > > > > seems to achieve the goal of data in a separate page without
> > > > > > using extra descriptors.
> > > > > >
> > > > > > thus my proposal to replace the requirement of a separate
> > > > > > descriptor with an offset of data from beginning of
> > > > > > buffer that driver sets.
> > > > > >
> > > > > >
> > > > > We have carefully considered your suggestion.
> > > > >
> > > > > We refer to spec v7 and earlier as scheme A for short. Review scheme A
> > > > > below:
> > > > >
> > > > > | receive buffer |
> > > > >
> > > > > | 0th descriptor | 1th descriptor |
> > > > >
> > > > > | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |
> > > > >
> > > > > We use a buffer plus a separate page when allocating the receive
> > > > >
> > > > > buffer. In this way, we can ensure that all payloads can be
> > > > >
> > > > > independently in a page, which is very beneficial for the zerocopy
> > > > >
> > > > > implemented by the upper layer.
> > > > >
> > > > > scheme A better solves the problem of headroom, tailroom and memory waste,
> > > > > but as you said, this solution relies on descriptor chain.
> > > > >
> > > > > Our rethinking approach is no longer based on or using descriptor chain.
> > > > >
> > > > > We refer to your proposed offset-based scheme as scheme B:
> > > > >
> > > > > As you suggested, scheme B gives the device a buffer, using offset to
> > > > > indicate where to place the payload like this:
> > > > >
> > > > > <header>...<padding>... <beginning of page><data>
> > > > >
> > > > > But how to apply for this buffer? Since we want the payload to be placed on
> > > > > a separate page, the method we consider is to directly apply to the driver
> > > > > for two pages of contiguous memory.
> > > > >
> > > > > Then the beginning of this contiguous memory is used to store the headroom,
> > > > > and the contiguous memory after the headroom is directly handed over to the
> > > > > device. similar to the following:
> > > > >
> > > > > <------------------------------------------ receive buffer(2 pages)
> > > > > ----------------------------------------->
> > > > >
> > > > > <<---------------------------------- first page
> > > > > -----------------------------------><---- second page ------>>
> > > > >
> > > > > <<Driver reserved, the later part is filled><vheader><transport
> > > > > header>..<padding>..<beginning of page><data>>
> > > > >
> > > > > Based on your previous suggestion, we also considered another new scheme C.
> > > > >
> > > > > This scheme is implemented based on mergeable buffer, filling a separate
> > > > > page each time.
> > > > >
> > > > > If the split header is negotiated and the packet can be successfully split
> > > > > by the device, the device needs to find at least two buffers, namely two
> > > > > pages, one for the virtio-net header and transport header, and the other for
> > > > > the data payload. Like the following:
> > > > >
> > > > > | receive buffer1(page) | receive buffer2 (page) |
> > > > >
> > > > > | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload |
> > > > >
> > > > > At the same time, if XDP is considered, then the device needs to add
> > > > > headroom at the beginning of receive buffer1 when receiving packets, so that
> > > > > the driver can process programs similar to XDP. In order to solve this
> > > > > problem, can scheme C introduce an offset, which requires the device to
> > > > > write data from the offset position to receive buffer1, like the following:
> > > > >
> > > > > | receive buffer (page) | receive buffer (page) |
> > > > >
> > > > > | <-- offset(hold) --> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|
> > > > > payload |
> > > > And in fact, B and C both use an offset now, right?
> > >
> > > B: offset is used to get the position to place the payload.
> > > C: The offset is used to reserve some space for the device, which the driver can
> > >     use as headroom.
> > >
> > >     In order to make the payload page-aligned, we can only hand over the entire
> > >     page to the device, so we cannot reserve some headroom in advance.
> >
> >
> > For C, it might be better to do some tweak since mergeable buffer doesn't
> > forbid using a descriptor chain as a single buffer.
> >
> > So if it's a descriptor chain we got back the method A by placing the
> > payload in a dedicated buffer. If it's not placing the payload in an
> > adjacent buffer.
> >
> > Thanks
>
> Let's find a way so devices do not care how descriptors are laid out.

Can we try to make a desc describe a similar but unconnected piece of memory?

struct vring_desc {
	__virtio64 addr;
	__virtio32 len;
	__virtio16 flags;
	__virtio16 next;
};

Note that len in desc is currently 32 bits, but generally 16 bits is enough, so
we have 16bits free space, and because we do not use desc chain, next is also
free. We can use this two 16-bit spaces describe a hold.

|<-- buf1 --><-- hold --><--- page --->|
|<- offset-->|
             |<- size ->|

offset and size are less 65535.

Thanks.


>
> >
> > >
> > > > > Then we simply compare the advantages and disadvantages of scheme A(spec
> > > > > v7), scheme B (offset buffer(2 pages)) and scheme C (based on mergeable
> > > > > buffer):
> > > > >
> > > > > 1. desc chain:
> > > > >
> > > > > - A depends on desciptor chain; - B, C do not depend on desciptor chain.
> > > > >
> > > > > 2. page alloc
> > > > >
> > > > > - B fills two consecutive pages, which causes a great waste of memory for
> > > > > small packages such as arp; - C fills a single page, slightly better than B.
> > > > >
> > > > > 3. Memory waste:
> > > > >
> > > > > - The memory waste of scheme A is mainly the 0th descriptor that is skipped
> > > > > by the device;
> > > > there's also the cost of the indirect buffer since that is used when
> > > > there is a chain.
> > > Yes
> > >
> > >
> > > > > - When scheme B and scheme C successfully split the header,
> > > > > there is a huge waste of the first page, but the first page can be quickly
> > > > > released by copying.
> > > > >
> > > > > 4. headroom
> > > > >
> > > > > - The headrooms of plan A and plan B are reserved; - Scheme C requires the
> > > > > driver to set off to let the device skip off when using receive buffer1.
> > > > >
> > > > > 5. tailroom
> > > > >
> > > > > - When splitting the header, skb usually needs to store each independent
> > > > > page in the non-linear data area based on shinfo. - The tailroom of scheme A
> > > > > is reserved by itself; - Scheme B requires the driver to set the reserved
> > > > > padding area for the first receive buffer(2 pages) to use shinfo when the
> > > > > split header is not successfully executed; - Scheme C requires the driver to
> > > > > set max_len for the first receive buffer(page).
> > > > >
> > > > >
> > > > > Which plan do you prefer?
> > > > I think either both B and C depending on the mergeable buffers flag,
> > > > or just one of these two.
> > > If I understand correctly, B does not depend on mergeable, while C must depend
> > > on mergeable.
> > >
> > > Thanks.
> > >
> > >
> > > > > ---
> > > > >
> > > > > Thanks.
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-09-28  8:08 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-16  9:34 [virtio-dev] [PATCH v7] virtio_net: support split header Heng Qi
2022-08-25 14:22 ` Cornelia Huck
2022-08-30 11:23   ` Heng Qi
2022-08-30 11:26 ` Heng Qi
2022-09-02  4:12 ` Heng Qi
2022-09-08 21:18   ` Michael S. Tsirkin
2022-09-02  6:21 ` [virtio-dev] " Jason Wang
2022-09-02  6:41   ` Michael S. Tsirkin
2022-09-02  8:58     ` Heng Qi
2022-09-04 20:31       ` Michael S. Tsirkin
2022-09-05  7:52         ` Xuan Zhuo
2022-09-05  8:37           ` Heng Qi
2022-09-05  9:43             ` Xuan Zhuo
2022-09-06  5:47               ` Jason Wang
2022-09-08 21:18               ` Michael S. Tsirkin
2022-09-02  7:36   ` Heng Qi
2022-09-04 20:27     ` Michael S. Tsirkin
2022-09-06  5:56       ` Jason Wang
2022-09-09  7:41       ` [virtio-dev] " Heng Qi
2022-09-09 11:15         ` Michael S. Tsirkin
2022-09-09 12:38           ` Xuan Zhuo
2022-09-14  3:34             ` Jason Wang
2022-09-27 21:35               ` Michael S. Tsirkin
2022-09-28  2:15                 ` Heng Qi
2022-09-28  8:01                 ` Xuan Zhuo
2022-09-09 12:47           ` Xuan Zhuo
2022-09-13  7:20             ` Heng Qi
2022-09-09 10:22       ` Heng Qi
2022-09-02  8:26   ` Heng Qi
2022-09-06  5:53     ` Jason Wang
2022-09-02  6:48 ` Michael S. Tsirkin
2022-09-07 11:16 ` [virtio-dev] " Heng Qi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.