virtio-dev.lists.oasis-open.org archive mirror
 help / color / mirror / Atom feed
* [virtio-dev] [PATCH v11] virtio-net: support inner header hash
@ 2023-03-20 11:18 Heng Qi
  2023-03-20 19:43 ` [virtio-dev] " Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Heng Qi @ 2023-03-20 11:18 UTC (permalink / raw)
  To: virtio-dev, virtio-comment
  Cc: Michael S . Tsirkin, Parav Pandit, Jason Wang, Yuri Benditovich,
	Xuan Zhuo

1. Currently, a received encapsulated packet has an outer and an inner header, but
the virtio device is unable to calculate the hash for the inner header. Multiple
flows with the same outer header but different inner headers are steered to the
same receive queue. This results in poor receive performance.

To address this limitation, a new feature VIRTIO_NET_F_HASH_TUNNEL has been
introduced, which enables the device to advertise the capability to calculate the
hash for the inner packet header. Compared with the out header hash, it regains
better receive performance.

2. The same flow can traverse through different tunnels, resulting in the encapsulated
packets being spread across multiple receive queues (refer to the figure below).
However, in certain scenarios, it becomes necessary to direct these encapsulated
packets of the same flow to a single receive queue. This facilitates the processing
of the flow by the same CPU to improve performance (warm caches, less locking, etc.).

               client1                    client2
                  |                          |
                  |        +-------+         |
                  +------->|tunnels|<--------+
                           +-------+
                              |  |
                              |  |
                              v  v
                      +-----------------+
                      | processing host |
                      +-----------------+

To achieve this, the device can calculate a symmetric hash based on the inner packet
headers of the flow. The symmetric hash disregards the order of the 5-tuple when
computing the hash.

Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
v10->v11:
	1. Revise commit log for clarity for readers.
	2. Some modifications to avoid undefined terms. @Parav Pandit
	3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
	4. Add the normative statements. @Parav Pandit

v9->v10:
	1. Removed hash_report_tunnel related information. @Parav Pandit
	2. Re-describe the limitations of QoS for tunneling.
	3. Some clarification.

v8->v9:
	1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
	2. Add tunnel security section. @Michael S . Tsirkin
	3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
	4. Fix some typos.
	5. Add more tunnel types. @Michael S . Tsirkin

v7->v8:
	1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
	2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
	3. Removed re-definition for inner packet hashing. @Parav Pandit
	4. Fix some typos. @Michael S . Tsirkin
	5. Clarify some sentences. @Michael S . Tsirkin

v6->v7:
	1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
	2. Fix some syntax issues. @Michael S. Tsirkin

v5->v6:
	1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
	2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
	3. Move the links to introduction section. @Michael S. Tsirkin
	4. Clarify some sentences. @Michael S. Tsirkin

v4->v5:
	1. Clarify some paragraphs. @Cornelia Huck
	2. Fix the u8 type. @Cornelia Huck

v3->v4:
	1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
	2. Make things clearer. @Jason Wang @Michael S. Tsirkin
	3. Keep the possibility to use inner hash for automatic receive steering. @Jason Wang
	4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. many times. @Michael S. Tsirkin

v2->v3:
	1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
	2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason Wang, @Michael S. Tsirkin

v1->v2:
	1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
	2. Clarify some paragraphs. @Jason Wang
	3. Add \field{hash_tunnel} and VIRTIO_NET_HASH_REPORT_GRE. @Yuri Benditovich

 device-types/net/description.tex        | 119 +++++++++++++++++++++++-
 device-types/net/device-conformance.tex |   1 +
 device-types/net/driver-conformance.tex |   1 +
 introduction.tex                        |  24 +++++
 4 files changed, 144 insertions(+), 1 deletion(-)

diff --git a/device-types/net/description.tex b/device-types/net/description.tex
index 0500bb6..49dee2f 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -83,6 +83,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
     channel.
 
+\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner packet header hash
+    for tunnel-encapsulated packets.
+
 \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
 
 \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -139,6 +142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
 \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
 \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with VIRTIO_NET_F_RSS and/or VIRTIO_NET_F_HASH_REPORT.
 \end{description}
 
 \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -198,6 +202,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
         u8 rss_max_key_size;
         le16 rss_max_indirection_table_length;
         le32 supported_hash_types;
+        le32 supported_tunnel_hash_types;
 };
 \end{lstlisting}
 The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
@@ -212,6 +217,12 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
 Field \field{supported_hash_types} contains the bitmask of supported hash types.
 See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types.
 
+The next field, \field{supported_tunnel_hash_types} only exists if the device
+supports inner packet header hash, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set.
+
+Field \field{supported_tunnel_hash_types} contains the bitmask of supported tunnel hash types.
+See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types} for details of supported tunnel hash types.
+
 \devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
 
 The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
@@ -848,6 +859,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
 If the feature VIRTIO_NET_F_RSS was negotiated:
 \begin{itemize}
 \item The device uses \field{hash_types} of the virtio_net_rss_config structure as 'Enabled hash types' bitmask.
+\item The device uses \field{hash_tunnel_types} of the virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
 \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_rss_config structure (see
 \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / Setting RSS parameters}).
 \end{itemize}
@@ -855,6 +867,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
 If the feature VIRTIO_NET_F_RSS was not negotiated:
 \begin{itemize}
 \item The device uses \field{hash_types} of the virtio_net_hash_config structure as 'Enabled hash types' bitmask.
+\item The device uses \field{hash_tunnel_types} of the virtio_net_hash_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
 \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_hash_config structure (see
 \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}).
 \end{itemize}
@@ -870,6 +883,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
 
 \subparagraph{Supported/enabled hash types}
 \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
+This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
+\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
 Hash types applicable for IPv4 packets:
 \begin{lstlisting}
 #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
@@ -980,6 +995,99 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
 (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}).
 \end{itemize}
 
+\paragraph{Inner Packet Header Hash}
+If the driver negotiates the VIRTIO_NET_F_HASH_TUNNEL feature, it can configure the
+hash parameters (including \field{hash_tunnel_types}) for inner packet header hash
+through the VIRTIO_NET_CTRL_MQ_HASH_CONFIG or the VIRTIO_NET_CTRL_RSS_CONFIG command.
+If multiple commands are sent, the device configuration will be defined by the last command received.
+
+If a specific encapsulation type is set in \field{hash_tunnel_types}, the device will calculate the
+hash on the inner packet header of the encapsulated packet (See \ref{sec:Device Types
+/ Network Device / Device OperatiHn / Processing of Incoming Packets /
+Hash calculation for incoming packets / Tunnel/Encapsulated packet}). If the encapsulation
+type is not included in \field{hash_tunnel_types} or the value of \field{hash_tunnel_types}
+is VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash on the outer header.
+
+\field{hash_tunnel_types} is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for non-encapsulated packets.
+
+\subparagraph{Tunnel/Encapsulated packet}
+\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet}
+A tunnel packet is encapsulated from the original packet based on the tunneling
+protocol (only a single level of encapsulation is currently supported). The
+encapsulated packet contains an outer header and an inner header, and the device
+calculates the hash over either the inner header or the outer header.
+
+When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received encapsulated
+packet's outer header matches one of the supported \field{hash_tunnel_types},
+the hash of the inner header is calculated. Supported encapsulation types are listed
+in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming
+Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
+
+Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]}, \hyperref[intro:VXLAN]{[VXLAN]},
+\hyperref[intro:GENEVE]{[GENEVE]}, \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
+
+\subparagraph{Supported/enabled tunnel hash types}
+\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types}
+If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and \field{hash_tunnel_types}
+is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash using the
+outer header of the encapsulated packet.
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_NONE        (1 << 0)
+\end{lstlisting}
+
+The encapsulation hash type below indicates that the hash is calculated over the
+inner packet header:
+Hash type applicable for inner payload of the gre-encapsulated packet
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 1)
+\end{lstlisting}
+Hash type applicable for inner payload of the vxlan-encapsulated packet
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 2)
+\end{lstlisting}
+Hash type applicable for inner payload of the geneve-encapsulated packet
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 3)
+\end{lstlisting}
+Hash type applicable for inner payload of the ip-encapsulated packet
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 4)
+\end{lstlisting}
+Hash type applicable for inner payload of the nvgre-encapsulated packet
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 5)
+\end{lstlisting}
+
+\subparagraph{Tunnel QoS limitation}
+When a specific receive queue is shared by multiple tunnels to receive encapsulated packets,
+there is no quality of service (QoS) for these packets. For example, when the packets of certain
+tunnels are spread across multiple receive queues, these receive queues may have an unbalanced
+amount of packets. This can cause a specific receive queue to become full, resulting in packet loss.
+
+Possible mitigations:
+\begin{itemize}
+\item Use a tool with good forwarding performance to keep the receive queue from filling up.
+\item If the QoS is unavailable, the driver can set \field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
+      to disable inner packet hash for encapsulated packets.
+\item Choose a hash key that can avoid queue collisions.
+\item Perform appropriate QoS before packets consume the receive buffers of the receive queues.
+\end{itemize}
+
+The limitations mentioned above exist with/without the inner packer header hash.
+
+\devicenormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
+
+The device MUST calculate the outer packet hash if the received encapsulated packet has an encapsulation type not in \field{supported_tunnel_hash_types}.
+
+The device MUST drop the encapsulated packet if the destination receive queue is being reset.
+
+\drivernormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
+
+If the driver does not negotiate the VIRTIO_NET_F_HASH_TUNNEL feature, it MUST set \field{hash_tunnel_types}
+to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE before issuing the command VIRTIO_NET_CTRL_MQ_HASH_CONFIG or VIRTIO_NET_CTRL_RSS_CONFIG.
+
+The driver MUST set \field{hash_tunnel_types} to the encapsulation types supported by the device.
+
 \paragraph{Hash reporting for incoming packets}
 \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting for incoming packets}
 
@@ -1392,12 +1500,17 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
     le16 reserved[4];
     u8 hash_key_length;
     u8 hash_key_data[hash_key_length];
+    le32 hash_tunnel_types;
 };
 \end{lstlisting}
 Field \field{hash_types} contains a bitmask of allowed hash types as
 defined in
 \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
-Initially the device has all hash types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
+
+Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
+defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
+
+Initially the device has all hash types and hash tunnel types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
 
 Field \field{reserved} MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure,
 defined in \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS)}.
@@ -1421,6 +1534,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
     le16 max_tx_vq;
     u8 hash_key_length;
     u8 hash_key_data[hash_key_length];
+    le32 hash_tunnel_types;
 };
 \end{lstlisting}
 Field \field{hash_types} contains a bitmask of allowed hash types as
@@ -1441,6 +1555,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
 
 Fields \field{hash_key_length} and \field{hash_key_data} define the key to be used in hash calculation.
 
+Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
+defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
+
 \drivernormative{\subparagraph}{Setting RSS parameters}{Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
 
 A driver MUST NOT send the VIRTIO_NET_CTRL_MQ_RSS_CONFIG command if the feature VIRTIO_NET_F_RSS has not been negotiated.
diff --git a/device-types/net/device-conformance.tex b/device-types/net/device-conformance.tex
index 54f6783..0ff5944 100644
--- a/device-types/net/device-conformance.tex
+++ b/device-types/net/device-conformance.tex
@@ -14,4 +14,5 @@
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
 \end{itemize}
diff --git a/device-types/net/driver-conformance.tex b/device-types/net/driver-conformance.tex
index 97d0cc1..951be89 100644
--- a/device-types/net/driver-conformance.tex
+++ b/device-types/net/driver-conformance.tex
@@ -14,4 +14,5 @@
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
 \end{itemize}
diff --git a/introduction.tex b/introduction.tex
index 287c5fc..25c9d48 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -99,6 +99,30 @@ \section{Normative References}\label{sec:Normative References}
     Standards for Efficient Cryptography Group(SECG), ``SEC1: Elliptic Cureve Cryptography'', Version 1.0, September 2000.
 	\newline\url{https://www.secg.org/sec1-v2.pdf}\\
 
+	\phantomsection\label{intro:GRE}\textbf{[GRE]} &
+    Generic Routing Encapsulation
+	\newline\url{https://datatracker.ietf.org/doc/rfc2784/}\\
+	\phantomsection\label{intro:VXLAN}\textbf{[VXLAN]} &
+    Virtual eXtensible Local Area Network
+	\newline\url{https://datatracker.ietf.org/doc/rfc7348/}\\
+	\phantomsection\label{intro:GENEVE}\textbf{[GENEVE]} &
+    Generic Network Virtualization Encapsulation
+	\phantomsection\label{intro:IPIP}\textbf{[IPIP]} &
+    IP Encapsulation within IP
+	\newline\url{https://www.rfc-editor.org/rfc/rfc2003}\\
+	\phantomsection\label{intro:IPIP}\textbf{[NVGRE]} &
+    NVGRE: Network Virtualization Using Generic Routing Encapsulation
+	\newline\url{https://www.rfc-editor.org/rfc/rfc7637.html}\\
+	\newline\url{https://datatracker.ietf.org/doc/rfc8926/}\\
+	\phantomsection\label{intro:IP}\textbf{[IP]} &
+    INTERNET PROTOCOL
+	\newline\url{https://www.rfc-editor.org/rfc/rfc791}\\
+	\phantomsection\label{intro:UDP}\textbf{[UDP]} &
+    User Datagram Protocol
+	\newline\url{https://www.rfc-editor.org/rfc/rfc768}\\
+	\phantomsection\label{intro:TCP}\textbf{[TCP]} &
+    TRANSMISSION CONTROL PROTOCOL
+	\newline\url{https://www.rfc-editor.org/rfc/rfc793}\\
 \end{longtable}
 
 \section{Non-Normative References}
-- 
2.19.1.6.gb485710b


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-20 11:18 [virtio-dev] [PATCH v11] virtio-net: support inner header hash Heng Qi
@ 2023-03-20 19:43 ` Michael S. Tsirkin
  2023-03-20 21:07   ` Michael S. Tsirkin
                     ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2023-03-20 19:43 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-dev, virtio-comment, Parav Pandit, Jason Wang,
	Yuri Benditovich, Xuan Zhuo

On Mon, Mar 20, 2023 at 07:18:40PM +0800, Heng Qi wrote:
> 1. Currently, a received encapsulated packet has an outer and an inner header, but
> the virtio device is unable to calculate the hash for the inner header. Multiple
> flows with the same outer header but different inner headers are steered to the
> same receive queue. This results in poor receive performance.
> 
> To address this limitation, a new feature VIRTIO_NET_F_HASH_TUNNEL has been
> introduced, which enables the device to advertise the capability to calculate the
> hash for the inner packet header. Compared with the out header hash, it regains
> better receive performance.

So this would be a very good argument however the cost would be it would
seem we have to keep extending this indefinitely as new tunneling
protocols come to light.
But I believe in fact we don't at least for this argument:
the standard way to address this is actually by propagating entropy 
from inner to outer header.

So I'd maybe reorder the commit log and give the explanation 2 below
then say "for some legacy systems 
including entropy in IP header
as done in modern protocols is not practical, resulting in
bad performance under RSS".


> 2. The same flow can traverse through different tunnels, resulting in the encapsulated
> packets being spread across multiple receive queues (refer to the figure below).
> However, in certain scenarios, it becomes necessary to direct these encapsulated
> packets of the same flow to a single receive queue. This facilitates the processing
> of the flow by the same CPU to improve performance (warm caches, less locking, etc.).
> 
>                client1                    client2
>                   |                          |
>                   |        +-------+         |
>                   +------->|tunnels|<--------+
>                            +-------+
>                               |  |
>                               |  |
>                               v  v
>                       +-----------------+
>                       | processing host |
>                       +-----------------+

necessary is too strong a word I feel.
All this is, is an optimization, we don't really know how strong it is
even.

Here's how I understand this:

Imagine two clients client1 and client2 talking to each other.
A copy of all packets is sent to a processing host over a virtio device.
Two directions of the same flow between two clients might be
encapsulated in two different tunnels, with current RSS
strategies they would land on two arbitrary, unrelated queues.
As an optimization, some hosts might wish to make sure both directions
of the encapsulated flow land on the same queue.


Is this a good summary?


Now that things begin to be clearer, I kind of begin to agree with
Jason's suggestion that this is extremely narrow.  And what if I want
one direction on queue1 and another one queue2 e.g. adjacent numbers for
the same flow?  If enough people agree this is needed we can accept this
but did you at all consider using something programmable like BPF for
this?  Considering we are putting not insignificant amount of work into
this, making this widely useful would be better than a narrow
optimization for a very specific usecase.


> To achieve this, the device can calculate a symmetric hash based on the inner packet
> headers of the flow. The symmetric hash disregards the order of the 5-tuple when
> computing the hash.

when you say symmetric hash you really mean symmetric key for toeplitz, yes?
It's not that it disregards order, it just gives the same result if
you reverse source and destination, no?


> Reviewed-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> v10->v11:
> 	1. Revise commit log for clarity for readers.
> 	2. Some modifications to avoid undefined terms. @Parav Pandit
> 	3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
> 	4. Add the normative statements. @Parav Pandit
> 
> v9->v10:
> 	1. Removed hash_report_tunnel related information. @Parav Pandit
> 	2. Re-describe the limitations of QoS for tunneling.
> 	3. Some clarification.
> 
> v8->v9:
> 	1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
> 	2. Add tunnel security section. @Michael S . Tsirkin
> 	3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
> 	4. Fix some typos.
> 	5. Add more tunnel types. @Michael S . Tsirkin
> 
> v7->v8:
> 	1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
> 	2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
> 	3. Removed re-definition for inner packet hashing. @Parav Pandit
> 	4. Fix some typos. @Michael S . Tsirkin
> 	5. Clarify some sentences. @Michael S . Tsirkin
> 
> v6->v7:
> 	1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
> 	2. Fix some syntax issues. @Michael S. Tsirkin
> 
> v5->v6:
> 	1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
> 	2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
> 	3. Move the links to introduction section. @Michael S. Tsirkin
> 	4. Clarify some sentences. @Michael S. Tsirkin
> 
> v4->v5:
> 	1. Clarify some paragraphs. @Cornelia Huck
> 	2. Fix the u8 type. @Cornelia Huck
> 
> v3->v4:
> 	1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
> 	2. Make things clearer. @Jason Wang @Michael S. Tsirkin
> 	3. Keep the possibility to use inner hash for automatic receive steering. @Jason Wang
> 	4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. many times. @Michael S. Tsirkin
> 
> v2->v3:
> 	1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
> 	2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason Wang, @Michael S. Tsirkin
> 
> v1->v2:
> 	1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
> 	2. Clarify some paragraphs. @Jason Wang
> 	3. Add \field{hash_tunnel} and VIRTIO_NET_HASH_REPORT_GRE. @Yuri Benditovich
> 
>  device-types/net/description.tex        | 119 +++++++++++++++++++++++-
>  device-types/net/device-conformance.tex |   1 +
>  device-types/net/driver-conformance.tex |   1 +
>  introduction.tex                        |  24 +++++
>  4 files changed, 144 insertions(+), 1 deletion(-)
> 
> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> index 0500bb6..49dee2f 100644
> --- a/device-types/net/description.tex
> +++ b/device-types/net/description.tex
> @@ -83,6 +83,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>      channel.
>  
> +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner packet header hash
> +    for tunnel-encapsulated packets.
> +
>  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>  
>  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> @@ -139,6 +142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with VIRTIO_NET_F_RSS and/or VIRTIO_NET_F_HASH_REPORT.
>  \end{description}
>  
>  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> @@ -198,6 +202,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>          u8 rss_max_key_size;
>          le16 rss_max_indirection_table_length;
>          le32 supported_hash_types;
> +        le32 supported_tunnel_hash_types;
>  };
>  \end{lstlisting}
>  The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
> @@ -212,6 +217,12 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>  Field \field{supported_hash_types} contains the bitmask of supported hash types.
>  See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types.
>  
> +The next field, \field{supported_tunnel_hash_types} only exists if the device
> +supports inner packet header hash, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set.
> +
> +Field \field{supported_tunnel_hash_types} contains the bitmask of supported tunnel hash types.
> +See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types} for details of supported tunnel hash types.
> +
>  \devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
>  
>  The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
> @@ -848,6 +859,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>  If the feature VIRTIO_NET_F_RSS was negotiated:
>  \begin{itemize}
>  \item The device uses \field{hash_types} of the virtio_net_rss_config structure as 'Enabled hash types' bitmask.
> +\item The device uses \field{hash_tunnel_types} of the virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
>  \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_rss_config structure (see
>  \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / Setting RSS parameters}).
>  \end{itemize}
> @@ -855,6 +867,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>  If the feature VIRTIO_NET_F_RSS was not negotiated:
>  \begin{itemize}
>  \item The device uses \field{hash_types} of the virtio_net_hash_config structure as 'Enabled hash types' bitmask.
> +\item The device uses \field{hash_tunnel_types} of the virtio_net_hash_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
>  \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_hash_config structure (see
>  \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}).
>  \end{itemize}
> @@ -870,6 +883,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>  
>  \subparagraph{Supported/enabled hash types}
>  \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
> +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
> +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
>  Hash types applicable for IPv4 packets:
>  \begin{lstlisting}
>  #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
> @@ -980,6 +995,99 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>  (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}).
>  \end{itemize}
>  
> +\paragraph{Inner Packet Header Hash}
> +If the driver negotiates the VIRTIO_NET_F_HASH_TUNNEL feature, it can configure the
> +hash parameters (including \field{hash_tunnel_types}) for inner packet header hash
> +through the VIRTIO_NET_CTRL_MQ_HASH_CONFIG or the VIRTIO_NET_CTRL_RSS_CONFIG command.
> +If multiple commands are sent, the device configuration will be defined by the last command received.
> +
> +If a specific encapsulation type is set in \field{hash_tunnel_types}, the device will calculate the
> +hash on the inner packet header of the encapsulated packet (See \ref{sec:Device Types
> +/ Network Device / Device OperatiHn / Processing of Incoming Packets /
> +Hash calculation for incoming packets / Tunnel/Encapsulated packet}). If the encapsulation
> +type is not included in \field{hash_tunnel_types} or the value of \field{hash_tunnel_types}
> +is VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash on the outer header.
> +
> +\field{hash_tunnel_types} is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for non-encapsulated packets.
> +
> +\subparagraph{Tunnel/Encapsulated packet}
> +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet}
> +A tunnel packet is encapsulated from the original packet based on the tunneling
> +protocol (only a single level of encapsulation is currently supported). The
> +encapsulated packet contains an outer header and an inner header, and the device
> +calculates the hash over either the inner header or the outer header.
> +
> +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received encapsulated
> +packet's outer header matches one of the supported \field{hash_tunnel_types},
> +the hash of the inner header is calculated. Supported encapsulation types are listed
> +in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming
> +Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> +
> +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]}, \hyperref[intro:VXLAN]{[VXLAN]},
> +\hyperref[intro:GENEVE]{[GENEVE]}, \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
> +
> +\subparagraph{Supported/enabled tunnel hash types}
> +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types}
> +If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and \field{hash_tunnel_types}
> +is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash using the
> +outer header of the encapsulated packet.
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NONE        (1 << 0)
> +\end{lstlisting}
> +
> +The encapsulation hash type below indicates that the hash is calculated over the
> +inner packet header:
> +Hash type applicable for inner payload of the gre-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 1)
> +\end{lstlisting}
> +Hash type applicable for inner payload of the vxlan-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 2)
> +\end{lstlisting}
> +Hash type applicable for inner payload of the geneve-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 3)
> +\end{lstlisting}
> +Hash type applicable for inner payload of the ip-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 4)
> +\end{lstlisting}
> +Hash type applicable for inner payload of the nvgre-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 5)
> +\end{lstlisting}
> +
> +\subparagraph{Tunnel QoS limitation}
> +When a specific receive queue is shared by multiple tunnels to receive encapsulated packets,
> +there is no quality of service (QoS) for these packets. For example, when the packets of certain
> +tunnels are spread across multiple receive queues, these receive queues may have an unbalanced
> +amount of packets. This can cause a specific receive queue to become full, resulting in packet loss.
> +
> +Possible mitigations:
> +\begin{itemize}
> +\item Use a tool with good forwarding performance to keep the receive queue from filling up.
> +\item If the QoS is unavailable, the driver can set \field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
> +      to disable inner packet hash for encapsulated packets.
> +\item Choose a hash key that can avoid queue collisions.
> +\item Perform appropriate QoS before packets consume the receive buffers of the receive queues.
> +\end{itemize}
> +
> +The limitations mentioned above exist with/without the inner packer header hash.
> +
> +\devicenormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> +
> +The device MUST calculate the outer packet hash if the received encapsulated packet has an encapsulation type not in \field{supported_tunnel_hash_types}.
> +
> +The device MUST drop the encapsulated packet if the destination receive queue is being reset.

I'm not sure how this last one got here. It seems to have nothing to do
with encapsulation - if we want to we should require this for all
packets or none at all.


> +\drivernormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> +
> +If the driver does not negotiate the VIRTIO_NET_F_HASH_TUNNEL feature, it MUST set \field{hash_tunnel_types}
> +to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE before issuing the command VIRTIO_NET_CTRL_MQ_HASH_CONFIG or VIRTIO_NET_CTRL_RSS_CONFIG.
> +
> +The driver MUST set \field{hash_tunnel_types} to the encapsulation types supported by the device.

unclear. seems to mean all types must be approved
where you really mean "only those types". original for non tunnel is:

A driver MUST NOT set any VIRTIO_NET_HASH_TYPE_ flags that are not supported by a device.

which is clear though a bit verbose with two negations.

Also here it says "supported" but below it says "allowed".



>  \paragraph{Hash reporting for incoming packets}
>  \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting for incoming packets}
>  
> @@ -1392,12 +1500,17 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>      le16 reserved[4];
>      u8 hash_key_length;
>      u8 hash_key_data[hash_key_length];
> +    le32 hash_tunnel_types;
>  };

Hmm this fixed type after variable type is problematic - might
become unaligned. We could use some of reserved[4]
for this ...



>  \end{lstlisting}
>  Field \field{hash_types} contains a bitmask of allowed hash types as
>  defined in
>  \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
> -Initially the device has all hash types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
> +
> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> +
> +Initially the device has all hash types and hash tunnel types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
>  
>  Field \field{reserved} MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure,
>  defined in \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS)}.
> @@ -1421,6 +1534,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>      le16 max_tx_vq;
>      u8 hash_key_length;
>      u8 hash_key_data[hash_key_length];
> +    le32 hash_tunnel_types;


Same alignment problem here but I'm not sure how to solve it.
Suggestions?

>  };
>  \end{lstlisting}
>  Field \field{hash_types} contains a bitmask of allowed hash types as
> @@ -1441,6 +1555,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>  
>  Fields \field{hash_key_length} and \field{hash_key_data} define the key to be used in hash calculation.
>  
> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> +
>  \drivernormative{\subparagraph}{Setting RSS parameters}{Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>  
>  A driver MUST NOT send the VIRTIO_NET_CTRL_MQ_RSS_CONFIG command if the feature VIRTIO_NET_F_RSS has not been negotiated.
> diff --git a/device-types/net/device-conformance.tex b/device-types/net/device-conformance.tex
> index 54f6783..0ff5944 100644
> --- a/device-types/net/device-conformance.tex
> +++ b/device-types/net/device-conformance.tex
> @@ -14,4 +14,5 @@
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>  \end{itemize}
> diff --git a/device-types/net/driver-conformance.tex b/device-types/net/driver-conformance.tex
> index 97d0cc1..951be89 100644
> --- a/device-types/net/driver-conformance.tex
> +++ b/device-types/net/driver-conformance.tex
> @@ -14,4 +14,5 @@
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>  \end{itemize}
> diff --git a/introduction.tex b/introduction.tex
> index 287c5fc..25c9d48 100644
> --- a/introduction.tex
> +++ b/introduction.tex
> @@ -99,6 +99,30 @@ \section{Normative References}\label{sec:Normative References}
>      Standards for Efficient Cryptography Group(SECG), ``SEC1: Elliptic Cureve Cryptography'', Version 1.0, September 2000.
>  	\newline\url{https://www.secg.org/sec1-v2.pdf}\\
>  
> +	\phantomsection\label{intro:GRE}\textbf{[GRE]} &
> +    Generic Routing Encapsulation
> +	\newline\url{https://datatracker.ietf.org/doc/rfc2784/}\\

This is GRE over IPv4.
So we are not supporting GRE over IPv6?

And we do not support optional keys?



> +	\phantomsection\label{intro:VXLAN}\textbf{[VXLAN]} &
> +    Virtual eXtensible Local Area Network
> +	\newline\url{https://datatracker.ietf.org/doc/rfc7348/}\\
> +	\phantomsection\label{intro:GENEVE}\textbf{[GENEVE]} &
> +    Generic Network Virtualization Encapsulation
> +	\phantomsection\label{intro:IPIP}\textbf{[IPIP]} &
> +    IP Encapsulation within IP
> +	\newline\url{https://www.rfc-editor.org/rfc/rfc2003}\\
> +	\phantomsection\label{intro:IPIP}\textbf{[NVGRE]} &
> +    NVGRE: Network Virtualization Using Generic Routing Encapsulation
> +	\newline\url{https://www.rfc-editor.org/rfc/rfc7637.html}\\
> +	\newline\url{https://datatracker.ietf.org/doc/rfc8926/}\\
> +	\phantomsection\label{intro:IP}\textbf{[IP]} &
> +    INTERNET PROTOCOL
> +	\newline\url{https://www.rfc-editor.org/rfc/rfc791}\\
> +	\phantomsection\label{intro:UDP}\textbf{[UDP]} &
> +    User Datagram Protocol
> +	\newline\url{https://www.rfc-editor.org/rfc/rfc768}\\
> +	\phantomsection\label{intro:TCP}\textbf{[TCP]} &
> +    TRANSMISSION CONTROL PROTOCOL
> +	\newline\url{https://www.rfc-editor.org/rfc/rfc793}\\
>  \end{longtable}
>  
>  \section{Non-Normative References}
> -- 
> 2.19.1.6.gb485710b


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-20 19:43 ` [virtio-dev] " Michael S. Tsirkin
@ 2023-03-20 21:07   ` Michael S. Tsirkin
  2023-03-21  3:35   ` Jason Wang
  2023-03-21  3:56   ` Heng Qi
  2 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2023-03-20 21:07 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-dev, virtio-comment, Parav Pandit, Jason Wang,
	Yuri Benditovich, Xuan Zhuo

On Mon, Mar 20, 2023 at 03:43:51PM -0400, Michael S. Tsirkin wrote:
> On Mon, Mar 20, 2023 at 07:18:40PM +0800, Heng Qi wrote:
> > 1. Currently, a received encapsulated packet has an outer and an inner header, but
> > the virtio device is unable to calculate the hash for the inner header. Multiple
> > flows with the same outer header but different inner headers are steered to the
> > same receive queue. This results in poor receive performance.
> > 
> > To address this limitation, a new feature VIRTIO_NET_F_HASH_TUNNEL has been
> > introduced, which enables the device to advertise the capability to calculate the
> > hash for the inner packet header. Compared with the out header hash, it regains
> > better receive performance.
> 
> So this would be a very good argument however the cost would be it would
> seem we have to keep extending this indefinitely as new tunneling
> protocols come to light.
> But I believe in fact we don't at least for this argument:
> the standard way to address this is actually by propagating entropy 
> from inner to outer header.
> 
> So I'd maybe reorder the commit log and give the explanation 2 below
> then say "for some legacy systems 
> including entropy in IP header
> as done in modern protocols is not practical, resulting in
> bad performance under RSS".
> 
> 
> > 2. The same flow can traverse through different tunnels, resulting in the encapsulated
> > packets being spread across multiple receive queues (refer to the figure below).
> > However, in certain scenarios, it becomes necessary to direct these encapsulated
> > packets of the same flow to a single receive queue. This facilitates the processing
> > of the flow by the same CPU to improve performance (warm caches, less locking, etc.).
> > 
> >                client1                    client2
> >                   |                          |
> >                   |        +-------+         |
> >                   +------->|tunnels|<--------+
> >                            +-------+
> >                               |  |
> >                               |  |
> >                               v  v
> >                       +-----------------+
> >                       | processing host |
> >                       +-----------------+
> 
> necessary is too strong a word I feel.
> All this is, is an optimization, we don't really know how strong it is
> even.
> 
> Here's how I understand this:
> 
> Imagine two clients client1 and client2 talking to each other.
> A copy of all packets is sent to a processing host over a virtio device.
> Two directions of the same flow between two clients might be
> encapsulated in two different tunnels, with current RSS
> strategies they would land on two arbitrary, unrelated queues.
> As an optimization, some hosts might wish to make sure both directions
> of the encapsulated flow land on the same queue.
> 
> 
> Is this a good summary?
> 
> 
> Now that things begin to be clearer, I kind of begin to agree with
> Jason's suggestion that this is extremely narrow.  And what if I want
> one direction on queue1 and another one queue2 e.g. adjacent numbers for
> the same flow?  If enough people agree this is needed we can accept this
> but did you at all consider using something programmable like BPF for
> this?  Considering we are putting not insignificant amount of work into
> this, making this widely useful would be better than a narrow
> optimization for a very specific usecase.
> 
> 
> > To achieve this, the device can calculate a symmetric hash based on the inner packet
> > headers of the flow. The symmetric hash disregards the order of the 5-tuple when
> > computing the hash.
> 
> when you say symmetric hash you really mean symmetric key for toeplitz, yes?
> It's not that it disregards order, it just gives the same result if
> you reverse source and destination, no?

And to follow up on this, assuming this is right, we really would need
to specify in much more detail how exactly is the hash
calculated. Are we taking leftmost bits from the key?
How much? Which keys produce a symmetrical hash?
Are there keys that will do this for both IPv4 and IPv6?
If not why isn't this a problem?
It did not matter much as long as all we wanted
is to spread packets around but here we suddenly have
a whole new feature based on attempts to hash flows exactly to
the same queue.


> 
> > Reviewed-by: Jason Wang <jasowang@redhat.com>
> > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> > v10->v11:
> > 	1. Revise commit log for clarity for readers.
> > 	2. Some modifications to avoid undefined terms. @Parav Pandit
> > 	3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
> > 	4. Add the normative statements. @Parav Pandit
> > 
> > v9->v10:
> > 	1. Removed hash_report_tunnel related information. @Parav Pandit
> > 	2. Re-describe the limitations of QoS for tunneling.
> > 	3. Some clarification.
> > 
> > v8->v9:
> > 	1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
> > 	2. Add tunnel security section. @Michael S . Tsirkin
> > 	3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
> > 	4. Fix some typos.
> > 	5. Add more tunnel types. @Michael S . Tsirkin
> > 
> > v7->v8:
> > 	1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
> > 	2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
> > 	3. Removed re-definition for inner packet hashing. @Parav Pandit
> > 	4. Fix some typos. @Michael S . Tsirkin
> > 	5. Clarify some sentences. @Michael S . Tsirkin
> > 
> > v6->v7:
> > 	1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
> > 	2. Fix some syntax issues. @Michael S. Tsirkin
> > 
> > v5->v6:
> > 	1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
> > 	2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
> > 	3. Move the links to introduction section. @Michael S. Tsirkin
> > 	4. Clarify some sentences. @Michael S. Tsirkin
> > 
> > v4->v5:
> > 	1. Clarify some paragraphs. @Cornelia Huck
> > 	2. Fix the u8 type. @Cornelia Huck
> > 
> > v3->v4:
> > 	1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
> > 	2. Make things clearer. @Jason Wang @Michael S. Tsirkin
> > 	3. Keep the possibility to use inner hash for automatic receive steering. @Jason Wang
> > 	4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. many times. @Michael S. Tsirkin
> > 
> > v2->v3:
> > 	1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
> > 	2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason Wang, @Michael S. Tsirkin
> > 
> > v1->v2:
> > 	1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
> > 	2. Clarify some paragraphs. @Jason Wang
> > 	3. Add \field{hash_tunnel} and VIRTIO_NET_HASH_REPORT_GRE. @Yuri Benditovich
> > 
> >  device-types/net/description.tex        | 119 +++++++++++++++++++++++-
> >  device-types/net/device-conformance.tex |   1 +
> >  device-types/net/driver-conformance.tex |   1 +
> >  introduction.tex                        |  24 +++++
> >  4 files changed, 144 insertions(+), 1 deletion(-)
> > 
> > diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> > index 0500bb6..49dee2f 100644
> > --- a/device-types/net/description.tex
> > +++ b/device-types/net/description.tex
> > @@ -83,6 +83,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> >      channel.
> >  
> > +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner packet header hash
> > +    for tunnel-encapsulated packets.
> > +
> >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> >  
> >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > @@ -139,6 +142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> >  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with VIRTIO_NET_F_RSS and/or VIRTIO_NET_F_HASH_REPORT.
> >  \end{description}
> >  
> >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > @@ -198,6 +202,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
> >          u8 rss_max_key_size;
> >          le16 rss_max_indirection_table_length;
> >          le32 supported_hash_types;
> > +        le32 supported_tunnel_hash_types;
> >  };
> >  \end{lstlisting}
> >  The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
> > @@ -212,6 +217,12 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
> >  Field \field{supported_hash_types} contains the bitmask of supported hash types.
> >  See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types.
> >  
> > +The next field, \field{supported_tunnel_hash_types} only exists if the device
> > +supports inner packet header hash, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set.
> > +
> > +Field \field{supported_tunnel_hash_types} contains the bitmask of supported tunnel hash types.
> > +See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types} for details of supported tunnel hash types.
> > +
> >  \devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
> >  
> >  The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
> > @@ -848,6 +859,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> >  If the feature VIRTIO_NET_F_RSS was negotiated:
> >  \begin{itemize}
> >  \item The device uses \field{hash_types} of the virtio_net_rss_config structure as 'Enabled hash types' bitmask.
> > +\item The device uses \field{hash_tunnel_types} of the virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
> >  \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_rss_config structure (see
> >  \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / Setting RSS parameters}).
> >  \end{itemize}
> > @@ -855,6 +867,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> >  If the feature VIRTIO_NET_F_RSS was not negotiated:
> >  \begin{itemize}
> >  \item The device uses \field{hash_types} of the virtio_net_hash_config structure as 'Enabled hash types' bitmask.
> > +\item The device uses \field{hash_tunnel_types} of the virtio_net_hash_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
> >  \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_hash_config structure (see
> >  \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}).
> >  \end{itemize}
> > @@ -870,6 +883,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> >  
> >  \subparagraph{Supported/enabled hash types}
> >  \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
> > +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
> > +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
> >  Hash types applicable for IPv4 packets:
> >  \begin{lstlisting}
> >  #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
> > @@ -980,6 +995,99 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> >  (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}).
> >  \end{itemize}
> >  
> > +\paragraph{Inner Packet Header Hash}
> > +If the driver negotiates the VIRTIO_NET_F_HASH_TUNNEL feature, it can configure the
> > +hash parameters (including \field{hash_tunnel_types}) for inner packet header hash
> > +through the VIRTIO_NET_CTRL_MQ_HASH_CONFIG or the VIRTIO_NET_CTRL_RSS_CONFIG command.
> > +If multiple commands are sent, the device configuration will be defined by the last command received.
> > +
> > +If a specific encapsulation type is set in \field{hash_tunnel_types}, the device will calculate the
> > +hash on the inner packet header of the encapsulated packet (See \ref{sec:Device Types
> > +/ Network Device / Device OperatiHn / Processing of Incoming Packets /
> > +Hash calculation for incoming packets / Tunnel/Encapsulated packet}). If the encapsulation
> > +type is not included in \field{hash_tunnel_types} or the value of \field{hash_tunnel_types}
> > +is VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash on the outer header.
> > +
> > +\field{hash_tunnel_types} is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for non-encapsulated packets.
> > +
> > +\subparagraph{Tunnel/Encapsulated packet}
> > +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet}
> > +A tunnel packet is encapsulated from the original packet based on the tunneling
> > +protocol (only a single level of encapsulation is currently supported). The
> > +encapsulated packet contains an outer header and an inner header, and the device
> > +calculates the hash over either the inner header or the outer header.
> > +
> > +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received encapsulated
> > +packet's outer header matches one of the supported \field{hash_tunnel_types},
> > +the hash of the inner header is calculated. Supported encapsulation types are listed
> > +in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming
> > +Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > +
> > +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]}, \hyperref[intro:VXLAN]{[VXLAN]},
> > +\hyperref[intro:GENEVE]{[GENEVE]}, \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
> > +
> > +\subparagraph{Supported/enabled tunnel hash types}
> > +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types}
> > +If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and \field{hash_tunnel_types}
> > +is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash using the
> > +outer header of the encapsulated packet.
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NONE        (1 << 0)
> > +\end{lstlisting}
> > +
> > +The encapsulation hash type below indicates that the hash is calculated over the
> > +inner packet header:
> > +Hash type applicable for inner payload of the gre-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 1)
> > +\end{lstlisting}
> > +Hash type applicable for inner payload of the vxlan-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 2)
> > +\end{lstlisting}
> > +Hash type applicable for inner payload of the geneve-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 3)
> > +\end{lstlisting}
> > +Hash type applicable for inner payload of the ip-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 4)
> > +\end{lstlisting}
> > +Hash type applicable for inner payload of the nvgre-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 5)
> > +\end{lstlisting}
> > +
> > +\subparagraph{Tunnel QoS limitation}
> > +When a specific receive queue is shared by multiple tunnels to receive encapsulated packets,
> > +there is no quality of service (QoS) for these packets. For example, when the packets of certain
> > +tunnels are spread across multiple receive queues, these receive queues may have an unbalanced
> > +amount of packets. This can cause a specific receive queue to become full, resulting in packet loss.
> > +
> > +Possible mitigations:
> > +\begin{itemize}
> > +\item Use a tool with good forwarding performance to keep the receive queue from filling up.
> > +\item If the QoS is unavailable, the driver can set \field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
> > +      to disable inner packet hash for encapsulated packets.
> > +\item Choose a hash key that can avoid queue collisions.
> > +\item Perform appropriate QoS before packets consume the receive buffers of the receive queues.
> > +\end{itemize}
> > +
> > +The limitations mentioned above exist with/without the inner packer header hash.
> > +
> > +\devicenormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > +
> > +The device MUST calculate the outer packet hash if the received encapsulated packet has an encapsulation type not in \field{supported_tunnel_hash_types}.
> > +
> > +The device MUST drop the encapsulated packet if the destination receive queue is being reset.
> 
> I'm not sure how this last one got here. It seems to have nothing to do
> with encapsulation - if we want to we should require this for all
> packets or none at all.
> 
> 
> > +\drivernormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > +
> > +If the driver does not negotiate the VIRTIO_NET_F_HASH_TUNNEL feature, it MUST set \field{hash_tunnel_types}
> > +to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE before issuing the command VIRTIO_NET_CTRL_MQ_HASH_CONFIG or VIRTIO_NET_CTRL_RSS_CONFIG.
> > +
> > +The driver MUST set \field{hash_tunnel_types} to the encapsulation types supported by the device.
> 
> unclear. seems to mean all types must be approved
> where you really mean "only those types". original for non tunnel is:
> 
> A driver MUST NOT set any VIRTIO_NET_HASH_TYPE_ flags that are not supported by a device.
> 
> which is clear though a bit verbose with two negations.
> 
> Also here it says "supported" but below it says "allowed".
> 
> 
> 
> >  \paragraph{Hash reporting for incoming packets}
> >  \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting for incoming packets}
> >  
> > @@ -1392,12 +1500,17 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >      le16 reserved[4];
> >      u8 hash_key_length;
> >      u8 hash_key_data[hash_key_length];
> > +    le32 hash_tunnel_types;
> >  };
> 
> Hmm this fixed type after variable type is problematic - might
> become unaligned. We could use some of reserved[4]
> for this ...
> 
> 
> 
> >  \end{lstlisting}
> >  Field \field{hash_types} contains a bitmask of allowed hash types as
> >  defined in
> >  \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
> > -Initially the device has all hash types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
> > +
> > +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> > +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > +
> > +Initially the device has all hash types and hash tunnel types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
> >  
> >  Field \field{reserved} MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure,
> >  defined in \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS)}.
> > @@ -1421,6 +1534,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >      le16 max_tx_vq;
> >      u8 hash_key_length;
> >      u8 hash_key_data[hash_key_length];
> > +    le32 hash_tunnel_types;
> 
> 
> Same alignment problem here but I'm not sure how to solve it.
> Suggestions?
> 
> >  };
> >  \end{lstlisting}
> >  Field \field{hash_types} contains a bitmask of allowed hash types as
> > @@ -1441,6 +1555,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >  
> >  Fields \field{hash_key_length} and \field{hash_key_data} define the key to be used in hash calculation.
> >  
> > +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> > +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > +
> >  \drivernormative{\subparagraph}{Setting RSS parameters}{Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> >  
> >  A driver MUST NOT send the VIRTIO_NET_CTRL_MQ_RSS_CONFIG command if the feature VIRTIO_NET_F_RSS has not been negotiated.
> > diff --git a/device-types/net/device-conformance.tex b/device-types/net/device-conformance.tex
> > index 54f6783..0ff5944 100644
> > --- a/device-types/net/device-conformance.tex
> > +++ b/device-types/net/device-conformance.tex
> > @@ -14,4 +14,5 @@
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> >  \end{itemize}
> > diff --git a/device-types/net/driver-conformance.tex b/device-types/net/driver-conformance.tex
> > index 97d0cc1..951be89 100644
> > --- a/device-types/net/driver-conformance.tex
> > +++ b/device-types/net/driver-conformance.tex
> > @@ -14,4 +14,5 @@
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> >  \end{itemize}
> > diff --git a/introduction.tex b/introduction.tex
> > index 287c5fc..25c9d48 100644
> > --- a/introduction.tex
> > +++ b/introduction.tex
> > @@ -99,6 +99,30 @@ \section{Normative References}\label{sec:Normative References}
> >      Standards for Efficient Cryptography Group(SECG), ``SEC1: Elliptic Cureve Cryptography'', Version 1.0, September 2000.
> >  	\newline\url{https://www.secg.org/sec1-v2.pdf}\\
> >  
> > +	\phantomsection\label{intro:GRE}\textbf{[GRE]} &
> > +    Generic Routing Encapsulation
> > +	\newline\url{https://datatracker.ietf.org/doc/rfc2784/}\\
> 
> This is GRE over IPv4.
> So we are not supporting GRE over IPv6?
> 
> And we do not support optional keys?
> 
> 
> 
> > +	\phantomsection\label{intro:VXLAN}\textbf{[VXLAN]} &
> > +    Virtual eXtensible Local Area Network
> > +	\newline\url{https://datatracker.ietf.org/doc/rfc7348/}\\
> > +	\phantomsection\label{intro:GENEVE}\textbf{[GENEVE]} &
> > +    Generic Network Virtualization Encapsulation
> > +	\phantomsection\label{intro:IPIP}\textbf{[IPIP]} &
> > +    IP Encapsulation within IP
> > +	\newline\url{https://www.rfc-editor.org/rfc/rfc2003}\\
> > +	\phantomsection\label{intro:IPIP}\textbf{[NVGRE]} &
> > +    NVGRE: Network Virtualization Using Generic Routing Encapsulation
> > +	\newline\url{https://www.rfc-editor.org/rfc/rfc7637.html}\\
> > +	\newline\url{https://datatracker.ietf.org/doc/rfc8926/}\\
> > +	\phantomsection\label{intro:IP}\textbf{[IP]} &
> > +    INTERNET PROTOCOL
> > +	\newline\url{https://www.rfc-editor.org/rfc/rfc791}\\
> > +	\phantomsection\label{intro:UDP}\textbf{[UDP]} &
> > +    User Datagram Protocol
> > +	\newline\url{https://www.rfc-editor.org/rfc/rfc768}\\
> > +	\phantomsection\label{intro:TCP}\textbf{[TCP]} &
> > +    TRANSMISSION CONTROL PROTOCOL
> > +	\newline\url{https://www.rfc-editor.org/rfc/rfc793}\\
> >  \end{longtable}
> >  
> >  \section{Non-Normative References}
> > -- 
> > 2.19.1.6.gb485710b


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-20 19:43 ` [virtio-dev] " Michael S. Tsirkin
  2023-03-20 21:07   ` Michael S. Tsirkin
@ 2023-03-21  3:35   ` Jason Wang
  2023-03-21  5:12     ` Heng Qi
  2023-03-21  3:56   ` Heng Qi
  2 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2023-03-21  3:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-dev, virtio-comment, Parav Pandit,
	Yuri Benditovich, Xuan Zhuo

On Tue, Mar 21, 2023 at 3:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Mar 20, 2023 at 07:18:40PM +0800, Heng Qi wrote:
> > 1. Currently, a received encapsulated packet has an outer and an inner header, but
> > the virtio device is unable to calculate the hash for the inner header. Multiple
> > flows with the same outer header but different inner headers are steered to the
> > same receive queue. This results in poor receive performance.
> >
> > To address this limitation, a new feature VIRTIO_NET_F_HASH_TUNNEL has been
> > introduced, which enables the device to advertise the capability to calculate the
> > hash for the inner packet header. Compared with the out header hash, it regains
> > better receive performance.
>
> So this would be a very good argument however the cost would be it would
> seem we have to keep extending this indefinitely as new tunneling
> protocols come to light.
> But I believe in fact we don't at least for this argument:
> the standard way to address this is actually by propagating entropy
> from inner to outer header.

Things would be more complicated when multiple layers of tunneling are
being used.

>
> So I'd maybe reorder the commit log and give the explanation 2 below
> then say "for some legacy systems
> including entropy in IP header
> as done in modern protocols is not practical, resulting in
> bad performance under RSS".
>
>
> > 2. The same flow can traverse through different tunnels, resulting in the encapsulated
> > packets being spread across multiple receive queues (refer to the figure below).
> > However, in certain scenarios, it becomes necessary to direct these encapsulated
> > packets of the same flow to a single receive queue. This facilitates the processing
> > of the flow by the same CPU to improve performance (warm caches, less locking, etc.).
> >
> >                client1                    client2
> >                   |                          |
> >                   |        +-------+         |
> >                   +------->|tunnels|<--------+
> >                            +-------+
> >                               |  |
> >                               |  |
> >                               v  v
> >                       +-----------------+
> >                       | processing host |
> >                       +-----------------+
>
> necessary is too strong a word I feel.
> All this is, is an optimization, we don't really know how strong it is
> even.
>
> Here's how I understand this:
>
> Imagine two clients client1 and client2 talking to each other.
> A copy of all packets is sent to a processing host over a virtio device.
> Two directions of the same flow between two clients might be
> encapsulated in two different tunnels, with current RSS
> strategies they would land on two arbitrary, unrelated queues.
> As an optimization, some hosts might wish to make sure both directions
> of the encapsulated flow land on the same queue.
>
>
> Is this a good summary?
>
>
> Now that things begin to be clearer, I kind of begin to agree with
> Jason's suggestion that this is extremely narrow.  And what if I want
> one direction on queue1 and another one queue2 e.g. adjacent numbers for
> the same flow?  If enough people agree this is needed we can accept this
> but did you at all consider using something programmable like BPF for
> this?  Considering we are putting not insignificant amount of work into
> this, making this widely useful would be better than a narrow
> optimization for a very specific usecase.

+1


>
>
> > To achieve this, the device can calculate a symmetric hash based on the inner packet
> > headers of the flow. The symmetric hash disregards the order of the 5-tuple when
> > computing the hash.
>
> when you say symmetric hash you really mean symmetric key for toeplitz, yes?
> It's not that it disregards order, it just gives the same result if
> you reverse source and destination, no?
>
>
> > Reviewed-by: Jason Wang <jasowang@redhat.com>

I'd like to drop my reviewed-by since the patch differs from the one
that I reviewed in many places.

Thanks

> > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> > v10->v11:
> >       1. Revise commit log for clarity for readers.
> >       2. Some modifications to avoid undefined terms. @Parav Pandit
> >       3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
> >       4. Add the normative statements. @Parav Pandit
> >
> > v9->v10:
> >       1. Removed hash_report_tunnel related information. @Parav Pandit
> >       2. Re-describe the limitations of QoS for tunneling.
> >       3. Some clarification.
> >
> > v8->v9:
> >       1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
> >       2. Add tunnel security section. @Michael S . Tsirkin
> >       3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
> >       4. Fix some typos.
> >       5. Add more tunnel types. @Michael S . Tsirkin
> >
> > v7->v8:
> >       1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
> >       2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
> >       3. Removed re-definition for inner packet hashing. @Parav Pandit
> >       4. Fix some typos. @Michael S . Tsirkin
> >       5. Clarify some sentences. @Michael S . Tsirkin
> >
> > v6->v7:
> >       1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
> >       2. Fix some syntax issues. @Michael S. Tsirkin
> >
> > v5->v6:
> >       1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
> >       2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
> >       3. Move the links to introduction section. @Michael S. Tsirkin
> >       4. Clarify some sentences. @Michael S. Tsirkin
> >
> > v4->v5:
> >       1. Clarify some paragraphs. @Cornelia Huck
> >       2. Fix the u8 type. @Cornelia Huck
> >
> > v3->v4:
> >       1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
> >       2. Make things clearer. @Jason Wang @Michael S. Tsirkin
> >       3. Keep the possibility to use inner hash for automatic receive steering. @Jason Wang
> >       4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. many times. @Michael S. Tsirkin
> >
> > v2->v3:
> >       1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
> >       2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason Wang, @Michael S. Tsirkin
> >
> > v1->v2:
> >       1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
> >       2. Clarify some paragraphs. @Jason Wang
> >       3. Add \field{hash_tunnel} and VIRTIO_NET_HASH_REPORT_GRE. @Yuri Benditovich
> >
> >  device-types/net/description.tex        | 119 +++++++++++++++++++++++-
> >  device-types/net/device-conformance.tex |   1 +
> >  device-types/net/driver-conformance.tex |   1 +
> >  introduction.tex                        |  24 +++++
> >  4 files changed, 144 insertions(+), 1 deletion(-)
> >
> > diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> > index 0500bb6..49dee2f 100644
> > --- a/device-types/net/description.tex
> > +++ b/device-types/net/description.tex
> > @@ -83,6 +83,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> >      channel.
> >
> > +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner packet header hash
> > +    for tunnel-encapsulated packets.
> > +
> >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> >
> >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > @@ -139,6 +142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> >  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with VIRTIO_NET_F_RSS and/or VIRTIO_NET_F_HASH_REPORT.
> >  \end{description}
> >
> >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > @@ -198,6 +202,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
> >          u8 rss_max_key_size;
> >          le16 rss_max_indirection_table_length;
> >          le32 supported_hash_types;
> > +        le32 supported_tunnel_hash_types;
> >  };
> >  \end{lstlisting}
> >  The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
> > @@ -212,6 +217,12 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
> >  Field \field{supported_hash_types} contains the bitmask of supported hash types.
> >  See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types.
> >
> > +The next field, \field{supported_tunnel_hash_types} only exists if the device
> > +supports inner packet header hash, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set.
> > +
> > +Field \field{supported_tunnel_hash_types} contains the bitmask of supported tunnel hash types.
> > +See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types} for details of supported tunnel hash types.
> > +
> >  \devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
> >
> >  The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
> > @@ -848,6 +859,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> >  If the feature VIRTIO_NET_F_RSS was negotiated:
> >  \begin{itemize}
> >  \item The device uses \field{hash_types} of the virtio_net_rss_config structure as 'Enabled hash types' bitmask.
> > +\item The device uses \field{hash_tunnel_types} of the virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
> >  \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_rss_config structure (see
> >  \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / Setting RSS parameters}).
> >  \end{itemize}
> > @@ -855,6 +867,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> >  If the feature VIRTIO_NET_F_RSS was not negotiated:
> >  \begin{itemize}
> >  \item The device uses \field{hash_types} of the virtio_net_hash_config structure as 'Enabled hash types' bitmask.
> > +\item The device uses \field{hash_tunnel_types} of the virtio_net_hash_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
> >  \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_hash_config structure (see
> >  \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}).
> >  \end{itemize}
> > @@ -870,6 +883,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> >
> >  \subparagraph{Supported/enabled hash types}
> >  \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
> > +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
> > +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
> >  Hash types applicable for IPv4 packets:
> >  \begin{lstlisting}
> >  #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
> > @@ -980,6 +995,99 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> >  (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}).
> >  \end{itemize}
> >
> > +\paragraph{Inner Packet Header Hash}
> > +If the driver negotiates the VIRTIO_NET_F_HASH_TUNNEL feature, it can configure the
> > +hash parameters (including \field{hash_tunnel_types}) for inner packet header hash
> > +through the VIRTIO_NET_CTRL_MQ_HASH_CONFIG or the VIRTIO_NET_CTRL_RSS_CONFIG command.
> > +If multiple commands are sent, the device configuration will be defined by the last command received.
> > +
> > +If a specific encapsulation type is set in \field{hash_tunnel_types}, the device will calculate the
> > +hash on the inner packet header of the encapsulated packet (See \ref{sec:Device Types
> > +/ Network Device / Device OperatiHn / Processing of Incoming Packets /
> > +Hash calculation for incoming packets / Tunnel/Encapsulated packet}). If the encapsulation
> > +type is not included in \field{hash_tunnel_types} or the value of \field{hash_tunnel_types}
> > +is VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash on the outer header.
> > +
> > +\field{hash_tunnel_types} is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for non-encapsulated packets.
> > +
> > +\subparagraph{Tunnel/Encapsulated packet}
> > +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet}
> > +A tunnel packet is encapsulated from the original packet based on the tunneling
> > +protocol (only a single level of encapsulation is currently supported). The
> > +encapsulated packet contains an outer header and an inner header, and the device
> > +calculates the hash over either the inner header or the outer header.
> > +
> > +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received encapsulated
> > +packet's outer header matches one of the supported \field{hash_tunnel_types},
> > +the hash of the inner header is calculated. Supported encapsulation types are listed
> > +in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming
> > +Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > +
> > +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]}, \hyperref[intro:VXLAN]{[VXLAN]},
> > +\hyperref[intro:GENEVE]{[GENEVE]}, \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
> > +
> > +\subparagraph{Supported/enabled tunnel hash types}
> > +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types}
> > +If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and \field{hash_tunnel_types}
> > +is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash using the
> > +outer header of the encapsulated packet.
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NONE        (1 << 0)
> > +\end{lstlisting}
> > +
> > +The encapsulation hash type below indicates that the hash is calculated over the
> > +inner packet header:
> > +Hash type applicable for inner payload of the gre-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 1)
> > +\end{lstlisting}
> > +Hash type applicable for inner payload of the vxlan-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 2)
> > +\end{lstlisting}
> > +Hash type applicable for inner payload of the geneve-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 3)
> > +\end{lstlisting}
> > +Hash type applicable for inner payload of the ip-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 4)
> > +\end{lstlisting}
> > +Hash type applicable for inner payload of the nvgre-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 5)
> > +\end{lstlisting}
> > +
> > +\subparagraph{Tunnel QoS limitation}
> > +When a specific receive queue is shared by multiple tunnels to receive encapsulated packets,
> > +there is no quality of service (QoS) for these packets. For example, when the packets of certain
> > +tunnels are spread across multiple receive queues, these receive queues may have an unbalanced
> > +amount of packets. This can cause a specific receive queue to become full, resulting in packet loss.
> > +
> > +Possible mitigations:
> > +\begin{itemize}
> > +\item Use a tool with good forwarding performance to keep the receive queue from filling up.
> > +\item If the QoS is unavailable, the driver can set \field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
> > +      to disable inner packet hash for encapsulated packets.
> > +\item Choose a hash key that can avoid queue collisions.
> > +\item Perform appropriate QoS before packets consume the receive buffers of the receive queues.
> > +\end{itemize}
> > +
> > +The limitations mentioned above exist with/without the inner packer header hash.
> > +
> > +\devicenormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > +
> > +The device MUST calculate the outer packet hash if the received encapsulated packet has an encapsulation type not in \field{supported_tunnel_hash_types}.
> > +
> > +The device MUST drop the encapsulated packet if the destination receive queue is being reset.
>
> I'm not sure how this last one got here. It seems to have nothing to do
> with encapsulation - if we want to we should require this for all
> packets or none at all.
>
>
> > +\drivernormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > +
> > +If the driver does not negotiate the VIRTIO_NET_F_HASH_TUNNEL feature, it MUST set \field{hash_tunnel_types}
> > +to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE before issuing the command VIRTIO_NET_CTRL_MQ_HASH_CONFIG or VIRTIO_NET_CTRL_RSS_CONFIG.
> > +
> > +The driver MUST set \field{hash_tunnel_types} to the encapsulation types supported by the device.
>
> unclear. seems to mean all types must be approved
> where you really mean "only those types". original for non tunnel is:
>
> A driver MUST NOT set any VIRTIO_NET_HASH_TYPE_ flags that are not supported by a device.
>
> which is clear though a bit verbose with two negations.
>
> Also here it says "supported" but below it says "allowed".
>
>
>
> >  \paragraph{Hash reporting for incoming packets}
> >  \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting for incoming packets}
> >
> > @@ -1392,12 +1500,17 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >      le16 reserved[4];
> >      u8 hash_key_length;
> >      u8 hash_key_data[hash_key_length];
> > +    le32 hash_tunnel_types;
> >  };
>
> Hmm this fixed type after variable type is problematic - might
> become unaligned. We could use some of reserved[4]
> for this ...
>
>
>
> >  \end{lstlisting}
> >  Field \field{hash_types} contains a bitmask of allowed hash types as
> >  defined in
> >  \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
> > -Initially the device has all hash types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
> > +
> > +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> > +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > +
> > +Initially the device has all hash types and hash tunnel types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
> >
> >  Field \field{reserved} MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure,
> >  defined in \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS)}.
> > @@ -1421,6 +1534,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >      le16 max_tx_vq;
> >      u8 hash_key_length;
> >      u8 hash_key_data[hash_key_length];
> > +    le32 hash_tunnel_types;
>
>
> Same alignment problem here but I'm not sure how to solve it.
> Suggestions?
>
> >  };
> >  \end{lstlisting}
> >  Field \field{hash_types} contains a bitmask of allowed hash types as
> > @@ -1441,6 +1555,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >
> >  Fields \field{hash_key_length} and \field{hash_key_data} define the key to be used in hash calculation.
> >
> > +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> > +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > +
> >  \drivernormative{\subparagraph}{Setting RSS parameters}{Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> >
> >  A driver MUST NOT send the VIRTIO_NET_CTRL_MQ_RSS_CONFIG command if the feature VIRTIO_NET_F_RSS has not been negotiated.
> > diff --git a/device-types/net/device-conformance.tex b/device-types/net/device-conformance.tex
> > index 54f6783..0ff5944 100644
> > --- a/device-types/net/device-conformance.tex
> > +++ b/device-types/net/device-conformance.tex
> > @@ -14,4 +14,5 @@
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> >  \end{itemize}
> > diff --git a/device-types/net/driver-conformance.tex b/device-types/net/driver-conformance.tex
> > index 97d0cc1..951be89 100644
> > --- a/device-types/net/driver-conformance.tex
> > +++ b/device-types/net/driver-conformance.tex
> > @@ -14,4 +14,5 @@
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> >  \end{itemize}
> > diff --git a/introduction.tex b/introduction.tex
> > index 287c5fc..25c9d48 100644
> > --- a/introduction.tex
> > +++ b/introduction.tex
> > @@ -99,6 +99,30 @@ \section{Normative References}\label{sec:Normative References}
> >      Standards for Efficient Cryptography Group(SECG), ``SEC1: Elliptic Cureve Cryptography'', Version 1.0, September 2000.
> >       \newline\url{https://www.secg.org/sec1-v2.pdf}\\
> >
> > +     \phantomsection\label{intro:GRE}\textbf{[GRE]} &
> > +    Generic Routing Encapsulation
> > +     \newline\url{https://datatracker.ietf.org/doc/rfc2784/}\\
>
> This is GRE over IPv4.
> So we are not supporting GRE over IPv6?
>
> And we do not support optional keys?
>
>
>
> > +     \phantomsection\label{intro:VXLAN}\textbf{[VXLAN]} &
> > +    Virtual eXtensible Local Area Network
> > +     \newline\url{https://datatracker.ietf.org/doc/rfc7348/}\\
> > +     \phantomsection\label{intro:GENEVE}\textbf{[GENEVE]} &
> > +    Generic Network Virtualization Encapsulation
> > +     \phantomsection\label{intro:IPIP}\textbf{[IPIP]} &
> > +    IP Encapsulation within IP
> > +     \newline\url{https://www.rfc-editor.org/rfc/rfc2003}\\
> > +     \phantomsection\label{intro:IPIP}\textbf{[NVGRE]} &
> > +    NVGRE: Network Virtualization Using Generic Routing Encapsulation
> > +     \newline\url{https://www.rfc-editor.org/rfc/rfc7637.html}\\
> > +     \newline\url{https://datatracker.ietf.org/doc/rfc8926/}\\
> > +     \phantomsection\label{intro:IP}\textbf{[IP]} &
> > +    INTERNET PROTOCOL
> > +     \newline\url{https://www.rfc-editor.org/rfc/rfc791}\\
> > +     \phantomsection\label{intro:UDP}\textbf{[UDP]} &
> > +    User Datagram Protocol
> > +     \newline\url{https://www.rfc-editor.org/rfc/rfc768}\\
> > +     \phantomsection\label{intro:TCP}\textbf{[TCP]} &
> > +    TRANSMISSION CONTROL PROTOCOL
> > +     \newline\url{https://www.rfc-editor.org/rfc/rfc793}\\
> >  \end{longtable}
> >
> >  \section{Non-Normative References}
> > --
> > 2.19.1.6.gb485710b
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-20 19:43 ` [virtio-dev] " Michael S. Tsirkin
  2023-03-20 21:07   ` Michael S. Tsirkin
  2023-03-21  3:35   ` Jason Wang
@ 2023-03-21  3:56   ` Heng Qi
  2023-03-21  4:19     ` Parav Pandit
  2023-03-21  7:34     ` Michael S. Tsirkin
  2 siblings, 2 replies; 19+ messages in thread
From: Heng Qi @ 2023-03-21  3:56 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit, Alvaro Karsz
  Cc: virtio-dev, virtio-comment, Jason Wang, Yuri Benditovich, Xuan Zhuo



在 2023/3/21 上午3:43, Michael S. Tsirkin 写道:
> On Mon, Mar 20, 2023 at 07:18:40PM +0800, Heng Qi wrote:
>> 1. Currently, a received encapsulated packet has an outer and an inner header, but
>> the virtio device is unable to calculate the hash for the inner header. Multiple
>> flows with the same outer header but different inner headers are steered to the
>> same receive queue. This results in poor receive performance.
>>
>> To address this limitation, a new feature VIRTIO_NET_F_HASH_TUNNEL has been
>> introduced, which enables the device to advertise the capability to calculate the
>> hash for the inner packet header. Compared with the out header hash, it regains
>> better receive performance.
> So this would be a very good argument however the cost would be it would
> seem we have to keep extending this indefinitely as new tunneling
> protocols come to light.
> But I believe in fact we don't at least for this argument:
> the standard way to address this is actually by propagating entropy
> from inner to outer header.

Yes, we don't argue with this.

>
> So I'd maybe reorder the commit log and give the explanation 2 below
> then say "for some legacy systems
> including entropy in IP header
> as done in modern protocols is not practical, resulting in
> bad performance under RSS".

I agree. But not necessarily the legacy system, some scenarios need to 
connect multiple tunnels, for compatibility, they will not use optional 
fields or choose the old tunnel protocol.

>
>
>> 2. The same flow can traverse through different tunnels, resulting in the encapsulated
>> packets being spread across multiple receive queues (refer to the figure below).
>> However, in certain scenarios, it becomes necessary to direct these encapsulated
>> packets of the same flow to a single receive queue. This facilitates the processing
>> of the flow by the same CPU to improve performance (warm caches, less locking, etc.).
>>
>>                 client1                    client2
>>                    |                          |
>>                    |        +-------+         |
>>                    +------->|tunnels|<--------+
>>                             +-------+
>>                                |  |
>>                                |  |
>>                                v  v
>>                        +-----------------+
>>                        | processing host |
>>                        +-----------------+
> necessary is too strong a word I feel.
> All this is, is an optimization, we don't really know how strong it is
> even.
>
> Here's how I understand this:
>
> Imagine two clients client1 and client2 talking to each other.
> A copy of all packets is sent to a processing host over a virtio device.
> Two directions of the same flow between two clients might be
> encapsulated in two different tunnels, with current RSS
> strategies they would land on two arbitrary, unrelated queues.
> As an optimization, some hosts might wish to make sure both directions
> of the encapsulated flow land on the same queue.
>
>
> Is this a good summary?

I think yes.

>
>
> Now that things begin to be clearer, I kind of begin to agree with
> Jason's suggestion that this is extremely narrow.  And what if I want
> one direction on queue1 and another one queue2 e.g. adjacent numbers for

I don't understand why we need this, can you point out some usage scenarios?

> the same flow?  If enough people agree this is needed we can accept this
> but did you at all consider using something programmable like BPF for

I think the problem is that our virtio device cannot support ebpf, we 
can also ask Alvaro, Parav if their virtio devices can support ebpf 
offloading. :)

> this?  Considering we are putting not insignificant amount of work into
> this, making this widely useful would be better than a narrow
> optimization for a very specific usecase.
>
>
>> To achieve this, the device can calculate a symmetric hash based on the inner packet
>> headers of the flow. The symmetric hash disregards the order of the 5-tuple when
>> computing the hash.
> when you say symmetric hash you really mean symmetric key for toeplitz, yes?
> It's not that it disregards order, it just gives the same result if
> you reverse source and destination, no?

Yes, symmetric hashes can use the key with 2 same bytes repeated, and 
only support reverse source and destination.

>
>
>> Reviewed-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>> v10->v11:
>> 	1. Revise commit log for clarity for readers.
>> 	2. Some modifications to avoid undefined terms. @Parav Pandit
>> 	3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
>> 	4. Add the normative statements. @Parav Pandit
>>
>> v9->v10:
>> 	1. Removed hash_report_tunnel related information. @Parav Pandit
>> 	2. Re-describe the limitations of QoS for tunneling.
>> 	3. Some clarification.
>>
>> v8->v9:
>> 	1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
>> 	2. Add tunnel security section. @Michael S . Tsirkin
>> 	3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
>> 	4. Fix some typos.
>> 	5. Add more tunnel types. @Michael S . Tsirkin
>>
>> v7->v8:
>> 	1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
>> 	2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
>> 	3. Removed re-definition for inner packet hashing. @Parav Pandit
>> 	4. Fix some typos. @Michael S . Tsirkin
>> 	5. Clarify some sentences. @Michael S . Tsirkin
>>
>> v6->v7:
>> 	1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
>> 	2. Fix some syntax issues. @Michael S. Tsirkin
>>
>> v5->v6:
>> 	1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
>> 	2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
>> 	3. Move the links to introduction section. @Michael S. Tsirkin
>> 	4. Clarify some sentences. @Michael S. Tsirkin
>>
>> v4->v5:
>> 	1. Clarify some paragraphs. @Cornelia Huck
>> 	2. Fix the u8 type. @Cornelia Huck
>>
>> v3->v4:
>> 	1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
>> 	2. Make things clearer. @Jason Wang @Michael S. Tsirkin
>> 	3. Keep the possibility to use inner hash for automatic receive steering. @Jason Wang
>> 	4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. many times. @Michael S. Tsirkin
>>
>> v2->v3:
>> 	1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
>> 	2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason Wang, @Michael S. Tsirkin
>>
>> v1->v2:
>> 	1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
>> 	2. Clarify some paragraphs. @Jason Wang
>> 	3. Add \field{hash_tunnel} and VIRTIO_NET_HASH_REPORT_GRE. @Yuri Benditovich
>>
>>   device-types/net/description.tex        | 119 +++++++++++++++++++++++-
>>   device-types/net/device-conformance.tex |   1 +
>>   device-types/net/driver-conformance.tex |   1 +
>>   introduction.tex                        |  24 +++++
>>   4 files changed, 144 insertions(+), 1 deletion(-)
>>
>> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
>> index 0500bb6..49dee2f 100644
>> --- a/device-types/net/description.tex
>> +++ b/device-types/net/description.tex
>> @@ -83,6 +83,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>       channel.
>>   
>> +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner packet header hash
>> +    for tunnel-encapsulated packets.
>> +
>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>   
>>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>> @@ -139,6 +142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>> +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with VIRTIO_NET_F_RSS and/or VIRTIO_NET_F_HASH_REPORT.
>>   \end{description}
>>   
>>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
>> @@ -198,6 +202,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>>           u8 rss_max_key_size;
>>           le16 rss_max_indirection_table_length;
>>           le32 supported_hash_types;
>> +        le32 supported_tunnel_hash_types;
>>   };
>>   \end{lstlisting}
>>   The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
>> @@ -212,6 +217,12 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>>   Field \field{supported_hash_types} contains the bitmask of supported hash types.
>>   See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types.
>>   
>> +The next field, \field{supported_tunnel_hash_types} only exists if the device
>> +supports inner packet header hash, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set.
>> +
>> +Field \field{supported_tunnel_hash_types} contains the bitmask of supported tunnel hash types.
>> +See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types} for details of supported tunnel hash types.
>> +
>>   \devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
>>   
>>   The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
>> @@ -848,6 +859,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>   If the feature VIRTIO_NET_F_RSS was negotiated:
>>   \begin{itemize}
>>   \item The device uses \field{hash_types} of the virtio_net_rss_config structure as 'Enabled hash types' bitmask.
>> +\item The device uses \field{hash_tunnel_types} of the virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
>>   \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_rss_config structure (see
>>   \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / Setting RSS parameters}).
>>   \end{itemize}
>> @@ -855,6 +867,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>   If the feature VIRTIO_NET_F_RSS was not negotiated:
>>   \begin{itemize}
>>   \item The device uses \field{hash_types} of the virtio_net_hash_config structure as 'Enabled hash types' bitmask.
>> +\item The device uses \field{hash_tunnel_types} of the virtio_net_hash_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
>>   \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_hash_config structure (see
>>   \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}).
>>   \end{itemize}
>> @@ -870,6 +883,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>   
>>   \subparagraph{Supported/enabled hash types}
>>   \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
>> +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
>> +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
>>   Hash types applicable for IPv4 packets:
>>   \begin{lstlisting}
>>   #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
>> @@ -980,6 +995,99 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>   (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}).
>>   \end{itemize}
>>   
>> +\paragraph{Inner Packet Header Hash}
>> +If the driver negotiates the VIRTIO_NET_F_HASH_TUNNEL feature, it can configure the
>> +hash parameters (including \field{hash_tunnel_types}) for inner packet header hash
>> +through the VIRTIO_NET_CTRL_MQ_HASH_CONFIG or the VIRTIO_NET_CTRL_RSS_CONFIG command.
>> +If multiple commands are sent, the device configuration will be defined by the last command received.
>> +
>> +If a specific encapsulation type is set in \field{hash_tunnel_types}, the device will calculate the
>> +hash on the inner packet header of the encapsulated packet (See \ref{sec:Device Types
>> +/ Network Device / Device OperatiHn / Processing of Incoming Packets /
>> +Hash calculation for incoming packets / Tunnel/Encapsulated packet}). If the encapsulation
>> +type is not included in \field{hash_tunnel_types} or the value of \field{hash_tunnel_types}
>> +is VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash on the outer header.
>> +
>> +\field{hash_tunnel_types} is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for non-encapsulated packets.
>> +
>> +\subparagraph{Tunnel/Encapsulated packet}
>> +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet}
>> +A tunnel packet is encapsulated from the original packet based on the tunneling
>> +protocol (only a single level of encapsulation is currently supported). The
>> +encapsulated packet contains an outer header and an inner header, and the device
>> +calculates the hash over either the inner header or the outer header.
>> +
>> +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received encapsulated
>> +packet's outer header matches one of the supported \field{hash_tunnel_types},
>> +the hash of the inner header is calculated. Supported encapsulation types are listed
>> +in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming
>> +Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
>> +
>> +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]}, \hyperref[intro:VXLAN]{[VXLAN]},
>> +\hyperref[intro:GENEVE]{[GENEVE]}, \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
>> +
>> +\subparagraph{Supported/enabled tunnel hash types}
>> +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types}
>> +If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and \field{hash_tunnel_types}
>> +is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash using the
>> +outer header of the encapsulated packet.
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NONE        (1 << 0)
>> +\end{lstlisting}
>> +
>> +The encapsulation hash type below indicates that the hash is calculated over the
>> +inner packet header:
>> +Hash type applicable for inner payload of the gre-encapsulated packet
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 1)
>> +\end{lstlisting}
>> +Hash type applicable for inner payload of the vxlan-encapsulated packet
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 2)
>> +\end{lstlisting}
>> +Hash type applicable for inner payload of the geneve-encapsulated packet
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 3)
>> +\end{lstlisting}
>> +Hash type applicable for inner payload of the ip-encapsulated packet
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 4)
>> +\end{lstlisting}
>> +Hash type applicable for inner payload of the nvgre-encapsulated packet
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 5)
>> +\end{lstlisting}
>> +
>> +\subparagraph{Tunnel QoS limitation}
>> +When a specific receive queue is shared by multiple tunnels to receive encapsulated packets,
>> +there is no quality of service (QoS) for these packets. For example, when the packets of certain
>> +tunnels are spread across multiple receive queues, these receive queues may have an unbalanced
>> +amount of packets. This can cause a specific receive queue to become full, resulting in packet loss.
>> +
>> +Possible mitigations:
>> +\begin{itemize}
>> +\item Use a tool with good forwarding performance to keep the receive queue from filling up.
>> +\item If the QoS is unavailable, the driver can set \field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
>> +      to disable inner packet hash for encapsulated packets.
>> +\item Choose a hash key that can avoid queue collisions.
>> +\item Perform appropriate QoS before packets consume the receive buffers of the receive queues.
>> +\end{itemize}
>> +
>> +The limitations mentioned above exist with/without the inner packer header hash.
>> +
>> +\devicenormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>> +
>> +The device MUST calculate the outer packet hash if the received encapsulated packet has an encapsulation type not in \field{supported_tunnel_hash_types}.
>> +
>> +The device MUST drop the encapsulated packet if the destination receive queue is being reset.
> I'm not sure how this last one got here. It seems to have nothing to do
> with encapsulation - if we want to we should require this for all
> packets or none at all.

Yes, you are right. It works for all packets.

>
>
>> +\drivernormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>> +
>> +If the driver does not negotiate the VIRTIO_NET_F_HASH_TUNNEL feature, it MUST set \field{hash_tunnel_types}
>> +to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE before issuing the command VIRTIO_NET_CTRL_MQ_HASH_CONFIG or VIRTIO_NET_CTRL_RSS_CONFIG.
>> +
>> +The driver MUST set \field{hash_tunnel_types} to the encapsulation types supported by the device.
> unclear. seems to mean all types must be approved
> where you really mean "only those types". original for non tunnel is:
>
> A driver MUST NOT set any VIRTIO_NET_HASH_TYPE_ flags that are not supported by a device.
>
> which is clear though a bit verbose with two negations.

Yes, we can use the same sentence structure to illustrate.

>
> Also here it says "supported" but below it says "allowed".
>
>
>
>>   \paragraph{Hash reporting for incoming packets}
>>   \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting for incoming packets}
>>   
>> @@ -1392,12 +1500,17 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>       le16 reserved[4];
>>       u8 hash_key_length;
>>       u8 hash_key_data[hash_key_length];
>> +    le32 hash_tunnel_types;
>>   };
> Hmm this fixed type after variable type is problematic - might
> become unaligned. We could use some of reserved[4]
> for this ...
>

This is a problem, and perhaps Parav's proposal of using a separate 
command and structure for inner hash is correct.

>
>>   \end{lstlisting}
>>   Field \field{hash_types} contains a bitmask of allowed hash types as
>>   defined in
>>   \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
>> -Initially the device has all hash types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
>> +
>> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
>> +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
>> +
>> +Initially the device has all hash types and hash tunnel types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
>>   
>>   Field \field{reserved} MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure,
>>   defined in \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS)}.
>> @@ -1421,6 +1534,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>       le16 max_tx_vq;
>>       u8 hash_key_length;
>>       u8 hash_key_data[hash_key_length];
>> +    le32 hash_tunnel_types;
>
> Same alignment problem here but I'm not sure how to solve it.
> Suggestions?
>
>>   };
>>   \end{lstlisting}
>>   Field \field{hash_types} contains a bitmask of allowed hash types as
>> @@ -1441,6 +1555,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>   
>>   Fields \field{hash_key_length} and \field{hash_key_data} define the key to be used in hash calculation.
>>   
>> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
>> +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
>> +
>>   \drivernormative{\subparagraph}{Setting RSS parameters}{Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>   
>>   A driver MUST NOT send the VIRTIO_NET_CTRL_MQ_RSS_CONFIG command if the feature VIRTIO_NET_F_RSS has not been negotiated.
>> diff --git a/device-types/net/device-conformance.tex b/device-types/net/device-conformance.tex
>> index 54f6783..0ff5944 100644
>> --- a/device-types/net/device-conformance.tex
>> +++ b/device-types/net/device-conformance.tex
>> @@ -14,4 +14,5 @@
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>   \end{itemize}
>> diff --git a/device-types/net/driver-conformance.tex b/device-types/net/driver-conformance.tex
>> index 97d0cc1..951be89 100644
>> --- a/device-types/net/driver-conformance.tex
>> +++ b/device-types/net/driver-conformance.tex
>> @@ -14,4 +14,5 @@
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>   \end{itemize}
>> diff --git a/introduction.tex b/introduction.tex
>> index 287c5fc..25c9d48 100644
>> --- a/introduction.tex
>> +++ b/introduction.tex
>> @@ -99,6 +99,30 @@ \section{Normative References}\label{sec:Normative References}
>>       Standards for Efficient Cryptography Group(SECG), ``SEC1: Elliptic Cureve Cryptography'', Version 1.0, September 2000.
>>   	\newline\url{https://www.secg.org/sec1-v2.pdf}\\
>>   
>> +	\phantomsection\label{intro:GRE}\textbf{[GRE]} &
>> +    Generic Routing Encapsulation
>> +	\newline\url{https://datatracker.ietf.org/doc/rfc2784/}\\
> This is GRE over IPv4.
> So we are not supporting GRE over IPv6?

Yes. Do we need to add it?
https://datatracker.ietf.org/doc/rfc7676/

>
> And we do not support optional keys?

We did not disallow optional fields.

Thanks.

>
>
>
>> +	\phantomsection\label{intro:VXLAN}\textbf{[VXLAN]} &
>> +    Virtual eXtensible Local Area Network
>> +	\newline\url{https://datatracker.ietf.org/doc/rfc7348/}\\
>> +	\phantomsection\label{intro:GENEVE}\textbf{[GENEVE]} &
>> +    Generic Network Virtualization Encapsulation
>> +	\phantomsection\label{intro:IPIP}\textbf{[IPIP]} &
>> +    IP Encapsulation within IP
>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc2003}\\
>> +	\phantomsection\label{intro:IPIP}\textbf{[NVGRE]} &
>> +    NVGRE: Network Virtualization Using Generic Routing Encapsulation
>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc7637.html}\\
>> +	\newline\url{https://datatracker.ietf.org/doc/rfc8926/}\\
>> +	\phantomsection\label{intro:IP}\textbf{[IP]} &
>> +    INTERNET PROTOCOL
>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc791}\\
>> +	\phantomsection\label{intro:UDP}\textbf{[UDP]} &
>> +    User Datagram Protocol
>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc768}\\
>> +	\phantomsection\label{intro:TCP}\textbf{[TCP]} &
>> +    TRANSMISSION CONTROL PROTOCOL
>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc793}\\
>>   \end{longtable}
>>   
>>   \section{Non-Normative References}
>> -- 
>> 2.19.1.6.gb485710b
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-21  3:56   ` Heng Qi
@ 2023-03-21  4:19     ` Parav Pandit
  2023-03-21  7:37       ` Michael S. Tsirkin
  2023-03-21  7:34     ` Michael S. Tsirkin
  1 sibling, 1 reply; 19+ messages in thread
From: Parav Pandit @ 2023-03-21  4:19 UTC (permalink / raw)
  To: Heng Qi, Michael S. Tsirkin, Alvaro Karsz
  Cc: virtio-dev, virtio-comment, Jason Wang, Yuri Benditovich, Xuan Zhuo


> From: Heng Qi <hengqi@linux.alibaba.com>
> Sent: Monday, March 20, 2023 11:56 PM

> > the same flow?  If enough people agree this is needed we can accept this
> > but did you at all consider using something programmable like BPF for
> 
> I think the problem is that our virtio device cannot support ebpf, we
> can also ask Alvaro, Parav if their virtio devices can support ebpf
> offloading. :)

tc qdisc offload for qos is a better choice than ebpf to me given the role is more than just drop action.
It requires many calculations across many queues.
But such qos is orthogonal to what is being proposed here.

One (this proposal) is solving spread to different RSS queues.
Another one is finding out which exact packet to drop/pass when queue usage is high. (ebpf/tc other ways to solve it).

Ebpf sounds cooler than the real offload implementation in the hw device at the current level.
I remember Jason's good talk on the ebpf a few years back, which is possible when done in sw on the hypervisor.

On mlx5 device inner hash is supported for IPnIP and GRE tunnels.
None of the existing users see tunnel attacks when used as forwarding plane.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-21  3:35   ` Jason Wang
@ 2023-03-21  5:12     ` Heng Qi
  0 siblings, 0 replies; 19+ messages in thread
From: Heng Qi @ 2023-03-21  5:12 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: virtio-dev, virtio-comment, Parav Pandit, Yuri Benditovich, Xuan Zhuo



在 2023/3/21 上午11:35, Jason Wang 写道:
> On Tue, Mar 21, 2023 at 3:43 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Mon, Mar 20, 2023 at 07:18:40PM +0800, Heng Qi wrote:
>>> 1. Currently, a received encapsulated packet has an outer and an inner header, but
>>> the virtio device is unable to calculate the hash for the inner header. Multiple
>>> flows with the same outer header but different inner headers are steered to the
>>> same receive queue. This results in poor receive performance.
>>>
>>> To address this limitation, a new feature VIRTIO_NET_F_HASH_TUNNEL has been
>>> introduced, which enables the device to advertise the capability to calculate the
>>> hash for the inner packet header. Compared with the out header hash, it regains
>>> better receive performance.
>> So this would be a very good argument however the cost would be it would
>> seem we have to keep extending this indefinitely as new tunneling
>> protocols come to light.
>> But I believe in fact we don't at least for this argument:
>> the standard way to address this is actually by propagating entropy
>> from inner to outer header.
> Things would be more complicated when multiple layers of tunneling are
> being used.
>
>> So I'd maybe reorder the commit log and give the explanation 2 below
>> then say "for some legacy systems
>> including entropy in IP header
>> as done in modern protocols is not practical, resulting in
>> bad performance under RSS".
>>
>>
>>> 2. The same flow can traverse through different tunnels, resulting in the encapsulated
>>> packets being spread across multiple receive queues (refer to the figure below).
>>> However, in certain scenarios, it becomes necessary to direct these encapsulated
>>> packets of the same flow to a single receive queue. This facilitates the processing
>>> of the flow by the same CPU to improve performance (warm caches, less locking, etc.).
>>>
>>>                 client1                    client2
>>>                    |                          |
>>>                    |        +-------+         |
>>>                    +------->|tunnels|<--------+
>>>                             +-------+
>>>                                |  |
>>>                                |  |
>>>                                v  v
>>>                        +-----------------+
>>>                        | processing host |
>>>                        +-----------------+
>> necessary is too strong a word I feel.
>> All this is, is an optimization, we don't really know how strong it is
>> even.
>>
>> Here's how I understand this:
>>
>> Imagine two clients client1 and client2 talking to each other.
>> A copy of all packets is sent to a processing host over a virtio device.
>> Two directions of the same flow between two clients might be
>> encapsulated in two different tunnels, with current RSS
>> strategies they would land on two arbitrary, unrelated queues.
>> As an optimization, some hosts might wish to make sure both directions
>> of the encapsulated flow land on the same queue.
>>
>>
>> Is this a good summary?
>>
>>
>> Now that things begin to be clearer, I kind of begin to agree with
>> Jason's suggestion that this is extremely narrow.  And what if I want
>> one direction on queue1 and another one queue2 e.g. adjacent numbers for
>> the same flow?  If enough people agree this is needed we can accept this
>> but did you at all consider using something programmable like BPF for
>> this?  Considering we are putting not insignificant amount of work into
>> this, making this widely useful would be better than a narrow
>> optimization for a very specific usecase.
> +1
>
>
>>
>>> To achieve this, the device can calculate a symmetric hash based on the inner packet
>>> headers of the flow. The symmetric hash disregards the order of the 5-tuple when
>>> computing the hash.
>> when you say symmetric hash you really mean symmetric key for toeplitz, yes?
>> It's not that it disregards order, it just gives the same result if
>> you reverse source and destination, no?
>>
>>
>>> Reviewed-by: Jason Wang <jasowang@redhat.com>
> I'd like to drop my reviewed-by since the patch differs from the one
> that I reviewed in many places.

Ok, I'll remove your reviewed tag.

Thank you very much!

>
> Thanks
>
>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>> ---
>>> v10->v11:
>>>        1. Revise commit log for clarity for readers.
>>>        2. Some modifications to avoid undefined terms. @Parav Pandit
>>>        3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
>>>        4. Add the normative statements. @Parav Pandit
>>>
>>> v9->v10:
>>>        1. Removed hash_report_tunnel related information. @Parav Pandit
>>>        2. Re-describe the limitations of QoS for tunneling.
>>>        3. Some clarification.
>>>
>>> v8->v9:
>>>        1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
>>>        2. Add tunnel security section. @Michael S . Tsirkin
>>>        3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
>>>        4. Fix some typos.
>>>        5. Add more tunnel types. @Michael S . Tsirkin
>>>
>>> v7->v8:
>>>        1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
>>>        2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
>>>        3. Removed re-definition for inner packet hashing. @Parav Pandit
>>>        4. Fix some typos. @Michael S . Tsirkin
>>>        5. Clarify some sentences. @Michael S . Tsirkin
>>>
>>> v6->v7:
>>>        1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
>>>        2. Fix some syntax issues. @Michael S. Tsirkin
>>>
>>> v5->v6:
>>>        1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
>>>        2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
>>>        3. Move the links to introduction section. @Michael S. Tsirkin
>>>        4. Clarify some sentences. @Michael S. Tsirkin
>>>
>>> v4->v5:
>>>        1. Clarify some paragraphs. @Cornelia Huck
>>>        2. Fix the u8 type. @Cornelia Huck
>>>
>>> v3->v4:
>>>        1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
>>>        2. Make things clearer. @Jason Wang @Michael S. Tsirkin
>>>        3. Keep the possibility to use inner hash for automatic receive steering. @Jason Wang
>>>        4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. many times. @Michael S. Tsirkin
>>>
>>> v2->v3:
>>>        1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
>>>        2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason Wang, @Michael S. Tsirkin
>>>
>>> v1->v2:
>>>        1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
>>>        2. Clarify some paragraphs. @Jason Wang
>>>        3. Add \field{hash_tunnel} and VIRTIO_NET_HASH_REPORT_GRE. @Yuri Benditovich
>>>
>>>   device-types/net/description.tex        | 119 +++++++++++++++++++++++-
>>>   device-types/net/device-conformance.tex |   1 +
>>>   device-types/net/driver-conformance.tex |   1 +
>>>   introduction.tex                        |  24 +++++
>>>   4 files changed, 144 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
>>> index 0500bb6..49dee2f 100644
>>> --- a/device-types/net/description.tex
>>> +++ b/device-types/net/description.tex
>>> @@ -83,6 +83,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>>       channel.
>>>
>>> +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner packet header hash
>>> +    for tunnel-encapsulated packets.
>>> +
>>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>>
>>>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>>> @@ -139,6 +142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>>> +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with VIRTIO_NET_F_RSS and/or VIRTIO_NET_F_HASH_REPORT.
>>>   \end{description}
>>>
>>>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
>>> @@ -198,6 +202,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>>>           u8 rss_max_key_size;
>>>           le16 rss_max_indirection_table_length;
>>>           le32 supported_hash_types;
>>> +        le32 supported_tunnel_hash_types;
>>>   };
>>>   \end{lstlisting}
>>>   The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
>>> @@ -212,6 +217,12 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>>>   Field \field{supported_hash_types} contains the bitmask of supported hash types.
>>>   See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types.
>>>
>>> +The next field, \field{supported_tunnel_hash_types} only exists if the device
>>> +supports inner packet header hash, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set.
>>> +
>>> +Field \field{supported_tunnel_hash_types} contains the bitmask of supported tunnel hash types.
>>> +See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types} for details of supported tunnel hash types.
>>> +
>>>   \devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
>>>
>>>   The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
>>> @@ -848,6 +859,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>   If the feature VIRTIO_NET_F_RSS was negotiated:
>>>   \begin{itemize}
>>>   \item The device uses \field{hash_types} of the virtio_net_rss_config structure as 'Enabled hash types' bitmask.
>>> +\item The device uses \field{hash_tunnel_types} of the virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
>>>   \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_rss_config structure (see
>>>   \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / Setting RSS parameters}).
>>>   \end{itemize}
>>> @@ -855,6 +867,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>   If the feature VIRTIO_NET_F_RSS was not negotiated:
>>>   \begin{itemize}
>>>   \item The device uses \field{hash_types} of the virtio_net_hash_config structure as 'Enabled hash types' bitmask.
>>> +\item The device uses \field{hash_tunnel_types} of the virtio_net_hash_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
>>>   \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_hash_config structure (see
>>>   \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}).
>>>   \end{itemize}
>>> @@ -870,6 +883,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>
>>>   \subparagraph{Supported/enabled hash types}
>>>   \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
>>> +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
>>> +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
>>>   Hash types applicable for IPv4 packets:
>>>   \begin{lstlisting}
>>>   #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
>>> @@ -980,6 +995,99 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>   (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}).
>>>   \end{itemize}
>>>
>>> +\paragraph{Inner Packet Header Hash}
>>> +If the driver negotiates the VIRTIO_NET_F_HASH_TUNNEL feature, it can configure the
>>> +hash parameters (including \field{hash_tunnel_types}) for inner packet header hash
>>> +through the VIRTIO_NET_CTRL_MQ_HASH_CONFIG or the VIRTIO_NET_CTRL_RSS_CONFIG command.
>>> +If multiple commands are sent, the device configuration will be defined by the last command received.
>>> +
>>> +If a specific encapsulation type is set in \field{hash_tunnel_types}, the device will calculate the
>>> +hash on the inner packet header of the encapsulated packet (See \ref{sec:Device Types
>>> +/ Network Device / Device OperatiHn / Processing of Incoming Packets /
>>> +Hash calculation for incoming packets / Tunnel/Encapsulated packet}). If the encapsulation
>>> +type is not included in \field{hash_tunnel_types} or the value of \field{hash_tunnel_types}
>>> +is VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash on the outer header.
>>> +
>>> +\field{hash_tunnel_types} is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for non-encapsulated packets.
>>> +
>>> +\subparagraph{Tunnel/Encapsulated packet}
>>> +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet}
>>> +A tunnel packet is encapsulated from the original packet based on the tunneling
>>> +protocol (only a single level of encapsulation is currently supported). The
>>> +encapsulated packet contains an outer header and an inner header, and the device
>>> +calculates the hash over either the inner header or the outer header.
>>> +
>>> +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received encapsulated
>>> +packet's outer header matches one of the supported \field{hash_tunnel_types},
>>> +the hash of the inner header is calculated. Supported encapsulation types are listed
>>> +in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming
>>> +Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
>>> +
>>> +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]}, \hyperref[intro:VXLAN]{[VXLAN]},
>>> +\hyperref[intro:GENEVE]{[GENEVE]}, \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
>>> +
>>> +\subparagraph{Supported/enabled tunnel hash types}
>>> +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types}
>>> +If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and \field{hash_tunnel_types}
>>> +is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash using the
>>> +outer header of the encapsulated packet.
>>> +\begin{lstlisting}
>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NONE        (1 << 0)
>>> +\end{lstlisting}
>>> +
>>> +The encapsulation hash type below indicates that the hash is calculated over the
>>> +inner packet header:
>>> +Hash type applicable for inner payload of the gre-encapsulated packet
>>> +\begin{lstlisting}
>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 1)
>>> +\end{lstlisting}
>>> +Hash type applicable for inner payload of the vxlan-encapsulated packet
>>> +\begin{lstlisting}
>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 2)
>>> +\end{lstlisting}
>>> +Hash type applicable for inner payload of the geneve-encapsulated packet
>>> +\begin{lstlisting}
>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 3)
>>> +\end{lstlisting}
>>> +Hash type applicable for inner payload of the ip-encapsulated packet
>>> +\begin{lstlisting}
>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 4)
>>> +\end{lstlisting}
>>> +Hash type applicable for inner payload of the nvgre-encapsulated packet
>>> +\begin{lstlisting}
>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 5)
>>> +\end{lstlisting}
>>> +
>>> +\subparagraph{Tunnel QoS limitation}
>>> +When a specific receive queue is shared by multiple tunnels to receive encapsulated packets,
>>> +there is no quality of service (QoS) for these packets. For example, when the packets of certain
>>> +tunnels are spread across multiple receive queues, these receive queues may have an unbalanced
>>> +amount of packets. This can cause a specific receive queue to become full, resulting in packet loss.
>>> +
>>> +Possible mitigations:
>>> +\begin{itemize}
>>> +\item Use a tool with good forwarding performance to keep the receive queue from filling up.
>>> +\item If the QoS is unavailable, the driver can set \field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
>>> +      to disable inner packet hash for encapsulated packets.
>>> +\item Choose a hash key that can avoid queue collisions.
>>> +\item Perform appropriate QoS before packets consume the receive buffers of the receive queues.
>>> +\end{itemize}
>>> +
>>> +The limitations mentioned above exist with/without the inner packer header hash.
>>> +
>>> +\devicenormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>> +
>>> +The device MUST calculate the outer packet hash if the received encapsulated packet has an encapsulation type not in \field{supported_tunnel_hash_types}.
>>> +
>>> +The device MUST drop the encapsulated packet if the destination receive queue is being reset.
>> I'm not sure how this last one got here. It seems to have nothing to do
>> with encapsulation - if we want to we should require this for all
>> packets or none at all.
>>
>>
>>> +\drivernormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>> +
>>> +If the driver does not negotiate the VIRTIO_NET_F_HASH_TUNNEL feature, it MUST set \field{hash_tunnel_types}
>>> +to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE before issuing the command VIRTIO_NET_CTRL_MQ_HASH_CONFIG or VIRTIO_NET_CTRL_RSS_CONFIG.
>>> +
>>> +The driver MUST set \field{hash_tunnel_types} to the encapsulation types supported by the device.
>> unclear. seems to mean all types must be approved
>> where you really mean "only those types". original for non tunnel is:
>>
>> A driver MUST NOT set any VIRTIO_NET_HASH_TYPE_ flags that are not supported by a device.
>>
>> which is clear though a bit verbose with two negations.
>>
>> Also here it says "supported" but below it says "allowed".
>>
>>
>>
>>>   \paragraph{Hash reporting for incoming packets}
>>>   \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting for incoming packets}
>>>
>>> @@ -1392,12 +1500,17 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>       le16 reserved[4];
>>>       u8 hash_key_length;
>>>       u8 hash_key_data[hash_key_length];
>>> +    le32 hash_tunnel_types;
>>>   };
>> Hmm this fixed type after variable type is problematic - might
>> become unaligned. We could use some of reserved[4]
>> for this ...
>>
>>
>>
>>>   \end{lstlisting}
>>>   Field \field{hash_types} contains a bitmask of allowed hash types as
>>>   defined in
>>>   \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
>>> -Initially the device has all hash types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
>>> +
>>> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
>>> +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
>>> +
>>> +Initially the device has all hash types and hash tunnel types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
>>>
>>>   Field \field{reserved} MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure,
>>>   defined in \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS)}.
>>> @@ -1421,6 +1534,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>       le16 max_tx_vq;
>>>       u8 hash_key_length;
>>>       u8 hash_key_data[hash_key_length];
>>> +    le32 hash_tunnel_types;
>>
>> Same alignment problem here but I'm not sure how to solve it.
>> Suggestions?
>>
>>>   };
>>>   \end{lstlisting}
>>>   Field \field{hash_types} contains a bitmask of allowed hash types as
>>> @@ -1441,6 +1555,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>
>>>   Fields \field{hash_key_length} and \field{hash_key_data} define the key to be used in hash calculation.
>>>
>>> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
>>> +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
>>> +
>>>   \drivernormative{\subparagraph}{Setting RSS parameters}{Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>>
>>>   A driver MUST NOT send the VIRTIO_NET_CTRL_MQ_RSS_CONFIG command if the feature VIRTIO_NET_F_RSS has not been negotiated.
>>> diff --git a/device-types/net/device-conformance.tex b/device-types/net/device-conformance.tex
>>> index 54f6783..0ff5944 100644
>>> --- a/device-types/net/device-conformance.tex
>>> +++ b/device-types/net/device-conformance.tex
>>> @@ -14,4 +14,5 @@
>>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>>   \end{itemize}
>>> diff --git a/device-types/net/driver-conformance.tex b/device-types/net/driver-conformance.tex
>>> index 97d0cc1..951be89 100644
>>> --- a/device-types/net/driver-conformance.tex
>>> +++ b/device-types/net/driver-conformance.tex
>>> @@ -14,4 +14,5 @@
>>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>>   \end{itemize}
>>> diff --git a/introduction.tex b/introduction.tex
>>> index 287c5fc..25c9d48 100644
>>> --- a/introduction.tex
>>> +++ b/introduction.tex
>>> @@ -99,6 +99,30 @@ \section{Normative References}\label{sec:Normative References}
>>>       Standards for Efficient Cryptography Group(SECG), ``SEC1: Elliptic Cureve Cryptography'', Version 1.0, September 2000.
>>>        \newline\url{https://www.secg.org/sec1-v2.pdf}\\
>>>
>>> +     \phantomsection\label{intro:GRE}\textbf{[GRE]} &
>>> +    Generic Routing Encapsulation
>>> +     \newline\url{https://datatracker.ietf.org/doc/rfc2784/}\\
>> This is GRE over IPv4.
>> So we are not supporting GRE over IPv6?
>>
>> And we do not support optional keys?
>>
>>
>>
>>> +     \phantomsection\label{intro:VXLAN}\textbf{[VXLAN]} &
>>> +    Virtual eXtensible Local Area Network
>>> +     \newline\url{https://datatracker.ietf.org/doc/rfc7348/}\\
>>> +     \phantomsection\label{intro:GENEVE}\textbf{[GENEVE]} &
>>> +    Generic Network Virtualization Encapsulation
>>> +     \phantomsection\label{intro:IPIP}\textbf{[IPIP]} &
>>> +    IP Encapsulation within IP
>>> +     \newline\url{https://www.rfc-editor.org/rfc/rfc2003}\\
>>> +     \phantomsection\label{intro:IPIP}\textbf{[NVGRE]} &
>>> +    NVGRE: Network Virtualization Using Generic Routing Encapsulation
>>> +     \newline\url{https://www.rfc-editor.org/rfc/rfc7637.html}\\
>>> +     \newline\url{https://datatracker.ietf.org/doc/rfc8926/}\\
>>> +     \phantomsection\label{intro:IP}\textbf{[IP]} &
>>> +    INTERNET PROTOCOL
>>> +     \newline\url{https://www.rfc-editor.org/rfc/rfc791}\\
>>> +     \phantomsection\label{intro:UDP}\textbf{[UDP]} &
>>> +    User Datagram Protocol
>>> +     \newline\url{https://www.rfc-editor.org/rfc/rfc768}\\
>>> +     \phantomsection\label{intro:TCP}\textbf{[TCP]} &
>>> +    TRANSMISSION CONTROL PROTOCOL
>>> +     \newline\url{https://www.rfc-editor.org/rfc/rfc793}\\
>>>   \end{longtable}
>>>
>>>   \section{Non-Normative References}
>>> --
>>> 2.19.1.6.gb485710b
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-21  3:56   ` Heng Qi
  2023-03-21  4:19     ` Parav Pandit
@ 2023-03-21  7:34     ` Michael S. Tsirkin
  2023-03-21 14:49       ` Heng Qi
  1 sibling, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2023-03-21  7:34 UTC (permalink / raw)
  To: Heng Qi
  Cc: Parav Pandit, Alvaro Karsz, virtio-dev, virtio-comment,
	Jason Wang, Yuri Benditovich, Xuan Zhuo

On Tue, Mar 21, 2023 at 11:56:14AM +0800, Heng Qi wrote:
> 
> 
> 在 2023/3/21 上午3:43, Michael S. Tsirkin 写道:
> > On Mon, Mar 20, 2023 at 07:18:40PM +0800, Heng Qi wrote:
> > > 1. Currently, a received encapsulated packet has an outer and an inner header, but
> > > the virtio device is unable to calculate the hash for the inner header. Multiple
> > > flows with the same outer header but different inner headers are steered to the
> > > same receive queue. This results in poor receive performance.
> > > 
> > > To address this limitation, a new feature VIRTIO_NET_F_HASH_TUNNEL has been
> > > introduced, which enables the device to advertise the capability to calculate the
> > > hash for the inner packet header. Compared with the out header hash, it regains
> > > better receive performance.
> > So this would be a very good argument however the cost would be it would
> > seem we have to keep extending this indefinitely as new tunneling
> > protocols come to light.
> > But I believe in fact we don't at least for this argument:
> > the standard way to address this is actually by propagating entropy
> > from inner to outer header.
> 
> Yes, we don't argue with this.
> 
> > 
> > So I'd maybe reorder the commit log and give the explanation 2 below
> > then say "for some legacy systems
> > including entropy in IP header
> > as done in modern protocols is not practical, resulting in
> > bad performance under RSS".
> 
> I agree. But not necessarily the legacy system, some scenarios need to
> connect multiple tunnels, for compatibility, they will not use optional
> fields or choose the old tunnel protocol.

compatibility ... with legacy systems, no?

> > 
> > 
> > > 2. The same flow can traverse through different tunnels, resulting in the encapsulated
> > > packets being spread across multiple receive queues (refer to the figure below).
> > > However, in certain scenarios, it becomes necessary to direct these encapsulated
> > > packets of the same flow to a single receive queue. This facilitates the processing
> > > of the flow by the same CPU to improve performance (warm caches, less locking, etc.).
> > > 
> > >                 client1                    client2
> > >                    |                          |
> > >                    |        +-------+         |
> > >                    +------->|tunnels|<--------+
> > >                             +-------+
> > >                                |  |
> > >                                |  |
> > >                                v  v
> > >                        +-----------------+
> > >                        | processing host |
> > >                        +-----------------+
> > necessary is too strong a word I feel.
> > All this is, is an optimization, we don't really know how strong it is
> > even.
> > 
> > Here's how I understand this:
> > 
> > Imagine two clients client1 and client2 talking to each other.
> > A copy of all packets is sent to a processing host over a virtio device.
> > Two directions of the same flow between two clients might be
> > encapsulated in two different tunnels, with current RSS
> > strategies they would land on two arbitrary, unrelated queues.
> > As an optimization, some hosts might wish to make sure both directions
> > of the encapsulated flow land on the same queue.
> > 
> > 
> > Is this a good summary?
> 
> I think yes.
> 
> > 
> > 
> > Now that things begin to be clearer, I kind of begin to agree with
> > Jason's suggestion that this is extremely narrow.  And what if I want
> > one direction on queue1 and another one queue2 e.g. adjacent numbers for
> 
> I don't understand why we need this, can you point out some usage scenarios?

If traffic is predominantly UDP, each queue can be processed in
parallel. If you need to look at the other side of the flow once
in a while, you can find it by doing ^1.

> > the same flow?  If enough people agree this is needed we can accept this
> > but did you at all consider using something programmable like BPF for
> 
> I think the problem is that our virtio device cannot support ebpf, we can
> also ask Alvaro, Parav if their virtio devices can support ebpf offloading.
> :)

This isn't ebpf, more like classic bpf. Just math done on packets,
no tables.


> > this?  Considering we are putting not insignificant amount of work into
> > this, making this widely useful would be better than a narrow
> > optimization for a very specific usecase.
> > 
> > 
> > > To achieve this, the device can calculate a symmetric hash based on the inner packet
> > > headers of the flow. The symmetric hash disregards the order of the 5-tuple when
> > > computing the hash.
> > when you say symmetric hash you really mean symmetric key for toeplitz, yes?
> > It's not that it disregards order, it just gives the same result if
> > you reverse source and destination, no?
> 
> Yes, symmetric hashes can use the key with 2 same bytes repeated, and only
> support reverse source and destination.

So, this won't work if some inner flows are IPv4 and others IPv6, right?
You have to know the inner flow format?

> > 
> > 
> > > Reviewed-by: Jason Wang <jasowang@redhat.com>
> > > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > > v10->v11:
> > > 	1. Revise commit log for clarity for readers.
> > > 	2. Some modifications to avoid undefined terms. @Parav Pandit
> > > 	3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
> > > 	4. Add the normative statements. @Parav Pandit
> > > 
> > > v9->v10:
> > > 	1. Removed hash_report_tunnel related information. @Parav Pandit
> > > 	2. Re-describe the limitations of QoS for tunneling.
> > > 	3. Some clarification.
> > > 
> > > v8->v9:
> > > 	1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
> > > 	2. Add tunnel security section. @Michael S . Tsirkin
> > > 	3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
> > > 	4. Fix some typos.
> > > 	5. Add more tunnel types. @Michael S . Tsirkin
> > > 
> > > v7->v8:
> > > 	1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
> > > 	2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
> > > 	3. Removed re-definition for inner packet hashing. @Parav Pandit
> > > 	4. Fix some typos. @Michael S . Tsirkin
> > > 	5. Clarify some sentences. @Michael S . Tsirkin
> > > 
> > > v6->v7:
> > > 	1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
> > > 	2. Fix some syntax issues. @Michael S. Tsirkin
> > > 
> > > v5->v6:
> > > 	1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
> > > 	2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
> > > 	3. Move the links to introduction section. @Michael S. Tsirkin
> > > 	4. Clarify some sentences. @Michael S. Tsirkin
> > > 
> > > v4->v5:
> > > 	1. Clarify some paragraphs. @Cornelia Huck
> > > 	2. Fix the u8 type. @Cornelia Huck
> > > 
> > > v3->v4:
> > > 	1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
> > > 	2. Make things clearer. @Jason Wang @Michael S. Tsirkin
> > > 	3. Keep the possibility to use inner hash for automatic receive steering. @Jason Wang
> > > 	4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. many times. @Michael S. Tsirkin
> > > 
> > > v2->v3:
> > > 	1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
> > > 	2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason Wang, @Michael S. Tsirkin
> > > 
> > > v1->v2:
> > > 	1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
> > > 	2. Clarify some paragraphs. @Jason Wang
> > > 	3. Add \field{hash_tunnel} and VIRTIO_NET_HASH_REPORT_GRE. @Yuri Benditovich
> > > 
> > >   device-types/net/description.tex        | 119 +++++++++++++++++++++++-
> > >   device-types/net/device-conformance.tex |   1 +
> > >   device-types/net/driver-conformance.tex |   1 +
> > >   introduction.tex                        |  24 +++++
> > >   4 files changed, 144 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> > > index 0500bb6..49dee2f 100644
> > > --- a/device-types/net/description.tex
> > > +++ b/device-types/net/description.tex
> > > @@ -83,6 +83,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> > >   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > >       channel.
> > > +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner packet header hash
> > > +    for tunnel-encapsulated packets.
> > > +
> > >   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> > >   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > > @@ -139,6 +142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> > >   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> > >   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> > >   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > > +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with VIRTIO_NET_F_RSS and/or VIRTIO_NET_F_HASH_REPORT.
> > >   \end{description}
> > >   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > > @@ -198,6 +202,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
> > >           u8 rss_max_key_size;
> > >           le16 rss_max_indirection_table_length;
> > >           le32 supported_hash_types;
> > > +        le32 supported_tunnel_hash_types;
> > >   };
> > >   \end{lstlisting}
> > >   The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
> > > @@ -212,6 +217,12 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
> > >   Field \field{supported_hash_types} contains the bitmask of supported hash types.
> > >   See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types.
> > > +The next field, \field{supported_tunnel_hash_types} only exists if the device
> > > +supports inner packet header hash, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set.
> > > +
> > > +Field \field{supported_tunnel_hash_types} contains the bitmask of supported tunnel hash types.
> > > +See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types} for details of supported tunnel hash types.
> > > +
> > >   \devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
> > >   The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
> > > @@ -848,6 +859,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > >   If the feature VIRTIO_NET_F_RSS was negotiated:
> > >   \begin{itemize}
> > >   \item The device uses \field{hash_types} of the virtio_net_rss_config structure as 'Enabled hash types' bitmask.
> > > +\item The device uses \field{hash_tunnel_types} of the virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
> > >   \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_rss_config structure (see
> > >   \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / Setting RSS parameters}).
> > >   \end{itemize}
> > > @@ -855,6 +867,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > >   If the feature VIRTIO_NET_F_RSS was not negotiated:
> > >   \begin{itemize}
> > >   \item The device uses \field{hash_types} of the virtio_net_hash_config structure as 'Enabled hash types' bitmask.
> > > +\item The device uses \field{hash_tunnel_types} of the virtio_net_hash_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
> > >   \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_hash_config structure (see
> > >   \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}).
> > >   \end{itemize}
> > > @@ -870,6 +883,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > >   \subparagraph{Supported/enabled hash types}
> > >   \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
> > > +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
> > > +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
> > >   Hash types applicable for IPv4 packets:
> > >   \begin{lstlisting}
> > >   #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
> > > @@ -980,6 +995,99 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > >   (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}).
> > >   \end{itemize}
> > > +\paragraph{Inner Packet Header Hash}
> > > +If the driver negotiates the VIRTIO_NET_F_HASH_TUNNEL feature, it can configure the
> > > +hash parameters (including \field{hash_tunnel_types}) for inner packet header hash
> > > +through the VIRTIO_NET_CTRL_MQ_HASH_CONFIG or the VIRTIO_NET_CTRL_RSS_CONFIG command.
> > > +If multiple commands are sent, the device configuration will be defined by the last command received.
> > > +
> > > +If a specific encapsulation type is set in \field{hash_tunnel_types}, the device will calculate the
> > > +hash on the inner packet header of the encapsulated packet (See \ref{sec:Device Types
> > > +/ Network Device / Device OperatiHn / Processing of Incoming Packets /
> > > +Hash calculation for incoming packets / Tunnel/Encapsulated packet}). If the encapsulation
> > > +type is not included in \field{hash_tunnel_types} or the value of \field{hash_tunnel_types}
> > > +is VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash on the outer header.
> > > +
> > > +\field{hash_tunnel_types} is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for non-encapsulated packets.
> > > +
> > > +\subparagraph{Tunnel/Encapsulated packet}
> > > +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet}
> > > +A tunnel packet is encapsulated from the original packet based on the tunneling
> > > +protocol (only a single level of encapsulation is currently supported). The
> > > +encapsulated packet contains an outer header and an inner header, and the device
> > > +calculates the hash over either the inner header or the outer header.
> > > +
> > > +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received encapsulated
> > > +packet's outer header matches one of the supported \field{hash_tunnel_types},
> > > +the hash of the inner header is calculated. Supported encapsulation types are listed
> > > +in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming
> > > +Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > > +
> > > +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]}, \hyperref[intro:VXLAN]{[VXLAN]},
> > > +\hyperref[intro:GENEVE]{[GENEVE]}, \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
> > > +
> > > +\subparagraph{Supported/enabled tunnel hash types}
> > > +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types}
> > > +If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and \field{hash_tunnel_types}
> > > +is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash using the
> > > +outer header of the encapsulated packet.
> > > +\begin{lstlisting}
> > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NONE        (1 << 0)
> > > +\end{lstlisting}
> > > +
> > > +The encapsulation hash type below indicates that the hash is calculated over the
> > > +inner packet header:
> > > +Hash type applicable for inner payload of the gre-encapsulated packet
> > > +\begin{lstlisting}
> > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 1)
> > > +\end{lstlisting}
> > > +Hash type applicable for inner payload of the vxlan-encapsulated packet
> > > +\begin{lstlisting}
> > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 2)
> > > +\end{lstlisting}
> > > +Hash type applicable for inner payload of the geneve-encapsulated packet
> > > +\begin{lstlisting}
> > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 3)
> > > +\end{lstlisting}
> > > +Hash type applicable for inner payload of the ip-encapsulated packet
> > > +\begin{lstlisting}
> > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 4)
> > > +\end{lstlisting}
> > > +Hash type applicable for inner payload of the nvgre-encapsulated packet
> > > +\begin{lstlisting}
> > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 5)
> > > +\end{lstlisting}
> > > +
> > > +\subparagraph{Tunnel QoS limitation}
> > > +When a specific receive queue is shared by multiple tunnels to receive encapsulated packets,
> > > +there is no quality of service (QoS) for these packets. For example, when the packets of certain
> > > +tunnels are spread across multiple receive queues, these receive queues may have an unbalanced
> > > +amount of packets. This can cause a specific receive queue to become full, resulting in packet loss.
> > > +
> > > +Possible mitigations:
> > > +\begin{itemize}
> > > +\item Use a tool with good forwarding performance to keep the receive queue from filling up.
> > > +\item If the QoS is unavailable, the driver can set \field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
> > > +      to disable inner packet hash for encapsulated packets.
> > > +\item Choose a hash key that can avoid queue collisions.
> > > +\item Perform appropriate QoS before packets consume the receive buffers of the receive queues.
> > > +\end{itemize}
> > > +
> > > +The limitations mentioned above exist with/without the inner packer header hash.
> > > +
> > > +\devicenormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > > +
> > > +The device MUST calculate the outer packet hash if the received encapsulated packet has an encapsulation type not in \field{supported_tunnel_hash_types}.
> > > +
> > > +The device MUST drop the encapsulated packet if the destination receive queue is being reset.
> > I'm not sure how this last one got here. It seems to have nothing to do
> > with encapsulation - if we want to we should require this for all
> > packets or none at all.
> 
> Yes, you are right. It works for all packets.
> 
> > 
> > 
> > > +\drivernormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > > +
> > > +If the driver does not negotiate the VIRTIO_NET_F_HASH_TUNNEL feature, it MUST set \field{hash_tunnel_types}
> > > +to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE before issuing the command VIRTIO_NET_CTRL_MQ_HASH_CONFIG or VIRTIO_NET_CTRL_RSS_CONFIG.
> > > +
> > > +The driver MUST set \field{hash_tunnel_types} to the encapsulation types supported by the device.
> > unclear. seems to mean all types must be approved
> > where you really mean "only those types". original for non tunnel is:
> > 
> > A driver MUST NOT set any VIRTIO_NET_HASH_TYPE_ flags that are not supported by a device.
> > 
> > which is clear though a bit verbose with two negations.
> 
> Yes, we can use the same sentence structure to illustrate.
> 
> > 
> > Also here it says "supported" but below it says "allowed".
> > 
> > 
> > 
> > >   \paragraph{Hash reporting for incoming packets}
> > >   \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting for incoming packets}
> > > @@ -1392,12 +1500,17 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > >       le16 reserved[4];
> > >       u8 hash_key_length;
> > >       u8 hash_key_data[hash_key_length];
> > > +    le32 hash_tunnel_types;
> > >   };
> > Hmm this fixed type after variable type is problematic - might
> > become unaligned. We could use some of reserved[4]
> > for this ...
> > 
> 
> This is a problem, and perhaps Parav's proposal of using a separate command
> and structure for inner hash is correct.
> 
> > 
> > >   \end{lstlisting}
> > >   Field \field{hash_types} contains a bitmask of allowed hash types as
> > >   defined in
> > >   \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
> > > -Initially the device has all hash types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
> > > +
> > > +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> > > +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > > +
> > > +Initially the device has all hash types and hash tunnel types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
> > >   Field \field{reserved} MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure,
> > >   defined in \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS)}.
> > > @@ -1421,6 +1534,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > >       le16 max_tx_vq;
> > >       u8 hash_key_length;
> > >       u8 hash_key_data[hash_key_length];
> > > +    le32 hash_tunnel_types;
> > 
> > Same alignment problem here but I'm not sure how to solve it.
> > Suggestions?
> > 
> > >   };
> > >   \end{lstlisting}
> > >   Field \field{hash_types} contains a bitmask of allowed hash types as
> > > @@ -1441,6 +1555,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > >   Fields \field{hash_key_length} and \field{hash_key_data} define the key to be used in hash calculation.
> > > +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> > > +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > > +
> > >   \drivernormative{\subparagraph}{Setting RSS parameters}{Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > >   A driver MUST NOT send the VIRTIO_NET_CTRL_MQ_RSS_CONFIG command if the feature VIRTIO_NET_F_RSS has not been negotiated.
> > > diff --git a/device-types/net/device-conformance.tex b/device-types/net/device-conformance.tex
> > > index 54f6783..0ff5944 100644
> > > --- a/device-types/net/device-conformance.tex
> > > +++ b/device-types/net/device-conformance.tex
> > > @@ -14,4 +14,5 @@
> > >   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> > >   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> > >   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > >   \end{itemize}
> > > diff --git a/device-types/net/driver-conformance.tex b/device-types/net/driver-conformance.tex
> > > index 97d0cc1..951be89 100644
> > > --- a/device-types/net/driver-conformance.tex
> > > +++ b/device-types/net/driver-conformance.tex
> > > @@ -14,4 +14,5 @@
> > >   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> > >   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > >   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > >   \end{itemize}
> > > diff --git a/introduction.tex b/introduction.tex
> > > index 287c5fc..25c9d48 100644
> > > --- a/introduction.tex
> > > +++ b/introduction.tex
> > > @@ -99,6 +99,30 @@ \section{Normative References}\label{sec:Normative References}
> > >       Standards for Efficient Cryptography Group(SECG), ``SEC1: Elliptic Cureve Cryptography'', Version 1.0, September 2000.
> > >   	\newline\url{https://www.secg.org/sec1-v2.pdf}\\
> > > +	\phantomsection\label{intro:GRE}\textbf{[GRE]} &
> > > +    Generic Routing Encapsulation
> > > +	\newline\url{https://datatracker.ietf.org/doc/rfc2784/}\\
> > This is GRE over IPv4.
> > So we are not supporting GRE over IPv6?
> 
> Yes. Do we need to add it?
> https://datatracker.ietf.org/doc/rfc7676/

If you want to support it, yes.

> > 
> > And we do not support optional keys?
> 
> We did not disallow optional fields.
> 
> Thanks.

The spec you link to does not include this.

> > 
> > 
> > 
> > > +	\phantomsection\label{intro:VXLAN}\textbf{[VXLAN]} &
> > > +    Virtual eXtensible Local Area Network
> > > +	\newline\url{https://datatracker.ietf.org/doc/rfc7348/}\\
> > > +	\phantomsection\label{intro:GENEVE}\textbf{[GENEVE]} &
> > > +    Generic Network Virtualization Encapsulation
> > > +	\phantomsection\label{intro:IPIP}\textbf{[IPIP]} &
> > > +    IP Encapsulation within IP
> > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc2003}\\
> > > +	\phantomsection\label{intro:IPIP}\textbf{[NVGRE]} &
> > > +    NVGRE: Network Virtualization Using Generic Routing Encapsulation
> > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc7637.html}\\
> > > +	\newline\url{https://datatracker.ietf.org/doc/rfc8926/}\\
> > > +	\phantomsection\label{intro:IP}\textbf{[IP]} &
> > > +    INTERNET PROTOCOL
> > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc791}\\
> > > +	\phantomsection\label{intro:UDP}\textbf{[UDP]} &
> > > +    User Datagram Protocol
> > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc768}\\
> > > +	\phantomsection\label{intro:TCP}\textbf{[TCP]} &
> > > +    TRANSMISSION CONTROL PROTOCOL
> > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc793}\\
> > >   \end{longtable}
> > >   \section{Non-Normative References}
> > > -- 
> > > 2.19.1.6.gb485710b
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-21  4:19     ` Parav Pandit
@ 2023-03-21  7:37       ` Michael S. Tsirkin
  2023-03-21 19:46         ` Parav Pandit
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2023-03-21  7:37 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Heng Qi, Alvaro Karsz, virtio-dev, virtio-comment, Jason Wang,
	Yuri Benditovich, Xuan Zhuo

On Tue, Mar 21, 2023 at 04:19:17AM +0000, Parav Pandit wrote:
> One (this proposal) is solving spread to different RSS queues.

Spread is mostly ok with modern protocols though. It seems to
optimize for a specific monitoring solution.

> Another one is finding out which exact packet to drop/pass when queue usage is high. (ebpf/tc other ways to solve it).
> 
> Ebpf sounds cooler than the real offload implementation in the hw device at the current level.
> I remember Jason's good talk on the ebpf a few years back, which is possible when done in sw on the hypervisor.

I was talking about classic bpf though. no state.

> On mlx5 device inner hash is supported for IPnIP and GRE tunnels.

which variants of GRE are supported?

> None of the existing users see tunnel attacks when used as forwarding plane.

good to know.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-21  7:34     ` Michael S. Tsirkin
@ 2023-03-21 14:49       ` Heng Qi
  2023-03-21 15:58         ` Michael S. Tsirkin
  2023-03-23  2:52         ` Jason Wang
  0 siblings, 2 replies; 19+ messages in thread
From: Heng Qi @ 2023-03-21 14:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Alvaro Karsz, virtio-dev, virtio-comment,
	Jason Wang, Yuri Benditovich, Xuan Zhuo



在 2023/3/21 下午3:34, Michael S. Tsirkin 写道:
> On Tue, Mar 21, 2023 at 11:56:14AM +0800, Heng Qi wrote:
>>
>> 在 2023/3/21 上午3:43, Michael S. Tsirkin 写道:
>>> On Mon, Mar 20, 2023 at 07:18:40PM +0800, Heng Qi wrote:
>>>> 1. Currently, a received encapsulated packet has an outer and an inner header, but
>>>> the virtio device is unable to calculate the hash for the inner header. Multiple
>>>> flows with the same outer header but different inner headers are steered to the
>>>> same receive queue. This results in poor receive performance.
>>>>
>>>> To address this limitation, a new feature VIRTIO_NET_F_HASH_TUNNEL has been
>>>> introduced, which enables the device to advertise the capability to calculate the
>>>> hash for the inner packet header. Compared with the out header hash, it regains
>>>> better receive performance.
>>> So this would be a very good argument however the cost would be it would
>>> seem we have to keep extending this indefinitely as new tunneling
>>> protocols come to light.
>>> But I believe in fact we don't at least for this argument:
>>> the standard way to address this is actually by propagating entropy
>>> from inner to outer header.
>> Yes, we don't argue with this.
>>
>>> So I'd maybe reorder the commit log and give the explanation 2 below
>>> then say "for some legacy systems
>>> including entropy in IP header
>>> as done in modern protocols is not practical, resulting in
>>> bad performance under RSS".
>> I agree. But not necessarily the legacy system, some scenarios need to
>> connect multiple tunnels, for compatibility, they will not use optional
>> fields or choose the old tunnel protocol.
> compatibility ... with legacy systems, no?
>
>>>
>>>> 2. The same flow can traverse through different tunnels, resulting in the encapsulated
>>>> packets being spread across multiple receive queues (refer to the figure below).
>>>> However, in certain scenarios, it becomes necessary to direct these encapsulated
>>>> packets of the same flow to a single receive queue. This facilitates the processing
>>>> of the flow by the same CPU to improve performance (warm caches, less locking, etc.).
>>>>
>>>>                  client1                    client2
>>>>                     |                          |
>>>>                     |        +-------+         |
>>>>                     +------->|tunnels|<--------+
>>>>                              +-------+
>>>>                                 |  |
>>>>                                 |  |
>>>>                                 v  v
>>>>                         +-----------------+
>>>>                         | processing host |
>>>>                         +-----------------+
>>> necessary is too strong a word I feel.
>>> All this is, is an optimization, we don't really know how strong it is
>>> even.
>>>
>>> Here's how I understand this:
>>>
>>> Imagine two clients client1 and client2 talking to each other.
>>> A copy of all packets is sent to a processing host over a virtio device.
>>> Two directions of the same flow between two clients might be
>>> encapsulated in two different tunnels, with current RSS
>>> strategies they would land on two arbitrary, unrelated queues.
>>> As an optimization, some hosts might wish to make sure both directions
>>> of the encapsulated flow land on the same queue.
>>>
>>>
>>> Is this a good summary?
>> I think yes.
>>
>>>
>>> Now that things begin to be clearer, I kind of begin to agree with
>>> Jason's suggestion that this is extremely narrow.  And what if I want
>>> one direction on queue1 and another one queue2 e.g. adjacent numbers for
>> I don't understand why we need this, can you point out some usage scenarios?
> If traffic is predominantly UDP, each queue can be processed in
> parallel. If you need to look at the other side of the flow once
> in a while, you can find it by doing ^1.

I'm not sure if I align with you, but I try to answer. When we try to 
place traffic in one direction on a certain queue,
it means that we have calculated the hash, we can record the five-tuple 
information and the queue number. When
the traffic in the other direction comes, we can match what we just 
recorded information and place it on the ^1 queue.

>
>>> the same flow?  If enough people agree this is needed we can accept this
>>> but did you at all consider using something programmable like BPF for
>> I think the problem is that our virtio device cannot support ebpf, we can
>> also ask Alvaro, Parav if their virtio devices can support ebpf offloading.
>> :)
> This isn't ebpf, more like classic bpf. Just math done on packets,
> no tables.

We would also really like to use simple bpf offloading, which is cool. 
But it still takes time, for example to
support parsing of bpf instructions etc. on devices like fpga, which 
they can't do easily now. Few devices
are supported right now, I only see support for the netronome iNIC in 
the kernel.

    #git grep XDP_SETUP_PROG_HW
    drivers/net/ethernet/netronome/nfp/nfp_net_common.c:    case 
XDP_SETUP_PROG_HW:
    drivers/net/netdevsim/bpf.c:    if (bpf->command == 
XDP_SETUP_PROG_HW && !ns->bpf_xdpoffload_accept) {
    drivers/net/netdevsim/bpf.c:    if (bpf->command == XDP_SETUP_PROG_HW) {
    drivers/net/netdevsim/bpf.c:    case XDP_SETUP_PROG_HW:
    include/linux/netdevice.h:      XDP_SETUP_PROG_HW,
    net/core/dev.c: xdp.command = mode == XDP_MODE_HW ? 
XDP_SETUP_PROG_HW : XDP_SETUP_PROG;


>
>
>>> this?  Considering we are putting not insignificant amount of work into
>>> this, making this widely useful would be better than a narrow
>>> optimization for a very specific usecase.
>>>
>>>
>>>> To achieve this, the device can calculate a symmetric hash based on the inner packet
>>>> headers of the flow. The symmetric hash disregards the order of the 5-tuple when
>>>> computing the hash.
>>> when you say symmetric hash you really mean symmetric key for toeplitz, yes?
>>> It's not that it disregards order, it just gives the same result if
>>> you reverse source and destination, no?
>> Yes, symmetric hashes can use the key with 2 same bytes repeated, and only
>> support reverse source and destination.
> So, this won't work if some inner flows are IPv4 and others IPv6, right?
> You have to know the inner flow format?

Yes, we need.

>
>>>
>>>> Reviewed-by: Jason Wang <jasowang@redhat.com>
>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>> ---
>>>> v10->v11:
>>>> 	1. Revise commit log for clarity for readers.
>>>> 	2. Some modifications to avoid undefined terms. @Parav Pandit
>>>> 	3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
>>>> 	4. Add the normative statements. @Parav Pandit
>>>>
>>>> v9->v10:
>>>> 	1. Removed hash_report_tunnel related information. @Parav Pandit
>>>> 	2. Re-describe the limitations of QoS for tunneling.
>>>> 	3. Some clarification.
>>>>
>>>> v8->v9:
>>>> 	1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
>>>> 	2. Add tunnel security section. @Michael S . Tsirkin
>>>> 	3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
>>>> 	4. Fix some typos.
>>>> 	5. Add more tunnel types. @Michael S . Tsirkin
>>>>
>>>> v7->v8:
>>>> 	1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
>>>> 	2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
>>>> 	3. Removed re-definition for inner packet hashing. @Parav Pandit
>>>> 	4. Fix some typos. @Michael S . Tsirkin
>>>> 	5. Clarify some sentences. @Michael S . Tsirkin
>>>>
>>>> v6->v7:
>>>> 	1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
>>>> 	2. Fix some syntax issues. @Michael S. Tsirkin
>>>>
>>>> v5->v6:
>>>> 	1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
>>>> 	2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
>>>> 	3. Move the links to introduction section. @Michael S. Tsirkin
>>>> 	4. Clarify some sentences. @Michael S. Tsirkin
>>>>
>>>> v4->v5:
>>>> 	1. Clarify some paragraphs. @Cornelia Huck
>>>> 	2. Fix the u8 type. @Cornelia Huck
>>>>
>>>> v3->v4:
>>>> 	1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
>>>> 	2. Make things clearer. @Jason Wang @Michael S. Tsirkin
>>>> 	3. Keep the possibility to use inner hash for automatic receive steering. @Jason Wang
>>>> 	4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. many times. @Michael S. Tsirkin
>>>>
>>>> v2->v3:
>>>> 	1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
>>>> 	2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason Wang, @Michael S. Tsirkin
>>>>
>>>> v1->v2:
>>>> 	1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
>>>> 	2. Clarify some paragraphs. @Jason Wang
>>>> 	3. Add \field{hash_tunnel} and VIRTIO_NET_HASH_REPORT_GRE. @Yuri Benditovich
>>>>
>>>>    device-types/net/description.tex        | 119 +++++++++++++++++++++++-
>>>>    device-types/net/device-conformance.tex |   1 +
>>>>    device-types/net/driver-conformance.tex |   1 +
>>>>    introduction.tex                        |  24 +++++
>>>>    4 files changed, 144 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
>>>> index 0500bb6..49dee2f 100644
>>>> --- a/device-types/net/description.tex
>>>> +++ b/device-types/net/description.tex
>>>> @@ -83,6 +83,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>>>    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>>>        channel.
>>>> +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner packet header hash
>>>> +    for tunnel-encapsulated packets.
>>>> +
>>>>    \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>>>    \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>>>> @@ -139,6 +142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>>>    \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>>>    \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>>>    \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>>>> +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with VIRTIO_NET_F_RSS and/or VIRTIO_NET_F_HASH_REPORT.
>>>>    \end{description}
>>>>    \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
>>>> @@ -198,6 +202,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>>>>            u8 rss_max_key_size;
>>>>            le16 rss_max_indirection_table_length;
>>>>            le32 supported_hash_types;
>>>> +        le32 supported_tunnel_hash_types;
>>>>    };
>>>>    \end{lstlisting}
>>>>    The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
>>>> @@ -212,6 +217,12 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>>>>    Field \field{supported_hash_types} contains the bitmask of supported hash types.
>>>>    See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types.
>>>> +The next field, \field{supported_tunnel_hash_types} only exists if the device
>>>> +supports inner packet header hash, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set.
>>>> +
>>>> +Field \field{supported_tunnel_hash_types} contains the bitmask of supported tunnel hash types.
>>>> +See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types} for details of supported tunnel hash types.
>>>> +
>>>>    \devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
>>>>    The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
>>>> @@ -848,6 +859,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>>    If the feature VIRTIO_NET_F_RSS was negotiated:
>>>>    \begin{itemize}
>>>>    \item The device uses \field{hash_types} of the virtio_net_rss_config structure as 'Enabled hash types' bitmask.
>>>> +\item The device uses \field{hash_tunnel_types} of the virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
>>>>    \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_rss_config structure (see
>>>>    \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / Setting RSS parameters}).
>>>>    \end{itemize}
>>>> @@ -855,6 +867,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>>    If the feature VIRTIO_NET_F_RSS was not negotiated:
>>>>    \begin{itemize}
>>>>    \item The device uses \field{hash_types} of the virtio_net_hash_config structure as 'Enabled hash types' bitmask.
>>>> +\item The device uses \field{hash_tunnel_types} of the virtio_net_hash_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
>>>>    \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_hash_config structure (see
>>>>    \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}).
>>>>    \end{itemize}
>>>> @@ -870,6 +883,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>>    \subparagraph{Supported/enabled hash types}
>>>>    \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
>>>> +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
>>>> +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
>>>>    Hash types applicable for IPv4 packets:
>>>>    \begin{lstlisting}
>>>>    #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
>>>> @@ -980,6 +995,99 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>>    (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}).
>>>>    \end{itemize}
>>>> +\paragraph{Inner Packet Header Hash}
>>>> +If the driver negotiates the VIRTIO_NET_F_HASH_TUNNEL feature, it can configure the
>>>> +hash parameters (including \field{hash_tunnel_types}) for inner packet header hash
>>>> +through the VIRTIO_NET_CTRL_MQ_HASH_CONFIG or the VIRTIO_NET_CTRL_RSS_CONFIG command.
>>>> +If multiple commands are sent, the device configuration will be defined by the last command received.
>>>> +
>>>> +If a specific encapsulation type is set in \field{hash_tunnel_types}, the device will calculate the
>>>> +hash on the inner packet header of the encapsulated packet (See \ref{sec:Device Types
>>>> +/ Network Device / Device OperatiHn / Processing of Incoming Packets /
>>>> +Hash calculation for incoming packets / Tunnel/Encapsulated packet}). If the encapsulation
>>>> +type is not included in \field{hash_tunnel_types} or the value of \field{hash_tunnel_types}
>>>> +is VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash on the outer header.
>>>> +
>>>> +\field{hash_tunnel_types} is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for non-encapsulated packets.
>>>> +
>>>> +\subparagraph{Tunnel/Encapsulated packet}
>>>> +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet}
>>>> +A tunnel packet is encapsulated from the original packet based on the tunneling
>>>> +protocol (only a single level of encapsulation is currently supported). The
>>>> +encapsulated packet contains an outer header and an inner header, and the device
>>>> +calculates the hash over either the inner header or the outer header.
>>>> +
>>>> +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received encapsulated
>>>> +packet's outer header matches one of the supported \field{hash_tunnel_types},
>>>> +the hash of the inner header is calculated. Supported encapsulation types are listed
>>>> +in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming
>>>> +Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
>>>> +
>>>> +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]}, \hyperref[intro:VXLAN]{[VXLAN]},
>>>> +\hyperref[intro:GENEVE]{[GENEVE]}, \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
>>>> +
>>>> +\subparagraph{Supported/enabled tunnel hash types}
>>>> +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types}
>>>> +If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and \field{hash_tunnel_types}
>>>> +is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash using the
>>>> +outer header of the encapsulated packet.
>>>> +\begin{lstlisting}
>>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NONE        (1 << 0)
>>>> +\end{lstlisting}
>>>> +
>>>> +The encapsulation hash type below indicates that the hash is calculated over the
>>>> +inner packet header:
>>>> +Hash type applicable for inner payload of the gre-encapsulated packet
>>>> +\begin{lstlisting}
>>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 1)
>>>> +\end{lstlisting}
>>>> +Hash type applicable for inner payload of the vxlan-encapsulated packet
>>>> +\begin{lstlisting}
>>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 2)
>>>> +\end{lstlisting}
>>>> +Hash type applicable for inner payload of the geneve-encapsulated packet
>>>> +\begin{lstlisting}
>>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 3)
>>>> +\end{lstlisting}
>>>> +Hash type applicable for inner payload of the ip-encapsulated packet
>>>> +\begin{lstlisting}
>>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 4)
>>>> +\end{lstlisting}
>>>> +Hash type applicable for inner payload of the nvgre-encapsulated packet
>>>> +\begin{lstlisting}
>>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 5)
>>>> +\end{lstlisting}
>>>> +
>>>> +\subparagraph{Tunnel QoS limitation}
>>>> +When a specific receive queue is shared by multiple tunnels to receive encapsulated packets,
>>>> +there is no quality of service (QoS) for these packets. For example, when the packets of certain
>>>> +tunnels are spread across multiple receive queues, these receive queues may have an unbalanced
>>>> +amount of packets. This can cause a specific receive queue to become full, resulting in packet loss.
>>>> +
>>>> +Possible mitigations:
>>>> +\begin{itemize}
>>>> +\item Use a tool with good forwarding performance to keep the receive queue from filling up.
>>>> +\item If the QoS is unavailable, the driver can set \field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
>>>> +      to disable inner packet hash for encapsulated packets.
>>>> +\item Choose a hash key that can avoid queue collisions.
>>>> +\item Perform appropriate QoS before packets consume the receive buffers of the receive queues.
>>>> +\end{itemize}
>>>> +
>>>> +The limitations mentioned above exist with/without the inner packer header hash.
>>>> +
>>>> +\devicenormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>>> +
>>>> +The device MUST calculate the outer packet hash if the received encapsulated packet has an encapsulation type not in \field{supported_tunnel_hash_types}.
>>>> +
>>>> +The device MUST drop the encapsulated packet if the destination receive queue is being reset.
>>> I'm not sure how this last one got here. It seems to have nothing to do
>>> with encapsulation - if we want to we should require this for all
>>> packets or none at all.
>> Yes, you are right. It works for all packets.
>>
>>>
>>>> +\drivernormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>>> +
>>>> +If the driver does not negotiate the VIRTIO_NET_F_HASH_TUNNEL feature, it MUST set \field{hash_tunnel_types}
>>>> +to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE before issuing the command VIRTIO_NET_CTRL_MQ_HASH_CONFIG or VIRTIO_NET_CTRL_RSS_CONFIG.
>>>> +
>>>> +The driver MUST set \field{hash_tunnel_types} to the encapsulation types supported by the device.
>>> unclear. seems to mean all types must be approved
>>> where you really mean "only those types". original for non tunnel is:
>>>
>>> A driver MUST NOT set any VIRTIO_NET_HASH_TYPE_ flags that are not supported by a device.
>>>
>>> which is clear though a bit verbose with two negations.
>> Yes, we can use the same sentence structure to illustrate.
>>
>>> Also here it says "supported" but below it says "allowed".
>>>
>>>
>>>
>>>>    \paragraph{Hash reporting for incoming packets}
>>>>    \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting for incoming packets}
>>>> @@ -1392,12 +1500,17 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>>        le16 reserved[4];
>>>>        u8 hash_key_length;
>>>>        u8 hash_key_data[hash_key_length];
>>>> +    le32 hash_tunnel_types;
>>>>    };
>>> Hmm this fixed type after variable type is problematic - might
>>> become unaligned. We could use some of reserved[4]
>>> for this ...
>>>
>> This is a problem, and perhaps Parav's proposal of using a separate command
>> and structure for inner hash is correct.
>>
>>>>    \end{lstlisting}
>>>>    Field \field{hash_types} contains a bitmask of allowed hash types as
>>>>    defined in
>>>>    \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
>>>> -Initially the device has all hash types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
>>>> +
>>>> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
>>>> +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
>>>> +
>>>> +Initially the device has all hash types and hash tunnel types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
>>>>    Field \field{reserved} MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure,
>>>>    defined in \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS)}.
>>>> @@ -1421,6 +1534,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>>        le16 max_tx_vq;
>>>>        u8 hash_key_length;
>>>>        u8 hash_key_data[hash_key_length];
>>>> +    le32 hash_tunnel_types;
>>> Same alignment problem here but I'm not sure how to solve it.
>>> Suggestions?
>>>
>>>>    };
>>>>    \end{lstlisting}
>>>>    Field \field{hash_types} contains a bitmask of allowed hash types as
>>>> @@ -1441,6 +1555,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>>    Fields \field{hash_key_length} and \field{hash_key_data} define the key to be used in hash calculation.
>>>> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
>>>> +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
>>>> +
>>>>    \drivernormative{\subparagraph}{Setting RSS parameters}{Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>>>    A driver MUST NOT send the VIRTIO_NET_CTRL_MQ_RSS_CONFIG command if the feature VIRTIO_NET_F_RSS has not been negotiated.
>>>> diff --git a/device-types/net/device-conformance.tex b/device-types/net/device-conformance.tex
>>>> index 54f6783..0ff5944 100644
>>>> --- a/device-types/net/device-conformance.tex
>>>> +++ b/device-types/net/device-conformance.tex
>>>> @@ -14,4 +14,5 @@
>>>>    \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>>>>    \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>>>>    \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>>> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>>>    \end{itemize}
>>>> diff --git a/device-types/net/driver-conformance.tex b/device-types/net/driver-conformance.tex
>>>> index 97d0cc1..951be89 100644
>>>> --- a/device-types/net/driver-conformance.tex
>>>> +++ b/device-types/net/driver-conformance.tex
>>>> @@ -14,4 +14,5 @@
>>>>    \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>>>>    \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>>>    \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>>> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>>>    \end{itemize}
>>>> diff --git a/introduction.tex b/introduction.tex
>>>> index 287c5fc..25c9d48 100644
>>>> --- a/introduction.tex
>>>> +++ b/introduction.tex
>>>> @@ -99,6 +99,30 @@ \section{Normative References}\label{sec:Normative References}
>>>>        Standards for Efficient Cryptography Group(SECG), ``SEC1: Elliptic Cureve Cryptography'', Version 1.0, September 2000.
>>>>    	\newline\url{https://www.secg.org/sec1-v2.pdf}\\
>>>> +	\phantomsection\label{intro:GRE}\textbf{[GRE]} &
>>>> +    Generic Routing Encapsulation
>>>> +	\newline\url{https://datatracker.ietf.org/doc/rfc2784/}\\
>>> This is GRE over IPv4.
>>> So we are not supporting GRE over IPv6?
>> Yes. Do we need to add it?
>> https://datatracker.ietf.org/doc/rfc7676/
> If you want to support it, yes.
>
>>> And we do not support optional keys?
>> We did not disallow optional fields.
>>
>> Thanks.
> The spec you link to does not include this.

I'll add this. :)

Thanks!

>
>>>
>>>
>>>> +	\phantomsection\label{intro:VXLAN}\textbf{[VXLAN]} &
>>>> +    Virtual eXtensible Local Area Network
>>>> +	\newline\url{https://datatracker.ietf.org/doc/rfc7348/}\\
>>>> +	\phantomsection\label{intro:GENEVE}\textbf{[GENEVE]} &
>>>> +    Generic Network Virtualization Encapsulation
>>>> +	\phantomsection\label{intro:IPIP}\textbf{[IPIP]} &
>>>> +    IP Encapsulation within IP
>>>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc2003}\\
>>>> +	\phantomsection\label{intro:IPIP}\textbf{[NVGRE]} &
>>>> +    NVGRE: Network Virtualization Using Generic Routing Encapsulation
>>>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc7637.html}\\
>>>> +	\newline\url{https://datatracker.ietf.org/doc/rfc8926/}\\
>>>> +	\phantomsection\label{intro:IP}\textbf{[IP]} &
>>>> +    INTERNET PROTOCOL
>>>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc791}\\
>>>> +	\phantomsection\label{intro:UDP}\textbf{[UDP]} &
>>>> +    User Datagram Protocol
>>>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc768}\\
>>>> +	\phantomsection\label{intro:TCP}\textbf{[TCP]} &
>>>> +    TRANSMISSION CONTROL PROTOCOL
>>>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc793}\\
>>>>    \end{longtable}
>>>>    \section{Non-Normative References}
>>>> -- 
>>>> 2.19.1.6.gb485710b
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-21 14:49       ` Heng Qi
@ 2023-03-21 15:58         ` Michael S. Tsirkin
  2023-03-22 12:49           ` Heng Qi
  2023-03-23  2:52         ` Jason Wang
  1 sibling, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2023-03-21 15:58 UTC (permalink / raw)
  To: Heng Qi
  Cc: Parav Pandit, Alvaro Karsz, virtio-dev, virtio-comment,
	Jason Wang, Yuri Benditovich, Xuan Zhuo

On Tue, Mar 21, 2023 at 10:49:39PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/3/21 下午3:34, Michael S. Tsirkin 写道:
> > On Tue, Mar 21, 2023 at 11:56:14AM +0800, Heng Qi wrote:
> > > 
> > > 在 2023/3/21 上午3:43, Michael S. Tsirkin 写道:
> > > > On Mon, Mar 20, 2023 at 07:18:40PM +0800, Heng Qi wrote:
> > > > > 1. Currently, a received encapsulated packet has an outer and an inner header, but
> > > > > the virtio device is unable to calculate the hash for the inner header. Multiple
> > > > > flows with the same outer header but different inner headers are steered to the
> > > > > same receive queue. This results in poor receive performance.
> > > > > 
> > > > > To address this limitation, a new feature VIRTIO_NET_F_HASH_TUNNEL has been
> > > > > introduced, which enables the device to advertise the capability to calculate the
> > > > > hash for the inner packet header. Compared with the out header hash, it regains
> > > > > better receive performance.
> > > > So this would be a very good argument however the cost would be it would
> > > > seem we have to keep extending this indefinitely as new tunneling
> > > > protocols come to light.
> > > > But I believe in fact we don't at least for this argument:
> > > > the standard way to address this is actually by propagating entropy
> > > > from inner to outer header.
> > > Yes, we don't argue with this.
> > > 
> > > > So I'd maybe reorder the commit log and give the explanation 2 below
> > > > then say "for some legacy systems
> > > > including entropy in IP header
> > > > as done in modern protocols is not practical, resulting in
> > > > bad performance under RSS".
> > > I agree. But not necessarily the legacy system, some scenarios need to
> > > connect multiple tunnels, for compatibility, they will not use optional
> > > fields or choose the old tunnel protocol.
> > compatibility ... with legacy systems, no?
> > 
> > > > 
> > > > > 2. The same flow can traverse through different tunnels, resulting in the encapsulated
> > > > > packets being spread across multiple receive queues (refer to the figure below).
> > > > > However, in certain scenarios, it becomes necessary to direct these encapsulated
> > > > > packets of the same flow to a single receive queue. This facilitates the processing
> > > > > of the flow by the same CPU to improve performance (warm caches, less locking, etc.).
> > > > > 
> > > > >                  client1                    client2
> > > > >                     |                          |
> > > > >                     |        +-------+         |
> > > > >                     +------->|tunnels|<--------+
> > > > >                              +-------+
> > > > >                                 |  |
> > > > >                                 |  |
> > > > >                                 v  v
> > > > >                         +-----------------+
> > > > >                         | processing host |
> > > > >                         +-----------------+
> > > > necessary is too strong a word I feel.
> > > > All this is, is an optimization, we don't really know how strong it is
> > > > even.
> > > > 
> > > > Here's how I understand this:
> > > > 
> > > > Imagine two clients client1 and client2 talking to each other.
> > > > A copy of all packets is sent to a processing host over a virtio device.
> > > > Two directions of the same flow between two clients might be
> > > > encapsulated in two different tunnels, with current RSS
> > > > strategies they would land on two arbitrary, unrelated queues.
> > > > As an optimization, some hosts might wish to make sure both directions
> > > > of the encapsulated flow land on the same queue.
> > > > 
> > > > 
> > > > Is this a good summary?
> > > I think yes.
> > > 
> > > > 
> > > > Now that things begin to be clearer, I kind of begin to agree with
> > > > Jason's suggestion that this is extremely narrow.  And what if I want
> > > > one direction on queue1 and another one queue2 e.g. adjacent numbers for
> > > I don't understand why we need this, can you point out some usage scenarios?
> > If traffic is predominantly UDP, each queue can be processed in
> > parallel. If you need to look at the other side of the flow once
> > in a while, you can find it by doing ^1.
> 
> I'm not sure if I align with you, but I try to answer. When we try to place
> traffic in one direction on a certain queue,
> it means that we have calculated the hash, we can record the five-tuple
> information and the queue number. When
> the traffic in the other direction comes, we can match what we just recorded
> information and place it on the ^1 queue.
> 
> > 
> > > > the same flow?  If enough people agree this is needed we can accept this
> > > > but did you at all consider using something programmable like BPF for
> > > I think the problem is that our virtio device cannot support ebpf, we can
> > > also ask Alvaro, Parav if their virtio devices can support ebpf offloading.
> > > :)
> > This isn't ebpf, more like classic bpf. Just math done on packets,
> > no tables.
> 
> We would also really like to use simple bpf offloading, which is cool. But
> it still takes time, for example to
> support parsing of bpf instructions etc. on devices like fpga, which they
> can't do easily now. Few devices
> are supported right now, I only see support for the netronome iNIC in the
> kernel.
> 
>    #git grep XDP_SETUP_PROG_HW
>    drivers/net/ethernet/netronome/nfp/nfp_net_common.c:    case
> XDP_SETUP_PROG_HW:
>    drivers/net/netdevsim/bpf.c:    if (bpf->command == XDP_SETUP_PROG_HW &&
> !ns->bpf_xdpoffload_accept) {
>    drivers/net/netdevsim/bpf.c:    if (bpf->command == XDP_SETUP_PROG_HW) {
>    drivers/net/netdevsim/bpf.c:    case XDP_SETUP_PROG_HW:
>    include/linux/netdevice.h:      XDP_SETUP_PROG_HW,
>    net/core/dev.c: xdp.command = mode == XDP_MODE_HW ? XDP_SETUP_PROG_HW :
> XDP_SETUP_PROG;
> 
> 
> > 
> > 
> > > > this?  Considering we are putting not insignificant amount of work into
> > > > this, making this widely useful would be better than a narrow
> > > > optimization for a very specific usecase.
> > > > 
> > > > 
> > > > > To achieve this, the device can calculate a symmetric hash based on the inner packet
> > > > > headers of the flow. The symmetric hash disregards the order of the 5-tuple when
> > > > > computing the hash.
> > > > when you say symmetric hash you really mean symmetric key for toeplitz, yes?
> > > > It's not that it disregards order, it just gives the same result if
> > > > you reverse source and destination, no?
> > > Yes, symmetric hashes can use the key with 2 same bytes repeated, and only
> > > support reverse source and destination.
> > So, this won't work if some inner flows are IPv4 and others IPv6, right?
> > You have to know the inner flow format?
> 
> Yes, we need.

Ouch, even more narrow.  Maybe we need support for XOR hash then?


> > 
> > > > 
> > > > > Reviewed-by: Jason Wang <jasowang@redhat.com>
> > > > > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > > > > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > > v10->v11:
> > > > > 	1. Revise commit log for clarity for readers.
> > > > > 	2. Some modifications to avoid undefined terms. @Parav Pandit
> > > > > 	3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
> > > > > 	4. Add the normative statements. @Parav Pandit
> > > > > 
> > > > > v9->v10:
> > > > > 	1. Removed hash_report_tunnel related information. @Parav Pandit
> > > > > 	2. Re-describe the limitations of QoS for tunneling.
> > > > > 	3. Some clarification.
> > > > > 
> > > > > v8->v9:
> > > > > 	1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
> > > > > 	2. Add tunnel security section. @Michael S . Tsirkin
> > > > > 	3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
> > > > > 	4. Fix some typos.
> > > > > 	5. Add more tunnel types. @Michael S . Tsirkin
> > > > > 
> > > > > v7->v8:
> > > > > 	1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
> > > > > 	2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
> > > > > 	3. Removed re-definition for inner packet hashing. @Parav Pandit
> > > > > 	4. Fix some typos. @Michael S . Tsirkin
> > > > > 	5. Clarify some sentences. @Michael S . Tsirkin
> > > > > 
> > > > > v6->v7:
> > > > > 	1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
> > > > > 	2. Fix some syntax issues. @Michael S. Tsirkin
> > > > > 
> > > > > v5->v6:
> > > > > 	1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
> > > > > 	2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
> > > > > 	3. Move the links to introduction section. @Michael S. Tsirkin
> > > > > 	4. Clarify some sentences. @Michael S. Tsirkin
> > > > > 
> > > > > v4->v5:
> > > > > 	1. Clarify some paragraphs. @Cornelia Huck
> > > > > 	2. Fix the u8 type. @Cornelia Huck
> > > > > 
> > > > > v3->v4:
> > > > > 	1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
> > > > > 	2. Make things clearer. @Jason Wang @Michael S. Tsirkin
> > > > > 	3. Keep the possibility to use inner hash for automatic receive steering. @Jason Wang
> > > > > 	4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. many times. @Michael S. Tsirkin
> > > > > 
> > > > > v2->v3:
> > > > > 	1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
> > > > > 	2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason Wang, @Michael S. Tsirkin
> > > > > 
> > > > > v1->v2:
> > > > > 	1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
> > > > > 	2. Clarify some paragraphs. @Jason Wang
> > > > > 	3. Add \field{hash_tunnel} and VIRTIO_NET_HASH_REPORT_GRE. @Yuri Benditovich
> > > > > 
> > > > >    device-types/net/description.tex        | 119 +++++++++++++++++++++++-
> > > > >    device-types/net/device-conformance.tex |   1 +
> > > > >    device-types/net/driver-conformance.tex |   1 +
> > > > >    introduction.tex                        |  24 +++++
> > > > >    4 files changed, 144 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> > > > > index 0500bb6..49dee2f 100644
> > > > > --- a/device-types/net/description.tex
> > > > > +++ b/device-types/net/description.tex
> > > > > @@ -83,6 +83,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> > > > >    \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > > > >        channel.
> > > > > +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner packet header hash
> > > > > +    for tunnel-encapsulated packets.
> > > > > +
> > > > >    \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> > > > >    \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > > > > @@ -139,6 +142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> > > > >    \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> > > > >    \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> > > > >    \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > > > > +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with VIRTIO_NET_F_RSS and/or VIRTIO_NET_F_HASH_REPORT.
> > > > >    \end{description}
> > > > >    \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > > > > @@ -198,6 +202,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
> > > > >            u8 rss_max_key_size;
> > > > >            le16 rss_max_indirection_table_length;
> > > > >            le32 supported_hash_types;
> > > > > +        le32 supported_tunnel_hash_types;
> > > > >    };
> > > > >    \end{lstlisting}
> > > > >    The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
> > > > > @@ -212,6 +217,12 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
> > > > >    Field \field{supported_hash_types} contains the bitmask of supported hash types.
> > > > >    See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types.
> > > > > +The next field, \field{supported_tunnel_hash_types} only exists if the device
> > > > > +supports inner packet header hash, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set.
> > > > > +
> > > > > +Field \field{supported_tunnel_hash_types} contains the bitmask of supported tunnel hash types.
> > > > > +See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types} for details of supported tunnel hash types.
> > > > > +
> > > > >    \devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
> > > > >    The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
> > > > > @@ -848,6 +859,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > > >    If the feature VIRTIO_NET_F_RSS was negotiated:
> > > > >    \begin{itemize}
> > > > >    \item The device uses \field{hash_types} of the virtio_net_rss_config structure as 'Enabled hash types' bitmask.
> > > > > +\item The device uses \field{hash_tunnel_types} of the virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
> > > > >    \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_rss_config structure (see
> > > > >    \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / Setting RSS parameters}).
> > > > >    \end{itemize}
> > > > > @@ -855,6 +867,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > > >    If the feature VIRTIO_NET_F_RSS was not negotiated:
> > > > >    \begin{itemize}
> > > > >    \item The device uses \field{hash_types} of the virtio_net_hash_config structure as 'Enabled hash types' bitmask.
> > > > > +\item The device uses \field{hash_tunnel_types} of the virtio_net_hash_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
> > > > >    \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_hash_config structure (see
> > > > >    \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}).
> > > > >    \end{itemize}
> > > > > @@ -870,6 +883,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > > >    \subparagraph{Supported/enabled hash types}
> > > > >    \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
> > > > > +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
> > > > > +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
> > > > >    Hash types applicable for IPv4 packets:
> > > > >    \begin{lstlisting}
> > > > >    #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
> > > > > @@ -980,6 +995,99 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > > >    (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}).
> > > > >    \end{itemize}
> > > > > +\paragraph{Inner Packet Header Hash}
> > > > > +If the driver negotiates the VIRTIO_NET_F_HASH_TUNNEL feature, it can configure the
> > > > > +hash parameters (including \field{hash_tunnel_types}) for inner packet header hash
> > > > > +through the VIRTIO_NET_CTRL_MQ_HASH_CONFIG or the VIRTIO_NET_CTRL_RSS_CONFIG command.
> > > > > +If multiple commands are sent, the device configuration will be defined by the last command received.
> > > > > +
> > > > > +If a specific encapsulation type is set in \field{hash_tunnel_types}, the device will calculate the
> > > > > +hash on the inner packet header of the encapsulated packet (See \ref{sec:Device Types
> > > > > +/ Network Device / Device OperatiHn / Processing of Incoming Packets /
> > > > > +Hash calculation for incoming packets / Tunnel/Encapsulated packet}). If the encapsulation
> > > > > +type is not included in \field{hash_tunnel_types} or the value of \field{hash_tunnel_types}
> > > > > +is VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash on the outer header.
> > > > > +
> > > > > +\field{hash_tunnel_types} is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for non-encapsulated packets.
> > > > > +
> > > > > +\subparagraph{Tunnel/Encapsulated packet}
> > > > > +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet}
> > > > > +A tunnel packet is encapsulated from the original packet based on the tunneling
> > > > > +protocol (only a single level of encapsulation is currently supported). The
> > > > > +encapsulated packet contains an outer header and an inner header, and the device
> > > > > +calculates the hash over either the inner header or the outer header.
> > > > > +
> > > > > +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received encapsulated
> > > > > +packet's outer header matches one of the supported \field{hash_tunnel_types},
> > > > > +the hash of the inner header is calculated. Supported encapsulation types are listed
> > > > > +in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming
> > > > > +Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > > > > +
> > > > > +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]}, \hyperref[intro:VXLAN]{[VXLAN]},
> > > > > +\hyperref[intro:GENEVE]{[GENEVE]}, \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
> > > > > +
> > > > > +\subparagraph{Supported/enabled tunnel hash types}
> > > > > +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types}
> > > > > +If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and \field{hash_tunnel_types}
> > > > > +is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash using the
> > > > > +outer header of the encapsulated packet.
> > > > > +\begin{lstlisting}
> > > > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NONE        (1 << 0)
> > > > > +\end{lstlisting}
> > > > > +
> > > > > +The encapsulation hash type below indicates that the hash is calculated over the
> > > > > +inner packet header:
> > > > > +Hash type applicable for inner payload of the gre-encapsulated packet
> > > > > +\begin{lstlisting}
> > > > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 1)
> > > > > +\end{lstlisting}
> > > > > +Hash type applicable for inner payload of the vxlan-encapsulated packet
> > > > > +\begin{lstlisting}
> > > > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 2)
> > > > > +\end{lstlisting}
> > > > > +Hash type applicable for inner payload of the geneve-encapsulated packet
> > > > > +\begin{lstlisting}
> > > > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 3)
> > > > > +\end{lstlisting}
> > > > > +Hash type applicable for inner payload of the ip-encapsulated packet
> > > > > +\begin{lstlisting}
> > > > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 4)
> > > > > +\end{lstlisting}
> > > > > +Hash type applicable for inner payload of the nvgre-encapsulated packet
> > > > > +\begin{lstlisting}
> > > > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 5)
> > > > > +\end{lstlisting}
> > > > > +
> > > > > +\subparagraph{Tunnel QoS limitation}
> > > > > +When a specific receive queue is shared by multiple tunnels to receive encapsulated packets,
> > > > > +there is no quality of service (QoS) for these packets. For example, when the packets of certain
> > > > > +tunnels are spread across multiple receive queues, these receive queues may have an unbalanced
> > > > > +amount of packets. This can cause a specific receive queue to become full, resulting in packet loss.
> > > > > +
> > > > > +Possible mitigations:
> > > > > +\begin{itemize}
> > > > > +\item Use a tool with good forwarding performance to keep the receive queue from filling up.
> > > > > +\item If the QoS is unavailable, the driver can set \field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
> > > > > +      to disable inner packet hash for encapsulated packets.
> > > > > +\item Choose a hash key that can avoid queue collisions.
> > > > > +\item Perform appropriate QoS before packets consume the receive buffers of the receive queues.
> > > > > +\end{itemize}
> > > > > +
> > > > > +The limitations mentioned above exist with/without the inner packer header hash.
> > > > > +
> > > > > +\devicenormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > > > > +
> > > > > +The device MUST calculate the outer packet hash if the received encapsulated packet has an encapsulation type not in \field{supported_tunnel_hash_types}.
> > > > > +
> > > > > +The device MUST drop the encapsulated packet if the destination receive queue is being reset.
> > > > I'm not sure how this last one got here. It seems to have nothing to do
> > > > with encapsulation - if we want to we should require this for all
> > > > packets or none at all.
> > > Yes, you are right. It works for all packets.
> > > 
> > > > 
> > > > > +\drivernormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > > > > +
> > > > > +If the driver does not negotiate the VIRTIO_NET_F_HASH_TUNNEL feature, it MUST set \field{hash_tunnel_types}
> > > > > +to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE before issuing the command VIRTIO_NET_CTRL_MQ_HASH_CONFIG or VIRTIO_NET_CTRL_RSS_CONFIG.
> > > > > +
> > > > > +The driver MUST set \field{hash_tunnel_types} to the encapsulation types supported by the device.
> > > > unclear. seems to mean all types must be approved
> > > > where you really mean "only those types". original for non tunnel is:
> > > > 
> > > > A driver MUST NOT set any VIRTIO_NET_HASH_TYPE_ flags that are not supported by a device.
> > > > 
> > > > which is clear though a bit verbose with two negations.
> > > Yes, we can use the same sentence structure to illustrate.
> > > 
> > > > Also here it says "supported" but below it says "allowed".
> > > > 
> > > > 
> > > > 
> > > > >    \paragraph{Hash reporting for incoming packets}
> > > > >    \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting for incoming packets}
> > > > > @@ -1392,12 +1500,17 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > > > >        le16 reserved[4];
> > > > >        u8 hash_key_length;
> > > > >        u8 hash_key_data[hash_key_length];
> > > > > +    le32 hash_tunnel_types;
> > > > >    };
> > > > Hmm this fixed type after variable type is problematic - might
> > > > become unaligned. We could use some of reserved[4]
> > > > for this ...
> > > > 
> > > This is a problem, and perhaps Parav's proposal of using a separate command
> > > and structure for inner hash is correct.
> > > 
> > > > >    \end{lstlisting}
> > > > >    Field \field{hash_types} contains a bitmask of allowed hash types as
> > > > >    defined in
> > > > >    \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
> > > > > -Initially the device has all hash types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
> > > > > +
> > > > > +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> > > > > +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > > > > +
> > > > > +Initially the device has all hash types and hash tunnel types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
> > > > >    Field \field{reserved} MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure,
> > > > >    defined in \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS)}.
> > > > > @@ -1421,6 +1534,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > > > >        le16 max_tx_vq;
> > > > >        u8 hash_key_length;
> > > > >        u8 hash_key_data[hash_key_length];
> > > > > +    le32 hash_tunnel_types;
> > > > Same alignment problem here but I'm not sure how to solve it.
> > > > Suggestions?
> > > > 
> > > > >    };
> > > > >    \end{lstlisting}
> > > > >    Field \field{hash_types} contains a bitmask of allowed hash types as
> > > > > @@ -1441,6 +1555,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > > > >    Fields \field{hash_key_length} and \field{hash_key_data} define the key to be used in hash calculation.
> > > > > +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> > > > > +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > > > > +
> > > > >    \drivernormative{\subparagraph}{Setting RSS parameters}{Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > > >    A driver MUST NOT send the VIRTIO_NET_CTRL_MQ_RSS_CONFIG command if the feature VIRTIO_NET_F_RSS has not been negotiated.
> > > > > diff --git a/device-types/net/device-conformance.tex b/device-types/net/device-conformance.tex
> > > > > index 54f6783..0ff5944 100644
> > > > > --- a/device-types/net/device-conformance.tex
> > > > > +++ b/device-types/net/device-conformance.tex
> > > > > @@ -14,4 +14,5 @@
> > > > >    \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> > > > >    \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> > > > >    \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > > +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > > > >    \end{itemize}
> > > > > diff --git a/device-types/net/driver-conformance.tex b/device-types/net/driver-conformance.tex
> > > > > index 97d0cc1..951be89 100644
> > > > > --- a/device-types/net/driver-conformance.tex
> > > > > +++ b/device-types/net/driver-conformance.tex
> > > > > @@ -14,4 +14,5 @@
> > > > >    \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> > > > >    \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > > >    \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > > +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > > > >    \end{itemize}
> > > > > diff --git a/introduction.tex b/introduction.tex
> > > > > index 287c5fc..25c9d48 100644
> > > > > --- a/introduction.tex
> > > > > +++ b/introduction.tex
> > > > > @@ -99,6 +99,30 @@ \section{Normative References}\label{sec:Normative References}
> > > > >        Standards for Efficient Cryptography Group(SECG), ``SEC1: Elliptic Cureve Cryptography'', Version 1.0, September 2000.
> > > > >    	\newline\url{https://www.secg.org/sec1-v2.pdf}\\
> > > > > +	\phantomsection\label{intro:GRE}\textbf{[GRE]} &
> > > > > +    Generic Routing Encapsulation
> > > > > +	\newline\url{https://datatracker.ietf.org/doc/rfc2784/}\\
> > > > This is GRE over IPv4.
> > > > So we are not supporting GRE over IPv6?
> > > Yes. Do we need to add it?
> > > https://datatracker.ietf.org/doc/rfc7676/
> > If you want to support it, yes.
> > 
> > > > And we do not support optional keys?
> > > We did not disallow optional fields.
> > > 
> > > Thanks.
> > The spec you link to does not include this.
> 
> I'll add this. :)
> 
> Thanks!

Question is how common it is to support all three.
Do I understand it correctly that currently your use-case
is mostly with GRE?

> > 
> > > > 
> > > > 
> > > > > +	\phantomsection\label{intro:VXLAN}\textbf{[VXLAN]} &
> > > > > +    Virtual eXtensible Local Area Network
> > > > > +	\newline\url{https://datatracker.ietf.org/doc/rfc7348/}\\
> > > > > +	\phantomsection\label{intro:GENEVE}\textbf{[GENEVE]} &
> > > > > +    Generic Network Virtualization Encapsulation
> > > > > +	\phantomsection\label{intro:IPIP}\textbf{[IPIP]} &
> > > > > +    IP Encapsulation within IP
> > > > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc2003}\\
> > > > > +	\phantomsection\label{intro:IPIP}\textbf{[NVGRE]} &
> > > > > +    NVGRE: Network Virtualization Using Generic Routing Encapsulation
> > > > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc7637.html}\\
> > > > > +	\newline\url{https://datatracker.ietf.org/doc/rfc8926/}\\
> > > > > +	\phantomsection\label{intro:IP}\textbf{[IP]} &
> > > > > +    INTERNET PROTOCOL
> > > > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc791}\\
> > > > > +	\phantomsection\label{intro:UDP}\textbf{[UDP]} &
> > > > > +    User Datagram Protocol
> > > > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc768}\\
> > > > > +	\phantomsection\label{intro:TCP}\textbf{[TCP]} &
> > > > > +    TRANSMISSION CONTROL PROTOCOL
> > > > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc793}\\
> > > > >    \end{longtable}
> > > > >    \section{Non-Normative References}
> > > > > -- 
> > > > > 2.19.1.6.gb485710b
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-21  7:37       ` Michael S. Tsirkin
@ 2023-03-21 19:46         ` Parav Pandit
  2023-03-21 21:32           ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Parav Pandit @ 2023-03-21 19:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, Alvaro Karsz, virtio-dev, virtio-comment, Jason Wang,
	Yuri Benditovich, Xuan Zhuo


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, March 21, 2023 3:37 AM
> 
> On Tue, Mar 21, 2023 at 04:19:17AM +0000, Parav Pandit wrote:
> > One (this proposal) is solving spread to different RSS queues.
> 
> Spread is mostly ok with modern protocols though. 
Yes. 
> It seems to optimize for a specific monitoring solution.
> 
Monitoring is a critical part of the infrastructure.
So, if a point solution is useful, at least I don't see a negative of it especially when there is a user of it.

> > Another one is finding out which exact packet to drop/pass when queue usage
> is high. (ebpf/tc other ways to solve it).
> >
> > Ebpf sounds cooler than the real offload implementation in the hw device at
> the current level.
> > I remember Jason's good talk on the ebpf a few years back, which is possible
> when done in sw on the hypervisor.
> 
> I was talking about classic bpf though. no state.
> 
Packet processing logic requires to keep track of past tunnel data and counters across many queues for fairness.
This involves stateful ness of past/current data.

Even with outer header entropy, multiple tunnels can steer to a small set of queues, because num_tunnel >= num_queues.
Hence, fairness is orthogonal.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-21 19:46         ` Parav Pandit
@ 2023-03-21 21:32           ` Michael S. Tsirkin
  0 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2023-03-21 21:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Heng Qi, Alvaro Karsz, virtio-dev, virtio-comment, Jason Wang,
	Yuri Benditovich, Xuan Zhuo

On Tue, Mar 21, 2023 at 07:46:14PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, March 21, 2023 3:37 AM
> > 
> > On Tue, Mar 21, 2023 at 04:19:17AM +0000, Parav Pandit wrote:
> > > One (this proposal) is solving spread to different RSS queues.
> > 
> > Spread is mostly ok with modern protocols though. 
> Yes. 
> > It seems to optimize for a specific monitoring solution.
> > 
> Monitoring is a critical part of the infrastructure.
> So, if a point solution is useful, at least I don't see a negative of it especially when there is a user of it.


Yes. I'd like to include just the protocols that genuinely benefit
though, so we can avoid the churn of adding more and more tunneling
protocols as they appear.

That's why I am trying to find out whether limiting this to
just classic GRE is ok (and maybe GRE IPv6).

I also feel if symmetry is needed we need support for xor based thing
with a mask, playing with toeplitz is fragile due to e.g. IPv4/IPv6
headers being different length.

> > > Another one is finding out which exact packet to drop/pass when queue usage
> > is high. (ebpf/tc other ways to solve it).
> > >
> > > Ebpf sounds cooler than the real offload implementation in the hw device at
> > the current level.
> > > I remember Jason's good talk on the ebpf a few years back, which is possible
> > when done in sw on the hypervisor.
> > 
> > I was talking about classic bpf though. no state.
> > 
> Packet processing logic requires to keep track of past tunnel data and counters across many queues for fairness.
> This involves stateful ness of past/current data.
> 
> Even with outer header entropy, multiple tunnels can steer to a small set of queues, because num_tunnel >= num_queues.
> Hence, fairness is orthogonal.

Yes I was not talking about fairness. that was addressed adequately with
a security note I feel.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-21 15:58         ` Michael S. Tsirkin
@ 2023-03-22 12:49           ` Heng Qi
  2023-03-22 16:42             ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Heng Qi @ 2023-03-22 12:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Alvaro Karsz, virtio-dev, virtio-comment,
	Jason Wang, Yuri Benditovich, Xuan Zhuo



在 2023/3/21 下午11:58, Michael S. Tsirkin 写道:
> On Tue, Mar 21, 2023 at 10:49:39PM +0800, Heng Qi wrote:
>>
>> 在 2023/3/21 下午3:34, Michael S. Tsirkin 写道:
>>> On Tue, Mar 21, 2023 at 11:56:14AM +0800, Heng Qi wrote:
>>>> 在 2023/3/21 上午3:43, Michael S. Tsirkin 写道:
>>>>> On Mon, Mar 20, 2023 at 07:18:40PM +0800, Heng Qi wrote:
>>>>>> 1. Currently, a received encapsulated packet has an outer and an inner header, but
>>>>>> the virtio device is unable to calculate the hash for the inner header. Multiple
>>>>>> flows with the same outer header but different inner headers are steered to the
>>>>>> same receive queue. This results in poor receive performance.
>>>>>>
>>>>>> To address this limitation, a new feature VIRTIO_NET_F_HASH_TUNNEL has been
>>>>>> introduced, which enables the device to advertise the capability to calculate the
>>>>>> hash for the inner packet header. Compared with the out header hash, it regains
>>>>>> better receive performance.
>>>>> So this would be a very good argument however the cost would be it would
>>>>> seem we have to keep extending this indefinitely as new tunneling
>>>>> protocols come to light.
>>>>> But I believe in fact we don't at least for this argument:
>>>>> the standard way to address this is actually by propagating entropy
>>>>> from inner to outer header.
>>>> Yes, we don't argue with this.
>>>>
>>>>> So I'd maybe reorder the commit log and give the explanation 2 below
>>>>> then say "for some legacy systems
>>>>> including entropy in IP header
>>>>> as done in modern protocols is not practical, resulting in
>>>>> bad performance under RSS".
>>>> I agree. But not necessarily the legacy system, some scenarios need to
>>>> connect multiple tunnels, for compatibility, they will not use optional
>>>> fields or choose the old tunnel protocol.
>>> compatibility ... with legacy systems, no?
>>>
>>>>>> 2. The same flow can traverse through different tunnels, resulting in the encapsulated
>>>>>> packets being spread across multiple receive queues (refer to the figure below).
>>>>>> However, in certain scenarios, it becomes necessary to direct these encapsulated
>>>>>> packets of the same flow to a single receive queue. This facilitates the processing
>>>>>> of the flow by the same CPU to improve performance (warm caches, less locking, etc.).
>>>>>>
>>>>>>                   client1                    client2
>>>>>>                      |                          |
>>>>>>                      |        +-------+         |
>>>>>>                      +------->|tunnels|<--------+
>>>>>>                               +-------+
>>>>>>                                  |  |
>>>>>>                                  |  |
>>>>>>                                  v  v
>>>>>>                          +-----------------+
>>>>>>                          | processing host |
>>>>>>                          +-----------------+
>>>>> necessary is too strong a word I feel.
>>>>> All this is, is an optimization, we don't really know how strong it is
>>>>> even.
>>>>>
>>>>> Here's how I understand this:
>>>>>
>>>>> Imagine two clients client1 and client2 talking to each other.
>>>>> A copy of all packets is sent to a processing host over a virtio device.
>>>>> Two directions of the same flow between two clients might be
>>>>> encapsulated in two different tunnels, with current RSS
>>>>> strategies they would land on two arbitrary, unrelated queues.
>>>>> As an optimization, some hosts might wish to make sure both directions
>>>>> of the encapsulated flow land on the same queue.
>>>>>
>>>>>
>>>>> Is this a good summary?
>>>> I think yes.
>>>>
>>>>> Now that things begin to be clearer, I kind of begin to agree with
>>>>> Jason's suggestion that this is extremely narrow.  And what if I want
>>>>> one direction on queue1 and another one queue2 e.g. adjacent numbers for
>>>> I don't understand why we need this, can you point out some usage scenarios?
>>> If traffic is predominantly UDP, each queue can be processed in
>>> parallel. If you need to look at the other side of the flow once
>>> in a while, you can find it by doing ^1.
>> I'm not sure if I align with you, but I try to answer. When we try to place
>> traffic in one direction on a certain queue,
>> it means that we have calculated the hash, we can record the five-tuple
>> information and the queue number. When
>> the traffic in the other direction comes, we can match what we just recorded
>> information and place it on the ^1 queue.
>>
>>>>> the same flow?  If enough people agree this is needed we can accept this
>>>>> but did you at all consider using something programmable like BPF for
>>>> I think the problem is that our virtio device cannot support ebpf, we can
>>>> also ask Alvaro, Parav if their virtio devices can support ebpf offloading.
>>>> :)
>>> This isn't ebpf, more like classic bpf. Just math done on packets,
>>> no tables.
>> We would also really like to use simple bpf offloading, which is cool. But
>> it still takes time, for example to
>> support parsing of bpf instructions etc. on devices like fpga, which they
>> can't do easily now. Few devices
>> are supported right now, I only see support for the netronome iNIC in the
>> kernel.
>>
>>     #git grep XDP_SETUP_PROG_HW
>>     drivers/net/ethernet/netronome/nfp/nfp_net_common.c:    case
>> XDP_SETUP_PROG_HW:
>>     drivers/net/netdevsim/bpf.c:    if (bpf->command == XDP_SETUP_PROG_HW &&
>> !ns->bpf_xdpoffload_accept) {
>>     drivers/net/netdevsim/bpf.c:    if (bpf->command == XDP_SETUP_PROG_HW) {
>>     drivers/net/netdevsim/bpf.c:    case XDP_SETUP_PROG_HW:
>>     include/linux/netdevice.h:      XDP_SETUP_PROG_HW,
>>     net/core/dev.c: xdp.command = mode == XDP_MODE_HW ? XDP_SETUP_PROG_HW :
>> XDP_SETUP_PROG;
>>
>>
>>>
>>>>> this?  Considering we are putting not insignificant amount of work into
>>>>> this, making this widely useful would be better than a narrow
>>>>> optimization for a very specific usecase.
>>>>>
>>>>>
>>>>>> To achieve this, the device can calculate a symmetric hash based on the inner packet
>>>>>> headers of the flow. The symmetric hash disregards the order of the 5-tuple when
>>>>>> computing the hash.
>>>>> when you say symmetric hash you really mean symmetric key for toeplitz, yes?
>>>>> It's not that it disregards order, it just gives the same result if
>>>>> you reverse source and destination, no?
>>>> Yes, symmetric hashes can use the key with 2 same bytes repeated, and only
>>>> support reverse source and destination.
>>> So, this won't work if some inner flows are IPv4 and others IPv6, right?
>>> You have to know the inner flow format?
>> Yes, we need.
> Ouch, even more narrow.

I may have misunderstood what you meant earlier. For the device, the IP 
families of the inner payloads of the same flow are the same.
The device can calculate a symmetrical hash so that the flow can be 
placed to the same queue.

> Maybe we need support for XOR hash then?

I think we can. This is orthogonal to the inner header hash, I can start 
work on XOR hashing in another follow-up thread if you want.

>
>
>>>>>> Reviewed-by: Jason Wang <jasowang@redhat.com>
>>>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>>>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>>>> ---
>>>>>> v10->v11:
>>>>>> 	1. Revise commit log for clarity for readers.
>>>>>> 	2. Some modifications to avoid undefined terms. @Parav Pandit
>>>>>> 	3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
>>>>>> 	4. Add the normative statements. @Parav Pandit
>>>>>>
>>>>>> v9->v10:
>>>>>> 	1. Removed hash_report_tunnel related information. @Parav Pandit
>>>>>> 	2. Re-describe the limitations of QoS for tunneling.
>>>>>> 	3. Some clarification.
>>>>>>
>>>>>> v8->v9:
>>>>>> 	1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
>>>>>> 	2. Add tunnel security section. @Michael S . Tsirkin
>>>>>> 	3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
>>>>>> 	4. Fix some typos.
>>>>>> 	5. Add more tunnel types. @Michael S . Tsirkin
>>>>>>
>>>>>> v7->v8:
>>>>>> 	1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
>>>>>> 	2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
>>>>>> 	3. Removed re-definition for inner packet hashing. @Parav Pandit
>>>>>> 	4. Fix some typos. @Michael S . Tsirkin
>>>>>> 	5. Clarify some sentences. @Michael S . Tsirkin
>>>>>>
>>>>>> v6->v7:
>>>>>> 	1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
>>>>>> 	2. Fix some syntax issues. @Michael S. Tsirkin
>>>>>>
>>>>>> v5->v6:
>>>>>> 	1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
>>>>>> 	2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
>>>>>> 	3. Move the links to introduction section. @Michael S. Tsirkin
>>>>>> 	4. Clarify some sentences. @Michael S. Tsirkin
>>>>>>
>>>>>> v4->v5:
>>>>>> 	1. Clarify some paragraphs. @Cornelia Huck
>>>>>> 	2. Fix the u8 type. @Cornelia Huck
>>>>>>
>>>>>> v3->v4:
>>>>>> 	1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
>>>>>> 	2. Make things clearer. @Jason Wang @Michael S. Tsirkin
>>>>>> 	3. Keep the possibility to use inner hash for automatic receive steering. @Jason Wang
>>>>>> 	4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. many times. @Michael S. Tsirkin
>>>>>>
>>>>>> v2->v3:
>>>>>> 	1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
>>>>>> 	2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason Wang, @Michael S. Tsirkin
>>>>>>
>>>>>> v1->v2:
>>>>>> 	1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
>>>>>> 	2. Clarify some paragraphs. @Jason Wang
>>>>>> 	3. Add \field{hash_tunnel} and VIRTIO_NET_HASH_REPORT_GRE. @Yuri Benditovich
>>>>>>
>>>>>>     device-types/net/description.tex        | 119 +++++++++++++++++++++++-
>>>>>>     device-types/net/device-conformance.tex |   1 +
>>>>>>     device-types/net/driver-conformance.tex |   1 +
>>>>>>     introduction.tex                        |  24 +++++
>>>>>>     4 files changed, 144 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
>>>>>> index 0500bb6..49dee2f 100644
>>>>>> --- a/device-types/net/description.tex
>>>>>> +++ b/device-types/net/description.tex
>>>>>> @@ -83,6 +83,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>>>>>     \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>>>>>         channel.
>>>>>> +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner packet header hash
>>>>>> +    for tunnel-encapsulated packets.
>>>>>> +
>>>>>>     \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>>>>>     \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>>>>>> @@ -139,6 +142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>>>>>     \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>>>>>     \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>>>>>     \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>>>>>> +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with VIRTIO_NET_F_RSS and/or VIRTIO_NET_F_HASH_REPORT.
>>>>>>     \end{description}
>>>>>>     \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
>>>>>> @@ -198,6 +202,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>>>>>>             u8 rss_max_key_size;
>>>>>>             le16 rss_max_indirection_table_length;
>>>>>>             le32 supported_hash_types;
>>>>>> +        le32 supported_tunnel_hash_types;
>>>>>>     };
>>>>>>     \end{lstlisting}
>>>>>>     The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
>>>>>> @@ -212,6 +217,12 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>>>>>>     Field \field{supported_hash_types} contains the bitmask of supported hash types.
>>>>>>     See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types.
>>>>>> +The next field, \field{supported_tunnel_hash_types} only exists if the device
>>>>>> +supports inner packet header hash, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set.
>>>>>> +
>>>>>> +Field \field{supported_tunnel_hash_types} contains the bitmask of supported tunnel hash types.
>>>>>> +See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types} for details of supported tunnel hash types.
>>>>>> +
>>>>>>     \devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
>>>>>>     The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
>>>>>> @@ -848,6 +859,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>>>>     If the feature VIRTIO_NET_F_RSS was negotiated:
>>>>>>     \begin{itemize}
>>>>>>     \item The device uses \field{hash_types} of the virtio_net_rss_config structure as 'Enabled hash types' bitmask.
>>>>>> +\item The device uses \field{hash_tunnel_types} of the virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
>>>>>>     \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_rss_config structure (see
>>>>>>     \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / Setting RSS parameters}).
>>>>>>     \end{itemize}
>>>>>> @@ -855,6 +867,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>>>>     If the feature VIRTIO_NET_F_RSS was not negotiated:
>>>>>>     \begin{itemize}
>>>>>>     \item The device uses \field{hash_types} of the virtio_net_hash_config structure as 'Enabled hash types' bitmask.
>>>>>> +\item The device uses \field{hash_tunnel_types} of the virtio_net_hash_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
>>>>>>     \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_hash_config structure (see
>>>>>>     \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}).
>>>>>>     \end{itemize}
>>>>>> @@ -870,6 +883,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>>>>     \subparagraph{Supported/enabled hash types}
>>>>>>     \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
>>>>>> +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
>>>>>> +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
>>>>>>     Hash types applicable for IPv4 packets:
>>>>>>     \begin{lstlisting}
>>>>>>     #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
>>>>>> @@ -980,6 +995,99 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>>>>     (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}).
>>>>>>     \end{itemize}
>>>>>> +\paragraph{Inner Packet Header Hash}
>>>>>> +If the driver negotiates the VIRTIO_NET_F_HASH_TUNNEL feature, it can configure the
>>>>>> +hash parameters (including \field{hash_tunnel_types}) for inner packet header hash
>>>>>> +through the VIRTIO_NET_CTRL_MQ_HASH_CONFIG or the VIRTIO_NET_CTRL_RSS_CONFIG command.
>>>>>> +If multiple commands are sent, the device configuration will be defined by the last command received.
>>>>>> +
>>>>>> +If a specific encapsulation type is set in \field{hash_tunnel_types}, the device will calculate the
>>>>>> +hash on the inner packet header of the encapsulated packet (See \ref{sec:Device Types
>>>>>> +/ Network Device / Device OperatiHn / Processing of Incoming Packets /
>>>>>> +Hash calculation for incoming packets / Tunnel/Encapsulated packet}). If the encapsulation
>>>>>> +type is not included in \field{hash_tunnel_types} or the value of \field{hash_tunnel_types}
>>>>>> +is VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash on the outer header.
>>>>>> +
>>>>>> +\field{hash_tunnel_types} is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for non-encapsulated packets.
>>>>>> +
>>>>>> +\subparagraph{Tunnel/Encapsulated packet}
>>>>>> +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet}
>>>>>> +A tunnel packet is encapsulated from the original packet based on the tunneling
>>>>>> +protocol (only a single level of encapsulation is currently supported). The
>>>>>> +encapsulated packet contains an outer header and an inner header, and the device
>>>>>> +calculates the hash over either the inner header or the outer header.
>>>>>> +
>>>>>> +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received encapsulated
>>>>>> +packet's outer header matches one of the supported \field{hash_tunnel_types},
>>>>>> +the hash of the inner header is calculated. Supported encapsulation types are listed
>>>>>> +in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming
>>>>>> +Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
>>>>>> +
>>>>>> +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]}, \hyperref[intro:VXLAN]{[VXLAN]},
>>>>>> +\hyperref[intro:GENEVE]{[GENEVE]}, \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
>>>>>> +
>>>>>> +\subparagraph{Supported/enabled tunnel hash types}
>>>>>> +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types}
>>>>>> +If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and \field{hash_tunnel_types}
>>>>>> +is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash using the
>>>>>> +outer header of the encapsulated packet.
>>>>>> +\begin{lstlisting}
>>>>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NONE        (1 << 0)
>>>>>> +\end{lstlisting}
>>>>>> +
>>>>>> +The encapsulation hash type below indicates that the hash is calculated over the
>>>>>> +inner packet header:
>>>>>> +Hash type applicable for inner payload of the gre-encapsulated packet
>>>>>> +\begin{lstlisting}
>>>>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 1)
>>>>>> +\end{lstlisting}
>>>>>> +Hash type applicable for inner payload of the vxlan-encapsulated packet
>>>>>> +\begin{lstlisting}
>>>>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 2)
>>>>>> +\end{lstlisting}
>>>>>> +Hash type applicable for inner payload of the geneve-encapsulated packet
>>>>>> +\begin{lstlisting}
>>>>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 3)
>>>>>> +\end{lstlisting}
>>>>>> +Hash type applicable for inner payload of the ip-encapsulated packet
>>>>>> +\begin{lstlisting}
>>>>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 4)
>>>>>> +\end{lstlisting}
>>>>>> +Hash type applicable for inner payload of the nvgre-encapsulated packet
>>>>>> +\begin{lstlisting}
>>>>>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 5)
>>>>>> +\end{lstlisting}
>>>>>> +
>>>>>> +\subparagraph{Tunnel QoS limitation}
>>>>>> +When a specific receive queue is shared by multiple tunnels to receive encapsulated packets,
>>>>>> +there is no quality of service (QoS) for these packets. For example, when the packets of certain
>>>>>> +tunnels are spread across multiple receive queues, these receive queues may have an unbalanced
>>>>>> +amount of packets. This can cause a specific receive queue to become full, resulting in packet loss.
>>>>>> +
>>>>>> +Possible mitigations:
>>>>>> +\begin{itemize}
>>>>>> +\item Use a tool with good forwarding performance to keep the receive queue from filling up.
>>>>>> +\item If the QoS is unavailable, the driver can set \field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
>>>>>> +      to disable inner packet hash for encapsulated packets.
>>>>>> +\item Choose a hash key that can avoid queue collisions.
>>>>>> +\item Perform appropriate QoS before packets consume the receive buffers of the receive queues.
>>>>>> +\end{itemize}
>>>>>> +
>>>>>> +The limitations mentioned above exist with/without the inner packer header hash.
>>>>>> +
>>>>>> +\devicenormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>>>>> +
>>>>>> +The device MUST calculate the outer packet hash if the received encapsulated packet has an encapsulation type not in \field{supported_tunnel_hash_types}.
>>>>>> +
>>>>>> +The device MUST drop the encapsulated packet if the destination receive queue is being reset.
>>>>> I'm not sure how this last one got here. It seems to have nothing to do
>>>>> with encapsulation - if we want to we should require this for all
>>>>> packets or none at all.
>>>> Yes, you are right. It works for all packets.
>>>>
>>>>>> +\drivernormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>>>>> +
>>>>>> +If the driver does not negotiate the VIRTIO_NET_F_HASH_TUNNEL feature, it MUST set \field{hash_tunnel_types}
>>>>>> +to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE before issuing the command VIRTIO_NET_CTRL_MQ_HASH_CONFIG or VIRTIO_NET_CTRL_RSS_CONFIG.
>>>>>> +
>>>>>> +The driver MUST set \field{hash_tunnel_types} to the encapsulation types supported by the device.
>>>>> unclear. seems to mean all types must be approved
>>>>> where you really mean "only those types". original for non tunnel is:
>>>>>
>>>>> A driver MUST NOT set any VIRTIO_NET_HASH_TYPE_ flags that are not supported by a device.
>>>>>
>>>>> which is clear though a bit verbose with two negations.
>>>> Yes, we can use the same sentence structure to illustrate.
>>>>
>>>>> Also here it says "supported" but below it says "allowed".
>>>>>
>>>>>
>>>>>
>>>>>>     \paragraph{Hash reporting for incoming packets}
>>>>>>     \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting for incoming packets}
>>>>>> @@ -1392,12 +1500,17 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>>>>         le16 reserved[4];
>>>>>>         u8 hash_key_length;
>>>>>>         u8 hash_key_data[hash_key_length];
>>>>>> +    le32 hash_tunnel_types;
>>>>>>     };
>>>>> Hmm this fixed type after variable type is problematic - might
>>>>> become unaligned. We could use some of reserved[4]
>>>>> for this ...
>>>>>
>>>> This is a problem, and perhaps Parav's proposal of using a separate command
>>>> and structure for inner hash is correct.
>>>>
>>>>>>     \end{lstlisting}
>>>>>>     Field \field{hash_types} contains a bitmask of allowed hash types as
>>>>>>     defined in
>>>>>>     \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
>>>>>> -Initially the device has all hash types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
>>>>>> +
>>>>>> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
>>>>>> +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
>>>>>> +
>>>>>> +Initially the device has all hash types and hash tunnel types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
>>>>>>     Field \field{reserved} MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure,
>>>>>>     defined in \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS)}.
>>>>>> @@ -1421,6 +1534,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>>>>         le16 max_tx_vq;
>>>>>>         u8 hash_key_length;
>>>>>>         u8 hash_key_data[hash_key_length];
>>>>>> +    le32 hash_tunnel_types;
>>>>> Same alignment problem here but I'm not sure how to solve it.
>>>>> Suggestions?
>>>>>
>>>>>>     };
>>>>>>     \end{lstlisting}
>>>>>>     Field \field{hash_types} contains a bitmask of allowed hash types as
>>>>>> @@ -1441,6 +1555,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>>>>>     Fields \field{hash_key_length} and \field{hash_key_data} define the key to be used in hash calculation.
>>>>>> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
>>>>>> +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
>>>>>> +
>>>>>>     \drivernormative{\subparagraph}{Setting RSS parameters}{Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>>>>>     A driver MUST NOT send the VIRTIO_NET_CTRL_MQ_RSS_CONFIG command if the feature VIRTIO_NET_F_RSS has not been negotiated.
>>>>>> diff --git a/device-types/net/device-conformance.tex b/device-types/net/device-conformance.tex
>>>>>> index 54f6783..0ff5944 100644
>>>>>> --- a/device-types/net/device-conformance.tex
>>>>>> +++ b/device-types/net/device-conformance.tex
>>>>>> @@ -14,4 +14,5 @@
>>>>>>     \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>>>>>>     \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>>>>>>     \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>>>>> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>>>>>     \end{itemize}
>>>>>> diff --git a/device-types/net/driver-conformance.tex b/device-types/net/driver-conformance.tex
>>>>>> index 97d0cc1..951be89 100644
>>>>>> --- a/device-types/net/driver-conformance.tex
>>>>>> +++ b/device-types/net/driver-conformance.tex
>>>>>> @@ -14,4 +14,5 @@
>>>>>>     \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>>>>>>     \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>>>>>     \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>>>>>> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
>>>>>>     \end{itemize}
>>>>>> diff --git a/introduction.tex b/introduction.tex
>>>>>> index 287c5fc..25c9d48 100644
>>>>>> --- a/introduction.tex
>>>>>> +++ b/introduction.tex
>>>>>> @@ -99,6 +99,30 @@ \section{Normative References}\label{sec:Normative References}
>>>>>>         Standards for Efficient Cryptography Group(SECG), ``SEC1: Elliptic Cureve Cryptography'', Version 1.0, September 2000.
>>>>>>     	\newline\url{https://www.secg.org/sec1-v2.pdf}\\
>>>>>> +	\phantomsection\label{intro:GRE}\textbf{[GRE]} &
>>>>>> +    Generic Routing Encapsulation
>>>>>> +	\newline\url{https://datatracker.ietf.org/doc/rfc2784/}\\
>>>>> This is GRE over IPv4.
>>>>> So we are not supporting GRE over IPv6?
>>>> Yes. Do we need to add it?
>>>> https://datatracker.ietf.org/doc/rfc7676/
>>> If you want to support it, yes.
>>>
>>>>> And we do not support optional keys?
>>>> We did not disallow optional fields.
>>>>
>>>> Thanks.
>>> The spec you link to does not include this.
>> I'll add this. :)
>>
>> Thanks!
> Question is how common it is to support all three.
> Do I understand it correctly that currently your use-case
> is mostly with GRE?

Our main use-cases are GRE(https://datatracker.ietf.org/doc/rfc2784), 
VXLAN and GENEVE.

GRE needs to spread across multiple queues using the inner header hash.
VXLAN and GENEVE require inner symmetric hashing to allow the same CPU 
to process and improve performance.

Thanks.


>>>>>
>>>>>> +	\phantomsection\label{intro:VXLAN}\textbf{[VXLAN]} &
>>>>>> +    Virtual eXtensible Local Area Network
>>>>>> +	\newline\url{https://datatracker.ietf.org/doc/rfc7348/}\\
>>>>>> +	\phantomsection\label{intro:GENEVE}\textbf{[GENEVE]} &
>>>>>> +    Generic Network Virtualization Encapsulation
>>>>>> +	\phantomsection\label{intro:IPIP}\textbf{[IPIP]} &
>>>>>> +    IP Encapsulation within IP
>>>>>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc2003}\\
>>>>>> +	\phantomsection\label{intro:IPIP}\textbf{[NVGRE]} &
>>>>>> +    NVGRE: Network Virtualization Using Generic Routing Encapsulation
>>>>>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc7637.html}\\
>>>>>> +	\newline\url{https://datatracker.ietf.org/doc/rfc8926/}\\
>>>>>> +	\phantomsection\label{intro:IP}\textbf{[IP]} &
>>>>>> +    INTERNET PROTOCOL
>>>>>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc791}\\
>>>>>> +	\phantomsection\label{intro:UDP}\textbf{[UDP]} &
>>>>>> +    User Datagram Protocol
>>>>>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc768}\\
>>>>>> +	\phantomsection\label{intro:TCP}\textbf{[TCP]} &
>>>>>> +    TRANSMISSION CONTROL PROTOCOL
>>>>>> +	\newline\url{https://www.rfc-editor.org/rfc/rfc793}\\
>>>>>>     \end{longtable}
>>>>>>     \section{Non-Normative References}
>>>>>> -- 
>>>>>> 2.19.1.6.gb485710b
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-22 12:49           ` Heng Qi
@ 2023-03-22 16:42             ` Michael S. Tsirkin
  2023-03-23  3:13               ` Parav Pandit
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2023-03-22 16:42 UTC (permalink / raw)
  To: Heng Qi
  Cc: Parav Pandit, Alvaro Karsz, virtio-dev, virtio-comment,
	Jason Wang, Yuri Benditovich, Xuan Zhuo

On Wed, Mar 22, 2023 at 08:49:40PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/3/21 下午11:58, Michael S. Tsirkin 写道:
> > On Tue, Mar 21, 2023 at 10:49:39PM +0800, Heng Qi wrote:
> > > 
> > > 在 2023/3/21 下午3:34, Michael S. Tsirkin 写道:
> > > > On Tue, Mar 21, 2023 at 11:56:14AM +0800, Heng Qi wrote:
> > > > > 在 2023/3/21 上午3:43, Michael S. Tsirkin 写道:
> > > > > > On Mon, Mar 20, 2023 at 07:18:40PM +0800, Heng Qi wrote:
> > > > > > > 1. Currently, a received encapsulated packet has an outer and an inner header, but
> > > > > > > the virtio device is unable to calculate the hash for the inner header. Multiple
> > > > > > > flows with the same outer header but different inner headers are steered to the
> > > > > > > same receive queue. This results in poor receive performance.
> > > > > > > 
> > > > > > > To address this limitation, a new feature VIRTIO_NET_F_HASH_TUNNEL has been
> > > > > > > introduced, which enables the device to advertise the capability to calculate the
> > > > > > > hash for the inner packet header. Compared with the out header hash, it regains
> > > > > > > better receive performance.
> > > > > > So this would be a very good argument however the cost would be it would
> > > > > > seem we have to keep extending this indefinitely as new tunneling
> > > > > > protocols come to light.
> > > > > > But I believe in fact we don't at least for this argument:
> > > > > > the standard way to address this is actually by propagating entropy
> > > > > > from inner to outer header.
> > > > > Yes, we don't argue with this.
> > > > > 
> > > > > > So I'd maybe reorder the commit log and give the explanation 2 below
> > > > > > then say "for some legacy systems
> > > > > > including entropy in IP header
> > > > > > as done in modern protocols is not practical, resulting in
> > > > > > bad performance under RSS".
> > > > > I agree. But not necessarily the legacy system, some scenarios need to
> > > > > connect multiple tunnels, for compatibility, they will not use optional
> > > > > fields or choose the old tunnel protocol.
> > > > compatibility ... with legacy systems, no?
> > > > 
> > > > > > > 2. The same flow can traverse through different tunnels, resulting in the encapsulated
> > > > > > > packets being spread across multiple receive queues (refer to the figure below).
> > > > > > > However, in certain scenarios, it becomes necessary to direct these encapsulated
> > > > > > > packets of the same flow to a single receive queue. This facilitates the processing
> > > > > > > of the flow by the same CPU to improve performance (warm caches, less locking, etc.).
> > > > > > > 
> > > > > > >                   client1                    client2
> > > > > > >                      |                          |
> > > > > > >                      |        +-------+         |
> > > > > > >                      +------->|tunnels|<--------+
> > > > > > >                               +-------+
> > > > > > >                                  |  |
> > > > > > >                                  |  |
> > > > > > >                                  v  v
> > > > > > >                          +-----------------+
> > > > > > >                          | processing host |
> > > > > > >                          +-----------------+
> > > > > > necessary is too strong a word I feel.
> > > > > > All this is, is an optimization, we don't really know how strong it is
> > > > > > even.
> > > > > > 
> > > > > > Here's how I understand this:
> > > > > > 
> > > > > > Imagine two clients client1 and client2 talking to each other.
> > > > > > A copy of all packets is sent to a processing host over a virtio device.
> > > > > > Two directions of the same flow between two clients might be
> > > > > > encapsulated in two different tunnels, with current RSS
> > > > > > strategies they would land on two arbitrary, unrelated queues.
> > > > > > As an optimization, some hosts might wish to make sure both directions
> > > > > > of the encapsulated flow land on the same queue.
> > > > > > 
> > > > > > 
> > > > > > Is this a good summary?
> > > > > I think yes.
> > > > > 
> > > > > > Now that things begin to be clearer, I kind of begin to agree with
> > > > > > Jason's suggestion that this is extremely narrow.  And what if I want
> > > > > > one direction on queue1 and another one queue2 e.g. adjacent numbers for
> > > > > I don't understand why we need this, can you point out some usage scenarios?
> > > > If traffic is predominantly UDP, each queue can be processed in
> > > > parallel. If you need to look at the other side of the flow once
> > > > in a while, you can find it by doing ^1.
> > > I'm not sure if I align with you, but I try to answer. When we try to place
> > > traffic in one direction on a certain queue,
> > > it means that we have calculated the hash, we can record the five-tuple
> > > information and the queue number. When
> > > the traffic in the other direction comes, we can match what we just recorded
> > > information and place it on the ^1 queue.
> > > 
> > > > > > the same flow?  If enough people agree this is needed we can accept this
> > > > > > but did you at all consider using something programmable like BPF for
> > > > > I think the problem is that our virtio device cannot support ebpf, we can
> > > > > also ask Alvaro, Parav if their virtio devices can support ebpf offloading.
> > > > > :)
> > > > This isn't ebpf, more like classic bpf. Just math done on packets,
> > > > no tables.
> > > We would also really like to use simple bpf offloading, which is cool. But
> > > it still takes time, for example to
> > > support parsing of bpf instructions etc. on devices like fpga, which they
> > > can't do easily now. Few devices
> > > are supported right now, I only see support for the netronome iNIC in the
> > > kernel.
> > > 
> > >     #git grep XDP_SETUP_PROG_HW
> > >     drivers/net/ethernet/netronome/nfp/nfp_net_common.c:    case
> > > XDP_SETUP_PROG_HW:
> > >     drivers/net/netdevsim/bpf.c:    if (bpf->command == XDP_SETUP_PROG_HW &&
> > > !ns->bpf_xdpoffload_accept) {
> > >     drivers/net/netdevsim/bpf.c:    if (bpf->command == XDP_SETUP_PROG_HW) {
> > >     drivers/net/netdevsim/bpf.c:    case XDP_SETUP_PROG_HW:
> > >     include/linux/netdevice.h:      XDP_SETUP_PROG_HW,
> > >     net/core/dev.c: xdp.command = mode == XDP_MODE_HW ? XDP_SETUP_PROG_HW :
> > > XDP_SETUP_PROG;
> > > 
> > > 
> > > > 
> > > > > > this?  Considering we are putting not insignificant amount of work into
> > > > > > this, making this widely useful would be better than a narrow
> > > > > > optimization for a very specific usecase.
> > > > > > 
> > > > > > 
> > > > > > > To achieve this, the device can calculate a symmetric hash based on the inner packet
> > > > > > > headers of the flow. The symmetric hash disregards the order of the 5-tuple when
> > > > > > > computing the hash.
> > > > > > when you say symmetric hash you really mean symmetric key for toeplitz, yes?
> > > > > > It's not that it disregards order, it just gives the same result if
> > > > > > you reverse source and destination, no?
> > > > > Yes, symmetric hashes can use the key with 2 same bytes repeated, and only
> > > > > support reverse source and destination.
> > > > So, this won't work if some inner flows are IPv4 and others IPv6, right?
> > > > You have to know the inner flow format?
> > > Yes, we need.
> > Ouch, even more narrow.
> 
> I may have misunderstood what you meant earlier. For the device, the IP
> families of the inner payloads of the same flow are the same.

Yes. But my point is this. Some flows can be IPv4 others IPv6.
Do you see a way to have a key that will result in a symmetrical hash
for both IPv4 and IPv6? Can you give an example please?


> The device can calculate a symmetrical hash so that the flow can be placed
> to the same queue.
> 
> > Maybe we need support for XOR hash then?
> 
> I think we can. This is orthogonal to the inner header hash, I can start
> work on XOR hashing in another follow-up thread if you want.

Hmm can or should?

> > 
> > 
> > > > > > > Reviewed-by: Jason Wang <jasowang@redhat.com>
> > > > > > > Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > > > > > > Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > ---
> > > > > > > v10->v11:
> > > > > > > 	1. Revise commit log for clarity for readers.
> > > > > > > 	2. Some modifications to avoid undefined terms. @Parav Pandit
> > > > > > > 	3. Change VIRTIO_NET_F_HASH_TUNNEL dependency. @Parav Pandit
> > > > > > > 	4. Add the normative statements. @Parav Pandit
> > > > > > > 
> > > > > > > v9->v10:
> > > > > > > 	1. Removed hash_report_tunnel related information. @Parav Pandit
> > > > > > > 	2. Re-describe the limitations of QoS for tunneling.
> > > > > > > 	3. Some clarification.
> > > > > > > 
> > > > > > > v8->v9:
> > > > > > > 	1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
> > > > > > > 	2. Add tunnel security section. @Michael S . Tsirkin
> > > > > > > 	3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
> > > > > > > 	4. Fix some typos.
> > > > > > > 	5. Add more tunnel types. @Michael S . Tsirkin
> > > > > > > 
> > > > > > > v7->v8:
> > > > > > > 	1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
> > > > > > > 	2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
> > > > > > > 	3. Removed re-definition for inner packet hashing. @Parav Pandit
> > > > > > > 	4. Fix some typos. @Michael S . Tsirkin
> > > > > > > 	5. Clarify some sentences. @Michael S . Tsirkin
> > > > > > > 
> > > > > > > v6->v7:
> > > > > > > 	1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
> > > > > > > 	2. Fix some syntax issues. @Michael S. Tsirkin
> > > > > > > 
> > > > > > > v5->v6:
> > > > > > > 	1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
> > > > > > > 	2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
> > > > > > > 	3. Move the links to introduction section. @Michael S. Tsirkin
> > > > > > > 	4. Clarify some sentences. @Michael S. Tsirkin
> > > > > > > 
> > > > > > > v4->v5:
> > > > > > > 	1. Clarify some paragraphs. @Cornelia Huck
> > > > > > > 	2. Fix the u8 type. @Cornelia Huck
> > > > > > > 
> > > > > > > v3->v4:
> > > > > > > 	1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
> > > > > > > 	2. Make things clearer. @Jason Wang @Michael S. Tsirkin
> > > > > > > 	3. Keep the possibility to use inner hash for automatic receive steering. @Jason Wang
> > > > > > > 	4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. many times. @Michael S. Tsirkin
> > > > > > > 
> > > > > > > v2->v3:
> > > > > > > 	1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
> > > > > > > 	2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason Wang, @Michael S. Tsirkin
> > > > > > > 
> > > > > > > v1->v2:
> > > > > > > 	1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
> > > > > > > 	2. Clarify some paragraphs. @Jason Wang
> > > > > > > 	3. Add \field{hash_tunnel} and VIRTIO_NET_HASH_REPORT_GRE. @Yuri Benditovich
> > > > > > > 
> > > > > > >     device-types/net/description.tex        | 119 +++++++++++++++++++++++-
> > > > > > >     device-types/net/device-conformance.tex |   1 +
> > > > > > >     device-types/net/driver-conformance.tex |   1 +
> > > > > > >     introduction.tex                        |  24 +++++
> > > > > > >     4 files changed, 144 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> > > > > > > index 0500bb6..49dee2f 100644
> > > > > > > --- a/device-types/net/description.tex
> > > > > > > +++ b/device-types/net/description.tex
> > > > > > > @@ -83,6 +83,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> > > > > > >     \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > > > > > >         channel.
> > > > > > > +\item[VIRTIO_NET_F_HASH_TUNNEL(52)] Device supports inner packet header hash
> > > > > > > +    for tunnel-encapsulated packets.
> > > > > > > +
> > > > > > >     \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> > > > > > >     \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > > > > > > @@ -139,6 +142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> > > > > > >     \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> > > > > > >     \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> > > > > > >     \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > > > > > > +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ along with VIRTIO_NET_F_RSS and/or VIRTIO_NET_F_HASH_REPORT.
> > > > > > >     \end{description}
> > > > > > >     \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > > > > > > @@ -198,6 +202,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
> > > > > > >             u8 rss_max_key_size;
> > > > > > >             le16 rss_max_indirection_table_length;
> > > > > > >             le32 supported_hash_types;
> > > > > > > +        le32 supported_tunnel_hash_types;
> > > > > > >     };
> > > > > > >     \end{lstlisting}
> > > > > > >     The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
> > > > > > > @@ -212,6 +217,12 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
> > > > > > >     Field \field{supported_hash_types} contains the bitmask of supported hash types.
> > > > > > >     See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types.
> > > > > > > +The next field, \field{supported_tunnel_hash_types} only exists if the device
> > > > > > > +supports inner packet header hash, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set.
> > > > > > > +
> > > > > > > +Field \field{supported_tunnel_hash_types} contains the bitmask of supported tunnel hash types.
> > > > > > > +See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types} for details of supported tunnel hash types.
> > > > > > > +
> > > > > > >     \devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
> > > > > > >     The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
> > > > > > > @@ -848,6 +859,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > > > > >     If the feature VIRTIO_NET_F_RSS was negotiated:
> > > > > > >     \begin{itemize}
> > > > > > >     \item The device uses \field{hash_types} of the virtio_net_rss_config structure as 'Enabled hash types' bitmask.
> > > > > > > +\item The device uses \field{hash_tunnel_types} of the virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
> > > > > > >     \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_rss_config structure (see
> > > > > > >     \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / Setting RSS parameters}).
> > > > > > >     \end{itemize}
> > > > > > > @@ -855,6 +867,7 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > > > > >     If the feature VIRTIO_NET_F_RSS was not negotiated:
> > > > > > >     \begin{itemize}
> > > > > > >     \item The device uses \field{hash_types} of the virtio_net_hash_config structure as 'Enabled hash types' bitmask.
> > > > > > > +\item The device uses \field{hash_tunnel_types} of the virtio_net_hash_config structure as 'Enabled hash tunnel types' bitmask if VIRTIO_NET_F_HASH_TUNNEL was negotiated.
> > > > > > >     \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_hash_config structure (see
> > > > > > >     \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}).
> > > > > > >     \end{itemize}
> > > > > > > @@ -870,6 +883,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > > > > >     \subparagraph{Supported/enabled hash types}
> > > > > > >     \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
> > > > > > > +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
> > > > > > > +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
> > > > > > >     Hash types applicable for IPv4 packets:
> > > > > > >     \begin{lstlisting}
> > > > > > >     #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
> > > > > > > @@ -980,6 +995,99 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> > > > > > >     (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}).
> > > > > > >     \end{itemize}
> > > > > > > +\paragraph{Inner Packet Header Hash}
> > > > > > > +If the driver negotiates the VIRTIO_NET_F_HASH_TUNNEL feature, it can configure the
> > > > > > > +hash parameters (including \field{hash_tunnel_types}) for inner packet header hash
> > > > > > > +through the VIRTIO_NET_CTRL_MQ_HASH_CONFIG or the VIRTIO_NET_CTRL_RSS_CONFIG command.
> > > > > > > +If multiple commands are sent, the device configuration will be defined by the last command received.
> > > > > > > +
> > > > > > > +If a specific encapsulation type is set in \field{hash_tunnel_types}, the device will calculate the
> > > > > > > +hash on the inner packet header of the encapsulated packet (See \ref{sec:Device Types
> > > > > > > +/ Network Device / Device OperatiHn / Processing of Incoming Packets /
> > > > > > > +Hash calculation for incoming packets / Tunnel/Encapsulated packet}). If the encapsulation
> > > > > > > +type is not included in \field{hash_tunnel_types} or the value of \field{hash_tunnel_types}
> > > > > > > +is VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash on the outer header.
> > > > > > > +
> > > > > > > +\field{hash_tunnel_types} is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE by the device for non-encapsulated packets.
> > > > > > > +
> > > > > > > +\subparagraph{Tunnel/Encapsulated packet}
> > > > > > > +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet}
> > > > > > > +A tunnel packet is encapsulated from the original packet based on the tunneling
> > > > > > > +protocol (only a single level of encapsulation is currently supported). The
> > > > > > > +encapsulated packet contains an outer header and an inner header, and the device
> > > > > > > +calculates the hash over either the inner header or the outer header.
> > > > > > > +
> > > > > > > +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received encapsulated
> > > > > > > +packet's outer header matches one of the supported \field{hash_tunnel_types},
> > > > > > > +the hash of the inner header is calculated. Supported encapsulation types are listed
> > > > > > > +in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming
> > > > > > > +Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > > > > > > +
> > > > > > > +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]}, \hyperref[intro:VXLAN]{[VXLAN]},
> > > > > > > +\hyperref[intro:GENEVE]{[GENEVE]}, \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
> > > > > > > +
> > > > > > > +\subparagraph{Supported/enabled tunnel hash types}
> > > > > > > +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types}
> > > > > > > +If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and \field{hash_tunnel_types}
> > > > > > > +is set to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE, the device calculates the hash using the
> > > > > > > +outer header of the encapsulated packet.
> > > > > > > +\begin{lstlisting}
> > > > > > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NONE        (1 << 0)
> > > > > > > +\end{lstlisting}
> > > > > > > +
> > > > > > > +The encapsulation hash type below indicates that the hash is calculated over the
> > > > > > > +inner packet header:
> > > > > > > +Hash type applicable for inner payload of the gre-encapsulated packet
> > > > > > > +\begin{lstlisting}
> > > > > > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 1)
> > > > > > > +\end{lstlisting}
> > > > > > > +Hash type applicable for inner payload of the vxlan-encapsulated packet
> > > > > > > +\begin{lstlisting}
> > > > > > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 2)
> > > > > > > +\end{lstlisting}
> > > > > > > +Hash type applicable for inner payload of the geneve-encapsulated packet
> > > > > > > +\begin{lstlisting}
> > > > > > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 3)
> > > > > > > +\end{lstlisting}
> > > > > > > +Hash type applicable for inner payload of the ip-encapsulated packet
> > > > > > > +\begin{lstlisting}
> > > > > > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 4)
> > > > > > > +\end{lstlisting}
> > > > > > > +Hash type applicable for inner payload of the nvgre-encapsulated packet
> > > > > > > +\begin{lstlisting}
> > > > > > > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 5)
> > > > > > > +\end{lstlisting}
> > > > > > > +
> > > > > > > +\subparagraph{Tunnel QoS limitation}
> > > > > > > +When a specific receive queue is shared by multiple tunnels to receive encapsulated packets,
> > > > > > > +there is no quality of service (QoS) for these packets. For example, when the packets of certain
> > > > > > > +tunnels are spread across multiple receive queues, these receive queues may have an unbalanced
> > > > > > > +amount of packets. This can cause a specific receive queue to become full, resulting in packet loss.
> > > > > > > +
> > > > > > > +Possible mitigations:
> > > > > > > +\begin{itemize}
> > > > > > > +\item Use a tool with good forwarding performance to keep the receive queue from filling up.
> > > > > > > +\item If the QoS is unavailable, the driver can set \field{hash_tunnel_types} to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE
> > > > > > > +      to disable inner packet hash for encapsulated packets.
> > > > > > > +\item Choose a hash key that can avoid queue collisions.
> > > > > > > +\item Perform appropriate QoS before packets consume the receive buffers of the receive queues.
> > > > > > > +\end{itemize}
> > > > > > > +
> > > > > > > +The limitations mentioned above exist with/without the inner packer header hash.
> > > > > > > +
> > > > > > > +\devicenormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > > > > > > +
> > > > > > > +The device MUST calculate the outer packet hash if the received encapsulated packet has an encapsulation type not in \field{supported_tunnel_hash_types}.
> > > > > > > +
> > > > > > > +The device MUST drop the encapsulated packet if the destination receive queue is being reset.
> > > > > > I'm not sure how this last one got here. It seems to have nothing to do
> > > > > > with encapsulation - if we want to we should require this for all
> > > > > > packets or none at all.
> > > > > Yes, you are right. It works for all packets.
> > > > > 
> > > > > > > +\drivernormative{\subparagraph}{Inner Packet Header Hash}{Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > > > > > > +
> > > > > > > +If the driver does not negotiate the VIRTIO_NET_F_HASH_TUNNEL feature, it MUST set \field{hash_tunnel_types}
> > > > > > > +to VIRTIO_NET_HASH_TUNNEL_TYPE_NONE before issuing the command VIRTIO_NET_CTRL_MQ_HASH_CONFIG or VIRTIO_NET_CTRL_RSS_CONFIG.
> > > > > > > +
> > > > > > > +The driver MUST set \field{hash_tunnel_types} to the encapsulation types supported by the device.
> > > > > > unclear. seems to mean all types must be approved
> > > > > > where you really mean "only those types". original for non tunnel is:
> > > > > > 
> > > > > > A driver MUST NOT set any VIRTIO_NET_HASH_TYPE_ flags that are not supported by a device.
> > > > > > 
> > > > > > which is clear though a bit verbose with two negations.
> > > > > Yes, we can use the same sentence structure to illustrate.
> > > > > 
> > > > > > Also here it says "supported" but below it says "allowed".
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > >     \paragraph{Hash reporting for incoming packets}
> > > > > > >     \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting for incoming packets}
> > > > > > > @@ -1392,12 +1500,17 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > > > > > >         le16 reserved[4];
> > > > > > >         u8 hash_key_length;
> > > > > > >         u8 hash_key_data[hash_key_length];
> > > > > > > +    le32 hash_tunnel_types;
> > > > > > >     };
> > > > > > Hmm this fixed type after variable type is problematic - might
> > > > > > become unaligned. We could use some of reserved[4]
> > > > > > for this ...
> > > > > > 
> > > > > This is a problem, and perhaps Parav's proposal of using a separate command
> > > > > and structure for inner hash is correct.
> > > > > 
> > > > > > >     \end{lstlisting}
> > > > > > >     Field \field{hash_types} contains a bitmask of allowed hash types as
> > > > > > >     defined in
> > > > > > >     \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
> > > > > > > -Initially the device has all hash types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
> > > > > > > +
> > > > > > > +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> > > > > > > +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > > > > > > +
> > > > > > > +Initially the device has all hash types and hash tunnel types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
> > > > > > >     Field \field{reserved} MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure,
> > > > > > >     defined in \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS)}.
> > > > > > > @@ -1421,6 +1534,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > > > > > >         le16 max_tx_vq;
> > > > > > >         u8 hash_key_length;
> > > > > > >         u8 hash_key_data[hash_key_length];
> > > > > > > +    le32 hash_tunnel_types;
> > > > > > Same alignment problem here but I'm not sure how to solve it.
> > > > > > Suggestions?
> > > > > > 
> > > > > > >     };
> > > > > > >     \end{lstlisting}
> > > > > > >     Field \field{hash_types} contains a bitmask of allowed hash types as
> > > > > > > @@ -1441,6 +1555,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > > > > > >     Fields \field{hash_key_length} and \field{hash_key_data} define the key to be used in hash calculation.
> > > > > > > +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> > > > > > > +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> > > > > > > +
> > > > > > >     \drivernormative{\subparagraph}{Setting RSS parameters}{Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > > > > >     A driver MUST NOT send the VIRTIO_NET_CTRL_MQ_RSS_CONFIG command if the feature VIRTIO_NET_F_RSS has not been negotiated.
> > > > > > > diff --git a/device-types/net/device-conformance.tex b/device-types/net/device-conformance.tex
> > > > > > > index 54f6783..0ff5944 100644
> > > > > > > --- a/device-types/net/device-conformance.tex
> > > > > > > +++ b/device-types/net/device-conformance.tex
> > > > > > > @@ -14,4 +14,5 @@
> > > > > > >     \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> > > > > > >     \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> > > > > > >     \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > > > > +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > > > > > >     \end{itemize}
> > > > > > > diff --git a/device-types/net/driver-conformance.tex b/device-types/net/driver-conformance.tex
> > > > > > > index 97d0cc1..951be89 100644
> > > > > > > --- a/device-types/net/driver-conformance.tex
> > > > > > > +++ b/device-types/net/driver-conformance.tex
> > > > > > > @@ -14,4 +14,5 @@
> > > > > > >     \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> > > > > > >     \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > > > > >     \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> > > > > > > +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Inner Packet Header Hash}
> > > > > > >     \end{itemize}
> > > > > > > diff --git a/introduction.tex b/introduction.tex
> > > > > > > index 287c5fc..25c9d48 100644
> > > > > > > --- a/introduction.tex
> > > > > > > +++ b/introduction.tex
> > > > > > > @@ -99,6 +99,30 @@ \section{Normative References}\label{sec:Normative References}
> > > > > > >         Standards for Efficient Cryptography Group(SECG), ``SEC1: Elliptic Cureve Cryptography'', Version 1.0, September 2000.
> > > > > > >     	\newline\url{https://www.secg.org/sec1-v2.pdf}\\
> > > > > > > +	\phantomsection\label{intro:GRE}\textbf{[GRE]} &
> > > > > > > +    Generic Routing Encapsulation
> > > > > > > +	\newline\url{https://datatracker.ietf.org/doc/rfc2784/}\\
> > > > > > This is GRE over IPv4.
> > > > > > So we are not supporting GRE over IPv6?
> > > > > Yes. Do we need to add it?
> > > > > https://datatracker.ietf.org/doc/rfc7676/
> > > > If you want to support it, yes.
> > > > 
> > > > > > And we do not support optional keys?
> > > > > We did not disallow optional fields.
> > > > > 
> > > > > Thanks.
> > > > The spec you link to does not include this.
> > > I'll add this. :)
> > > 
> > > Thanks!
> > Question is how common it is to support all three.
> > Do I understand it correctly that currently your use-case
> > is mostly with GRE?
> 
> Our main use-cases are GRE(https://datatracker.ietf.org/doc/rfc2784), VXLAN
> and GENEVE.
> 
> GRE needs to spread across multiple queues using the inner header hash.
> VXLAN and GENEVE require inner symmetric hashing to allow the same CPU to
> process and improve performance.
> 
> Thanks.
> 
> 
> > > > > > 
> > > > > > > +	\phantomsection\label{intro:VXLAN}\textbf{[VXLAN]} &
> > > > > > > +    Virtual eXtensible Local Area Network
> > > > > > > +	\newline\url{https://datatracker.ietf.org/doc/rfc7348/}\\
> > > > > > > +	\phantomsection\label{intro:GENEVE}\textbf{[GENEVE]} &
> > > > > > > +    Generic Network Virtualization Encapsulation
> > > > > > > +	\phantomsection\label{intro:IPIP}\textbf{[IPIP]} &
> > > > > > > +    IP Encapsulation within IP
> > > > > > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc2003}\\
> > > > > > > +	\phantomsection\label{intro:IPIP}\textbf{[NVGRE]} &
> > > > > > > +    NVGRE: Network Virtualization Using Generic Routing Encapsulation
> > > > > > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc7637.html}\\
> > > > > > > +	\newline\url{https://datatracker.ietf.org/doc/rfc8926/}\\
> > > > > > > +	\phantomsection\label{intro:IP}\textbf{[IP]} &
> > > > > > > +    INTERNET PROTOCOL
> > > > > > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc791}\\
> > > > > > > +	\phantomsection\label{intro:UDP}\textbf{[UDP]} &
> > > > > > > +    User Datagram Protocol
> > > > > > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc768}\\
> > > > > > > +	\phantomsection\label{intro:TCP}\textbf{[TCP]} &
> > > > > > > +    TRANSMISSION CONTROL PROTOCOL
> > > > > > > +	\newline\url{https://www.rfc-editor.org/rfc/rfc793}\\
> > > > > > >     \end{longtable}
> > > > > > >     \section{Non-Normative References}
> > > > > > > -- 
> > > > > > > 2.19.1.6.gb485710b
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-21 14:49       ` Heng Qi
  2023-03-21 15:58         ` Michael S. Tsirkin
@ 2023-03-23  2:52         ` Jason Wang
  1 sibling, 0 replies; 19+ messages in thread
From: Jason Wang @ 2023-03-23  2:52 UTC (permalink / raw)
  To: Heng Qi, Michael S. Tsirkin
  Cc: Parav Pandit, Alvaro Karsz, virtio-dev, virtio-comment,
	Yuri Benditovich, Xuan Zhuo


在 2023/3/21 22:49, Heng Qi 写道:
>
>
> 在 2023/3/21 下午3:34, Michael S. Tsirkin 写道:
>> On Tue, Mar 21, 2023 at 11:56:14AM +0800, Heng Qi wrote:
>>>
>>> 在 2023/3/21 上午3:43, Michael S. Tsirkin 写道:
>>>> On Mon, Mar 20, 2023 at 07:18:40PM +0800, Heng Qi wrote:
>>>>> 1. Currently, a received encapsulated packet has an outer and an 
>>>>> inner header, but
>>>>> the virtio device is unable to calculate the hash for the inner 
>>>>> header. Multiple
>>>>> flows with the same outer header but different inner headers are 
>>>>> steered to the
>>>>> same receive queue. This results in poor receive performance.
>>>>>
>>>>> To address this limitation, a new feature VIRTIO_NET_F_HASH_TUNNEL 
>>>>> has been
>>>>> introduced, which enables the device to advertise the capability 
>>>>> to calculate the
>>>>> hash for the inner packet header. Compared with the out header 
>>>>> hash, it regains
>>>>> better receive performance.
>>>> So this would be a very good argument however the cost would be it 
>>>> would
>>>> seem we have to keep extending this indefinitely as new tunneling
>>>> protocols come to light.
>>>> But I believe in fact we don't at least for this argument:
>>>> the standard way to address this is actually by propagating entropy
>>>> from inner to outer header.
>>> Yes, we don't argue with this.
>>>
>>>> So I'd maybe reorder the commit log and give the explanation 2 below
>>>> then say "for some legacy systems
>>>> including entropy in IP header
>>>> as done in modern protocols is not practical, resulting in
>>>> bad performance under RSS".
>>> I agree. But not necessarily the legacy system, some scenarios need to
>>> connect multiple tunnels, for compatibility, they will not use optional
>>> fields or choose the old tunnel protocol.
>> compatibility ... with legacy systems, no?
>>
>>>>
>>>>> 2. The same flow can traverse through different tunnels, resulting 
>>>>> in the encapsulated
>>>>> packets being spread across multiple receive queues (refer to the 
>>>>> figure below).
>>>>> However, in certain scenarios, it becomes necessary to direct 
>>>>> these encapsulated
>>>>> packets of the same flow to a single receive queue. This 
>>>>> facilitates the processing
>>>>> of the flow by the same CPU to improve performance (warm caches, 
>>>>> less locking, etc.).
>>>>>
>>>>>                  client1                    client2
>>>>>                     |                          |
>>>>>                     |        +-------+         |
>>>>>                     +------->|tunnels|<--------+
>>>>>                              +-------+
>>>>>                                 |  |
>>>>>                                 |  |
>>>>>                                 v  v
>>>>>                         +-----------------+
>>>>>                         | processing host |
>>>>>                         +-----------------+
>>>> necessary is too strong a word I feel.
>>>> All this is, is an optimization, we don't really know how strong it is
>>>> even.
>>>>
>>>> Here's how I understand this:
>>>>
>>>> Imagine two clients client1 and client2 talking to each other.
>>>> A copy of all packets is sent to a processing host over a virtio 
>>>> device.
>>>> Two directions of the same flow between two clients might be
>>>> encapsulated in two different tunnels, with current RSS
>>>> strategies they would land on two arbitrary, unrelated queues.
>>>> As an optimization, some hosts might wish to make sure both directions
>>>> of the encapsulated flow land on the same queue.
>>>>
>>>>
>>>> Is this a good summary?
>>> I think yes.
>>>
>>>>
>>>> Now that things begin to be clearer, I kind of begin to agree with
>>>> Jason's suggestion that this is extremely narrow.  And what if I want
>>>> one direction on queue1 and another one queue2 e.g. adjacent 
>>>> numbers for
>>> I don't understand why we need this, can you point out some usage 
>>> scenarios?
>> If traffic is predominantly UDP, each queue can be processed in
>> parallel. If you need to look at the other side of the flow once
>> in a while, you can find it by doing ^1.
>
> I'm not sure if I align with you, but I try to answer. When we try to 
> place traffic in one direction on a certain queue,
> it means that we have calculated the hash, we can record the 
> five-tuple information and the queue number. When
> the traffic in the other direction comes, we can match what we just 
> recorded information and place it on the ^1 queue.
>
>>
>>>> the same flow?  If enough people agree this is needed we can accept 
>>>> this
>>>> but did you at all consider using something programmable like BPF for
>>> I think the problem is that our virtio device cannot support ebpf, 
>>> we can
>>> also ask Alvaro, Parav if their virtio devices can support ebpf 
>>> offloading.
>>> :)
>> This isn't ebpf, more like classic bpf. Just math done on packets,
>> no tables.
>
> We would also really like to use simple bpf offloading, which is cool. 
> But it still takes time, for example to
> support parsing of bpf instructions etc. on devices like fpga, which 
> they can't do easily now. Few devices
> are supported right now, I only see support for the netronome iNIC in 
> the kernel.
>
>    #git grep XDP_SETUP_PROG_HW
>    drivers/net/ethernet/netronome/nfp/nfp_net_common.c:    case 
> XDP_SETUP_PROG_HW:
>    drivers/net/netdevsim/bpf.c:    if (bpf->command == 
> XDP_SETUP_PROG_HW && !ns->bpf_xdpoffload_accept) {
>    drivers/net/netdevsim/bpf.c:    if (bpf->command == 
> XDP_SETUP_PROG_HW) {
>    drivers/net/netdevsim/bpf.c:    case XDP_SETUP_PROG_HW:
>    include/linux/netdevice.h:      XDP_SETUP_PROG_HW,
>    net/core/dev.c: xdp.command = mode == XDP_MODE_HW ? 
> XDP_SETUP_PROG_HW : XDP_SETUP_PROG;


Note that this is the eBPF hardware offloading which is much more 
complicated than what we propose now. For hash calculation, a simple 
classical bpf or other like P4 would be sufficient. The point is to 
allow the user to customize the hash calculation.

If this is too flexible for the hardware, it would be stillbetter to 
consider a more general hash calculation pipeline (XOR, swap, hash 
masks, hash key customization) like:

https://docs.napatech.com/r/Feature-Set-N-ANL10/Hash-Value-Generation

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-22 16:42             ` Michael S. Tsirkin
@ 2023-03-23  3:13               ` Parav Pandit
  2023-03-23  3:58                 ` Heng Qi
  0 siblings, 1 reply; 19+ messages in thread
From: Parav Pandit @ 2023-03-23  3:13 UTC (permalink / raw)
  To: Michael S. Tsirkin, Heng Qi
  Cc: Alvaro Karsz, virtio-dev, virtio-comment, Jason Wang,
	Yuri Benditovich, Xuan Zhuo


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, March 22, 2023 12:42 PM
 
> Yes. But my point is this. Some flows can be IPv4 others IPv6.
> Do you see a way to have a key that will result in a symmetrical hash for both
> IPv4 and IPv6? Can you give an example please?
>
Heng,

Is that the requirement to have two completely different flows (ipv, ipv6) to steer to a single RQ?
The requirement, what I understood, is between two directions for a flow to result in a single hash value.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-23  3:13               ` Parav Pandit
@ 2023-03-23  3:58                 ` Heng Qi
  2023-03-23  5:03                   ` Heng Qi
  0 siblings, 1 reply; 19+ messages in thread
From: Heng Qi @ 2023-03-23  3:58 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: Alvaro Karsz, virtio-dev, virtio-comment, Jason Wang,
	Yuri Benditovich, Xuan Zhuo



在 2023/3/23 上午11:13, Parav Pandit 写道:
>> From: Michael S. Tsirkin <mst@redhat.com>
>> Sent: Wednesday, March 22, 2023 12:42 PM
>   
>> Yes. But my point is this. Some flows can be IPv4 others IPv6.
>> Do you see a way to have a key that will result in a symmetrical hash for both
>> IPv4 and IPv6? Can you give an example please?
>>
> Heng,
>
> Is that the requirement to have two completely different flows (ipv, ipv6) to steer to a single RQ?

Michael should be talking about whether there is a symmetric key that 
can serve both IPv4 and IPv6, so that they can respectively achieve the 
purpose of symmetric hashing.
I am not an expert in hashing, but this article [1] deduces a symmetric 
hash key, and I think it should be possible to deduce a specific key to 
meet this requirement.

Or we should support XOR hashing.

Thanks.

> The requirement, what I understood, is between two directions for a flow to result in a single hash value.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [virtio-dev] Re: [PATCH v11] virtio-net: support inner header hash
  2023-03-23  3:58                 ` Heng Qi
@ 2023-03-23  5:03                   ` Heng Qi
  0 siblings, 0 replies; 19+ messages in thread
From: Heng Qi @ 2023-03-23  5:03 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: Alvaro Karsz, virtio-dev, virtio-comment, Jason Wang,
	Yuri Benditovich, Xuan Zhuo



在 2023/3/23 上午11:58, Heng Qi 写道:
>
>
> 在 2023/3/23 上午11:13, Parav Pandit 写道:
>>> From: Michael S. Tsirkin <mst@redhat.com>
>>> Sent: Wednesday, March 22, 2023 12:42 PM
>>> Yes. But my point is this. Some flows can be IPv4 others IPv6.
>>> Do you see a way to have a key that will result in a symmetrical 
>>> hash for both
>>> IPv4 and IPv6? Can you give an example please?
>>>
>> Heng,
>>
>> Is that the requirement to have two completely different flows (ipv, 
>> ipv6) to steer to a single RQ?
>
> Michael should be talking about whether there is a symmetric key that 
> can serve both IPv4 and IPv6, so that they can respectively achieve 
> the purpose of symmetric hashing.
> I am not an expert in hashing, but this article [1] deduces a 
> symmetric hash key, and I think it should be possible to deduce a 
> specific key to meet this requirement.

Sorry for the lost link:
[1] https://www.ndsl.kaist.edu/~kyoungsoo/papers/TR-symRSS.pdf

>
> Or we should support XOR hashing.
>
> Thanks.
>
>> The requirement, what I understood, is between two directions for a 
>> flow to result in a single hash value.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-03-23  5:03 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-20 11:18 [virtio-dev] [PATCH v11] virtio-net: support inner header hash Heng Qi
2023-03-20 19:43 ` [virtio-dev] " Michael S. Tsirkin
2023-03-20 21:07   ` Michael S. Tsirkin
2023-03-21  3:35   ` Jason Wang
2023-03-21  5:12     ` Heng Qi
2023-03-21  3:56   ` Heng Qi
2023-03-21  4:19     ` Parav Pandit
2023-03-21  7:37       ` Michael S. Tsirkin
2023-03-21 19:46         ` Parav Pandit
2023-03-21 21:32           ` Michael S. Tsirkin
2023-03-21  7:34     ` Michael S. Tsirkin
2023-03-21 14:49       ` Heng Qi
2023-03-21 15:58         ` Michael S. Tsirkin
2023-03-22 12:49           ` Heng Qi
2023-03-22 16:42             ` Michael S. Tsirkin
2023-03-23  3:13               ` Parav Pandit
2023-03-23  3:58                 ` Heng Qi
2023-03-23  5:03                   ` Heng Qi
2023-03-23  2:52         ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).