virtio-dev.lists.oasis-open.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9] virtio-net: support inner header hash
@ 2023-02-18 14:37 Heng Qi
  2023-02-20 15:53 ` [virtio-comment] Re: [virtio-dev] " Heng Qi
                   ` (4 more replies)
  0 siblings, 5 replies; 105+ messages in thread
From: Heng Qi @ 2023-02-18 14:37 UTC (permalink / raw)
  To: virtio-comment, virtio-dev
  Cc: Michael S . Tsirkin, Parav Pandit, Jason Wang, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo

If the tunnel is used to encapsulate the packets, the hash calculated
using the outer header of the receive packets is always fixed for the
same flow packets, i.e. they will be steered to the same receive queue.

We add a feature bit VIRTIO_NET_F_HASH_TUNNEL and related bitmasks
in \field{hash_tunnel_types}, which instructs the device to calculate the
hash using the inner headers of tunnel-encapsulated packets. Note that
VIRTIO_NET_F_HASH_TUNNEL only indicates the ability of the inner header
hash, and does not give the device the ability to use the hash value
to select a receiving queue to place the packet.

Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to report
an encapsulation type, and the feature depends on VIRTIO_NET_F_HASH_REPORT.
It only means that the encapsulation type can be reported, it cannot instruct
the device to calculate the hash.

Fixes: https://github.com/oasis-tcs/virtio-spec/issues/151

Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
v8->v9:
	1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
	2. Add tunnel security section. @Michael S . Tsirkin
	3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
	4. Fix some typos.
	5. Add more tunnel types. @Michael S . Tsirkin

v7->v8:
	1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
	2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
	3. Removed re-definition for inner packet hashing. @Parav Pandit
	4. Fix some typos. @Michael S . Tsirkin
	5. Clarify some sentences. @Michael S . Tsirkin

v6->v7:
	1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
	2. Fix some syntax issues. @Michael S. Tsirkin

v5->v6:
	1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
	2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
	3. Move the links to introduction section. @Michael S. Tsirkin
	4. Clarify some sentences. @Michael S. Tsirkin

v4->v5:
	1. Clarify some paragraphs. @Cornelia Huck
	2. Fix the u8 type. @Cornelia Huck

v3->v4:
	1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
	2. Make things clearer. @Jason Wang @Michael S. Tsirkin
	3. Keep the possibility to use inner hash for automatic receive steering. @Jason Wang
	4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. many times. @Michael S. Tsirkin

v2->v3:
	1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
	2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason Wang, @Michael S. Tsirkin

v1->v2:
	1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
	2. Clarify some paragraphs. @Jason Wang
	3. Add \field{hash_tunnel} and VIRTIO_NET_HASH_REPORT_GRE. @Yuri Benditovich

 content.tex      | 164 ++++++++++++++++++++++++++++++++++++++++++-----
 introduction.tex |  25 ++++++++
 2 files changed, 174 insertions(+), 15 deletions(-)

diff --git a/content.tex b/content.tex
index e863709..8e352d2 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,11 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
     channel.
 
+\item[VIRTIO_NET_F_HASH_TUNNEL(51)] Device supports inner header hash
+	for tunnel-encapsulated packets.
+
+\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL(52)] Device can report an encapsulation type.
+
 \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
 
 \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3145,8 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
 \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
 \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL] Requires VIRTIO_NET_F_HASH_REPORT.
 \end{description}
 
 \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3199,20 +3206,27 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
         u8 rss_max_key_size;
         le16 rss_max_indirection_table_length;
         le32 supported_hash_types;
+        le32 supported_tunnel_hash_types;
 };
 \end{lstlisting}
-The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
+The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL is set.
 It specifies the maximum supported length of RSS key in bytes.
 
 The following field, \field{rss_max_indirection_table_length} only exists if VIRTIO_NET_F_RSS is set.
 It specifies the maximum number of 16-bit entries in RSS indirection table.
 
 The next field, \field{supported_hash_types} only exists if the device supports hash calculation,
-i.e. if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
+i.e. if VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL is set.
 
 Field \field{supported_hash_types} contains the bitmask of supported hash types.
 See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types.
 
+The next field, \field{supported_tunnel_hash_types} only exists if the device
+supports inner hash calculation, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set.
+
+Field \field{supported_tunnel_hash_types} contains the bitmask of supported tunnel hash types.
+See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types} for details of supported tunnel hash types.
+
 \devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
 
 The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
@@ -3236,7 +3250,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
 negotiated.
 
 The device MUST set \field{rss_max_key_size} to at least 40, if it offers
-VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT.
+VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL.
 
 The device MUST set \field{rss_max_indirection_table_length} to at least 128, if it offers
 VIRTIO_NET_F_RSS.
@@ -3385,7 +3399,8 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
         le16 csum_offset;
         le16 num_buffers;
         le32 hash_value;        (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
-        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
+        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated, and the upper 8 bits indicates the
+                                 encapsulation type if VIRTIO_NET_F_HASH_REPORT_TUNNEL negotiated, otherwise reserved)
         le16 padding_reserved;  (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
 };
 \end{lstlisting}
@@ -3838,11 +3853,15 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
 \begin{itemize}
 \item The feature VIRTIO_NET_F_RSS was negotiated. The device uses the hash to determine the receive virtqueue to place incoming packets.
 \item The feature VIRTIO_NET_F_HASH_REPORT was negotiated. The device reports the hash value and the hash type with the packet.
+\item The feature VIRTIO_NET_F_HASH_TUNNEL was negotiated. The device supports inner hash calculation. If additionally
+      VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device reports the encapsulation type as well.
 \end{itemize}
 
 If the feature VIRTIO_NET_F_RSS was negotiated:
 \begin{itemize}
 \item The device uses \field{hash_types} of the virtio_net_rss_config structure as 'Enabled hash types' bitmask.
+	If additionally VIRTIO_NET_F_HASH_TUNNEL was negotiated, the device uses \field{hash_tunnel_types} of the
+	virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask.
 \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_rss_config structure (see
 \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / Setting RSS parameters}).
 \end{itemize}
@@ -3850,11 +3869,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
 If the feature VIRTIO_NET_F_RSS was not negotiated:
 \begin{itemize}
 \item The device uses \field{hash_types} of the virtio_net_hash_config structure as 'Enabled hash types' bitmask.
+	If additionally VIRTIO_NET_F_HASH_TUNNEL was negotiated, the device uses \field{hash_tunnel_types} of the
+	virtio_net_hash_config structure as 'Enabled hash tunnel types' bitmask.
 \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_hash_config structure (see
 \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}).
 \end{itemize}
 
-Note that if the device offers VIRTIO_NET_F_HASH_REPORT, even if it supports only one pair of virtqueues, it MUST support
+Note that if the device offers VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL, even if it supports only one pair of virtqueues, it MUST support
 at least one of commands of VIRTIO_NET_CTRL_MQ class to configure reported hash parameters:
 \begin{itemize}
 \item If the device offers VIRTIO_NET_F_RSS, it MUST support VIRTIO_NET_CTRL_MQ_RSS_CONFIG command per
@@ -3863,8 +3884,36 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
  \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}.
 \end{itemize}
 
+\subparagraph{Tunnel/Encapsulated packet}
+\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet}
+A tunnel packet is encapsulated from the original packet based on the tunneling
+protocol (only a single level of encapsulation is currently supported). The
+encapsulated packet contains an outer header and an inner header, and the device
+calculates the hash over either the inner header or the outer header.
+
+When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and the corresponding
+encapsulation type is set in \field{hash_tunnel_types}, the hash for a specific
+type of encapsulated packet is calculated over the inner as opposed to outer header.
+Supported encapsulation types are listed in \ref{sec:Device Types / Network Device /
+Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets /
+Supported/enabled hash tunnel types}.
+
+If both VIRTIO_NET_F_HASH_REPORT_TUNNEL and VIRTIO_NET_F_HASH_REPORT are negotiated,
+and hash is calculated for an encapsulated  packet, the device reports the encapsulation
+type in addition to the hash value and hash type, regardless of whether the hash is
+calculated on the inner header or the outer header.
+
+If VIRTIO_NET_F_HASH_REPORT and VIRTIO_NET_F_HASH_REPORT_TUNNEL are negotiated
+but VIRTIO_NET_F_HASH_TUNNEL is not negotiated, the device calculates the hash over
+the outer header, and \field{hash_report} reports the hash type and encapsulation type.
+
+Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]}, \hyperref[intro:VXLAN]{[VXLAN]},
+\hyperref[intro:GENEVE]{[GENEVE]}, \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
+
 \subparagraph{Supported/enabled hash types}
 \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
+This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
+\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
 Hash types applicable for IPv4 packets:
 \begin{lstlisting}
 #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
@@ -3884,6 +3933,32 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
 #define VIRTIO_NET_HASH_TYPE_UDP_EX            (1 << 8)
 \end{lstlisting}
 
+\subparagraph{Supported/enabled tunnel hash types}
+\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types}
+If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated, the encapsulation
+hash type indicates that the hash is calculated over the inner header of
+the encapsulated packet:
+Hash type applicable for inner payload of the gre-encapsulated packet
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 0)
+\end{lstlisting}
+Hash type applicable for inner payload of the vxlan-encapsulated packet
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 1)
+\end{lstlisting}
+Hash type applicable for inner payload of the geneve-encapsulated packet
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 2)
+\end{lstlisting}
+Hash type applicable for inner payload of the ip-encapsulated packet
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 3)
+\end{lstlisting}
+Hash type applicable for inner payload of the nvgre-encapsulated packet
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 4)
+\end{lstlisting}
+
 \subparagraph{IPv4 packets}
 \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv4 packets}
 The device calculates the hash on IPv4 packets according to 'Enabled hash types' bitmask as follows:
@@ -3975,17 +4050,47 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
 (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}).
 \end{itemize}
 
+\subparagraph{Inner hash calculation of an encapsulated packet}
+If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and the corresponding
+encapsulation hash type is set in \field{hash_tunnel_types}, the device calculates the
+hash on the inner header of an encapsulated packet (See \ref{sec:Device Types
+/ Network Device / Device Operation / Processing of Incoming Packets /
+Hash calculation for incoming packets / Tunnel/Encapsulated packet}).
+
+\subparagraph{Security risks between encapsulated packets and RSS}
+There may be potential security risks when encapsulated packets using RSS to
+select queues for placement. When a user inside a tunnel tries to control the
+enqueuing of encapsulated packets, then the user can flood the device with invaild
+packets, and the flooded packets may be hashed into the same queue as packets in
+other normal tunnels, which causing the queue to overflow.
+
+This can pose several security risks:
+\begin{itemize}
+\item  Encapsulated packets in the normal tunnels cannot be enqueued due to queue
+       overflow, resulting in a large amount of packet loss.
+\item  The delay and retransmission of packets in the normal tunnels are extremely increased.
+\item  The user can observe the traffic information and enqueue information of other normal
+       tunnels, and conduct targeted DoS attacks.
+\end{\itemize}
+
 \paragraph{Hash reporting for incoming packets}
 \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting for incoming packets}
-
-If VIRTIO_NET_F_HASH_REPORT was negotiated and
- the device has calculated the hash for the packet, the device fills \field{hash_report} with the report type of calculated hash
-and \field{hash_value} with the value of calculated hash.
-
-If VIRTIO_NET_F_HASH_REPORT was negotiated but due to any reason the
-hash was not calculated, the device sets \field{hash_report} to VIRTIO_NET_HASH_REPORT_NONE.
-
-Possible values that the device can report in \field{hash_report} are defined below.
+If VIRTIO_NET_F_HASH_REPORT was negotiated and the device has calculated the
+hash for the packet, the device fills the lower 8 bits of \field{hash_report} with
+the report type of calculated hash, and \field{hash_value} with the value of calculated
+hash. Also, if VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device needs to fill
+the upper 8 bits of \field{hash_report} with the encapsulation type.
+
+If VIRTIO_NET_F_HASH_REPORT was negotiated but due to any reason the hash was not
+calculated, the device sets the lower 8 bits of \field{hash_report} to
+VIRTIO_NET_HASH_REPORT_NONE.
+
+If VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device fills the upper
+8 bits of \field{hash_report} with the encapsulation type for an encapsulated
+packet. Note that the upper 8 bits are all set to 0 for an unencapsulated
+packet, regardless of whether VIRTIO_NET_F_HASH_REPORT_TUNNEL is negotiated or not.
+
+Possible hash types that the device can report in \field{hash_report} are defined below.
 They correspond to supported hash types defined in
 \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
 as follows:
@@ -4005,6 +4110,26 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
 #define VIRTIO_NET_HASH_REPORT_UDPv6_EX        9
 \end{lstlisting}
 
+The upper 8 bits of \field{hash_report} can report the encapsulation type to
+the driver if VIRTIO_NET_F_HASH_REPORT_TUNNEL is negotiated.
+Possible encapsulation types that the device can report in \field{hash_report} are defined below.
+They correspond to supported hash tunnel types defined in
+\ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}
+as follows:
+
+VIRTIO_NET_HASH_TUNNEL_TYPE_XXX = 1 << (VIRTIO_NET_HASH_TUNNEL_REPORT_XXX - 256)
+
+\begin{lstlisting}
+#define VIRTIO_NET_HASH_TUNNEL_REPORT_GRE      256
+#define VIRTIO_NET_HASH_TUNNEL_REPORT_VXLAN    257
+#define VIRTIO_NET_HASH_TUNNEL_REPORT_GENEVE   258
+#define VIRTIO_NET_HASH_TUNNEL_REPORT_IPIP     259
+#define VIRTIO_NET_HASH_TUNNEL_REPORT_NVGRE    260
+\end{lstlisting}
+
+They correspond to supported hash types defined in
+\ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
+
 \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue}
 
 The driver uses the control virtqueue (if VIRTIO_NET_F_CTRL_VQ is
@@ -4364,6 +4489,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
 \begin{lstlisting}
 struct virtio_net_hash_config {
     le32 hash_types;
+    le32 hash_tunnel_types;
     le16 reserved[4];
     u8 hash_key_length;
     u8 hash_key_data[hash_key_length];
@@ -4372,7 +4498,11 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
 Field \field{hash_types} contains a bitmask of allowed hash types as
 defined in
 \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
-Initially the device has all hash types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
+
+Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
+defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
+
+Initially the device has all hash types and hash tunnel types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
 
 Field \field{reserved} MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure,
 defined in \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS)}.
@@ -4390,6 +4520,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
 \begin{lstlisting}
 struct virtio_net_rss_config {
     le32 hash_types;
+    le32 hash_tunnel_types;
     le16 indirection_table_mask;
     le16 unclassified_queue;
     le16 indirection_table[indirection_table_length];
@@ -4402,6 +4533,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
 defined in
 \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
 
+Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
+defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
+
 Field \field{indirection_table_mask} is a mask to be applied to
 the calculated hash to produce an index in the
 \field{indirection_table} array.
diff --git a/introduction.tex b/introduction.tex
index 287c5fc..69b95ae 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -98,6 +98,31 @@ \section{Normative References}\label{sec:Normative References}
 	\phantomsection\label{intro:SEC1}\textbf{[SEC1]} &
     Standards for Efficient Cryptography Group(SECG), ``SEC1: Elliptic Cureve Cryptography'', Version 1.0, September 2000.
 	\newline\url{https://www.secg.org/sec1-v2.pdf}\\
+	\phantomsection\label{intro:GRE}\textbf{[GRE]} &
+	Generic Routing Encapsulation
+	\newline\url{https://datatracker.ietf.org/doc/rfc2784/}\\
+	\phantomsection\label{intro:VXLAN}\textbf{[VXLAN]} &
+	Virtual eXtensible Local Area Network
+	\newline\url{https://datatracker.ietf.org/doc/rfc7348/}\\
+	\phantomsection\label{intro:GENEVE}\textbf{[GENEVE]} &
+	Generic Network Virtualization Encapsulation
+	\phantomsection\label{intro:IPIP}\textbf{[IPIP]} &
+	IP Encapsulation within IP
+	\newline\url{https://www.rfc-editor.org/rfc/rfc2003}\\
+	\phantomsection\label{intro:IPIP}\textbf{[NVGRE]} &
+	NVGRE: Network Virtualization Using Generic Routing Encapsulation
+	\newline\url{https://www.rfc-editor.org/rfc/rfc7637.html}\\
+	\newline\url{https://datatracker.ietf.org/doc/rfc8926/}\\
+	\phantomsection\label{intro:IP}\textbf{[IP]} &
+	INTERNET PROTOCOL
+	\newline\url{https://www.rfc-editor.org/rfc/rfc791}\\
+	\phantomsection\label{intro:UDP}\textbf{[UDP]} &
+	User Datagram Protocol
+	\newline\url{https://www.rfc-editor.org/rfc/rfc768}\\
+	\phantomsection\label{intro:TCP}\textbf{[TCP]} &
+	TRANSMISSION CONTROL PROTOCOL
+	\newline\url{https://www.rfc-editor.org/rfc/rfc793}\\
+
 
 \end{longtable}
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [virtio-comment] Re: [virtio-dev] [PATCH v9] virtio-net: support inner header hash
  2023-02-18 14:37 [PATCH v9] virtio-net: support inner header hash Heng Qi
@ 2023-02-20 15:53 ` Heng Qi
  2023-02-20 16:12   ` Michael S. Tsirkin
  2023-02-21  4:20 ` Parav Pandit
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-02-20 15:53 UTC (permalink / raw)
  To: virtio-comment, virtio-dev
  Cc: Michael S . Tsirkin, Parav Pandit, Jason Wang, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo

Hi, all.
Do you have any comments on this?

Thanks!

在 2023/2/18 下午10:37, Heng Qi 写道:
> If the tunnel is used to encapsulate the packets, the hash calculated
> using the outer header of the receive packets is always fixed for the
> same flow packets, i.e. they will be steered to the same receive queue.
>
> We add a feature bit VIRTIO_NET_F_HASH_TUNNEL and related bitmasks
> in \field{hash_tunnel_types}, which instructs the device to calculate the
> hash using the inner headers of tunnel-encapsulated packets. Note that
> VIRTIO_NET_F_HASH_TUNNEL only indicates the ability of the inner header
> hash, and does not give the device the ability to use the hash value
> to select a receiving queue to place the packet.
>
> Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to report
> an encapsulation type, and the feature depends on VIRTIO_NET_F_HASH_REPORT.
> It only means that the encapsulation type can be reported, it cannot instruct
> the device to calculate the hash.
>
> Fixes: https://github.com/oasis-tcs/virtio-spec/issues/151
>
> Reviewed-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> v8->v9:
> 	1. Merge hash_report_tunnel_types into hash_report. @Parav Pandit
> 	2. Add tunnel security section. @Michael S . Tsirkin
> 	3. Add VIRTIO_NET_F_HASH_REPORT_TUNNEL.
> 	4. Fix some typos.
> 	5. Add more tunnel types. @Michael S . Tsirkin
>
> v7->v8:
> 	1. Add supported_hash_tunnel_types. @Jason Wang, @Parav Pandit
> 	2. Change hash_report_tunnel to hash_report_tunnel_types. @Parav Pandit
> 	3. Removed re-definition for inner packet hashing. @Parav Pandit
> 	4. Fix some typos. @Michael S . Tsirkin
> 	5. Clarify some sentences. @Michael S . Tsirkin
>
> v6->v7:
> 	1. Modify the wording of some sentences for clarity. @Michael S. Tsirkin
> 	2. Fix some syntax issues. @Michael S. Tsirkin
>
> v5->v6:
> 	1. Fix some syntax and capitalization issues. @Michael S. Tsirkin
> 	2. Use encapsulated/encaptulation uniformly. @Michael S. Tsirkin
> 	3. Move the links to introduction section. @Michael S. Tsirkin
> 	4. Clarify some sentences. @Michael S. Tsirkin
>
> v4->v5:
> 	1. Clarify some paragraphs. @Cornelia Huck
> 	2. Fix the u8 type. @Cornelia Huck
>
> v3->v4:
> 	1. Rename VIRTIO_NET_F_HASH_GRE_VXLAN_GENEVE_INNER to VIRTIO_NET_F_HASH_TUNNEL. @Jason Wang
> 	2. Make things clearer. @Jason Wang @Michael S. Tsirkin
> 	3. Keep the possibility to use inner hash for automatic receive steering. @Jason Wang
> 	4. Add the "Tunnel packet" paragraph to avoid repeating the GRE etc. many times. @Michael S. Tsirkin
>
> v2->v3:
> 	1. Add a feature bit for GRE/VXLAN/GENEVE inner hash. @Jason Wang
> 	2. Chang \field{hash_tunnel} to \field{hash_report_tunnel}. @Jason Wang, @Michael S. Tsirkin
>
> v1->v2:
> 	1. Remove the patch for the bitmask fix. @Michael S. Tsirkin
> 	2. Clarify some paragraphs. @Jason Wang
> 	3. Add \field{hash_tunnel} and VIRTIO_NET_HASH_REPORT_GRE. @Yuri Benditovich
>
>   content.tex      | 164 ++++++++++++++++++++++++++++++++++++++++++-----
>   introduction.tex |  25 ++++++++
>   2 files changed, 174 insertions(+), 15 deletions(-)
>
> diff --git a/content.tex b/content.tex
> index e863709..8e352d2 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3084,6 +3084,11 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>       channel.
>   
> +\item[VIRTIO_NET_F_HASH_TUNNEL(51)] Device supports inner header hash
> +	for tunnel-encapsulated packets.
> +
> +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL(52)] Device can report an encapsulation type.
> +
>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>   
>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> @@ -3140,6 +3145,8 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL] Requires VIRTIO_NET_F_HASH_REPORT.
>   \end{description}
>   
>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> @@ -3199,20 +3206,27 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>           u8 rss_max_key_size;
>           le16 rss_max_indirection_table_length;
>           le32 supported_hash_types;
> +        le32 supported_tunnel_hash_types;
>   };
>   \end{lstlisting}
> -The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
> +The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL is set.
>   It specifies the maximum supported length of RSS key in bytes.
>   
>   The following field, \field{rss_max_indirection_table_length} only exists if VIRTIO_NET_F_RSS is set.
>   It specifies the maximum number of 16-bit entries in RSS indirection table.
>   
>   The next field, \field{supported_hash_types} only exists if the device supports hash calculation,
> -i.e. if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is set.
> +i.e. if VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL is set.
>   
>   Field \field{supported_hash_types} contains the bitmask of supported hash types.
>   See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types} for details of supported hash types.
>   
> +The next field, \field{supported_tunnel_hash_types} only exists if the device
> +supports inner hash calculation, i.e. if VIRTIO_NET_F_HASH_TUNNEL is set.
> +
> +Field \field{supported_tunnel_hash_types} contains the bitmask of supported tunnel hash types.
> +See \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types} for details of supported tunnel hash types.
> +
>   \devicenormative{\subsubsection}{Device configuration layout}{Device Types / Network Device / Device configuration layout}
>   
>   The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000 inclusive,
> @@ -3236,7 +3250,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Network Device
>   negotiated.
>   
>   The device MUST set \field{rss_max_key_size} to at least 40, if it offers
> -VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT.
> +VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL.
>   
>   The device MUST set \field{rss_max_indirection_table_length} to at least 128, if it offers
>   VIRTIO_NET_F_RSS.
> @@ -3385,7 +3399,8 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>           le16 csum_offset;
>           le16 num_buffers;
>           le32 hash_value;        (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
> -        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
> +        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated, and the upper 8 bits indicates the
> +                                 encapsulation type if VIRTIO_NET_F_HASH_REPORT_TUNNEL negotiated, otherwise reserved)
>           le16 padding_reserved;  (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
>   };
>   \end{lstlisting}
> @@ -3838,11 +3853,15 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>   \begin{itemize}
>   \item The feature VIRTIO_NET_F_RSS was negotiated. The device uses the hash to determine the receive virtqueue to place incoming packets.
>   \item The feature VIRTIO_NET_F_HASH_REPORT was negotiated. The device reports the hash value and the hash type with the packet.
> +\item The feature VIRTIO_NET_F_HASH_TUNNEL was negotiated. The device supports inner hash calculation. If additionally
> +      VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device reports the encapsulation type as well.
>   \end{itemize}
>   
>   If the feature VIRTIO_NET_F_RSS was negotiated:
>   \begin{itemize}
>   \item The device uses \field{hash_types} of the virtio_net_rss_config structure as 'Enabled hash types' bitmask.
> +	If additionally VIRTIO_NET_F_HASH_TUNNEL was negotiated, the device uses \field{hash_tunnel_types} of the
> +	virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask.
>   \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_rss_config structure (see
>   \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / Setting RSS parameters}).
>   \end{itemize}
> @@ -3850,11 +3869,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>   If the feature VIRTIO_NET_F_RSS was not negotiated:
>   \begin{itemize}
>   \item The device uses \field{hash_types} of the virtio_net_hash_config structure as 'Enabled hash types' bitmask.
> +	If additionally VIRTIO_NET_F_HASH_TUNNEL was negotiated, the device uses \field{hash_tunnel_types} of the
> +	virtio_net_hash_config structure as 'Enabled hash tunnel types' bitmask.
>   \item The device uses a key as defined in \field{hash_key_data} and \field{hash_key_length} of the virtio_net_hash_config structure (see
>   \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}).
>   \end{itemize}
>   
> -Note that if the device offers VIRTIO_NET_F_HASH_REPORT, even if it supports only one pair of virtqueues, it MUST support
> +Note that if the device offers VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL, even if it supports only one pair of virtqueues, it MUST support
>   at least one of commands of VIRTIO_NET_CTRL_MQ class to configure reported hash parameters:
>   \begin{itemize}
>   \item If the device offers VIRTIO_NET_F_RSS, it MUST support VIRTIO_NET_CTRL_MQ_RSS_CONFIG command per
> @@ -3863,8 +3884,36 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>    \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}.
>   \end{itemize}
>   
> +\subparagraph{Tunnel/Encapsulated packet}
> +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Tunnel/Encapsulated packet}
> +A tunnel packet is encapsulated from the original packet based on the tunneling
> +protocol (only a single level of encapsulation is currently supported). The
> +encapsulated packet contains an outer header and an inner header, and the device
> +calculates the hash over either the inner header or the outer header.
> +
> +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and the corresponding
> +encapsulation type is set in \field{hash_tunnel_types}, the hash for a specific
> +type of encapsulated packet is calculated over the inner as opposed to outer header.
> +Supported encapsulation types are listed in \ref{sec:Device Types / Network Device /
> +Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets /
> +Supported/enabled hash tunnel types}.
> +
> +If both VIRTIO_NET_F_HASH_REPORT_TUNNEL and VIRTIO_NET_F_HASH_REPORT are negotiated,
> +and hash is calculated for an encapsulated  packet, the device reports the encapsulation
> +type in addition to the hash value and hash type, regardless of whether the hash is
> +calculated on the inner header or the outer header.
> +
> +If VIRTIO_NET_F_HASH_REPORT and VIRTIO_NET_F_HASH_REPORT_TUNNEL are negotiated
> +but VIRTIO_NET_F_HASH_TUNNEL is not negotiated, the device calculates the hash over
> +the outer header, and \field{hash_report} reports the hash type and encapsulation type.
> +
> +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]}, \hyperref[intro:VXLAN]{[VXLAN]},
> +\hyperref[intro:GENEVE]{[GENEVE]}, \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
> +
>   \subparagraph{Supported/enabled hash types}
>   \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
> +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
> +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
>   Hash types applicable for IPv4 packets:
>   \begin{lstlisting}
>   #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
> @@ -3884,6 +3933,32 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>   #define VIRTIO_NET_HASH_TYPE_UDP_EX            (1 << 8)
>   \end{lstlisting}
>   
> +\subparagraph{Supported/enabled tunnel hash types}
> +\label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled tunnel hash types}
> +If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated, the encapsulation
> +hash type indicates that the hash is calculated over the inner header of
> +the encapsulated packet:
> +Hash type applicable for inner payload of the gre-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 0)
> +\end{lstlisting}
> +Hash type applicable for inner payload of the vxlan-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 1)
> +\end{lstlisting}
> +Hash type applicable for inner payload of the geneve-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 2)
> +\end{lstlisting}
> +Hash type applicable for inner payload of the ip-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 3)
> +\end{lstlisting}
> +Hash type applicable for inner payload of the nvgre-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 4)
> +\end{lstlisting}
> +
>   \subparagraph{IPv4 packets}
>   \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv4 packets}
>   The device calculates the hash on IPv4 packets according to 'Enabled hash types' bitmask as follows:
> @@ -3975,17 +4050,47 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>   (see \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / IPv6 packets without extension header}).
>   \end{itemize}
>   
> +\subparagraph{Inner hash calculation of an encapsulated packet}
> +If the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and the corresponding
> +encapsulation hash type is set in \field{hash_tunnel_types}, the device calculates the
> +hash on the inner header of an encapsulated packet (See \ref{sec:Device Types
> +/ Network Device / Device Operation / Processing of Incoming Packets /
> +Hash calculation for incoming packets / Tunnel/Encapsulated packet}).
> +
> +\subparagraph{Security risks between encapsulated packets and RSS}
> +There may be potential security risks when encapsulated packets using RSS to
> +select queues for placement. When a user inside a tunnel tries to control the
> +enqueuing of encapsulated packets, then the user can flood the device with invaild
> +packets, and the flooded packets may be hashed into the same queue as packets in
> +other normal tunnels, which causing the queue to overflow.
> +
> +This can pose several security risks:
> +\begin{itemize}
> +\item  Encapsulated packets in the normal tunnels cannot be enqueued due to queue
> +       overflow, resulting in a large amount of packet loss.
> +\item  The delay and retransmission of packets in the normal tunnels are extremely increased.
> +\item  The user can observe the traffic information and enqueue information of other normal
> +       tunnels, and conduct targeted DoS attacks.
> +\end{\itemize}
> +
>   \paragraph{Hash reporting for incoming packets}
>   \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash reporting for incoming packets}
> -
> -If VIRTIO_NET_F_HASH_REPORT was negotiated and
> - the device has calculated the hash for the packet, the device fills \field{hash_report} with the report type of calculated hash
> -and \field{hash_value} with the value of calculated hash.
> -
> -If VIRTIO_NET_F_HASH_REPORT was negotiated but due to any reason the
> -hash was not calculated, the device sets \field{hash_report} to VIRTIO_NET_HASH_REPORT_NONE.
> -
> -Possible values that the device can report in \field{hash_report} are defined below.
> +If VIRTIO_NET_F_HASH_REPORT was negotiated and the device has calculated the
> +hash for the packet, the device fills the lower 8 bits of \field{hash_report} with
> +the report type of calculated hash, and \field{hash_value} with the value of calculated
> +hash. Also, if VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device needs to fill
> +the upper 8 bits of \field{hash_report} with the encapsulation type.
> +
> +If VIRTIO_NET_F_HASH_REPORT was negotiated but due to any reason the hash was not
> +calculated, the device sets the lower 8 bits of \field{hash_report} to
> +VIRTIO_NET_HASH_REPORT_NONE.
> +
> +If VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device fills the upper
> +8 bits of \field{hash_report} with the encapsulation type for an encapsulated
> +packet. Note that the upper 8 bits are all set to 0 for an unencapsulated
> +packet, regardless of whether VIRTIO_NET_F_HASH_REPORT_TUNNEL is negotiated or not.
> +
> +Possible hash types that the device can report in \field{hash_report} are defined below.
>   They correspond to supported hash types defined in
>   \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}
>   as follows:
> @@ -4005,6 +4110,26 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>   #define VIRTIO_NET_HASH_REPORT_UDPv6_EX        9
>   \end{lstlisting}
>   
> +The upper 8 bits of \field{hash_report} can report the encapsulation type to
> +the driver if VIRTIO_NET_F_HASH_REPORT_TUNNEL is negotiated.
> +Possible encapsulation types that the device can report in \field{hash_report} are defined below.
> +They correspond to supported hash tunnel types defined in
> +\ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}
> +as follows:
> +
> +VIRTIO_NET_HASH_TUNNEL_TYPE_XXX = 1 << (VIRTIO_NET_HASH_TUNNEL_REPORT_XXX - 256)
> +
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_GRE      256
> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_VXLAN    257
> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_GENEVE   258
> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_IPIP     259
> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_NVGRE    260
> +\end{lstlisting}
> +
> +They correspond to supported hash types defined in
> +\ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
> +
>   \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue}
>   
>   The driver uses the control virtqueue (if VIRTIO_NET_F_CTRL_VQ is
> @@ -4364,6 +4489,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>   \begin{lstlisting}
>   struct virtio_net_hash_config {
>       le32 hash_types;
> +    le32 hash_tunnel_types;
>       le16 reserved[4];
>       u8 hash_key_length;
>       u8 hash_key_data[hash_key_length];
> @@ -4372,7 +4498,11 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>   Field \field{hash_types} contains a bitmask of allowed hash types as
>   defined in
>   \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
> -Initially the device has all hash types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
> +
> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> +
> +Initially the device has all hash types and hash tunnel types disabled and reports only VIRTIO_NET_HASH_REPORT_NONE.
>   
>   Field \field{reserved} MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure,
>   defined in \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS)}.
> @@ -4390,6 +4520,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>   \begin{lstlisting}
>   struct virtio_net_rss_config {
>       le32 hash_types;
> +    le32 hash_tunnel_types;
>       le16 indirection_table_mask;
>       le16 unclassified_queue;
>       le16 indirection_table[indirection_table_length];
> @@ -4402,6 +4533,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>   defined in
>   \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash types}.
>   
> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash tunnel types as
> +defined in \ref{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets / Supported/enabled hash tunnel types}.
> +
>   Field \field{indirection_table_mask} is a mask to be applied to
>   the calculated hash to produce an index in the
>   \field{indirection_table} array.
> diff --git a/introduction.tex b/introduction.tex
> index 287c5fc..69b95ae 100644
> --- a/introduction.tex
> +++ b/introduction.tex
> @@ -98,6 +98,31 @@ \section{Normative References}\label{sec:Normative References}
>   	\phantomsection\label{intro:SEC1}\textbf{[SEC1]} &
>       Standards for Efficient Cryptography Group(SECG), ``SEC1: Elliptic Cureve Cryptography'', Version 1.0, September 2000.
>   	\newline\url{https://www.secg.org/sec1-v2.pdf}\\
> +	\phantomsection\label{intro:GRE}\textbf{[GRE]} &
> +	Generic Routing Encapsulation
> +	\newline\url{https://datatracker.ietf.org/doc/rfc2784/}\\
> +	\phantomsection\label{intro:VXLAN}\textbf{[VXLAN]} &
> +	Virtual eXtensible Local Area Network
> +	\newline\url{https://datatracker.ietf.org/doc/rfc7348/}\\
> +	\phantomsection\label{intro:GENEVE}\textbf{[GENEVE]} &
> +	Generic Network Virtualization Encapsulation
> +	\phantomsection\label{intro:IPIP}\textbf{[IPIP]} &
> +	IP Encapsulation within IP
> +	\newline\url{https://www.rfc-editor.org/rfc/rfc2003}\\
> +	\phantomsection\label{intro:IPIP}\textbf{[NVGRE]} &
> +	NVGRE: Network Virtualization Using Generic Routing Encapsulation
> +	\newline\url{https://www.rfc-editor.org/rfc/rfc7637.html}\\
> +	\newline\url{https://datatracker.ietf.org/doc/rfc8926/}\\
> +	\phantomsection\label{intro:IP}\textbf{[IP]} &
> +	INTERNET PROTOCOL
> +	\newline\url{https://www.rfc-editor.org/rfc/rfc791}\\
> +	\phantomsection\label{intro:UDP}\textbf{[UDP]} &
> +	User Datagram Protocol
> +	\newline\url{https://www.rfc-editor.org/rfc/rfc768}\\
> +	\phantomsection\label{intro:TCP}\textbf{[TCP]} &
> +	TRANSMISSION CONTROL PROTOCOL
> +	\newline\url{https://www.rfc-editor.org/rfc/rfc793}\\
> +
>   
>   \end{longtable}
>   

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] [PATCH v9] virtio-net: support inner header hash
  2023-02-20 15:53 ` [virtio-comment] Re: [virtio-dev] " Heng Qi
@ 2023-02-20 16:12   ` Michael S. Tsirkin
  0 siblings, 0 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-20 16:12 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Mon, Feb 20, 2023 at 11:53:31PM +0800, Heng Qi wrote:
> Hi, all.
> Do you have any comments on this?
> 
> Thanks!

It's just been 2 days and you made lots of changes since v8.
Sit tight.


^ permalink raw reply	[flat|nested] 105+ messages in thread

* RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-18 14:37 [PATCH v9] virtio-net: support inner header hash Heng Qi
  2023-02-20 15:53 ` [virtio-comment] Re: [virtio-dev] " Heng Qi
@ 2023-02-21  4:20 ` Parav Pandit
  2023-02-21  6:14   ` [virtio-comment] " Heng Qi
                     ` (2 more replies)
  2023-02-21 17:50 ` Michael S. Tsirkin
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 105+ messages in thread
From: Parav Pandit @ 2023-02-21  4:20 UTC (permalink / raw)
  To: Heng Qi, virtio-comment, virtio-dev
  Cc: Michael S . Tsirkin, Jason Wang, Yuri Benditovich, Cornelia Huck,
	Xuan Zhuo


> From: Heng Qi <hengqi@linux.alibaba.com>
> Sent: Saturday, February 18, 2023 9:37 AM

> If the tunnel is used to encapsulate the packets, the hash calculated using the
s/hash calculated/hash is calculated

> outer header of the receive packets is always fixed for the same flow packets,
> i.e. they will be steered to the same receive queue.
> 
A little descriptive commit message like below reads better to me.

Currently, when a received packet is an encapsulated packet meaning there is an outer and an inner header, virtio device is unable to calculate the hash for the inner header.
Due to this limitation, multiple different flows identified by the inner header for the same outer header result in selecting the same receive queue.
This effectively disables the RSS, resulting in poor receive performance.

Hence, to overcome this limitation, a new feature is introduced using a feature bit VIRTIO_NET_F_HASH_TUNNEL.
This feature enables the device to advertise the capability to calculate the hash for the inner packet header.
Thereby regaining better RSS performance in presence of outer packet header.

> We add a feature bit VIRTIO_NET_F_HASH_TUNNEL and related bitmasks in
> \field{hash_tunnel_types}, which instructs the device to calculate the hash
> using the inner headers of tunnel-encapsulated packets. Note that
> VIRTIO_NET_F_HASH_TUNNEL only indicates the ability of the inner header
> hash, and does not give the device the ability to use the hash value to select a
> receiving queue to place the packet.
> 
> Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to report
> an encapsulation type, and the feature depends on
> VIRTIO_NET_F_HASH_REPORT.

As we discussed that tunnel type alone is not useful the sw, neither as an individual field nor merged with some other field.
Hence, please remove this feature bit. HASH_TUNNEL is good enough.
Please remove the references to it at more places below.

> It only means that the encapsulation type can be reported, it cannot instruct
> the device to calculate the hash.
> 

> 
> +\item[VIRTIO_NET_F_HASH_TUNNEL(51)] Device supports inner header hash
> +	for tunnel-encapsulated packets.
> +
> +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL(52)] Device can report an
> encapsulation type.
> +
Please remove this.

>  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications
> coalescing.
> 
>  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> @@ -3140,6 +3145,8 @@ \subsubsection{Feature bit
> requirements}\label{sec:Device Types / Network Device
> \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or
> VIRTIO_NET_F_HOST_TSO6.
>  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL] Requires
> VIRTIO_NET_F_HASH_REPORT.
>  \end{description}
> 
>  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types /
> Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3199,20
> +3206,27 @@ \subsection{Device configuration layout}\label{sec:Device Types
> / Network Device
>          u8 rss_max_key_size;
>          le16 rss_max_indirection_table_length;
>          le32 supported_hash_types;
> +        le32 supported_tunnel_hash_types;
>  };
>  \end{lstlisting}
> -The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS
> or VIRTIO_NET_F_HASH_REPORT is set.
> +The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS,
> VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL is set.
>  It specifies the maximum supported length of RSS key in bytes.
> 
>  The following field, \field{rss_max_indirection_table_length} only exists if
> VIRTIO_NET_F_RSS is set.
>  It specifies the maximum number of 16-bit entries in RSS indirection table.
> 
>  The next field, \field{supported_hash_types} only exists if the device supports
> hash calculation, -i.e. if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is
> set.
> +i.e. if VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or
> VIRTIO_NET_F_HASH_TUNNEL is set.
> 
>  Field \field{supported_hash_types} contains the bitmask of supported hash
> types.
>  See \ref{sec:Device Types / Network Device / Device Operation / Processing of
> Incoming Packets / Hash calculation for incoming packets / Supported/enabled
> hash types} for details of supported hash types.
> 
> +The next field, \field{supported_tunnel_hash_types} only exists if the
> +device supports inner hash calculation, i.e. if VIRTIO_NET_F_HASH_TUNNEL is
> set.
> +
> +Field \field{supported_tunnel_hash_types} contains the bitmask of supported
> tunnel hash types.
> +See \ref{sec:Device Types / Network Device / Device Operation / Processing
> of Incoming Packets / Hash calculation for incoming packets /
> Supported/enabled tunnel hash types} for details of supported tunnel hash
> types.
> +
>  \devicenormative{\subsubsection}{Device configuration layout}{Device Types /
> Network Device / Device configuration layout}
> 
>  The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000
> inclusive, @@ -3236,7 +3250,7 @@ \subsection{Device configuration
> layout}\label{sec:Device Types / Network Device  negotiated.
> 
>  The device MUST set \field{rss_max_key_size} to at least 40, if it offers -
> VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT.
> +VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or
> VIRTIO_NET_F_HASH_TUNNEL.
> 
>  The device MUST set \field{rss_max_indirection_table_length} to at least 128,
> if it offers  VIRTIO_NET_F_RSS.
> @@ -3385,7 +3399,8 @@ \subsection{Device Operation}\label{sec:Device
> Types / Network Device / Device O
>          le16 csum_offset;
>          le16 num_buffers;
>          le32 hash_value;        (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
> -        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
> +        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated,
> and the upper 8 bits indicates the
> +                                 encapsulation type if
> + VIRTIO_NET_F_HASH_REPORT_TUNNEL negotiated, otherwise reserved)
>          le16 padding_reserved;  (Only if VIRTIO_NET_F_HASH_REPORT
> negotiated)  };  \end{lstlisting} @@ -3838,11 +3853,15 @@
> \subsubsection{Processing of Incoming Packets}\label{sec:Device Types /
> Network  \begin{itemize}  \item The feature VIRTIO_NET_F_RSS was
> negotiated. The device uses the hash to determine the receive virtqueue to
> place incoming packets.
>  \item The feature VIRTIO_NET_F_HASH_REPORT was negotiated. The device
> reports the hash value and the hash type with the packet.
> +\item The feature VIRTIO_NET_F_HASH_TUNNEL was negotiated. The device
> supports inner hash calculation. If additionally
> +      VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device reports
> the encapsulation type as well.
>  \end{itemize}
> 
>  If the feature VIRTIO_NET_F_RSS was negotiated:
>  \begin{itemize}
>  \item The device uses \field{hash_types} of the virtio_net_rss_config structure
> as 'Enabled hash types' bitmask.
> +	If additionally VIRTIO_NET_F_HASH_TUNNEL was negotiated, the
> device uses \field{hash_tunnel_types} of the
> +	virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask.
>  \item The device uses a key as defined in \field{hash_key_data} and
> \field{hash_key_length} of the virtio_net_rss_config structure (see
> \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue
> / Receive-side scaling (RSS) / Setting RSS parameters}).
>  \end{itemize}
> @@ -3850,11 +3869,13 @@ \subsubsection{Processing of Incoming
> Packets}\label{sec:Device Types / Network  If the feature VIRTIO_NET_F_RSS
> was not negotiated:
>  \begin{itemize}
>  \item The device uses \field{hash_types} of the virtio_net_hash_config
> structure as 'Enabled hash types' bitmask.
> +	If additionally VIRTIO_NET_F_HASH_TUNNEL was negotiated, the
> device uses \field{hash_tunnel_types} of the
> +	virtio_net_hash_config structure as 'Enabled hash tunnel types'
> bitmask.
>  \item The device uses a key as defined in \field{hash_key_data} and
> \field{hash_key_length} of the virtio_net_hash_config structure (see
> \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue
> / Automatic receive steering in multiqueue mode / Hash calculation}).
>  \end{itemize}
> 
> -Note that if the device offers VIRTIO_NET_F_HASH_REPORT, even if it
> supports only one pair of virtqueues, it MUST support
> +Note that if the device offers VIRTIO_NET_F_HASH_REPORT or
> +VIRTIO_NET_F_HASH_TUNNEL, even if it supports only one pair of
> +virtqueues, it MUST support
>  at least one of commands of VIRTIO_NET_CTRL_MQ class to configure
> reported hash parameters:
>  \begin{itemize}
>  \item If the device offers VIRTIO_NET_F_RSS, it MUST support
> VIRTIO_NET_CTRL_MQ_RSS_CONFIG command per @@ -3863,8 +3884,36 @@
> \subsubsection{Processing of Incoming Packets}\label{sec:Device Types /
> Network
>   \ref{sec:Device Types / Network Device / Device Operation / Control
> Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}.
>  \end{itemize}
> 
> +\subparagraph{Tunnel/Encapsulated packet} \label{sec:Device Types /
> +Network Device / Device Operation / Processing of Incoming Packets /
> +Hash calculation for incoming packets / Tunnel/Encapsulated packet} A
> +tunnel packet is encapsulated from the original packet based on the
> +tunneling protocol (only a single level of encapsulation is currently
> +supported). The encapsulated packet contains an outer header and an inner
> header, and the device calculates the hash over either the inner header or the
> outer header.
> +
> +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and the
> +corresponding encapsulation type is set in \field{hash_tunnel_types},
> +the hash for a specific type of encapsulated packet is calculated over the inner
> as opposed to outer header.
To the outer header.

Here, you want to say that 
When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received packet's outer header matches one of the supported hash_tunnel_types, the hash of the inner header is calculated.

> +Supported encapsulation types are listed in \ref{sec:Device Types /
> +Network Device / Device Operation / Processing of Incoming Packets /
> +Hash calculation for incoming packets / Supported/enabled hash tunnel
> types}.
> +
> +If both VIRTIO_NET_F_HASH_REPORT_TUNNEL and
> VIRTIO_NET_F_HASH_REPORT
> +are negotiated, and hash is calculated for an encapsulated  packet, the
> +device reports the encapsulation type in addition to the hash value and
> +hash type, regardless of whether the hash is calculated on the inner header or
> the outer header.
> +
> +If VIRTIO_NET_F_HASH_REPORT and VIRTIO_NET_F_HASH_REPORT_TUNNEL
> are
> +negotiated but VIRTIO_NET_F_HASH_TUNNEL is not negotiated, the device
> +calculates the hash over the outer header, and \field{hash_report} reports the
> hash type and encapsulation type.
> +
> +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]},
> +\hyperref[intro:VXLAN]{[VXLAN]}, \hyperref[intro:GENEVE]{[GENEVE]},
> \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
> +
>  \subparagraph{Supported/enabled hash types}  \label{sec:Device Types /
> Network Device / Device Operation / Processing of Incoming Packets / Hash
> calculation for incoming packets / Supported/enabled hash types}
> +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
> +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
>  Hash types applicable for IPv4 packets:
>  \begin{lstlisting}
>  #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
> @@ -3884,6 +3933,32 @@ \subsubsection{Processing of Incoming
> Packets}\label{sec:Device Types / Network
>  #define VIRTIO_NET_HASH_TYPE_UDP_EX            (1 << 8)
>  \end{lstlisting}
> 
Lets please remove the below encoding.

> +\subparagraph{Supported/enabled tunnel hash types} \label{sec:Device
> +Types / Network Device / Device Operation / Processing of Incoming
> +Packets / Hash calculation for incoming packets / Supported/enabled
> +tunnel hash types} If the feature VIRTIO_NET_F_HASH_TUNNEL is
> +negotiated, the encapsulation hash type indicates that the hash is calculated
> over the inner header of the encapsulated packet:
> +Hash type applicable for inner payload of the gre-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 0)
> +\end{lstlisting}
> +Hash type applicable for inner payload of the vxlan-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 1)
> +\end{lstlisting}
> +Hash type applicable for inner payload of the geneve-encapsulated
> +packet \begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 2)
> +\end{lstlisting}
> +Hash type applicable for inner payload of the ip-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 3)
> +\end{lstlisting}
> +Hash type applicable for inner payload of the nvgre-encapsulated packet
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 4)
> +\end{lstlisting}
> +
>  \subparagraph{IPv4 packets}
>  \label{sec:Device Types / Network Device / Device Operation / Processing of
> Incoming Packets / Hash calculation for incoming packets / IPv4 packets}  The
> device calculates the hash on IPv4 packets according to 'Enabled hash types'
> bitmask as follows:
> @@ -3975,17 +4050,47 @@ \subsubsection{Processing of Incoming
> Packets}\label{sec:Device Types / Network  (see \ref{sec:Device Types /
> Network Device / Device Operation / Processing of Incoming Packets / Hash
> calculation for incoming packets / IPv6 packets without extension header}).
>  \end{itemize}
> 
> +\subparagraph{Inner hash calculation of an encapsulated packet} If the
> +feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and the corresponding
> +encapsulation hash type is set in \field{hash_tunnel_types}, the device
> +calculates the hash on the inner header of an encapsulated packet (See
> +\ref{sec:Device Types / Network Device / Device Operation / Processing
> +of Incoming Packets / Hash calculation for incoming packets /
> Tunnel/Encapsulated packet}).
> +
> +\subparagraph{Security risks between encapsulated packets and RSS}
> +There may be potential security risks when encapsulated packets using
s/when encapsulated/when encapsulating/

> +RSS to select queues for placement. When a user inside a tunnel tries
> +to control the enqueuing of encapsulated packets, then the user can
> +flood the device with invaild packets, and the flooded packets may be
> +hashed into the same queue as packets in other normal tunnels, which causing
> the queue to overflow.
> 
Invalid packets are confusing and the wording of "which causing" is not proper.
There is some duplicate wording below too.

I think above and below risk can be summarized in bit simpler manner.

How about,

When a specific receive queue is shared to receive packets of multiple tunnels, there is no quality of service for packets of multiple tunnels.

+
> +This can pose several security risks:
> +\begin{itemize}
> +\item  Encapsulated packets in the normal tunnels cannot be enqueued due to
> queue
> +       overflow, resulting in a large amount of packet loss.
> +\item  The delay and retransmission of packets in the normal tunnels are
> extremely increased.
This is something very protocol specific and doesn't belong here.

> +\item  The user can observe the traffic information and enqueue information
> of other normal
> +       tunnels, and conduct targeted DoS attacks.
Once hash_report_tunnel_types is removed, this second attack is no longer applicable.
Hence, please remove this too.

> +\end{\itemize}
> +
>  \paragraph{Hash reporting for incoming packets}  \label{sec:Device Types /
> Network Device / Device Operation / Processing of Incoming Packets / Hash
> reporting for incoming packets}
> -
> -If VIRTIO_NET_F_HASH_REPORT was negotiated and
> - the device has calculated the hash for the packet, the device fills
> \field{hash_report} with the report type of calculated hash -and
> \field{hash_value} with the value of calculated hash.
> -
> -If VIRTIO_NET_F_HASH_REPORT was negotiated but due to any reason the -
> hash was not calculated, the device sets \field{hash_report} to
> VIRTIO_NET_HASH_REPORT_NONE.
> -
> -Possible values that the device can report in \field{hash_report} are defined
> below.
> +If VIRTIO_NET_F_HASH_REPORT was negotiated and the device has
> +calculated the hash for the packet, the device fills the lower 8 bits
> +of \field{hash_report} with the report type of calculated hash, and
> +\field{hash_value} with the value of calculated hash. Also, if
> +VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device needs to
> fill the upper 8 bits of \field{hash_report} with the encapsulation type.
> +
> +If VIRTIO_NET_F_HASH_REPORT was negotiated but due to any reason the
> +hash was not calculated, the device sets the lower 8 bits of
> +\field{hash_report} to VIRTIO_NET_HASH_REPORT_NONE.
> +
> +If VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device fills the
> +upper
> +8 bits of \field{hash_report} with the encapsulation type for an
> +encapsulated packet. Note that the upper 8 bits are all set to 0 for an
> +unencapsulated packet, regardless of whether
> VIRTIO_NET_F_HASH_REPORT_TUNNEL is negotiated or not.
> +
> +Possible hash types that the device can report in \field{hash_report} are
> defined below.
>  They correspond to supported hash types defined in  \ref{sec:Device Types /
> Network Device / Device Operation / Processing of Incoming Packets / Hash
> calculation for incoming packets / Supported/enabled hash types}  as follows:
> @@ -4005,6 +4110,26 @@ \subsubsection{Processing of Incoming
> Packets}\label{sec:Device Types / Network
>  #define VIRTIO_NET_HASH_REPORT_UDPv6_EX        9
>  \end{lstlisting}
> 
> +The upper 8 bits of \field{hash_report} can report the encapsulation
> +type to the driver if VIRTIO_NET_F_HASH_REPORT_TUNNEL is negotiated.
> +Possible encapsulation types that the device can report in \field{hash_report}
> are defined below.
> +They correspond to supported hash tunnel types defined in
> +\ref{sec:Device Types / Network Device / Device Operation / Processing
> +of Incoming Packets / Hash calculation for incoming packets /
> Supported/enabled hash tunnel types} as follows:
> +
> +VIRTIO_NET_HASH_TUNNEL_TYPE_XXX = 1 <<
> +(VIRTIO_NET_HASH_TUNNEL_REPORT_XXX - 256)
> +
> +\begin{lstlisting}
> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_GRE      256
> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_VXLAN    257
> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_GENEVE   258
> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_IPIP     259
> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_NVGRE    260
> +\end{lstlisting}
> +
> +They correspond to supported hash types defined in \ref{sec:Device
> +Types / Network Device / Device Operation / Processing of Incoming Packets /
> Hash calculation for incoming packets / Supported/enabled hash types}.
> +
>  \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device /
> Device Operation / Control Virtqueue}
> 
>  The driver uses the control virtqueue (if VIRTIO_NET_F_CTRL_VQ is @@ -
> 4364,6 +4489,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types
> / Network Device / Devi  \begin{lstlisting}  struct virtio_net_hash_config {
>      le32 hash_types;
> +    le32 hash_tunnel_types;
>      le16 reserved[4];
>      u8 hash_key_length;
>      u8 hash_key_data[hash_key_length];
> @@ -4372,7 +4498,11 @@ \subsubsection{Control
> Virtqueue}\label{sec:Device Types / Network Device / Devi  Field
> \field{hash_types} contains a bitmask of allowed hash types as  defined in
> \ref{sec:Device Types / Network Device / Device Operation / Processing of
> Incoming Packets / Hash calculation for incoming packets / Supported/enabled
> hash types}.
> -Initially the device has all hash types disabled and reports only
> VIRTIO_NET_HASH_REPORT_NONE.
> +
> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash
> +tunnel types as defined in \ref{sec:Device Types / Network Device / Device
> Operation / Processing of Incoming Packets / Hash calculation for incoming
> packets / Supported/enabled hash tunnel types}.
> +
> +Initially the device has all hash types and hash tunnel types disabled and
> reports only VIRTIO_NET_HASH_REPORT_NONE.
> 
>  Field \field{reserved} MUST contain zeroes. It is defined to make the structure
> to match the layout of virtio_net_rss_config structure,  defined in
> \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue
> / Receive-side scaling (RSS)}.
> @@ -4390,6 +4520,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device
> Types / Network Device / Devi  \begin{lstlisting}  struct virtio_net_rss_config {
>      le32 hash_types;
> +    le32 hash_tunnel_types;
This field is not needed as device config space advertisement for the support is enough.

If the intent is to enable hashing for the specific tunnel(s), an individual command is better.

Regardless, this new field cannot be in the middle of the new structure as it breaks backward compatibility.

>      le16 indirection_table_mask;
>      le16 unclassified_queue;
>      le16 indirection_table[indirection_table_length];
> @@ -4402,6 +4533,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device
> Types / Network Device / Devi  defined in  \ref{sec:Device Types / Network
> Device / Device Operation / Processing of Incoming Packets / Hash calculation
> for incoming packets / Supported/enabled hash types}.
> 
> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash
> +tunnel types as defined in \ref{sec:Device Types / Network Device / Device
> Operation / Processing of Incoming Packets / Hash calculation for incoming
> packets / Supported/enabled hash tunnel types}.
> +
>  Field \field{indirection_table_mask} is a mask to be applied to  the calculated
> hash to produce an index in the  \field{indirection_table} array.
> diff --git a/introduction.tex b/introduction.tex index 287c5fc..69b95ae 100644
> --- a/introduction.tex
> +++ b/introduction.tex


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21  4:20 ` Parav Pandit
@ 2023-02-21  6:14   ` Heng Qi
  2023-02-21 12:47     ` Parav Pandit
  2023-02-21 17:05   ` Michael S. Tsirkin
  2023-03-01 14:32   ` [virtio-dev] " Heng Qi
  2 siblings, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-02-21  6:14 UTC (permalink / raw)
  To: Parav Pandit, virtio-comment, virtio-dev
  Cc: Michael S . Tsirkin, Jason Wang, Yuri Benditovich, Cornelia Huck,
	Xuan Zhuo

Hi, Parav! Thanks for your reply!

在 2023/2/21 下午12:20, Parav Pandit 写道:
>> From: Heng Qi <hengqi@linux.alibaba.com>
>> Sent: Saturday, February 18, 2023 9:37 AM
>> If the tunnel is used to encapsulate the packets, the hash calculated using the
> s/hash calculated/hash is calculated

Sorry, I'm not a native English speaker, but here I want to use an 
attributive clause, but there seems to be a grammatical problem.
I'll use a grammar checker later.:)

>
>> outer header of the receive packets is always fixed for the same flow packets,
>> i.e. they will be steered to the same receive queue.
>>
> A little descriptive commit message like below reads better to me.

Thanks for the suggestion, this is indeed more clear.

>
> Currently, when a received packet is an encapsulated packet meaning there is an outer and an inner header, virtio device is unable to calculate the hash for the inner header.
> Due to this limitation, multiple different flows identified by the inner header for the same outer header result in selecting the same receive queue.
> This effectively disables the RSS, resulting in poor receive performance.
>
> Hence, to overcome this limitation, a new feature is introduced using a feature bit VIRTIO_NET_F_HASH_TUNNEL.
> This feature enables the device to advertise the capability to calculate the hash for the inner packet header.
> Thereby regaining better RSS performance in presence of outer packet header.
>
>> We add a feature bit VIRTIO_NET_F_HASH_TUNNEL and related bitmasks in
>> \field{hash_tunnel_types}, which instructs the device to calculate the hash
>> using the inner headers of tunnel-encapsulated packets. Note that
>> VIRTIO_NET_F_HASH_TUNNEL only indicates the ability of the inner header
>> hash, and does not give the device the ability to use the hash value to select a
>> receiving queue to place the packet.
>>
>> Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to report
>> an encapsulation type, and the feature depends on
>> VIRTIO_NET_F_HASH_REPORT.
> As we discussed that tunnel type alone is not useful the sw, neither as an individual field nor merged with some other field.
> Hence, please remove this feature bit. HASH_TUNNEL is good enough.
> Please remove the references to it at more places below.

If we don't want it at all, I'll remove it and references to it.

>
>> It only means that the encapsulation type can be reported, it cannot instruct
>> the device to calculate the hash.
>>
>> +\item[VIRTIO_NET_F_HASH_TUNNEL(51)] Device supports inner header hash
>> +	for tunnel-encapsulated packets.
>> +
>> +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL(52)] Device can report an
>> encapsulation type.
>> +
> Please remove this.

Ok.

>
>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications
>> coalescing.
>>
>>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>> @@ -3140,6 +3145,8 @@ \subsubsection{Feature bit
>> requirements}\label{sec:Device Types / Network Device
>> \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or
>> VIRTIO_NET_F_HOST_TSO6.
>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>> +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ.
>> +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL] Requires
>> VIRTIO_NET_F_HASH_REPORT.
>>   \end{description}
>>
>>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types /
>> Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3199,20
>> +3206,27 @@ \subsection{Device configuration layout}\label{sec:Device Types
>> / Network Device
>>           u8 rss_max_key_size;
>>           le16 rss_max_indirection_table_length;
>>           le32 supported_hash_types;
>> +        le32 supported_tunnel_hash_types;
>>   };
>>   \end{lstlisting}
>> -The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS
>> or VIRTIO_NET_F_HASH_REPORT is set.
>> +The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS,
>> VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL is set.
>>   It specifies the maximum supported length of RSS key in bytes.
>>
>>   The following field, \field{rss_max_indirection_table_length} only exists if
>> VIRTIO_NET_F_RSS is set.
>>   It specifies the maximum number of 16-bit entries in RSS indirection table.
>>
>>   The next field, \field{supported_hash_types} only exists if the device supports
>> hash calculation, -i.e. if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is
>> set.
>> +i.e. if VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or
>> VIRTIO_NET_F_HASH_TUNNEL is set.
>>
>>   Field \field{supported_hash_types} contains the bitmask of supported hash
>> types.
>>   See \ref{sec:Device Types / Network Device / Device Operation / Processing of
>> Incoming Packets / Hash calculation for incoming packets / Supported/enabled
>> hash types} for details of supported hash types.
>>
>> +The next field, \field{supported_tunnel_hash_types} only exists if the
>> +device supports inner hash calculation, i.e. if VIRTIO_NET_F_HASH_TUNNEL is
>> set.
>> +
>> +Field \field{supported_tunnel_hash_types} contains the bitmask of supported
>> tunnel hash types.
>> +See \ref{sec:Device Types / Network Device / Device Operation / Processing
>> of Incoming Packets / Hash calculation for incoming packets /
>> Supported/enabled tunnel hash types} for details of supported tunnel hash
>> types.
>> +
>>   \devicenormative{\subsubsection}{Device configuration layout}{Device Types /
>> Network Device / Device configuration layout}
>>
>>   The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000
>> inclusive, @@ -3236,7 +3250,7 @@ \subsection{Device configuration
>> layout}\label{sec:Device Types / Network Device  negotiated.
>>
>>   The device MUST set \field{rss_max_key_size} to at least 40, if it offers -
>> VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT.
>> +VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or
>> VIRTIO_NET_F_HASH_TUNNEL.
>>
>>   The device MUST set \field{rss_max_indirection_table_length} to at least 128,
>> if it offers  VIRTIO_NET_F_RSS.
>> @@ -3385,7 +3399,8 @@ \subsection{Device Operation}\label{sec:Device
>> Types / Network Device / Device O
>>           le16 csum_offset;
>>           le16 num_buffers;
>>           le32 hash_value;        (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
>> -        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
>> +        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated,
>> and the upper 8 bits indicates the
>> +                                 encapsulation type if
>> + VIRTIO_NET_F_HASH_REPORT_TUNNEL negotiated, otherwise reserved)
>>           le16 padding_reserved;  (Only if VIRTIO_NET_F_HASH_REPORT
>> negotiated)  };  \end{lstlisting} @@ -3838,11 +3853,15 @@
>> \subsubsection{Processing of Incoming Packets}\label{sec:Device Types /
>> Network  \begin{itemize}  \item The feature VIRTIO_NET_F_RSS was
>> negotiated. The device uses the hash to determine the receive virtqueue to
>> place incoming packets.
>>   \item The feature VIRTIO_NET_F_HASH_REPORT was negotiated. The device
>> reports the hash value and the hash type with the packet.
>> +\item The feature VIRTIO_NET_F_HASH_TUNNEL was negotiated. The device
>> supports inner hash calculation. If additionally
>> +      VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device reports
>> the encapsulation type as well.
>>   \end{itemize}
>>
>>   If the feature VIRTIO_NET_F_RSS was negotiated:
>>   \begin{itemize}
>>   \item The device uses \field{hash_types} of the virtio_net_rss_config structure
>> as 'Enabled hash types' bitmask.
>> +	If additionally VIRTIO_NET_F_HASH_TUNNEL was negotiated, the
>> device uses \field{hash_tunnel_types} of the
>> +	virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask.
>>   \item The device uses a key as defined in \field{hash_key_data} and
>> \field{hash_key_length} of the virtio_net_rss_config structure (see
>> \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue
>> / Receive-side scaling (RSS) / Setting RSS parameters}).
>>   \end{itemize}
>> @@ -3850,11 +3869,13 @@ \subsubsection{Processing of Incoming
>> Packets}\label{sec:Device Types / Network  If the feature VIRTIO_NET_F_RSS
>> was not negotiated:
>>   \begin{itemize}
>>   \item The device uses \field{hash_types} of the virtio_net_hash_config
>> structure as 'Enabled hash types' bitmask.
>> +	If additionally VIRTIO_NET_F_HASH_TUNNEL was negotiated, the
>> device uses \field{hash_tunnel_types} of the
>> +	virtio_net_hash_config structure as 'Enabled hash tunnel types'
>> bitmask.
>>   \item The device uses a key as defined in \field{hash_key_data} and
>> \field{hash_key_length} of the virtio_net_hash_config structure (see
>> \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue
>> / Automatic receive steering in multiqueue mode / Hash calculation}).
>>   \end{itemize}
>>
>> -Note that if the device offers VIRTIO_NET_F_HASH_REPORT, even if it
>> supports only one pair of virtqueues, it MUST support
>> +Note that if the device offers VIRTIO_NET_F_HASH_REPORT or
>> +VIRTIO_NET_F_HASH_TUNNEL, even if it supports only one pair of
>> +virtqueues, it MUST support
>>   at least one of commands of VIRTIO_NET_CTRL_MQ class to configure
>> reported hash parameters:
>>   \begin{itemize}
>>   \item If the device offers VIRTIO_NET_F_RSS, it MUST support
>> VIRTIO_NET_CTRL_MQ_RSS_CONFIG command per @@ -3863,8 +3884,36 @@
>> \subsubsection{Processing of Incoming Packets}\label{sec:Device Types /
>> Network
>>    \ref{sec:Device Types / Network Device / Device Operation / Control
>> Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}.
>>   \end{itemize}
>>
>> +\subparagraph{Tunnel/Encapsulated packet} \label{sec:Device Types /
>> +Network Device / Device Operation / Processing of Incoming Packets /
>> +Hash calculation for incoming packets / Tunnel/Encapsulated packet} A
>> +tunnel packet is encapsulated from the original packet based on the
>> +tunneling protocol (only a single level of encapsulation is currently
>> +supported). The encapsulated packet contains an outer header and an inner
>> header, and the device calculates the hash over either the inner header or the
>> outer header.
>> +
>> +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and the
>> +corresponding encapsulation type is set in \field{hash_tunnel_types},
>> +the hash for a specific type of encapsulated packet is calculated over the inner
>> as opposed to outer header.
> To the outer header.

Ok. Will fix.

> Here, you want to say that
> When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received packet's outer header matches one of the supported hash_tunnel_types, the hash of the inner header is calculated.

Yes, and I'll take this description.

>> +Supported encapsulation types are listed in \ref{sec:Device Types /
>> +Network Device / Device Operation / Processing of Incoming Packets /
>> +Hash calculation for incoming packets / Supported/enabled hash tunnel
>> types}.
>> +
>> +If both VIRTIO_NET_F_HASH_REPORT_TUNNEL and
>> VIRTIO_NET_F_HASH_REPORT
>> +are negotiated, and hash is calculated for an encapsulated  packet, the
>> +device reports the encapsulation type in addition to the hash value and
>> +hash type, regardless of whether the hash is calculated on the inner header or
>> the outer header.
>> +
>> +If VIRTIO_NET_F_HASH_REPORT and VIRTIO_NET_F_HASH_REPORT_TUNNEL
>> are
>> +negotiated but VIRTIO_NET_F_HASH_TUNNEL is not negotiated, the device
>> +calculates the hash over the outer header, and \field{hash_report} reports the
>> hash type and encapsulation type.
>> +
>> +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]},
>> +\hyperref[intro:VXLAN]{[VXLAN]}, \hyperref[intro:GENEVE]{[GENEVE]},
>> \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
>> +
>>   \subparagraph{Supported/enabled hash types}  \label{sec:Device Types /
>> Network Device / Device Operation / Processing of Incoming Packets / Hash
>> calculation for incoming packets / Supported/enabled hash types}
>> +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
>> +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
>>   Hash types applicable for IPv4 packets:
>>   \begin{lstlisting}
>>   #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
>> @@ -3884,6 +3933,32 @@ \subsubsection{Processing of Incoming
>> Packets}\label{sec:Device Types / Network
>>   #define VIRTIO_NET_HASH_TYPE_UDP_EX            (1 << 8)
>>   \end{lstlisting}
>>
> Lets please remove the below encoding.

Here is the encoding of the hash tunnel types, I guess you are referring 
to remove the encoding of the hash_report_tunnel types?

>
>> +\subparagraph{Supported/enabled tunnel hash types} \label{sec:Device
>> +Types / Network Device / Device Operation / Processing of Incoming
>> +Packets / Hash calculation for incoming packets / Supported/enabled
>> +tunnel hash types} If the feature VIRTIO_NET_F_HASH_TUNNEL is
>> +negotiated, the encapsulation hash type indicates that the hash is calculated
>> over the inner header of the encapsulated packet:
>> +Hash type applicable for inner payload of the gre-encapsulated packet
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 0)
>> +\end{lstlisting}
>> +Hash type applicable for inner payload of the vxlan-encapsulated packet
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 1)
>> +\end{lstlisting}
>> +Hash type applicable for inner payload of the geneve-encapsulated
>> +packet \begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 2)
>> +\end{lstlisting}
>> +Hash type applicable for inner payload of the ip-encapsulated packet
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 3)
>> +\end{lstlisting}
>> +Hash type applicable for inner payload of the nvgre-encapsulated packet
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 4)
>> +\end{lstlisting}
>> +
>>   \subparagraph{IPv4 packets}
>>   \label{sec:Device Types / Network Device / Device Operation / Processing of
>> Incoming Packets / Hash calculation for incoming packets / IPv4 packets}  The
>> device calculates the hash on IPv4 packets according to 'Enabled hash types'
>> bitmask as follows:
>> @@ -3975,17 +4050,47 @@ \subsubsection{Processing of Incoming
>> Packets}\label{sec:Device Types / Network  (see \ref{sec:Device Types /
>> Network Device / Device Operation / Processing of Incoming Packets / Hash
>> calculation for incoming packets / IPv6 packets without extension header}).
>>   \end{itemize}
>>
>> +\subparagraph{Inner hash calculation of an encapsulated packet} If the
>> +feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and the corresponding
>> +encapsulation hash type is set in \field{hash_tunnel_types}, the device
>> +calculates the hash on the inner header of an encapsulated packet (See
>> +\ref{sec:Device Types / Network Device / Device Operation / Processing
>> +of Incoming Packets / Hash calculation for incoming packets /
>> Tunnel/Encapsulated packet}).
>> +
>> +\subparagraph{Security risks between encapsulated packets and RSS}
>> +There may be potential security risks when encapsulated packets using
> s/when encapsulated/when encapsulating/
Ok.
>
>> +RSS to select queues for placement. When a user inside a tunnel tries
>> +to control the enqueuing of encapsulated packets, then the user can
>> +flood the device with invaild packets, and the flooded packets may be
>> +hashed into the same queue as packets in other normal tunnels, which causing
>> the queue to overflow.
>>
> Invalid packets are confusing and the wording of "which causing" is not proper.
> There is some duplicate wording below too.
>
> I think above and below risk can be summarized in bit simpler manner.
>
> How about,
>
> When a specific receive queue is shared to receive packets of multiple tunnels, there is no quality of service for packets of multiple tunnels.
>
> +

I think this sentence can be used as a starting summary, and readers may 
still need to expand the explanation.
Do you think the following description is ok?
"
When a specific receive queue is shared to receive encapsulating packets 
of multiple tunnels,
there is no quality of service for these packets of multiple tunnels. 
For example:
A user inside the tunnel floods a device with packets, then the packets 
are hashed into the shared receive queue
and cause the queue to overflow, and this increases packet loss and 
latency for other tunnels.
"

>> +This can pose several security risks:
>> +\begin{itemize}
>> +\item  Encapsulated packets in the normal tunnels cannot be enqueued due to
>> queue
>> +       overflow, resulting in a large amount of packet loss.
>> +\item  The delay and retransmission of packets in the normal tunnels are
>> extremely increased.
> This is something very protocol specific and doesn't belong here.
Ok.

>
>> +\item  The user can observe the traffic information and enqueue information
>> of other normal
>> +       tunnels, and conduct targeted DoS attacks.
> Once hash_report_tunnel_types is removed, this second attack is no longer applicable.
> Hence, please remove this too.
Ok.

>
>> +\end{\itemize}
>> +
>>   \paragraph{Hash reporting for incoming packets}  \label{sec:Device Types /
>> Network Device / Device Operation / Processing of Incoming Packets / Hash
>> reporting for incoming packets}
>> -
>> -If VIRTIO_NET_F_HASH_REPORT was negotiated and
>> - the device has calculated the hash for the packet, the device fills
>> \field{hash_report} with the report type of calculated hash -and
>> \field{hash_value} with the value of calculated hash.
>> -
>> -If VIRTIO_NET_F_HASH_REPORT was negotiated but due to any reason the -
>> hash was not calculated, the device sets \field{hash_report} to
>> VIRTIO_NET_HASH_REPORT_NONE.
>> -
>> -Possible values that the device can report in \field{hash_report} are defined
>> below.
>> +If VIRTIO_NET_F_HASH_REPORT was negotiated and the device has
>> +calculated the hash for the packet, the device fills the lower 8 bits
>> +of \field{hash_report} with the report type of calculated hash, and
>> +\field{hash_value} with the value of calculated hash. Also, if
>> +VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device needs to
>> fill the upper 8 bits of \field{hash_report} with the encapsulation type.
>> +
>> +If VIRTIO_NET_F_HASH_REPORT was negotiated but due to any reason the
>> +hash was not calculated, the device sets the lower 8 bits of
>> +\field{hash_report} to VIRTIO_NET_HASH_REPORT_NONE.
>> +
>> +If VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device fills the
>> +upper
>> +8 bits of \field{hash_report} with the encapsulation type for an
>> +encapsulated packet. Note that the upper 8 bits are all set to 0 for an
>> +unencapsulated packet, regardless of whether
>> VIRTIO_NET_F_HASH_REPORT_TUNNEL is negotiated or not.
>> +
>> +Possible hash types that the device can report in \field{hash_report} are
>> defined below.
>>   They correspond to supported hash types defined in  \ref{sec:Device Types /
>> Network Device / Device Operation / Processing of Incoming Packets / Hash
>> calculation for incoming packets / Supported/enabled hash types}  as follows:
>> @@ -4005,6 +4110,26 @@ \subsubsection{Processing of Incoming
>> Packets}\label{sec:Device Types / Network
>>   #define VIRTIO_NET_HASH_REPORT_UDPv6_EX        9
>>   \end{lstlisting}
>>
>> +The upper 8 bits of \field{hash_report} can report the encapsulation
>> +type to the driver if VIRTIO_NET_F_HASH_REPORT_TUNNEL is negotiated.
>> +Possible encapsulation types that the device can report in \field{hash_report}
>> are defined below.
>> +They correspond to supported hash tunnel types defined in
>> +\ref{sec:Device Types / Network Device / Device Operation / Processing
>> +of Incoming Packets / Hash calculation for incoming packets /
>> Supported/enabled hash tunnel types} as follows:
>> +
>> +VIRTIO_NET_HASH_TUNNEL_TYPE_XXX = 1 <<
>> +(VIRTIO_NET_HASH_TUNNEL_REPORT_XXX - 256)
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_GRE      256
>> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_VXLAN    257
>> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_GENEVE   258
>> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_IPIP     259
>> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_NVGRE    260
>> +\end{lstlisting}
>> +
>> +They correspond to supported hash types defined in \ref{sec:Device
>> +Types / Network Device / Device Operation / Processing of Incoming Packets /
>> Hash calculation for incoming packets / Supported/enabled hash types}.
>> +
>>   \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device /
>> Device Operation / Control Virtqueue}
>>
>>   The driver uses the control virtqueue (if VIRTIO_NET_F_CTRL_VQ is @@ -
>> 4364,6 +4489,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types
>> / Network Device / Devi  \begin{lstlisting}  struct virtio_net_hash_config {
>>       le32 hash_types;
>> +    le32 hash_tunnel_types;
>>       le16 reserved[4];
>>       u8 hash_key_length;
>>       u8 hash_key_data[hash_key_length];
>> @@ -4372,7 +4498,11 @@ \subsubsection{Control
>> Virtqueue}\label{sec:Device Types / Network Device / Devi  Field
>> \field{hash_types} contains a bitmask of allowed hash types as  defined in
>> \ref{sec:Device Types / Network Device / Device Operation / Processing of
>> Incoming Packets / Hash calculation for incoming packets / Supported/enabled
>> hash types}.
>> -Initially the device has all hash types disabled and reports only
>> VIRTIO_NET_HASH_REPORT_NONE.
>> +
>> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash
>> +tunnel types as defined in \ref{sec:Device Types / Network Device / Device
>> Operation / Processing of Incoming Packets / Hash calculation for incoming
>> packets / Supported/enabled hash tunnel types}.
>> +
>> +Initially the device has all hash types and hash tunnel types disabled and
>> reports only VIRTIO_NET_HASH_REPORT_NONE.
>>
>>   Field \field{reserved} MUST contain zeroes. It is defined to make the structure
>> to match the layout of virtio_net_rss_config structure,  defined in
>> \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue
>> / Receive-side scaling (RSS)}.
>> @@ -4390,6 +4520,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device
>> Types / Network Device / Devi  \begin{lstlisting}  struct virtio_net_rss_config {
>>       le32 hash_types;
>> +    le32 hash_tunnel_types;
> This field is not needed as device config space advertisement for the support is enough.

If so, virtio_net_hash_config does not require hash_tunnel_types when it 
does not need to configure the specific tunnel(s).

>
> If the intent is to enable hashing for the specific tunnel(s), an individual command is better.

Drivers MAY filter out some tunneling types when negotiating this feature.
Do you think it would be better for us to add a separate command? I 
don't see tools like ethtool that can configure specific tunnels in 
userspace.

>
> Regardless, this new field cannot be in the middle of the new structure as it breaks backward compatibility.
>

Yes, you are right. I'll fix this.

Thank you very much!

>>       le16 indirection_table_mask;
>>       le16 unclassified_queue;
>>       le16 indirection_table[indirection_table_length];
>> @@ -4402,6 +4533,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device
>> Types / Network Device / Devi  defined in  \ref{sec:Device Types / Network
>> Device / Device Operation / Processing of Incoming Packets / Hash calculation
>> for incoming packets / Supported/enabled hash types}.
>>
>> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash
>> +tunnel types as defined in \ref{sec:Device Types / Network Device / Device
>> Operation / Processing of Incoming Packets / Hash calculation for incoming
>> packets / Supported/enabled hash tunnel types}.
>> +
>>   Field \field{indirection_table_mask} is a mask to be applied to  the calculated
>> hash to produce an index in the  \field{indirection_table} array.
>> diff --git a/introduction.tex b/introduction.tex index 287c5fc..69b95ae 100644
>> --- a/introduction.tex
>> +++ b/introduction.tex
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 105+ messages in thread

* RE: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21  6:14   ` [virtio-comment] " Heng Qi
@ 2023-02-21 12:47     ` Parav Pandit
  2023-02-21 13:34       ` Heng Qi
  2023-02-21 13:37       ` Heng Qi
  0 siblings, 2 replies; 105+ messages in thread
From: Parav Pandit @ 2023-02-21 12:47 UTC (permalink / raw)
  To: Heng Qi, virtio-comment, virtio-dev
  Cc: Michael S . Tsirkin, Jason Wang, Yuri Benditovich, Cornelia Huck,
	Xuan Zhuo



> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Heng Qi

> >> Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to
> >> report an encapsulation type, and the feature depends on
> >> VIRTIO_NET_F_HASH_REPORT.
> > As we discussed that tunnel type alone is not useful the sw, neither as an
> individual field nor merged with some other field.
> > Hence, please remove this feature bit. HASH_TUNNEL is good enough.
> > Please remove the references to it at more places below.
> 
> If we don't want it at all, I'll remove it and references to it.
>
Ok. thanks.
[..]
> >>   #define VIRTIO_NET_HASH_TYPE_UDP_EX            (1 << 8)
> >>   \end{lstlisting}
> >>
> > Lets please remove the below encoding.
> 
> Here is the encoding of the hash tunnel types, I guess you are referring to
> remove the encoding of the hash_report_tunnel types?
> 
Right.
[..]
> >> +RSS to select queues for placement. When a user inside a tunnel
> >> +tries to control the enqueuing of encapsulated packets, then the
> >> +user can flood the device with invaild packets, and the flooded
> >> +packets may be hashed into the same queue as packets in other normal
> >> +tunnels, which causing
> >> the queue to overflow.
> >>
> > Invalid packets are confusing and the wording of "which causing" is not
> proper.
> > There is some duplicate wording below too.
> >
> > I think above and below risk can be summarized in bit simpler manner.
> >
> > How about,
> >
> > When a specific receive queue is shared to receive packets of multiple
> tunnels, there is no quality of service for packets of multiple tunnels.
> >
> > +
> 
> I think this sentence can be used as a starting summary, and readers may still
> need to expand the explanation.
> Do you think the following description is ok?
> "
> When a specific receive queue is shared to receive encapsulating packets of
> multiple tunnels, there is no quality of service for these packets of multiple
> tunnels.
> For example:
> A user inside the tunnel floods a device with packets, then the packets are
> hashed into the shared receive queue and cause the queue to overflow, and
> this increases packet loss and latency for other tunnels.
> "
Even without a queue overflow, this shared receive queue may not have a balanced number of packets.
For example, tunnel-2 occupied 90% of the queue and left only 10% for tunnel-1.
So, your example is right (and extreme), a generic mention of QoS covers both aspects.

Secondly "user inside the tunnel" is challenging to explain.
In above sentence talks specifically about the "receive queue" as an object.
 
> struct virtio_net_rss_config {
> >>       le32 hash_types;
> >> +    le32 hash_tunnel_types;
> > This field is not needed as device config space advertisement for the support
> is enough.
> 
> If so, virtio_net_hash_config does not require hash_tunnel_types when it does
> not need to configure the specific tunnel(s).
> 
> >
> > If the intent is to enable hashing for the specific tunnel(s), an individual
> command is better.
> 
> Drivers MAY filter out some tunneling types when negotiating this feature.
> Do you think it would be better for us to add a separate command? I don't see
> tools like ethtool that can configure specific tunnels in userspace.
> 
The reason I proposed different command is, 
Let's say we have only single command. 
Rss config command has many other fields unrelated to the inner hash.
Hence, to enable inner hash, the driver needs to supply the exact same value for unrelated fields to the same value.
And device needs to compare it with the old value and maintain some sort of cache to derive that nothing changes from the previous hash config, hence, ignore hw configuration for rss.

This mechanism slows down the command for the unrelated task.
Hence, I am considering a separate command that would be simple for the device and driver to implement.

> >
> > Regardless, this new field cannot be in the middle of the new structure as it
> breaks backward compatibility.
> >
> 
> Yes, you are right. I'll fix this.
> 
> Thank you very much!


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 12:47     ` Parav Pandit
@ 2023-02-21 13:34       ` Heng Qi
  2023-02-21 15:32         ` Parav Pandit
  2023-02-21 13:37       ` Heng Qi
  1 sibling, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-02-21 13:34 UTC (permalink / raw)
  To: Parav Pandit, virtio-comment, virtio-dev
  Cc: Michael S . Tsirkin, Jason Wang, Yuri Benditovich, Cornelia Huck,
	Xuan Zhuo


在 2023/2/21 下午8:47, Parav Pandit 写道:
>
>> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
>> open.org> On Behalf Of Heng Qi
>>>> Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to
>>>> report an encapsulation type, and the feature depends on
>>>> VIRTIO_NET_F_HASH_REPORT.
>>> As we discussed that tunnel type alone is not useful the sw, neither as an
>> individual field nor merged with some other field.
>>> Hence, please remove this feature bit. HASH_TUNNEL is good enough.
>>> Please remove the references to it at more places below.
>> If we don't want it at all, I'll remove it and references to it.
>>
> Ok. thanks.
> [..]
>>>>    #define VIRTIO_NET_HASH_TYPE_UDP_EX            (1 << 8)
>>>>    \end{lstlisting}
>>>>
>>> Lets please remove the below encoding.
>> Here is the encoding of the hash tunnel types, I guess you are referring to
>> remove the encoding of the hash_report_tunnel types?
>>
> Right.
> [..]
>>>> +RSS to select queues for placement. When a user inside a tunnel
>>>> +tries to control the enqueuing of encapsulated packets, then the
>>>> +user can flood the device with invaild packets, and the flooded
>>>> +packets may be hashed into the same queue as packets in other normal
>>>> +tunnels, which causing
>>>> the queue to overflow.
>>>>
>>> Invalid packets are confusing and the wording of "which causing" is not
>> proper.
>>> There is some duplicate wording below too.
>>>
>>> I think above and below risk can be summarized in bit simpler manner.
>>>
>>> How about,
>>>
>>> When a specific receive queue is shared to receive packets of multiple
>> tunnels, there is no quality of service for packets of multiple tunnels.
>>> +
>> I think this sentence can be used as a starting summary, and readers may still
>> need to expand the explanation.
>> Do you think the following description is ok?
>> "
>> When a specific receive queue is shared to receive encapsulating packets of
>> multiple tunnels, there is no quality of service for these packets of multiple
>> tunnels.
>> For example:
>> A user inside the tunnel floods a device with packets, then the packets are
>> hashed into the shared receive queue and cause the queue to overflow, and
>> this increases packet loss and latency for other tunnels.
>> "
> Even without a queue overflow, this shared receive queue may not have a balanced number of packets.
> For example, tunnel-2 occupied 90% of the queue and left only 10% for tunnel-1.
> So, your example is right (and extreme), a generic mention of QoS covers both aspects.
>
> Secondly "user inside the tunnel" is challenging to explain.
> In above sentence talks specifically about the "receive queue" as an object.
>   

"overflow" in my example. If we only use a one-sentence summary to 
describe tunnel risks, then I think this subparagraph is called "QoS 
problem for tunnel hash".

>> struct virtio_net_rss_config {
>>>>        le32 hash_types;
>>>> +    le32 hash_tunnel_types;
>>> This field is not needed as device config space advertisement for the support
>> is enough.
>>
>> If so, virtio_net_hash_config does not require hash_tunnel_types when it does
>> not need to configure the specific tunnel(s).
>>
>>> If the intent is to enable hashing for the specific tunnel(s), an individual
>> command is better.
>>
>> Drivers MAY filter out some tunneling types when negotiating this feature.
>> Do you think it would be better for us to add a separate command? I don't see
>> tools like ethtool that can configure specific tunnels in userspace.
>>
> The reason I proposed different command is,
> Let's say we have only single command.
> Rss config command has many other fields unrelated to the inner hash.

Yes. For inner hash, fields such as indirection table are irrelevant.

> Hence, to enable inner hash, the driver needs to supply the exact same value for unrelated fields to the same value.
> And device needs to compare it with the old value and maintain some sort of cache to derive that nothing changes from the previous hash config, hence, ignore hw configuration for rss.
>
> This mechanism slows down the command for the unrelated task.

Totally agree.

> Hence, I am considering a separate command that would be simple for the device and driver to implement.

I agree with you. Do you want me to do it in this patch, or should we do 
it in another patch?

Thanks! :)

>
>>> Regardless, this new field cannot be in the middle of the new structure as it
>> breaks backward compatibility.
>> Yes, you are right. I'll fix this.
>>
>> Thank you very much!


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 12:47     ` Parav Pandit
  2023-02-21 13:34       ` Heng Qi
@ 2023-02-21 13:37       ` Heng Qi
  1 sibling, 0 replies; 105+ messages in thread
From: Heng Qi @ 2023-02-21 13:37 UTC (permalink / raw)
  To: Parav Pandit, virtio-comment, virtio-dev
  Cc: Michael S . Tsirkin, Jason Wang, Yuri Benditovich, Cornelia Huck,
	Xuan Zhuo


在 2023/2/21 下午8:47, Parav Pandit 写道:
>
>> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
>> open.org> On Behalf Of Heng Qi
>>>> Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to
>>>> report an encapsulation type, and the feature depends on
>>>> VIRTIO_NET_F_HASH_REPORT.
>>> As we discussed that tunnel type alone is not useful the sw, neither as an
>> individual field nor merged with some other field.
>>> Hence, please remove this feature bit. HASH_TUNNEL is good enough.
>>> Please remove the references to it at more places below.
>> If we don't want it at all, I'll remove it and references to it.
>>
> Ok. thanks.
> [..]
>>>>    #define VIRTIO_NET_HASH_TYPE_UDP_EX            (1 << 8)
>>>>    \end{lstlisting}
>>>>
>>> Lets please remove the below encoding.
>> Here is the encoding of the hash tunnel types, I guess you are referring to
>> remove the encoding of the hash_report_tunnel types?
>>
> Right.
> [..]
>>>> +RSS to select queues for placement. When a user inside a tunnel
>>>> +tries to control the enqueuing of encapsulated packets, then the
>>>> +user can flood the device with invaild packets, and the flooded
>>>> +packets may be hashed into the same queue as packets in other normal
>>>> +tunnels, which causing
>>>> the queue to overflow.
>>>>
>>> Invalid packets are confusing and the wording of "which causing" is not
>> proper.
>>> There is some duplicate wording below too.
>>>
>>> I think above and below risk can be summarized in bit simpler manner.
>>>
>>> How about,
>>>
>>> When a specific receive queue is shared to receive packets of multiple
>> tunnels, there is no quality of service for packets of multiple tunnels.
>>> +
>> I think this sentence can be used as a starting summary, and readers may still
>> need to expand the explanation.
>> Do you think the following description is ok?
>> "
>> When a specific receive queue is shared to receive encapsulating packets of
>> multiple tunnels, there is no quality of service for these packets of multiple
>> tunnels.
>> For example:
>> A user inside the tunnel floods a device with packets, then the packets are
>> hashed into the shared receive queue and cause the queue to overflow, and
>> this increases packet loss and latency for other tunnels.
>> "
> Even without a queue overflow, this shared receive queue may not have a balanced number of packets.
> For example, tunnel-2 occupied 90% of the queue and left only 10% for tunnel-1.
> So, your example is right (and extreme), a generic mention of QoS covers both aspects.
>
> Secondly "user inside the tunnel" is challenging to explain.
> In above sentence talks specifically about the "receive queue" as an object.
>   
>> struct virtio_net_rss_config {
>>>>        le32 hash_types;
>>>> +    le32 hash_tunnel_types;
>>> This field is not needed as device config space advertisement for the support
>> is enough.
>>
>> If so, virtio_net_hash_config does not require hash_tunnel_types when it does
>> not need to configure the specific tunnel(s).
>>
>>> If the intent is to enable hashing for the specific tunnel(s), an individual
>> command is better.
>>
>> Drivers MAY filter out some tunneling types when negotiating this feature.
>> Do you think it would be better for us to add a separate command? I don't see
>> tools like ethtool that can configure specific tunnels in userspace.
>>
> The reason I proposed different command is,
> Let's say we have only single command.
> Rss config command has many other fields unrelated to the inner hash.

And virtio_net_hash_config seems to suffice except for le16 reserved[4].

> Hence, to enable inner hash, the driver needs to supply the exact same value for unrelated fields to the same value.
> And device needs to compare it with the old value and maintain some sort of cache to derive that nothing changes from the previous hash config, hence, ignore hw configuration for rss.
>
> This mechanism slows down the command for the unrelated task.
> Hence, I am considering a separate command that would be simple for the device and driver to implement.
>
>>> Regardless, this new field cannot be in the middle of the new structure as it
>> breaks backward compatibility.
>> Yes, you are right. I'll fix this.
>>
>> Thank you very much!
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* RE: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 13:34       ` Heng Qi
@ 2023-02-21 15:32         ` Parav Pandit
  2023-02-21 16:44           ` [virtio-comment] Re: [virtio-dev] " Heng Qi
  0 siblings, 1 reply; 105+ messages in thread
From: Parav Pandit @ 2023-02-21 15:32 UTC (permalink / raw)
  To: Heng Qi, virtio-comment, virtio-dev
  Cc: Michael S . Tsirkin, Jason Wang, Yuri Benditovich, Cornelia Huck,
	Xuan Zhuo


> From: Heng Qi <hengqi@linux.alibaba.com>
> Sent: Tuesday, February 21, 2023 8:34 AM

> > Even without a queue overflow, this shared receive queue may not have a
> balanced number of packets.
> > For example, tunnel-2 occupied 90% of the queue and left only 10% for
> tunnel-1.
> > So, your example is right (and extreme), a generic mention of QoS covers
> both aspects.
> >
> > Secondly "user inside the tunnel" is challenging to explain.
> > In above sentence talks specifically about the "receive queue" as an object.
> >
> 
> "overflow" in my example. If we only use a one-sentence summary to describe
> tunnel risks, then I think this subparagraph is called "QoS problem for tunnel
> hash".
> 
Yes, Tunnel Qos limitations

> >> struct virtio_net_rss_config {
> >>>>        le32 hash_types;
> >>>> +    le32 hash_tunnel_types;
> >>> This field is not needed as device config space advertisement for
> >>> the support
> >> is enough.
> >>
> >> If so, virtio_net_hash_config does not require hash_tunnel_types when
> >> it does not need to configure the specific tunnel(s).
> >>
> >>> If the intent is to enable hashing for the specific tunnel(s), an
> >>> individual
> >> command is better.
> >>
> >> Drivers MAY filter out some tunneling types when negotiating this feature.
> >> Do you think it would be better for us to add a separate command? I
> >> don't see tools like ethtool that can configure specific tunnels in userspace.
> >>
> > The reason I proposed different command is, Let's say we have only
> > single command.
> > Rss config command has many other fields unrelated to the inner hash.
> 
> Yes. For inner hash, fields such as indirection table are irrelevant.
> 
Right.

> > Hence, to enable inner hash, the driver needs to supply the exact same value
> for unrelated fields to the same value.
> > And device needs to compare it with the old value and maintain some sort of
> cache to derive that nothing changes from the previous hash config, hence,
> ignore hw configuration for rss.
> >
> > This mechanism slows down the command for the unrelated task.
> 
> Totally agree.
> 
> > Hence, I am considering a separate command that would be simple for the
> device and driver to implement.
> 
> I agree with you. Do you want me to do it in this patch, or should we do it in
> another patch?
> 

It should be in this patch set because enablement in the device is linked to this capability exposed in this patch.

From patch split point of view,

Patch-1 to introduce the feature bit, description, and link to CVQ dependency.
Patch-2 for its link in virtio_net_config structure and description.
Patch-3 for new command touching control VQ pieces.

We can always squash the patch to single when/if it is hard to understand in multiple patches.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-comment] Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 15:32         ` Parav Pandit
@ 2023-02-21 16:44           ` Heng Qi
  2023-02-21 16:50             ` Parav Pandit
  0 siblings, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-02-21 16:44 UTC (permalink / raw)
  To: Parav Pandit, virtio-comment, virtio-dev
  Cc: Michael S . Tsirkin, Jason Wang, Yuri Benditovich, Cornelia Huck,
	Xuan Zhuo



在 2023/2/21 下午11:32, Parav Pandit 写道:
>> From: Heng Qi <hengqi@linux.alibaba.com>
>> Sent: Tuesday, February 21, 2023 8:34 AM
>>> Even without a queue overflow, this shared receive queue may not have a
>> balanced number of packets.
>>> For example, tunnel-2 occupied 90% of the queue and left only 10% for
>> tunnel-1.
>>> So, your example is right (and extreme), a generic mention of QoS covers
>> both aspects.
>>> Secondly "user inside the tunnel" is challenging to explain.
>>> In above sentence talks specifically about the "receive queue" as an object.
>>>
>> "overflow" in my example. If we only use a one-sentence summary to describe
>> tunnel risks, then I think this subparagraph is called "QoS problem for tunnel
>> hash".
>>
> Yes, Tunnel Qos limitations
>
>>>> struct virtio_net_rss_config {
>>>>>>         le32 hash_types;
>>>>>> +    le32 hash_tunnel_types;
>>>>> This field is not needed as device config space advertisement for
>>>>> the support
>>>> is enough.
>>>>
>>>> If so, virtio_net_hash_config does not require hash_tunnel_types when
>>>> it does not need to configure the specific tunnel(s).
>>>>
>>>>> If the intent is to enable hashing for the specific tunnel(s), an
>>>>> individual
>>>> command is better.
>>>>
>>>> Drivers MAY filter out some tunneling types when negotiating this feature.
>>>> Do you think it would be better for us to add a separate command? I
>>>> don't see tools like ethtool that can configure specific tunnels in userspace.
>>>>
>>> The reason I proposed different command is, Let's say we have only
>>> single command.
>>> Rss config command has many other fields unrelated to the inner hash.
>> Yes. For inner hash, fields such as indirection table are irrelevant.
>>
> Right.
>
>>> Hence, to enable inner hash, the driver needs to supply the exact same value
>> for unrelated fields to the same value.
>>> And device needs to compare it with the old value and maintain some sort of
>> cache to derive that nothing changes from the previous hash config, hence,
>> ignore hw configuration for rss.
>>> This mechanism slows down the command for the unrelated task.
>> Totally agree.
>>
>>> Hence, I am considering a separate command that would be simple for the
>> device and driver to implement.
>>
>> I agree with you. Do you want me to do it in this patch, or should we do it in
>> another patch?
>>
> It should be in this patch set because enablement in the device is linked to this capability exposed in this patch.
>
>  From patch split point of view,
>
> Patch-1 to introduce the feature bit, description, and link to CVQ dependency.
> Patch-2 for its link in virtio_net_config structure and description.
> Patch-3 for new command touching control VQ pieces.

Yes, and you seem to have missed my other replies in this thread:), I 
rephrased it here:
virtio_net_hash_config seems to be reusable, as the v9 patch is doing, 
why don't we re-use it?
The reason I can think of is to not expand the virtio_net_hash_config 
structure, just set a separate
structure for VIRTIO_NET_F_HASH_TUNNEL and include hash_tunnel_types?

Thanks.
>
> We can always squash the patch to single when/if it is hard to understand in multiple patches.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 105+ messages in thread

* RE: [virtio-dev] RE: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 16:44           ` [virtio-comment] Re: [virtio-dev] " Heng Qi
@ 2023-02-21 16:50             ` Parav Pandit
  2023-02-21 17:13               ` Michael S. Tsirkin
  2023-02-21 17:17               ` [virtio-comment] " Heng Qi
  0 siblings, 2 replies; 105+ messages in thread
From: Parav Pandit @ 2023-02-21 16:50 UTC (permalink / raw)
  To: Heng Qi, virtio-comment, virtio-dev
  Cc: Michael S . Tsirkin, Jason Wang, Yuri Benditovich, Cornelia Huck,
	Xuan Zhuo


> From: Heng Qi <hengqi@linux.alibaba.com>
> Sent: Tuesday, February 21, 2023 11:44 AM

> > Patch-1 to introduce the feature bit, description, and link to CVQ dependency.
> > Patch-2 for its link in virtio_net_config structure and description.
> > Patch-3 for new command touching control VQ pieces.
> 
> Yes, and you seem to have missed my other replies in this thread:), I rephrased
Was it comment "And virtio_net_hash_config seems to suffice except for le16 reserved[4]." ?
I don’t see these reserved fields in the current structure.

> it here:
> virtio_net_hash_config seems to be reusable, as the v9 patch is doing, why
> don't we re-use it?
> The reason I can think of is to not expand the virtio_net_hash_config structure,
> just set a separate structure for VIRTIO_NET_F_HASH_TUNNEL and include
> hash_tunnel_types?

The part that I am missing is, how do to reuse virtio_net_hash_config and say ignore all the existing fields related to rss, but only consider hash_tunnel_types?

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-21  4:20 ` Parav Pandit
  2023-02-21  6:14   ` [virtio-comment] " Heng Qi
@ 2023-02-21 17:05   ` Michael S. Tsirkin
  2023-02-21 19:29     ` Parav Pandit
  2023-03-01 14:32   ` [virtio-dev] " Heng Qi
  2 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-21 17:05 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 21, 2023 at 04:20:59AM +0000, Parav Pandit wrote:
> 
> > From: Heng Qi <hengqi@linux.alibaba.com>
> > Sent: Saturday, February 18, 2023 9:37 AM
> 
> > If the tunnel is used to encapsulate the packets, the hash calculated using the
> s/hash calculated/hash is calculated
> 
> > outer header of the receive packets is always fixed for the same flow packets,
> > i.e. they will be steered to the same receive queue.
> > 
> A little descriptive commit message like below reads better to me.
> Currently, when a received packet is an encapsulated packet meaning there is an outer and an inner header, virtio device is unable to calculate the hash for the inner header.
> Due to this limitation, multiple different flows identified by the inner header for the same outer header result in selecting the same receive queue.
> This effectively disables the RSS, resulting in poor receive performance.
> 
> Hence, to overcome this limitation, a new feature is introduced using a feature bit VIRTIO_NET_F_HASH_TUNNEL.
> This feature enables the device to advertise the capability to calculate the hash for the inner packet header.
> Thereby regaining better RSS performance in presence of outer packet header.

I think this is a good description however Parav I think it is important
to make contributors write their own commit messages so they know what
is the reason for the proposed change. What's good for the goose is good
for the gander - contributors should explain why their change to spec is
benefitial but reviewers should also explain why their changes to the
patch are benefitial, and "reads better to me" does not cut it - it does
not allow the contributor to improve with time.  It's more than about a
single contribution, see?

In this case I would say the issue is that motivation for the
change is never explained.

I am yet to review the patchset.

> 
> > We add a feature bit VIRTIO_NET_F_HASH_TUNNEL and related bitmasks in
> > \field{hash_tunnel_types}, which instructs the device to calculate the hash
> > using the inner headers of tunnel-encapsulated packets. Note that
> > VIRTIO_NET_F_HASH_TUNNEL only indicates the ability of the inner header
> > hash, and does not give the device the ability to use the hash value to select a
> > receiving queue to place the packet.
> > 
> > Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to report
> > an encapsulation type, and the feature depends on
> > VIRTIO_NET_F_HASH_REPORT.
> 
> As we discussed that tunnel type alone is not useful the sw, neither as an individual field nor merged with some other field.
> Hence, please remove this feature bit. HASH_TUNNEL is good enough.
> Please remove the references to it at more places below.
> 
> > It only means that the encapsulation type can be reported, it cannot instruct
> > the device to calculate the hash.
> > 
> 
> > 
> > +\item[VIRTIO_NET_F_HASH_TUNNEL(51)] Device supports inner header hash
> > +	for tunnel-encapsulated packets.
> > +
> > +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL(52)] Device can report an
> > encapsulation type.
> > +
> Please remove this.
> 
> >  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications
> > coalescing.
> > 
> >  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> > @@ -3140,6 +3145,8 @@ \subsubsection{Feature bit
> > requirements}\label{sec:Device Types / Network Device
> > \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or
> > VIRTIO_NET_F_HOST_TSO6.
> >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ.
> > +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL] Requires
> > VIRTIO_NET_F_HASH_REPORT.
> >  \end{description}
> > 
> >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types /
> > Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3199,20
> > +3206,27 @@ \subsection{Device configuration layout}\label{sec:Device Types
> > / Network Device
> >          u8 rss_max_key_size;
> >          le16 rss_max_indirection_table_length;
> >          le32 supported_hash_types;
> > +        le32 supported_tunnel_hash_types;
> >  };
> >  \end{lstlisting}
> > -The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS
> > or VIRTIO_NET_F_HASH_REPORT is set.
> > +The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS,
> > VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL is set.
> >  It specifies the maximum supported length of RSS key in bytes.
> > 
> >  The following field, \field{rss_max_indirection_table_length} only exists if
> > VIRTIO_NET_F_RSS is set.
> >  It specifies the maximum number of 16-bit entries in RSS indirection table.
> > 
> >  The next field, \field{supported_hash_types} only exists if the device supports
> > hash calculation, -i.e. if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is
> > set.
> > +i.e. if VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or
> > VIRTIO_NET_F_HASH_TUNNEL is set.
> > 
> >  Field \field{supported_hash_types} contains the bitmask of supported hash
> > types.
> >  See \ref{sec:Device Types / Network Device / Device Operation / Processing of
> > Incoming Packets / Hash calculation for incoming packets / Supported/enabled
> > hash types} for details of supported hash types.
> > 
> > +The next field, \field{supported_tunnel_hash_types} only exists if the
> > +device supports inner hash calculation, i.e. if VIRTIO_NET_F_HASH_TUNNEL is
> > set.
> > +
> > +Field \field{supported_tunnel_hash_types} contains the bitmask of supported
> > tunnel hash types.
> > +See \ref{sec:Device Types / Network Device / Device Operation / Processing
> > of Incoming Packets / Hash calculation for incoming packets /
> > Supported/enabled tunnel hash types} for details of supported tunnel hash
> > types.
> > +
> >  \devicenormative{\subsubsection}{Device configuration layout}{Device Types /
> > Network Device / Device configuration layout}
> > 
> >  The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000
> > inclusive, @@ -3236,7 +3250,7 @@ \subsection{Device configuration
> > layout}\label{sec:Device Types / Network Device  negotiated.
> > 
> >  The device MUST set \field{rss_max_key_size} to at least 40, if it offers -
> > VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT.
> > +VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or
> > VIRTIO_NET_F_HASH_TUNNEL.
> > 
> >  The device MUST set \field{rss_max_indirection_table_length} to at least 128,
> > if it offers  VIRTIO_NET_F_RSS.
> > @@ -3385,7 +3399,8 @@ \subsection{Device Operation}\label{sec:Device
> > Types / Network Device / Device O
> >          le16 csum_offset;
> >          le16 num_buffers;
> >          le32 hash_value;        (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
> > -        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
> > +        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated,
> > and the upper 8 bits indicates the
> > +                                 encapsulation type if
> > + VIRTIO_NET_F_HASH_REPORT_TUNNEL negotiated, otherwise reserved)
> >          le16 padding_reserved;  (Only if VIRTIO_NET_F_HASH_REPORT
> > negotiated)  };  \end{lstlisting} @@ -3838,11 +3853,15 @@
> > \subsubsection{Processing of Incoming Packets}\label{sec:Device Types /
> > Network  \begin{itemize}  \item The feature VIRTIO_NET_F_RSS was
> > negotiated. The device uses the hash to determine the receive virtqueue to
> > place incoming packets.
> >  \item The feature VIRTIO_NET_F_HASH_REPORT was negotiated. The device
> > reports the hash value and the hash type with the packet.
> > +\item The feature VIRTIO_NET_F_HASH_TUNNEL was negotiated. The device
> > supports inner hash calculation. If additionally
> > +      VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device reports
> > the encapsulation type as well.
> >  \end{itemize}
> > 
> >  If the feature VIRTIO_NET_F_RSS was negotiated:
> >  \begin{itemize}
> >  \item The device uses \field{hash_types} of the virtio_net_rss_config structure
> > as 'Enabled hash types' bitmask.
> > +	If additionally VIRTIO_NET_F_HASH_TUNNEL was negotiated, the
> > device uses \field{hash_tunnel_types} of the
> > +	virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask.
> >  \item The device uses a key as defined in \field{hash_key_data} and
> > \field{hash_key_length} of the virtio_net_rss_config structure (see
> > \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue
> > / Receive-side scaling (RSS) / Setting RSS parameters}).
> >  \end{itemize}
> > @@ -3850,11 +3869,13 @@ \subsubsection{Processing of Incoming
> > Packets}\label{sec:Device Types / Network  If the feature VIRTIO_NET_F_RSS
> > was not negotiated:
> >  \begin{itemize}
> >  \item The device uses \field{hash_types} of the virtio_net_hash_config
> > structure as 'Enabled hash types' bitmask.
> > +	If additionally VIRTIO_NET_F_HASH_TUNNEL was negotiated, the
> > device uses \field{hash_tunnel_types} of the
> > +	virtio_net_hash_config structure as 'Enabled hash tunnel types'
> > bitmask.
> >  \item The device uses a key as defined in \field{hash_key_data} and
> > \field{hash_key_length} of the virtio_net_hash_config structure (see
> > \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue
> > / Automatic receive steering in multiqueue mode / Hash calculation}).
> >  \end{itemize}
> > 
> > -Note that if the device offers VIRTIO_NET_F_HASH_REPORT, even if it
> > supports only one pair of virtqueues, it MUST support
> > +Note that if the device offers VIRTIO_NET_F_HASH_REPORT or
> > +VIRTIO_NET_F_HASH_TUNNEL, even if it supports only one pair of
> > +virtqueues, it MUST support
> >  at least one of commands of VIRTIO_NET_CTRL_MQ class to configure
> > reported hash parameters:
> >  \begin{itemize}
> >  \item If the device offers VIRTIO_NET_F_RSS, it MUST support
> > VIRTIO_NET_CTRL_MQ_RSS_CONFIG command per @@ -3863,8 +3884,36 @@
> > \subsubsection{Processing of Incoming Packets}\label{sec:Device Types /
> > Network
> >   \ref{sec:Device Types / Network Device / Device Operation / Control
> > Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}.
> >  \end{itemize}
> > 
> > +\subparagraph{Tunnel/Encapsulated packet} \label{sec:Device Types /
> > +Network Device / Device Operation / Processing of Incoming Packets /
> > +Hash calculation for incoming packets / Tunnel/Encapsulated packet} A
> > +tunnel packet is encapsulated from the original packet based on the
> > +tunneling protocol (only a single level of encapsulation is currently
> > +supported). The encapsulated packet contains an outer header and an inner
> > header, and the device calculates the hash over either the inner header or the
> > outer header.
> > +
> > +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and the
> > +corresponding encapsulation type is set in \field{hash_tunnel_types},
> > +the hash for a specific type of encapsulated packet is calculated over the inner
> > as opposed to outer header.
> To the outer header.
> 
> Here, you want to say that 
> When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received packet's outer header matches one of the supported hash_tunnel_types, the hash of the inner header is calculated.
> 
> > +Supported encapsulation types are listed in \ref{sec:Device Types /
> > +Network Device / Device Operation / Processing of Incoming Packets /
> > +Hash calculation for incoming packets / Supported/enabled hash tunnel
> > types}.
> > +
> > +If both VIRTIO_NET_F_HASH_REPORT_TUNNEL and
> > VIRTIO_NET_F_HASH_REPORT
> > +are negotiated, and hash is calculated for an encapsulated  packet, the
> > +device reports the encapsulation type in addition to the hash value and
> > +hash type, regardless of whether the hash is calculated on the inner header or
> > the outer header.
> > +
> > +If VIRTIO_NET_F_HASH_REPORT and VIRTIO_NET_F_HASH_REPORT_TUNNEL
> > are
> > +negotiated but VIRTIO_NET_F_HASH_TUNNEL is not negotiated, the device
> > +calculates the hash over the outer header, and \field{hash_report} reports the
> > hash type and encapsulation type.
> > +
> > +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]},
> > +\hyperref[intro:VXLAN]{[VXLAN]}, \hyperref[intro:GENEVE]{[GENEVE]},
> > \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
> > +
> >  \subparagraph{Supported/enabled hash types}  \label{sec:Device Types /
> > Network Device / Device Operation / Processing of Incoming Packets / Hash
> > calculation for incoming packets / Supported/enabled hash types}
> > +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
> > +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
> >  Hash types applicable for IPv4 packets:
> >  \begin{lstlisting}
> >  #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
> > @@ -3884,6 +3933,32 @@ \subsubsection{Processing of Incoming
> > Packets}\label{sec:Device Types / Network
> >  #define VIRTIO_NET_HASH_TYPE_UDP_EX            (1 << 8)
> >  \end{lstlisting}
> > 
> Lets please remove the below encoding.
> 
> > +\subparagraph{Supported/enabled tunnel hash types} \label{sec:Device
> > +Types / Network Device / Device Operation / Processing of Incoming
> > +Packets / Hash calculation for incoming packets / Supported/enabled
> > +tunnel hash types} If the feature VIRTIO_NET_F_HASH_TUNNEL is
> > +negotiated, the encapsulation hash type indicates that the hash is calculated
> > over the inner header of the encapsulated packet:
> > +Hash type applicable for inner payload of the gre-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 0)
> > +\end{lstlisting}
> > +Hash type applicable for inner payload of the vxlan-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 1)
> > +\end{lstlisting}
> > +Hash type applicable for inner payload of the geneve-encapsulated
> > +packet \begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 2)
> > +\end{lstlisting}
> > +Hash type applicable for inner payload of the ip-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 3)
> > +\end{lstlisting}
> > +Hash type applicable for inner payload of the nvgre-encapsulated packet
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 4)
> > +\end{lstlisting}
> > +
> >  \subparagraph{IPv4 packets}
> >  \label{sec:Device Types / Network Device / Device Operation / Processing of
> > Incoming Packets / Hash calculation for incoming packets / IPv4 packets}  The
> > device calculates the hash on IPv4 packets according to 'Enabled hash types'
> > bitmask as follows:
> > @@ -3975,17 +4050,47 @@ \subsubsection{Processing of Incoming
> > Packets}\label{sec:Device Types / Network  (see \ref{sec:Device Types /
> > Network Device / Device Operation / Processing of Incoming Packets / Hash
> > calculation for incoming packets / IPv6 packets without extension header}).
> >  \end{itemize}
> > 
> > +\subparagraph{Inner hash calculation of an encapsulated packet} If the
> > +feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and the corresponding
> > +encapsulation hash type is set in \field{hash_tunnel_types}, the device
> > +calculates the hash on the inner header of an encapsulated packet (See
> > +\ref{sec:Device Types / Network Device / Device Operation / Processing
> > +of Incoming Packets / Hash calculation for incoming packets /
> > Tunnel/Encapsulated packet}).
> > +
> > +\subparagraph{Security risks between encapsulated packets and RSS}
> > +There may be potential security risks when encapsulated packets using
> s/when encapsulated/when encapsulating/
> 
> > +RSS to select queues for placement. When a user inside a tunnel tries
> > +to control the enqueuing of encapsulated packets, then the user can
> > +flood the device with invaild packets, and the flooded packets may be
> > +hashed into the same queue as packets in other normal tunnels, which causing
> > the queue to overflow.
> > 
> Invalid packets are confusing and the wording of "which causing" is not proper.
> There is some duplicate wording below too.
> 
> I think above and below risk can be summarized in bit simpler manner.
> 
> How about,
> 
> When a specific receive queue is shared to receive packets of multiple tunnels, there is no quality of service for packets of multiple tunnels.

"shared to receive" is not grammatical either :)

If you are talking about a security risk you need to explain
1- what is the threat, what configurations are affected.
2- what is the attack type: DOS, information leak, etc.
3- how to mitigate it

This text touches a bit on 1 and 2 but not in an ordererly way.


> +
> > +This can pose several security risks:
> > +\begin{itemize}
> > +\item  Encapsulated packets in the normal tunnels cannot be enqueued due to
> > queue
> > +       overflow, resulting in a large amount of packet loss.
> > +\item  The delay and retransmission of packets in the normal tunnels are
> > extremely increased.
> This is something very protocol specific and doesn't belong here.

I don't see how it's specific - many protocols have retransmission and
are affected by delays. "extremely increased" sounds unrammatical to me
though.


> > +\item  The user can observe the traffic information and enqueue information
> > of other normal
> > +       tunnels, and conduct targeted DoS attacks.
> Once hash_report_tunnel_types is removed, this second attack is no longer applicable.
> Hence, please remove this too.


?
I don't get how removing a field helps DoS.

> > +\end{\itemize}
> > +
> >  \paragraph{Hash reporting for incoming packets}  \label{sec:Device Types /
> > Network Device / Device Operation / Processing of Incoming Packets / Hash
> > reporting for incoming packets}
> > -
> > -If VIRTIO_NET_F_HASH_REPORT was negotiated and
> > - the device has calculated the hash for the packet, the device fills
> > \field{hash_report} with the report type of calculated hash -and
> > \field{hash_value} with the value of calculated hash.
> > -
> > -If VIRTIO_NET_F_HASH_REPORT was negotiated but due to any reason the -
> > hash was not calculated, the device sets \field{hash_report} to
> > VIRTIO_NET_HASH_REPORT_NONE.
> > -
> > -Possible values that the device can report in \field{hash_report} are defined
> > below.
> > +If VIRTIO_NET_F_HASH_REPORT was negotiated and the device has
> > +calculated the hash for the packet, the device fills the lower 8 bits
> > +of \field{hash_report} with the report type of calculated hash, and
> > +\field{hash_value} with the value of calculated hash. Also, if
> > +VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device needs to
> > fill the upper 8 bits of \field{hash_report} with the encapsulation type.
> > +
> > +If VIRTIO_NET_F_HASH_REPORT was negotiated but due to any reason the
> > +hash was not calculated, the device sets the lower 8 bits of
> > +\field{hash_report} to VIRTIO_NET_HASH_REPORT_NONE.
> > +
> > +If VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device fills the
> > +upper
> > +8 bits of \field{hash_report} with the encapsulation type for an
> > +encapsulated packet. Note that the upper 8 bits are all set to 0 for an
> > +unencapsulated packet, regardless of whether
> > VIRTIO_NET_F_HASH_REPORT_TUNNEL is negotiated or not.
> > +
> > +Possible hash types that the device can report in \field{hash_report} are
> > defined below.
> >  They correspond to supported hash types defined in  \ref{sec:Device Types /
> > Network Device / Device Operation / Processing of Incoming Packets / Hash
> > calculation for incoming packets / Supported/enabled hash types}  as follows:
> > @@ -4005,6 +4110,26 @@ \subsubsection{Processing of Incoming
> > Packets}\label{sec:Device Types / Network
> >  #define VIRTIO_NET_HASH_REPORT_UDPv6_EX        9
> >  \end{lstlisting}
> > 
> > +The upper 8 bits of \field{hash_report} can report the encapsulation
> > +type to the driver if VIRTIO_NET_F_HASH_REPORT_TUNNEL is negotiated.
> > +Possible encapsulation types that the device can report in \field{hash_report}
> > are defined below.
> > +They correspond to supported hash tunnel types defined in
> > +\ref{sec:Device Types / Network Device / Device Operation / Processing
> > +of Incoming Packets / Hash calculation for incoming packets /
> > Supported/enabled hash tunnel types} as follows:
> > +
> > +VIRTIO_NET_HASH_TUNNEL_TYPE_XXX = 1 <<
> > +(VIRTIO_NET_HASH_TUNNEL_REPORT_XXX - 256)
> > +
> > +\begin{lstlisting}
> > +#define VIRTIO_NET_HASH_TUNNEL_REPORT_GRE      256
> > +#define VIRTIO_NET_HASH_TUNNEL_REPORT_VXLAN    257
> > +#define VIRTIO_NET_HASH_TUNNEL_REPORT_GENEVE   258
> > +#define VIRTIO_NET_HASH_TUNNEL_REPORT_IPIP     259
> > +#define VIRTIO_NET_HASH_TUNNEL_REPORT_NVGRE    260
> > +\end{lstlisting}
> > +
> > +They correspond to supported hash types defined in \ref{sec:Device
> > +Types / Network Device / Device Operation / Processing of Incoming Packets /
> > Hash calculation for incoming packets / Supported/enabled hash types}.
> > +
> >  \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device /
> > Device Operation / Control Virtqueue}
> > 
> >  The driver uses the control virtqueue (if VIRTIO_NET_F_CTRL_VQ is @@ -
> > 4364,6 +4489,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types
> > / Network Device / Devi  \begin{lstlisting}  struct virtio_net_hash_config {
> >      le32 hash_types;
> > +    le32 hash_tunnel_types;
> >      le16 reserved[4];
> >      u8 hash_key_length;
> >      u8 hash_key_data[hash_key_length];
> > @@ -4372,7 +4498,11 @@ \subsubsection{Control
> > Virtqueue}\label{sec:Device Types / Network Device / Devi  Field
> > \field{hash_types} contains a bitmask of allowed hash types as  defined in
> > \ref{sec:Device Types / Network Device / Device Operation / Processing of
> > Incoming Packets / Hash calculation for incoming packets / Supported/enabled
> > hash types}.
> > -Initially the device has all hash types disabled and reports only
> > VIRTIO_NET_HASH_REPORT_NONE.
> > +
> > +Field \field{hash_tunnel_types} contains a bitmask of allowed hash
> > +tunnel types as defined in \ref{sec:Device Types / Network Device / Device
> > Operation / Processing of Incoming Packets / Hash calculation for incoming
> > packets / Supported/enabled hash tunnel types}.
> > +
> > +Initially the device has all hash types and hash tunnel types disabled and
> > reports only VIRTIO_NET_HASH_REPORT_NONE.
> > 
> >  Field \field{reserved} MUST contain zeroes. It is defined to make the structure
> > to match the layout of virtio_net_rss_config structure,  defined in
> > \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue
> > / Receive-side scaling (RSS)}.
> > @@ -4390,6 +4520,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device
> > Types / Network Device / Devi  \begin{lstlisting}  struct virtio_net_rss_config {
> >      le32 hash_types;
> > +    le32 hash_tunnel_types;
> This field is not needed as device config space advertisement for the support is enough.
> 
> If the intent is to enable hashing for the specific tunnel(s), an individual command is better.

new command? I am not sure why we want that. why not handle
tunnels like we do other protocols?

> Regardless, this new field cannot be in the middle of the new structure as it breaks backward compatibility.

absolutely.

> >      le16 indirection_table_mask;
> >      le16 unclassified_queue;
> >      le16 indirection_table[indirection_table_length];
> > @@ -4402,6 +4533,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device
> > Types / Network Device / Devi  defined in  \ref{sec:Device Types / Network
> > Device / Device Operation / Processing of Incoming Packets / Hash calculation
> > for incoming packets / Supported/enabled hash types}.
> > 
> > +Field \field{hash_tunnel_types} contains a bitmask of allowed hash
> > +tunnel types as defined in \ref{sec:Device Types / Network Device / Device
> > Operation / Processing of Incoming Packets / Hash calculation for incoming
> > packets / Supported/enabled hash tunnel types}.
> > +
> >  Field \field{indirection_table_mask} is a mask to be applied to  the calculated
> > hash to produce an index in the  \field{indirection_table} array.
> > diff --git a/introduction.tex b/introduction.tex index 287c5fc..69b95ae 100644
> > --- a/introduction.tex
> > +++ b/introduction.tex


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 16:50             ` Parav Pandit
@ 2023-02-21 17:13               ` Michael S. Tsirkin
  2023-02-21 17:40                 ` [virtio-comment] " Parav Pandit
  2023-02-21 17:17               ` [virtio-comment] " Heng Qi
  1 sibling, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-21 17:13 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 21, 2023 at 04:50:56PM +0000, Parav Pandit wrote:
> 
> > From: Heng Qi <hengqi@linux.alibaba.com>
> > Sent: Tuesday, February 21, 2023 11:44 AM
> 
> > > Patch-1 to introduce the feature bit, description, and link to CVQ dependency.
> > > Patch-2 for its link in virtio_net_config structure and description.
> > > Patch-3 for new command touching control VQ pieces.
> > 
> > Yes, and you seem to have missed my other replies in this thread:), I rephrased
> Was it comment "And virtio_net_hash_config seems to suffice except for le16 reserved[4]." ?
> I don’t see these reserved fields in the current structure.
> 
> > it here:
> > virtio_net_hash_config seems to be reusable, as the v9 patch is doing, why
> > don't we re-use it?
> > The reason I can think of is to not expand the virtio_net_hash_config structure,
> > just set a separate structure for VIRTIO_NET_F_HASH_TUNNEL and include
> > hash_tunnel_types?
> 
> The part that I am missing is, how do to reuse virtio_net_hash_config and say ignore all the existing fields related to rss, but only consider hash_tunnel_types?

Like a union?  The answer is, don't. Just lay out fields
one after another.

-- 
MST


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-comment] RE: [virtio-dev] RE: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 16:50             ` Parav Pandit
  2023-02-21 17:13               ` Michael S. Tsirkin
@ 2023-02-21 17:17               ` Heng Qi
  2023-02-21 17:39                 ` Parav Pandit
  1 sibling, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-02-21 17:17 UTC (permalink / raw)
  To: Parav Pandit, virtio-comment, virtio-dev
  Cc: Michael S . Tsirkin, Jason Wang, Yuri Benditovich, Cornelia Huck,
	Xuan Zhuo



在 2023/2/22 上午12:50, Parav Pandit 写道:
>> From: Heng Qi <hengqi@linux.alibaba.com>
>> Sent: Tuesday, February 21, 2023 11:44 AM
>>> Patch-1 to introduce the feature bit, description, and link to CVQ dependency.
>>> Patch-2 for its link in virtio_net_config structure and description.
>>> Patch-3 for new command touching control VQ pieces.
>> Yes, and you seem to have missed my other replies in this thread:), I rephrased
> Was it comment "And virtio_net_hash_config seems to suffice except for le16 reserved[4]." ?
> I don’t see these reserved fields in the current structure.
>
>> it here:
>> virtio_net_hash_config seems to be reusable, as the v9 patch is doing, why
>> don't we re-use it?
>> The reason I can think of is to not expand the virtio_net_hash_config structure,
>> just set a separate structure for VIRTIO_NET_F_HASH_TUNNEL and include
>> hash_tunnel_types?
> The part that I am missing is, how do to reuse virtio_net_hash_config and say ignore all the existing fields related to rss, but only consider hash_tunnel_types?

Are you referring to such a command?

#define VIRTIO_NET_CTRL_HASH_TUNNEL_TYPE 1
struct virtio_net_tunnel_type_config {
      le32 hash_tunnel_types;
};

Thanks.


^ permalink raw reply	[flat|nested] 105+ messages in thread

* RE: [virtio-comment] RE: [virtio-dev] RE: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 17:17               ` [virtio-comment] " Heng Qi
@ 2023-02-21 17:39                 ` Parav Pandit
  0 siblings, 0 replies; 105+ messages in thread
From: Parav Pandit @ 2023-02-21 17:39 UTC (permalink / raw)
  To: Heng Qi, virtio-comment, virtio-dev
  Cc: Michael S . Tsirkin, Jason Wang, Yuri Benditovich, Cornelia Huck,
	Xuan Zhuo


> From: Heng Qi <hengqi@linux.alibaba.com>
> Sent: Tuesday, February 21, 2023 12:17 PM
> 
> 在 2023/2/22 上午12:50, Parav Pandit 写道:
> >> From: Heng Qi <hengqi@linux.alibaba.com>
> >> Sent: Tuesday, February 21, 2023 11:44 AM
> >>> Patch-1 to introduce the feature bit, description, and link to CVQ
> dependency.
> >>> Patch-2 for its link in virtio_net_config structure and description.
> >>> Patch-3 for new command touching control VQ pieces.
> >> Yes, and you seem to have missed my other replies in this thread:), I
> >> rephrased
> > Was it comment "And virtio_net_hash_config seems to suffice except for le16
> reserved[4]." ?
> > I don’t see these reserved fields in the current structure.
> >
> >> it here:
> >> virtio_net_hash_config seems to be reusable, as the v9 patch is
> >> doing, why don't we re-use it?
> >> The reason I can think of is to not expand the virtio_net_hash_config
> >> structure, just set a separate structure for VIRTIO_NET_F_HASH_TUNNEL
> >> and include hash_tunnel_types?
> > The part that I am missing is, how do to reuse virtio_net_hash_config and say
> ignore all the existing fields related to rss, but only consider
> hash_tunnel_types?
> 
> Are you referring to such a command?
> 
> #define VIRTIO_NET_CTRL_HASH_TUNNEL_TYPE 1 struct
> virtio_net_tunnel_type_config {
>       le32 hash_tunnel_types;
> };
> 
> Thanks.

Yes.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-comment] RE: [virtio-dev] RE: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 17:13               ` Michael S. Tsirkin
@ 2023-02-21 17:40                 ` Parav Pandit
  2023-02-21 17:44                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Parav Pandit @ 2023-02-21 17:40 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, February 21, 2023 12:14 PM
> > The part that I am missing is, how do to reuse virtio_net_hash_config and say
> ignore all the existing fields related to rss, but only consider
> hash_tunnel_types?
> 
> Like a union?  The answer is, don't. Just lay out fields one after another.
> 
In that case driver needs to fill up all the fields which are not related to hash_tunnel_types and the device also needs to compare with the previous config and ignore it.
Doesn’t look like a good use of existing commands and sw/fw usage for it.
Shouldn’t we have the explicit command for setting tunnel types?

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] RE: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 17:40                 ` [virtio-comment] " Parav Pandit
@ 2023-02-21 17:44                   ` Michael S. Tsirkin
  2023-02-21 17:54                     ` Parav Pandit
  0 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-21 17:44 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 21, 2023 at 05:40:51PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, February 21, 2023 12:14 PM
> > > The part that I am missing is, how do to reuse virtio_net_hash_config and say
> > ignore all the existing fields related to rss, but only consider
> > hash_tunnel_types?
> > 
> > Like a union?  The answer is, don't. Just lay out fields one after another.
> > 
> In that case driver needs to fill up all the fields which are not
> related to hash_tunnel_types and the device also needs to compare with
> the previous config and ignore it.  Doesn’t look like a good use of
> existing commands and sw/fw usage for it.  Shouldn’t we have the
> explicit command for setting tunnel types?

I don't know what's proposed at this point, this is too vague.
I feel which tunnels to hash for inner header is not different
from which transports to hash. If device wants to know
what changes it can compare. I expect generally devices will
just apply the new config without caring what changed exactly.

-- 
MST


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-18 14:37 [PATCH v9] virtio-net: support inner header hash Heng Qi
  2023-02-20 15:53 ` [virtio-comment] Re: [virtio-dev] " Heng Qi
  2023-02-21  4:20 ` Parav Pandit
@ 2023-02-21 17:50 ` Michael S. Tsirkin
  2023-02-22  3:22   ` Jason Wang
  2023-02-23 13:13 ` Michael S. Tsirkin
  2023-02-28 11:16 ` Michael S. Tsirkin
  4 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-21 17:50 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> +\subparagraph{Security risks between encapsulated packets and RSS}
> +There may be potential security risks when encapsulated packets using RSS to
> +select queues for placement. When a user inside a tunnel tries to control the
> +enqueuing of encapsulated packets, then the user can flood the device with invaild
> +packets, and the flooded packets may be hashed into the same queue as packets in
> +other normal tunnels, which causing the queue to overflow.
> +
> +This can pose several security risks:
> +\begin{itemize}
> +\item  Encapsulated packets in the normal tunnels cannot be enqueued due to queue
> +       overflow, resulting in a large amount of packet loss.
> +\item  The delay and retransmission of packets in the normal tunnels are extremely increased.
> +\item  The user can observe the traffic information and enqueue information of other normal
> +       tunnels, and conduct targeted DoS attacks.
> +\end{\itemize}
> +

Hmm with this all written out it sounds pretty severe.
At this point with no ways to mitigate, I don't feel this is something
e.g. Linux can enable.  I am not going to nack the spec patch if
others  find this somehow useful e.g. for dpdk. 
How about CC e.g. dpdk devs or whoever else is going to use this
and asking them for the opinion?


-- 
MST


^ permalink raw reply	[flat|nested] 105+ messages in thread

* RE: [virtio-dev] RE: [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 17:44                   ` Michael S. Tsirkin
@ 2023-02-21 17:54                     ` Parav Pandit
  0 siblings, 0 replies; 105+ messages in thread
From: Parav Pandit @ 2023-02-21 17:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, February 21, 2023 12:44 PM
> 
> On Tue, Feb 21, 2023 at 05:40:51PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, February 21, 2023 12:14 PM
> > > > The part that I am missing is, how do to reuse
> > > > virtio_net_hash_config and say
> > > ignore all the existing fields related to rss, but only consider
> > > hash_tunnel_types?
> > >
> > > Like a union?  The answer is, don't. Just lay out fields one after another.
> > >
> > In that case driver needs to fill up all the fields which are not
> > related to hash_tunnel_types and the device also needs to compare with
> > the previous config and ignore it.  Doesn’t look like a good use of
> > existing commands and sw/fw usage for it.  Shouldn’t we have the
> > explicit command for setting tunnel types?
> 
> I don't know what's proposed at this point, this is too vague.
Proposal is to have new command like how Heng drafted in latest email.

> I feel which tunnels to hash for inner header is not different from which
> transports to hash. If device wants to know what changes it can compare. I
> expect generally devices will just apply the new config without caring what
> changed exactly.
In a device when things are programmed, it often requires removing the previous configuration and re-apply the new one.
This results in wrong steering for several tens of micro to milli seconds for no apparent reason.
Hence, comparison is often needed for the best experience.
And those are just overheads on the device and driver side without any apparent gain other than reusing some structure of rss.
Hence, a separate command is more efficient choice.





^ permalink raw reply	[flat|nested] 105+ messages in thread

* RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 17:05   ` Michael S. Tsirkin
@ 2023-02-21 19:29     ` Parav Pandit
  2023-02-21 21:23       ` Michael S. Tsirkin
  2023-02-22  2:34       ` [virtio-dev] " Heng Qi
  0 siblings, 2 replies; 105+ messages in thread
From: Parav Pandit @ 2023-02-21 19:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, February 21, 2023 12:06 PM
> 
> On Tue, Feb 21, 2023 at 04:20:59AM +0000, Parav Pandit wrote:
> >
> > > From: Heng Qi <hengqi@linux.alibaba.com>
> > > Sent: Saturday, February 18, 2023 9:37 AM
> >
> > > If the tunnel is used to encapsulate the packets, the hash
> > > calculated using the
> > s/hash calculated/hash is calculated
> >
> > > outer header of the receive packets is always fixed for the same
> > > flow packets, i.e. they will be steered to the same receive queue.
> > >
> > A little descriptive commit message like below reads better to me.
> > Currently, when a received packet is an encapsulated packet meaning there
> is an outer and an inner header, virtio device is unable to calculate the hash for
> the inner header.
> > Due to this limitation, multiple different flows identified by the inner header
> for the same outer header result in selecting the same receive queue.
> > This effectively disables the RSS, resulting in poor receive performance.
> >
> > Hence, to overcome this limitation, a new feature is introduced using a
> feature bit VIRTIO_NET_F_HASH_TUNNEL.
> > This feature enables the device to advertise the capability to calculate the
> hash for the inner packet header.
> > Thereby regaining better RSS performance in presence of outer packet
> header.
> 
> I think this is a good description however Parav I think it is important to make
> contributors write their own commit messages so they know what is the reason
> for the proposed change. 
Sure. Contributor can rewrite it.

> What's good for the goose is good for the gander -
> contributors should explain why their change to spec is benefitial but reviewers
> should also explain why their changes to the patch are benefitial, and "reads
> better to me" does not cut it - it does not allow the contributor to improve with
> time.  It's more than about a single contribution, see?
> 
I provided an example template on how to write problem_description -> solution commit log.

At the beginning, I said "little descriptive" to explain why it reads better.
But it seems, even more verbosity is needed even for the reviewer to suggest.
I didn't see this often happening by other reviewers, but I make a note of it now, at least I can improve from this feedback.

I imagined that the contributor would see the pattern as problem->solution in the example commit log and follow in the future patches.
Giving example of current patch was best to describe how to write it.

> In this case I would say the issue is that motivation for the change is never
> explained.

> > When a specific receive queue is shared to receive packets of multiple
> tunnels, there is no quality of service for packets of multiple tunnels.
> 
> "shared to receive" is not grammatical either :)
> 
"Shared by multiple tunnels" will make it grammatical?

> If you are talking about a security risk you need to explain
> 1- what is the threat, what configurations are affected.
> 2- what is the attack type: DOS, information leak, etc.
> 3- how to mitigate it
> 
> This text touches a bit on 1 and 2 but not in an ordererly way.
> 
> 
it is best effort based.

#3 is outside the scope of this patch set.

> > +
> > > +This can pose several security risks:
> > > +\begin{itemize}
> > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > +enqueued due to
> > > queue
> > > +       overflow, resulting in a large amount of packet loss.
> > > +\item  The delay and retransmission of packets in the normal
> > > +tunnels are
> > > extremely increased.
> > This is something very protocol specific and doesn't belong here.
> 
> I don't see how it's specific - many protocols have retransmission and are
> affected by delays. "extremely increased" sounds unrammatical to me though.
> 
> 
I am not sure where you want to lead this discussion.
I am trying to help the spec and feature definition to be compact enough to progress.

It is specific to a protocol(s) and somehow arbitrarily concluded with a large number of packet losses.
Maybe only one ICMP packet got dropped and retransmit was just one packet.
Maybe it was TCP with selective retransmit enabled/disabled.

As far as receive side is concerned, it should say that there is no QoS among different tunnels.
The user will figure out how to mitigate when such QoS is not available. Either to run in best-effort mode or mitigate differently.

> > > +\item  The user can observe the traffic information and enqueue
> > > +information
> > > of other normal
> > > +       tunnels, and conduct targeted DoS attacks.
> > Once hash_report_tunnel_types is removed, this second attack is no longer
> applicable.
> > Hence, please remove this too.
> 
> 
> ?
> I don't get how removing a field helps DoS.
> 
I meant for the "observe and enqueue" part of the tunnel as not applicable. 

> \begin{lstlisting}  struct virtio_net_rss_config {
> > >      le32 hash_types;
> > > +    le32 hash_tunnel_types;
> > This field is not needed as device config space advertisement for the support
> is enough.
> >
> > If the intent is to enable hashing for the specific tunnel(s), an individual
> command is better.
> 
> new command? I am not sure why we want that. why not handle tunnels like
> we do other protocols?

I didn't follow.
We probably discussed in another thread that to set M bits, it is wise to avoid setting N other bits just to keep the command happy, where N >>> M and these N have a very strong relation in hw resource setup and packet steering.
Any examples of 'other protocols'?


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 19:29     ` Parav Pandit
@ 2023-02-21 21:23       ` Michael S. Tsirkin
  2023-02-21 21:36         ` Parav Pandit
  2023-02-22  2:34       ` [virtio-dev] " Heng Qi
  1 sibling, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-21 21:23 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 21, 2023 at 07:29:20PM +0000, Parav Pandit wrote:
> > > When a specific receive queue is shared to receive packets of multiple
> > tunnels, there is no quality of service for packets of multiple tunnels.
> > 
> > "shared to receive" is not grammatical either :)
> > 
> "Shared by multiple tunnels" will make it grammatical?

I think so, yes.

> > If you are talking about a security risk you need to explain
> > 1- what is the threat, what configurations are affected.
> > 2- what is the attack type: DOS, information leak, etc.
> > 3- how to mitigate it
> > 
> > This text touches a bit on 1 and 2 but not in an ordererly way.
> > 
> > 
> it is best effort based.
> 
> #3 is outside the scope of this patch set.

Scope is from greek for "target". It's what we are aiming for.
If we document a security risk then I would say yes we should aim
to provide not just problems but solutions too.

> > > +
> > > > +This can pose several security risks:
> > > > +\begin{itemize}
> > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > +enqueued due to
> > > > queue
> > > > +       overflow, resulting in a large amount of packet loss.
> > > > +\item  The delay and retransmission of packets in the normal
> > > > +tunnels are
> > > > extremely increased.
> > > This is something very protocol specific and doesn't belong here.
> > 
> > I don't see how it's specific - many protocols have retransmission and are
> > affected by delays. "extremely increased" sounds unrammatical to me though.
> > 
> > 
> I am not sure where you want to lead this discussion.

I just disagree that documenting timing effects does not belong in the
spec.

> I am trying to help the spec and feature definition to be compact enough to progress.
> 
> It is specific to a protocol(s) and somehow arbitrarily concluded with a large number of packet losses.
> Maybe only one ICMP packet got dropped and retransmit was just one packet.
> Maybe it was TCP with selective retransmit enabled/disabled.
> 
> As far as receive side is concerned, it should say that there is no QoS among different tunnels.
> The user will figure out how to mitigate when such QoS is not available. Either to run in best-effort mode or mitigate differently.

So you are saying either live with the problem (this is best effort yes?)
or find your own solutions? Such as?


> > > > +\item  The user can observe the traffic information and enqueue
> > > > +information
> > > > of other normal
> > > > +       tunnels, and conduct targeted DoS attacks.
> > > Once hash_report_tunnel_types is removed, this second attack is no longer
> > applicable.
> > > Hence, please remove this too.
> > 
> > 
> > ?
> > I don't get how removing a field helps DoS.
> > 
> I meant for the "observe and enqueue" part of the tunnel as not applicable. 

Sorry still don't get it :( I don't know what is the "observe and enqueue" part of the tunnel
and what is not applicable. But maybe Heng Qi does.

> > \begin{lstlisting}  struct virtio_net_rss_config {
> > > >      le32 hash_types;
> > > > +    le32 hash_tunnel_types;
> > > This field is not needed as device config space advertisement for the support
> > is enough.
> > >
> > > If the intent is to enable hashing for the specific tunnel(s), an individual
> > command is better.
> > 
> > new command? I am not sure why we want that. why not handle tunnels like
> > we do other protocols?
> 
> I didn't follow.
> We probably discussed in another thread that to set M bits, it is wise to avoid setting N other bits just to keep the command happy, where N >>> M and these N have a very strong relation in hw resource setup and packet steering.
> Any examples of 'other protocols'?

#define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
#define VIRTIO_NET_HASH_TYPE_TCPv4             (1 << 1)
#define VIRTIO_NET_HASH_TYPE_UDPv4             (1 << 2)

this kind of thing.

I don't see how a tunnel is different fundamentally. Why does it need
its own field?

-- 
MST


^ permalink raw reply	[flat|nested] 105+ messages in thread

* RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 21:23       ` Michael S. Tsirkin
@ 2023-02-21 21:36         ` Parav Pandit
  2023-02-21 21:46           ` Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Parav Pandit @ 2023-02-21 21:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, February 21, 2023 4:24 PM
> 
> On Tue, Feb 21, 2023 at 07:29:20PM +0000, Parav Pandit wrote:
> > > > When a specific receive queue is shared to receive packets of
> > > > multiple
> > > tunnels, there is no quality of service for packets of multiple tunnels.
> > >
> > > "shared to receive" is not grammatical either :)
> > >
> > "Shared by multiple tunnels" will make it grammatical?
> 
> I think so, yes.
> 
> > > If you are talking about a security risk you need to explain
> > > 1- what is the threat, what configurations are affected.
> > > 2- what is the attack type: DOS, information leak, etc.
> > > 3- how to mitigate it
> > >
> > > This text touches a bit on 1 and 2 but not in an ordererly way.
> > >
> > >
> > it is best effort based.
> >
> > #3 is outside the scope of this patch set.
> 
> Scope is from greek for "target". It's what we are aiming for.
> If we document a security risk then I would say yes we should aim to provide
> not just problems but solutions too.
> 
> > > > +
> > > > > +This can pose several security risks:
> > > > > +\begin{itemize}
> > > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > > +enqueued due to
> > > > > queue
> > > > > +       overflow, resulting in a large amount of packet loss.
> > > > > +\item  The delay and retransmission of packets in the normal
> > > > > +tunnels are
> > > > > extremely increased.
> > > > This is something very protocol specific and doesn't belong here.
> > >
> > > I don't see how it's specific - many protocols have retransmission
> > > and are affected by delays. "extremely increased" sounds unrammatical to
> me though.
> > >
> > >
> > I am not sure where you want to lead this discussion.
> 
> I just disagree that documenting timing effects does not belong in the spec.
> 
> > I am trying to help the spec and feature definition to be compact enough to
> progress.
> >
> > It is specific to a protocol(s) and somehow arbitrarily concluded with a large
> number of packet losses.
> > Maybe only one ICMP packet got dropped and retransmit was just one
> packet.
> > Maybe it was TCP with selective retransmit enabled/disabled.
> >
> > As far as receive side is concerned, it should say that there is no QoS among
> different tunnels.
> > The user will figure out how to mitigate when such QoS is not available.
> Either to run in best-effort mode or mitigate differently.
> 
> So you are saying either live with the problem (this is best effort yes?) 
Yes to best effort usage.

> 
> 
> > > > > +\item  The user can observe the traffic information and enqueue
> > > > > +information
> > > > > of other normal
> > > > > +       tunnels, and conduct targeted DoS attacks.
> > > > Once hash_report_tunnel_types is removed, this second attack is no
> > > > longer
> > > applicable.
> > > > Hence, please remove this too.
> > >
> > >
> > > ?
> > > I don't get how removing a field helps DoS.
> > >
> > I meant for the "observe and enqueue" part of the tunnel as not applicable.
> 
> Sorry still don't get it :( I don't know what is the "observe and enqueue" part of
> the tunnel and what is not applicable. But maybe Heng Qi does.
> 
Tunnel type such as vxlan/gre etc is not placed in the virtio_net_hdr.
This way the net_hdr doesn't leak such information to upper layer drivers as it cannot observe it.

> > > \begin{lstlisting}  struct virtio_net_rss_config {
> > > > >      le32 hash_types;
> > > > > +    le32 hash_tunnel_types;
> > > > This field is not needed as device config space advertisement for
> > > > the support
> > > is enough.
> > > >
> > > > If the intent is to enable hashing for the specific tunnel(s), an
> > > > individual
> > > command is better.
> > >
> > > new command? I am not sure why we want that. why not handle tunnels
> > > like we do other protocols?
> >
> > I didn't follow.
> > We probably discussed in another thread that to set M bits, it is wise to avoid
> setting N other bits just to keep the command happy, where N >>> M and these
> N have a very strong relation in hw resource setup and packet steering.
> > Any examples of 'other protocols'?
> 
> #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
> #define VIRTIO_NET_HASH_TYPE_TCPv4             (1 << 1)
> #define VIRTIO_NET_HASH_TYPE_UDPv4             (1 << 2)
> 
> this kind of thing.
> 
> I don't see how a tunnel is different fundamentally. Why does it need its own
> field?

Driver is in control to enable/disable tunnel based inner hash acceleration only when its needed.
This way certain data path hw parsers can be enabled/disabled.
Without this it will be always enabled even if there may not be any user of it.
Device has scope to optimize this flow.


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 21:36         ` Parav Pandit
@ 2023-02-21 21:46           ` Michael S. Tsirkin
  2023-02-21 22:32             ` Parav Pandit
  0 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-21 21:46 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 21, 2023 at 09:36:06PM +0000, Parav Pandit wrote:
> > So you are saying either live with the problem (this is best effort yes?) 
> Yes to best effort usage.

For sure something can be done to mitigate? How about randomizing the
key for example? That's in just like 1 minute of thinking. I am guessing
more can be done.


> > 
> > 
> > > > > > +\item  The user can observe the traffic information and enqueue
> > > > > > +information
> > > > > > of other normal
> > > > > > +       tunnels, and conduct targeted DoS attacks.
> > > > > Once hash_report_tunnel_types is removed, this second attack is no
> > > > > longer
> > > > applicable.
> > > > > Hence, please remove this too.
> > > >
> > > >
> > > > ?
> > > > I don't get how removing a field helps DoS.
> > > >
> > > I meant for the "observe and enqueue" part of the tunnel as not applicable.
> > 
> > Sorry still don't get it :( I don't know what is the "observe and enqueue" part of
> > the tunnel and what is not applicable. But maybe Heng Qi does.
> > 
> Tunnel type such as vxlan/gre etc is not placed in the virtio_net_hdr.
> This way the net_hdr doesn't leak such information to upper layer drivers as it cannot observe it.

What is this information driver can't observe? It sees all the packets
after all, we are not stripping tunneling headers.
I also don't really know what are upper layer drivers - for sure
layering of drivers is not covered in the spec for now so I am not sure
what do you mean by that.  The risk I mentioned is leaking the
information *on the network*.




> > > > \begin{lstlisting}  struct virtio_net_rss_config {
> > > > > >      le32 hash_types;
> > > > > > +    le32 hash_tunnel_types;
> > > > > This field is not needed as device config space advertisement for
> > > > > the support
> > > > is enough.
> > > > >
> > > > > If the intent is to enable hashing for the specific tunnel(s), an
> > > > > individual
> > > > command is better.
> > > >
> > > > new command? I am not sure why we want that. why not handle tunnels
> > > > like we do other protocols?
> > >
> > > I didn't follow.
> > > We probably discussed in another thread that to set M bits, it is wise to avoid
> > setting N other bits just to keep the command happy, where N >>> M and these
> > N have a very strong relation in hw resource setup and packet steering.
> > > Any examples of 'other protocols'?
> > 
> > #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
> > #define VIRTIO_NET_HASH_TYPE_TCPv4             (1 << 1)
> > #define VIRTIO_NET_HASH_TYPE_UDPv4             (1 << 2)
> > 
> > this kind of thing.
> > 
> > I don't see how a tunnel is different fundamentally. Why does it need its own
> > field?
> 
> Driver is in control to enable/disable tunnel based inner hash acceleration only when its needed.
> This way certain data path hw parsers can be enabled/disabled.
> Without this it will be always enabled even if there may not be any user of it.
> Device has scope to optimize this flow.

I feel you misunderstand the question. Or maybe I misunderstand what you
are proposing.  So tunnels need their own bits. But why a separate field
and not just more bits along the existing ones?

-- 
MST


^ permalink raw reply	[flat|nested] 105+ messages in thread

* RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 21:46           ` Michael S. Tsirkin
@ 2023-02-21 22:32             ` Parav Pandit
  2023-02-21 23:18               ` Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Parav Pandit @ 2023-02-21 22:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, February 21, 2023 4:46 PM
> 
> What is this information driver can't observe? It sees all the packets after all,
> we are not stripping tunneling headers.
Just the tunnel type.
If/when that tunnel header is stripped, it gets complicated where tunnel type is still present in the virtio_net_hdr because hash_report_tunnel feature bit is negotiated.

> I also don't really know what are upper layer drivers - for sure layering of
> drivers is not covered in the spec for now so I am not sure what do you mean by
> that.  The risk I mentioned is leaking the information *on the network*.
> 
Got it.

> 
> 
> 
> > > > > \begin{lstlisting}  struct virtio_net_rss_config {
> > > > > > >      le32 hash_types;
> > > > > > > +    le32 hash_tunnel_types;
> > > > > > This field is not needed as device config space advertisement
> > > > > > for the support
> > > > > is enough.
> > > > > >
> > > > > > If the intent is to enable hashing for the specific tunnel(s),
> > > > > > an individual
> > > > > command is better.
> > > > >
> > > > > new command? I am not sure why we want that. why not handle
> > > > > tunnels like we do other protocols?
> > > >
> > > > I didn't follow.
> > > > We probably discussed in another thread that to set M bits, it is
> > > > wise to avoid
> > > setting N other bits just to keep the command happy, where N >>> M
> > > and these N have a very strong relation in hw resource setup and packet
> steering.
> > > > Any examples of 'other protocols'?
> > >
> > > #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
> > > #define VIRTIO_NET_HASH_TYPE_TCPv4             (1 << 1)
> > > #define VIRTIO_NET_HASH_TYPE_UDPv4             (1 << 2)
> > >
> > > this kind of thing.
> > >
> > > I don't see how a tunnel is different fundamentally. Why does it
> > > need its own field?
> >
> > Driver is in control to enable/disable tunnel based inner hash acceleration
> only when its needed.
> > This way certain data path hw parsers can be enabled/disabled.
> > Without this it will be always enabled even if there may not be any user of it.
> > Device has scope to optimize this flow.
> 
> I feel you misunderstand the question. Or maybe I misunderstand what you are
> proposing.  So tunnels need their own bits. But why a separate field and not just
> more bits along the existing ones?

Because the hashing is not covering the outer header contents.

We may be still not discussing the same.
So let me refresh the context.

The question of discussion was,
Scenario:
1. device advertises the ability to hash on the inner packet header.
2. device prefers that driver enable it only when it needs to use this extra packet parser in hardware.

There are 3 options.
a. Because the feature is negotiated, it means it is enabled for all the tunnel types.
Pros:
1. No need to extend cvq cmd.
Cons:
1. device parser is always enabled, and the driver never uses it. This may result in inferior rx performance.

b. Since the feature is useful in a narrow case of sw-based vxlan etc driver, better not to enable hw for it.
Hence, have the knob to explicitly enable in hw.
So have the cvq command.
b.1 should it be combined with the existing command?
Cons:
a. when the driver wants to enable hash on inner, it needs to supply the exact same RSS config as before. Sw overhead with no gain.
b. device needs to parse new command value, compare with old config, and drop the RSS config, just enable inner hashing hw parser.
Or destroy the old rss config and re-apply. This results in weird behavior for the short interval with no apparent gain.

b.2 should it be on its own command?
Pros:
a. device and driver doesn't need to bother about b.1.a and b.1.b.
b. still benefits from not always enabling hw parser, as this is not a common case.
c. has the ability to enable when needed.


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 22:32             ` Parav Pandit
@ 2023-02-21 23:18               ` Michael S. Tsirkin
  2023-02-22  1:41                 ` Parav Pandit
  2023-02-22  2:51                 ` [virtio-dev] " Heng Qi
  0 siblings, 2 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-21 23:18 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 21, 2023 at 10:32:11PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, February 21, 2023 4:46 PM
> > 
> > What is this information driver can't observe? It sees all the packets after all,
> > we are not stripping tunneling headers.
> Just the tunnel type.
> If/when that tunnel header is stripped, it gets complicated where tunnel type is still present in the virtio_net_hdr because hash_report_tunnel feature bit is negotiated.

whoever strips off the tunnel has I imagine strip off the virtio net hdr
too - everything else in it such as gso type refers to the outer packet.

> > I also don't really know what are upper layer drivers - for sure layering of
> > drivers is not covered in the spec for now so I am not sure what do you mean by
> > that.  The risk I mentioned is leaking the information *on the network*.
> > 
> Got it.
> 
> > 
> > 
> > 
> > > > > > \begin{lstlisting}  struct virtio_net_rss_config {
> > > > > > > >      le32 hash_types;
> > > > > > > > +    le32 hash_tunnel_types;
> > > > > > > This field is not needed as device config space advertisement
> > > > > > > for the support
> > > > > > is enough.
> > > > > > >
> > > > > > > If the intent is to enable hashing for the specific tunnel(s),
> > > > > > > an individual
> > > > > > command is better.
> > > > > >
> > > > > > new command? I am not sure why we want that. why not handle
> > > > > > tunnels like we do other protocols?
> > > > >
> > > > > I didn't follow.
> > > > > We probably discussed in another thread that to set M bits, it is
> > > > > wise to avoid
> > > > setting N other bits just to keep the command happy, where N >>> M
> > > > and these N have a very strong relation in hw resource setup and packet
> > steering.
> > > > > Any examples of 'other protocols'?
> > > >
> > > > #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
> > > > #define VIRTIO_NET_HASH_TYPE_TCPv4             (1 << 1)
> > > > #define VIRTIO_NET_HASH_TYPE_UDPv4             (1 << 2)
> > > >
> > > > this kind of thing.
> > > >
> > > > I don't see how a tunnel is different fundamentally. Why does it
> > > > need its own field?
> > >
> > > Driver is in control to enable/disable tunnel based inner hash acceleration
> > only when its needed.
> > > This way certain data path hw parsers can be enabled/disabled.
> > > Without this it will be always enabled even if there may not be any user of it.
> > > Device has scope to optimize this flow.
> > 
> > I feel you misunderstand the question. Or maybe I misunderstand what you are
> > proposing.  So tunnels need their own bits. But why a separate field and not just
> > more bits along the existing ones?
> 
> Because the hashing is not covering the outer header contents.
> 
> We may be still not discussing the same.
> So let me refresh the context.
> 
> The question of discussion was,
> Scenario:
> 1. device advertises the ability to hash on the inner packet header.
> 2. device prefers that driver enable it only when it needs to use this extra packet parser in hardware.
> 
> There are 3 options.
> a. Because the feature is negotiated, it means it is enabled for all the tunnel types.
> Pros:
> 1. No need to extend cvq cmd.
> Cons:
> 1. device parser is always enabled, and the driver never uses it. This may result in inferior rx performance.
> 
> b. Since the feature is useful in a narrow case of sw-based vxlan etc driver, better not to enable hw for it.
> Hence, have the knob to explicitly enable in hw.
> So have the cvq command.
> b.1 should it be combined with the existing command?
> Cons:
> a. when the driver wants to enable hash on inner, it needs to supply the exact same RSS config as before. Sw overhead with no gain.
> b. device needs to parse new command value, compare with old config, and drop the RSS config, just enable inner hashing hw parser.
> Or destroy the old rss config and re-apply. This results in weird behavior for the short interval with no apparent gain.
>
> b.2 should it be on its own command?
> Pros:
> a. device and driver doesn't need to bother about b.1.a and b.1.b.
> b. still benefits from not always enabling hw parser, as this is not a common case.
> c. has the ability to enable when needed.

I prefer b.1. With reporting of the tunnel type gone I don't see a
fundamental difference between hashing over tunneling types and other
protocol types we support.  It's just a flag telling device over which
bits to calculate the hash. We don't have a separate command for hashing
of TCPv6, why have it for vxlan?  Extending with more HASH_TYPE makes
total sense to me, seems to fit better with the existing design and will
make patch smaller.


-- 
MST


^ permalink raw reply	[flat|nested] 105+ messages in thread

* RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 23:18               ` Michael S. Tsirkin
@ 2023-02-22  1:41                 ` Parav Pandit
  2023-02-22  2:51                 ` [virtio-dev] " Heng Qi
  1 sibling, 0 replies; 105+ messages in thread
From: Parav Pandit @ 2023-02-22  1:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, February 21, 2023 6:18 PM

> > The question of discussion was,
> > Scenario:
> > 1. device advertises the ability to hash on the inner packet header.
> > 2. device prefers that driver enable it only when it needs to use this extra
> packet parser in hardware.
> >
> > There are 3 options.
> > a. Because the feature is negotiated, it means it is enabled for all the tunnel
> types.
> > Pros:
> > 1. No need to extend cvq cmd.
> > Cons:
> > 1. device parser is always enabled, and the driver never uses it. This may
> result in inferior rx performance.
> >
> > b. Since the feature is useful in a narrow case of sw-based vxlan etc driver,
> better not to enable hw for it.
> > Hence, have the knob to explicitly enable in hw.
> > So have the cvq command.
> > b.1 should it be combined with the existing command?
> > Cons:
> > a. when the driver wants to enable hash on inner, it needs to supply the exact
> same RSS config as before. Sw overhead with no gain.
> > b. device needs to parse new command value, compare with old config, and
> drop the RSS config, just enable inner hashing hw parser.
> > Or destroy the old rss config and re-apply. This results in weird behavior for
> the short interval with no apparent gain.
> >
> > b.2 should it be on its own command?
> > Pros:
> > a. device and driver doesn't need to bother about b.1.a and b.1.b.
> > b. still benefits from not always enabling hw parser, as this is not a common
> case.
> > c. has the ability to enable when needed.
> 
> I prefer b.1. With reporting of the tunnel type gone I don't see a fundamental
> difference between hashing over tunneling types and other protocol types we
> support.  It's just a flag telling device over which bits to calculate the hash. We
> don't have a separate command for hashing of TCPv6, why have it for vxlan?
b.1 to always enable hw for multi-level packet processing is not very optimal for actual device implementation.
The difference is one level of header vs second-level hashing.
And new hash type values have zero use of it in sw.

> Extending with more HASH_TYPE makes total sense to me, seems to fit better
> with the existing design and will make patch smaller.


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 19:29     ` Parav Pandit
  2023-02-21 21:23       ` Michael S. Tsirkin
@ 2023-02-22  2:34       ` Heng Qi
  2023-02-22  6:21         ` Michael S. Tsirkin
  1 sibling, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-02-22  2:34 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Jason Wang, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo



在 2023/2/22 上午3:29, Parav Pandit 写道:
>
>> From: Michael S. Tsirkin <mst@redhat.com>
>> Sent: Tuesday, February 21, 2023 12:06 PM
>>
>> On Tue, Feb 21, 2023 at 04:20:59AM +0000, Parav Pandit wrote:
>>>> From: Heng Qi <hengqi@linux.alibaba.com>
>>>> Sent: Saturday, February 18, 2023 9:37 AM
>>>> If the tunnel is used to encapsulate the packets, the hash
>>>> calculated using the
>>> s/hash calculated/hash is calculated
>>>
>>>> outer header of the receive packets is always fixed for the same
>>>> flow packets, i.e. they will be steered to the same receive queue.
>>>>
>>> A little descriptive commit message like below reads better to me.
>>> Currently, when a received packet is an encapsulated packet meaning there
>> is an outer and an inner header, virtio device is unable to calculate the hash for
>> the inner header.
>>> Due to this limitation, multiple different flows identified by the inner header
>> for the same outer header result in selecting the same receive queue.
>>> This effectively disables the RSS, resulting in poor receive performance.
>>>
>>> Hence, to overcome this limitation, a new feature is introduced using a
>> feature bit VIRTIO_NET_F_HASH_TUNNEL.
>>> This feature enables the device to advertise the capability to calculate the
>> hash for the inner packet header.
>>> Thereby regaining better RSS performance in presence of outer packet
>> header.
>>
>> I think this is a good description however Parav I think it is important to make
>> contributors write their own commit messages so they know what is the reason
>> for the proposed change.
> Sure. Contributor can rewrite it.
>
>> What's good for the goose is good for the gander -
>> contributors should explain why their change to spec is benefitial but reviewers
>> should also explain why their changes to the patch are benefitial, and "reads
>> better to me" does not cut it - it does not allow the contributor to improve with
>> time.  It's more than about a single contribution, see?
>>
> I provided an example template on how to write problem_description -> solution commit log.
>
> At the beginning, I said "little descriptive" to explain why it reads better.
> But it seems, even more verbosity is needed even for the reviewer to suggest.
> I didn't see this often happening by other reviewers, but I make a note of it now, at least I can improve from this feedback.
>
> I imagined that the contributor would see the pattern as problem->solution in the example commit log and follow in the future patches.

Yes, we describe commits in the "as is -> problem -> solution" pattern, 
and it's good to refer to some better examples for clarity.

> Giving example of current patch was best to describe how to write it.
>
>> In this case I would say the issue is that motivation for the change is never
>> explained.
>>> When a specific receive queue is shared to receive packets of multiple
>> tunnels, there is no quality of service for packets of multiple tunnels.
>>
>> "shared to receive" is not grammatical either :)
>>
> "Shared by multiple tunnels" will make it grammatical?
>
>> If you are talking about a security risk you need to explain
>> 1- what is the threat, what configurations are affected.
>> 2- what is the attack type: DOS, information leak, etc.
>> 3- how to mitigate it
>>
>> This text touches a bit on 1 and 2 but not in an ordererly way.
>>
>>
> it is best effort based.
>
> #3 is outside the scope of this patch set.
>
>>> +
>>>> +This can pose several security risks:
>>>> +\begin{itemize}
>>>> +\item  Encapsulated packets in the normal tunnels cannot be
>>>> +enqueued due to
>>>> queue
>>>> +       overflow, resulting in a large amount of packet loss.
>>>> +\item  The delay and retransmission of packets in the normal
>>>> +tunnels are
>>>> extremely increased.
>>> This is something very protocol specific and doesn't belong here.
>> I don't see how it's specific - many protocols have retransmission and are
>> affected by delays. "extremely increased" sounds unrammatical to me though.
>>
>>
> I am not sure where you want to lead this discussion.
> I am trying to help the spec and feature definition to be compact enough to progress.

Thanks!

>
> It is specific to a protocol(s) and somehow arbitrarily concluded with a large number of packet losses.
> Maybe only one ICMP packet got dropped and retransmit was just one packet.
> Maybe it was TCP with selective retransmit enabled/disabled.
>
> As far as receive side is concerned, it should say that there is no QoS among different tunnels.
> The user will figure out how to mitigate when such QoS is not available. Either to run in best-effort mode or mitigate differently.

Yes, our cloud security and cloud network team will configure and use 
inner hash on dpdk. In fact I discussed with them the security issues 
between tunnels,
and I will quote their solutions to tunnel attacks below, but this is a 
problem between the tunnels, not the introduction of inner hash.
I don't think we need to focus too much on this, but I'll do my best to 
describe the security issues between tunnels in v10.

"
This is not a problem with the inner hash, it is a general problem with 
the outer hash.
I communicated with our people who are doing cloud security (they are 
also one of the demanders of inner hash),
and it is a common problem for one tunnel to attack another tunnel.

For example, there is a tunnel t1; a tunnel t2; a tunnel endpoint VTEP0, 
and the vni id of t1 is id1, and the vni id of v2 is id2; a VM.

At this time, regardless of the inner hash or the outer hash, the 
traffic of tunnel t1 and tunnel t2 will reach the VM through VTEP0 
(whether it is a single queue or multiple queues),
and may be placed on the same queue to cause queue overflow.

# Solutions:
1. Some current forwarding tools such as DPDK have good forwarding 
performance, and it is difficult to fill up the queue;
2. or switch the attack traffic to the attack clusters;
3. or connect the traffic of different tunnels to different network card 
ports or network devices.
4..
"

>
>>>> +\item  The user can observe the traffic information and enqueue
>>>> +information
>>>> of other normal
>>>> +       tunnels, and conduct targeted DoS attacks.
>>> Once hash_report_tunnel_types is removed, this second attack is no longer
>> applicable.
>>> Hence, please remove this too.
>>
>> ?
>> I don't get how removing a field helps DoS.
>>
> I meant for the "observe and enqueue" part of the tunnel as not applicable.
>
>> \begin{lstlisting}  struct virtio_net_rss_config {
>>>>       le32 hash_types;
>>>> +    le32 hash_tunnel_types;
>>> This field is not needed as device config space advertisement for the support
>> is enough.
>>> If the intent is to enable hashing for the specific tunnel(s), an individual
>> command is better.
>>
>> new command? I am not sure why we want that. why not handle tunnels like
>> we do other protocols?
> I didn't follow.
> We probably discussed in another thread that to set M bits, it is wise to avoid setting N other bits just to keep the command happy, where N >>> M and these N have a very strong relation in hw resource setup and packet steering.
> Any examples of 'other protocols'?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 23:18               ` Michael S. Tsirkin
  2023-02-22  1:41                 ` Parav Pandit
@ 2023-02-22  2:51                 ` Heng Qi
  1 sibling, 0 replies; 105+ messages in thread
From: Heng Qi @ 2023-02-22  2:51 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit
  Cc: virtio-comment, virtio-dev, Jason Wang, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo



在 2023/2/22 上午7:18, Michael S. Tsirkin 写道:
> On Tue, Feb 21, 2023 at 10:32:11PM +0000, Parav Pandit wrote:
>>> From: Michael S. Tsirkin <mst@redhat.com>
>>> Sent: Tuesday, February 21, 2023 4:46 PM
>>>
>>> What is this information driver can't observe? It sees all the packets after all,
>>> we are not stripping tunneling headers.
>> Just the tunnel type.
>> If/when that tunnel header is stripped, it gets complicated where tunnel type is still present in the virtio_net_hdr because hash_report_tunnel feature bit is negotiated.
> whoever strips off the tunnel has I imagine strip off the virtio net hdr
> too - everything else in it such as gso type refers to the outer packet.
>
>>> I also don't really know what are upper layer drivers - for sure layering of
>>> drivers is not covered in the spec for now so I am not sure what do you mean by
>>> that.  The risk I mentioned is leaking the information *on the network*.
>>>
>> Got it.
>>
>>>
>>>
>>>>>>> \begin{lstlisting}  struct virtio_net_rss_config {
>>>>>>>>>       le32 hash_types;
>>>>>>>>> +    le32 hash_tunnel_types;
>>>>>>>> This field is not needed as device config space advertisement
>>>>>>>> for the support
>>>>>>> is enough.
>>>>>>>> If the intent is to enable hashing for the specific tunnel(s),
>>>>>>>> an individual
>>>>>>> command is better.
>>>>>>>
>>>>>>> new command? I am not sure why we want that. why not handle
>>>>>>> tunnels like we do other protocols?
>>>>>> I didn't follow.
>>>>>> We probably discussed in another thread that to set M bits, it is
>>>>>> wise to avoid
>>>>> setting N other bits just to keep the command happy, where N >>> M
>>>>> and these N have a very strong relation in hw resource setup and packet
>>> steering.
>>>>>> Any examples of 'other protocols'?
>>>>> #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
>>>>> #define VIRTIO_NET_HASH_TYPE_TCPv4             (1 << 1)
>>>>> #define VIRTIO_NET_HASH_TYPE_UDPv4             (1 << 2)
>>>>>
>>>>> this kind of thing.
>>>>>
>>>>> I don't see how a tunnel is different fundamentally. Why does it
>>>>> need its own field?
>>>> Driver is in control to enable/disable tunnel based inner hash acceleration
>>> only when its needed.
>>>> This way certain data path hw parsers can be enabled/disabled.
>>>> Without this it will be always enabled even if there may not be any user of it.
>>>> Device has scope to optimize this flow.
>>> I feel you misunderstand the question. Or maybe I misunderstand what you are
>>> proposing.  So tunnels need their own bits. But why a separate field and not just
>>> more bits along the existing ones?
>> Because the hashing is not covering the outer header contents.
>>
>> We may be still not discussing the same.
>> So let me refresh the context.
>>
>> The question of discussion was,
>> Scenario:
>> 1. device advertises the ability to hash on the inner packet header.
>> 2. device prefers that driver enable it only when it needs to use this extra packet parser in hardware.
>>
>> There are 3 options.
>> a. Because the feature is negotiated, it means it is enabled for all the tunnel types.
>> Pros:
>> 1. No need to extend cvq cmd.
>> Cons:
>> 1. device parser is always enabled, and the driver never uses it. This may result in inferior rx performance.
>>
>> b. Since the feature is useful in a narrow case of sw-based vxlan etc driver, better not to enable hw for it.
>> Hence, have the knob to explicitly enable in hw.
>> So have the cvq command.
>> b.1 should it be combined with the existing command?
>> Cons:
>> a. when the driver wants to enable hash on inner, it needs to supply the exact same RSS config as before. Sw overhead with no gain.
>> b. device needs to parse new command value, compare with old config, and drop the RSS config, just enable inner hashing hw parser.
>> Or destroy the old rss config and re-apply. This results in weird behavior for the short interval with no apparent gain.
>>
>> b.2 should it be on its own command?
>> Pros:
>> a. device and driver doesn't need to bother about b.1.a and b.1.b.
>> b. still benefits from not always enabling hw parser, as this is not a common case.
>> c. has the ability to enable when needed.
> I prefer b.1. With reporting of the tunnel type gone I don't see a
> fundamental difference between hashing over tunneling types and other
> protocol types we support.  It's just a flag telling device over which
> bits to calculate the hash. We don't have a separate command for hashing
> of TCPv6, why have it for vxlan?  Extending with more HASH_TYPE makes
> total sense to me, seems to fit better with the existing design and will
> make patch smaller.

+1.

It is infrequent to configure the *tunnel hash types* through commands, 
and when configuring the *hash types*,
the hash key and indirection table are not required too.

>
>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-21 17:50 ` Michael S. Tsirkin
@ 2023-02-22  3:22   ` Jason Wang
  2023-02-22  6:46     ` Heng Qi
  0 siblings, 1 reply; 105+ messages in thread
From: Jason Wang @ 2023-02-22  3:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo


在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>> +\subparagraph{Security risks between encapsulated packets and RSS}
>> +There may be potential security risks when encapsulated packets using RSS to
>> +select queues for placement. When a user inside a tunnel tries to control the


What do you mean by "user" here? Is it a remote or local one?


>> +enqueuing of encapsulated packets, then the user can flood the device with invaild
>> +packets, and the flooded packets may be hashed into the same queue as packets in
>> +other normal tunnels, which causing the queue to overflow.
>> +
>> +This can pose several security risks:
>> +\begin{itemize}
>> +\item  Encapsulated packets in the normal tunnels cannot be enqueued due to queue
>> +       overflow, resulting in a large amount of packet loss.
>> +\item  The delay and retransmission of packets in the normal tunnels are extremely increased.
>> +\item  The user can observe the traffic information and enqueue information of other normal
>> +       tunnels, and conduct targeted DoS attacks.
>> +\end{\itemize}
>> +
> Hmm with this all written out it sounds pretty severe.


I think we need first understand whether or not it's a problem that we 
need to solve at spec level:

1) anything make encapsulated packets different or why we can't hit this 
problem without encapsulation

2) whether or not it's the implementation details that the spec doesn't 
need to care (or how it is solved in real NIC)

Thanks


> At this point with no ways to mitigate, I don't feel this is something
> e.g. Linux can enable.  I am not going to nack the spec patch if
> others  find this somehow useful e.g. for dpdk.
> How about CC e.g. dpdk devs or whoever else is going to use this
> and asking them for the opinion?
>
>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-22  2:34       ` [virtio-dev] " Heng Qi
@ 2023-02-22  6:21         ` Michael S. Tsirkin
  2023-02-22  7:03           ` Heng Qi
  0 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-22  6:21 UTC (permalink / raw)
  To: Heng Qi
  Cc: Parav Pandit, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Wed, Feb 22, 2023 at 10:34:39AM +0800, Heng Qi wrote:
> > The user will figure out how to mitigate when such QoS is not available. Either to run in best-effort mode or mitigate differently.
> 
> Yes, our cloud security and cloud network team will configure and use inner
> hash on dpdk.

Sounds good. More practical for dpdk than Linux.
Is there a chance that when the interface is close
to be final, but before the vote, you post a patch to the dpdk list and
get some acks from the maintainers, cc virtio-dev. This way we won't
merge something that will then go unused?
That would be best - do you have a prototype?

> In fact I discussed with them the security issues between
> tunnels,
> and I will quote their solutions to tunnel attacks below, but this is a
> problem between the tunnels, not the introduction of inner hash.
> I don't think we need to focus too much on this, but I'll do my best to
> describe the security issues between tunnels in v10.
> 
> "
> This is not a problem with the inner hash, it is a general problem with the
> outer hash.
> I communicated with our people who are doing cloud security (they are also
> one of the demanders of inner hash),
> and it is a common problem for one tunnel to attack another tunnel.
> 
> For example, there is a tunnel t1; a tunnel t2; a tunnel endpoint VTEP0, and
> the vni id of t1 is id1, and the vni id of v2 is id2; a VM.
> 
> At this time, regardless of the inner hash or the outer hash, the traffic of
> tunnel t1 and tunnel t2 will reach the VM through VTEP0 (whether it is a
> single queue or multiple queues),
> and may be placed on the same queue to cause queue overflow.

Do note (and explain in spec?) that with just an outer hash and RSS it
is possible to configure the tunnels to use distict queues. Impossible
with this interface but arguably only works for a small number of
tunnels anyway.

> # Solutions:

More like mitigations.

> 1. Some current forwarding tools such as DPDK have good forwarding
> performance, and it is difficult to fill up the queue;

Oh that's a good point. If driver is generally faster than the device
and queues stay away from filling up there's no DoS.
I'd add this to the spec.

> 2. or switch the attack traffic to the attack clusters;

What is that?

> 3. or connect the traffic of different tunnels to different network card
> ports or network devices.

Not sure how this is relevant. These a distinct outer MAC - with this
why do we need a tunnel?

> 4..
> "


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-22  3:22   ` Jason Wang
@ 2023-02-22  6:46     ` Heng Qi
  2023-02-22 11:30       ` Michael S. Tsirkin
  2023-02-23  2:50       ` Jason Wang
  0 siblings, 2 replies; 105+ messages in thread
From: Heng Qi @ 2023-02-22  6:46 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo

Hi, Jason. Long time no see. :)

在 2023/2/22 上午11:22, Jason Wang 写道:
>
> 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
>> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>>> +\subparagraph{Security risks between encapsulated packets and RSS}
>>> +There may be potential security risks when encapsulated packets 
>>> using RSS to
>>> +select queues for placement. When a user inside a tunnel tries to 
>>> control the
>
>
> What do you mean by "user" here? Is it a remote or local one?
>

I mean a remote attacker who is not under the control of the tunnel owner.

Thanks.

>
>>> +enqueuing of encapsulated packets, then the user can flood the 
>>> device with invaild
>>> +packets, and the flooded packets may be hashed into the same queue 
>>> as packets in
>>> +other normal tunnels, which causing the queue to overflow.
>>> +
>>> +This can pose several security risks:
>>> +\begin{itemize}
>>> +\item  Encapsulated packets in the normal tunnels cannot be 
>>> enqueued due to queue
>>> +       overflow, resulting in a large amount of packet loss.
>>> +\item  The delay and retransmission of packets in the normal 
>>> tunnels are extremely increased.
>>> +\item  The user can observe the traffic information and enqueue 
>>> information of other normal
>>> +       tunnels, and conduct targeted DoS attacks.
>>> +\end{\itemize}
>>> +
>> Hmm with this all written out it sounds pretty severe.
>
>
> I think we need first understand whether or not it's a problem that we 
> need to solve at spec level:
>
> 1) anything make encapsulated packets different or why we can't hit 
> this problem without encapsulation
>
> 2) whether or not it's the implementation details that the spec 
> doesn't need to care (or how it is solved in real NIC)
>
> Thanks
>
>
>> At this point with no ways to mitigate, I don't feel this is something
>> e.g. Linux can enable.  I am not going to nack the spec patch if
>> others  find this somehow useful e.g. for dpdk.
>> How about CC e.g. dpdk devs or whoever else is going to use this
>> and asking them for the opinion?
>>
>>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-22  6:21         ` Michael S. Tsirkin
@ 2023-02-22  7:03           ` Heng Qi
  2023-02-22 11:29             ` Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-02-22  7:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



在 2023/2/22 下午2:21, Michael S. Tsirkin 写道:
> On Wed, Feb 22, 2023 at 10:34:39AM +0800, Heng Qi wrote:
>>> The user will figure out how to mitigate when such QoS is not available. Either to run in best-effort mode or mitigate differently.
>> Yes, our cloud security and cloud network team will configure and use inner
>> hash on dpdk.
> Sounds good. More practical for dpdk than Linux.
> Is there a chance that when the interface is close
> to be final, but before the vote, you post a patch to the dpdk list and
> get some acks from the maintainers, cc virtio-dev. This way we won't
> merge something that will then go unused?
> That would be best - do you have a prototype?

Not yet, dpdk and the business team are waiting for our virtio 
specification, and
they have stated as a business team that their implementation on dpdk 
will not necessarily be open sourced to the community.😅

>
>> In fact I discussed with them the security issues between
>> tunnels,
>> and I will quote their solutions to tunnel attacks below, but this is a
>> problem between the tunnels, not the introduction of inner hash.
>> I don't think we need to focus too much on this, but I'll do my best to
>> describe the security issues between tunnels in v10.
>>
>> "
>> This is not a problem with the inner hash, it is a general problem with the
>> outer hash.
>> I communicated with our people who are doing cloud security (they are also
>> one of the demanders of inner hash),
>> and it is a common problem for one tunnel to attack another tunnel.
>>
>> For example, there is a tunnel t1; a tunnel t2; a tunnel endpoint VTEP0, and
>> the vni id of t1 is id1, and the vni id of v2 is id2; a VM.
>>
>> At this time, regardless of the inner hash or the outer hash, the traffic of
>> tunnel t1 and tunnel t2 will reach the VM through VTEP0 (whether it is a
>> single queue or multiple queues),
>> and may be placed on the same queue to cause queue overflow.
> Do note (and explain in spec?) that with just an outer hash and RSS it
> is possible to configure the tunnels to use distict queues. Impossible
> with this interface but arguably only works for a small number of
> tunnels anyway.
>
>> # Solutions:
> More like mitigations.

Yes, you are right.

>
>> 1. Some current forwarding tools such as DPDK have good forwarding
>> performance, and it is difficult to fill up the queue;
> Oh that's a good point. If driver is generally faster than the device
> and queues stay away from filling up there's no DoS.
> I'd add this to the spec.

Ok.

>
>> 2. or switch the attack traffic to the attack clusters;
> What is that?

This is done by the monitoring part outside the tunnel, which is also an 
important mitigation method they mentioned
to prevent DoS between tunnels. For example, the monitoring part cuts 
off, limits or redirects the abnormal traffic of the tunnel.

>
>> 3. or connect the traffic of different tunnels to different network card
>> ports or network devices.
> Not sure how this is relevant. These a distinct outer MAC - with this
> why do we need a tunnel?
>
>> 4..
>> "


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-22  7:03           ` Heng Qi
@ 2023-02-22 11:29             ` Michael S. Tsirkin
  0 siblings, 0 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-22 11:29 UTC (permalink / raw)
  To: Heng Qi
  Cc: Parav Pandit, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Wed, Feb 22, 2023 at 03:03:32PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/2/22 下午2:21, Michael S. Tsirkin 写道:
> > On Wed, Feb 22, 2023 at 10:34:39AM +0800, Heng Qi wrote:
> > > > The user will figure out how to mitigate when such QoS is not available. Either to run in best-effort mode or mitigate differently.
> > > Yes, our cloud security and cloud network team will configure and use inner
> > > hash on dpdk.
> > Sounds good. More practical for dpdk than Linux.
> > Is there a chance that when the interface is close
> > to be final, but before the vote, you post a patch to the dpdk list and
> > get some acks from the maintainers, cc virtio-dev. This way we won't
> > merge something that will then go unused?
> > That would be best - do you have a prototype?
> 
> Not yet, dpdk and the business team are waiting for our virtio
> specification, and
> they have stated as a business team that their implementation on dpdk will
> not necessarily be open sourced to the community.😅

Ugh so no open source implementations at all :(


> > 
> > > In fact I discussed with them the security issues between
> > > tunnels,
> > > and I will quote their solutions to tunnel attacks below, but this is a
> > > problem between the tunnels, not the introduction of inner hash.
> > > I don't think we need to focus too much on this, but I'll do my best to
> > > describe the security issues between tunnels in v10.
> > > 
> > > "
> > > This is not a problem with the inner hash, it is a general problem with the
> > > outer hash.
> > > I communicated with our people who are doing cloud security (they are also
> > > one of the demanders of inner hash),
> > > and it is a common problem for one tunnel to attack another tunnel.
> > > 
> > > For example, there is a tunnel t1; a tunnel t2; a tunnel endpoint VTEP0, and
> > > the vni id of t1 is id1, and the vni id of v2 is id2; a VM.
> > > 
> > > At this time, regardless of the inner hash or the outer hash, the traffic of
> > > tunnel t1 and tunnel t2 will reach the VM through VTEP0 (whether it is a
> > > single queue or multiple queues),
> > > and may be placed on the same queue to cause queue overflow.
> > Do note (and explain in spec?) that with just an outer hash and RSS it
> > is possible to configure the tunnels to use distict queues. Impossible
> > with this interface but arguably only works for a small number of
> > tunnels anyway.
> > 
> > > # Solutions:
> > More like mitigations.
> 
> Yes, you are right.
> 
> > 
> > > 1. Some current forwarding tools such as DPDK have good forwarding
> > > performance, and it is difficult to fill up the queue;
> > Oh that's a good point. If driver is generally faster than the device
> > and queues stay away from filling up there's no DoS.
> > I'd add this to the spec.
> 
> Ok.
> 
> > 
> > > 2. or switch the attack traffic to the attack clusters;
> > What is that?
> 
> This is done by the monitoring part outside the tunnel, which is also an
> important mitigation method they mentioned
> to prevent DoS between tunnels. For example, the monitoring part cuts off,
> limits or redirects the abnormal traffic of the tunnel.

This has to be outside the device though right?
Before traffic arrives at the device.

> > 
> > > 3. or connect the traffic of different tunnels to different network card
> > > ports or network devices.
> > Not sure how this is relevant. These a distinct outer MAC - with this
> > why do we need a tunnel?
> > 
> > > 4..
> > > "


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-22  6:46     ` Heng Qi
@ 2023-02-22 11:30       ` Michael S. Tsirkin
  2023-02-23  2:50       ` Jason Wang
  1 sibling, 0 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-22 11:30 UTC (permalink / raw)
  To: Heng Qi
  Cc: Jason Wang, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Wed, Feb 22, 2023 at 02:46:51PM +0800, Heng Qi wrote:
> Hi, Jason. Long time no see. :)
> 
> 在 2023/2/22 上午11:22, Jason Wang 写道:
> > 
> > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > +\subparagraph{Security risks between encapsulated packets and RSS}
> > > > +There may be potential security risks when encapsulated packets
> > > > using RSS to
> > > > +select queues for placement. When a user inside a tunnel tries
> > > > to control the
> > 
> > 
> > What do you mean by "user" here? Is it a remote or local one?
> > 
> 
> I mean a remote attacker who is not under the control of the tunnel owner.
> 
> Thanks.

OK let's just say "remote attacker" then.

> > 
> > > > +enqueuing of encapsulated packets, then the user can flood the
> > > > device with invaild
> > > > +packets, and the flooded packets may be hashed into the same
> > > > queue as packets in
> > > > +other normal tunnels, which causing the queue to overflow.
> > > > +
> > > > +This can pose several security risks:
> > > > +\begin{itemize}
> > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > enqueued due to queue
> > > > +       overflow, resulting in a large amount of packet loss.
> > > > +\item  The delay and retransmission of packets in the normal
> > > > tunnels are extremely increased.
> > > > +\item  The user can observe the traffic information and enqueue
> > > > information of other normal
> > > > +       tunnels, and conduct targeted DoS attacks.
> > > > +\end{\itemize}
> > > > +
> > > Hmm with this all written out it sounds pretty severe.
> > 
> > 
> > I think we need first understand whether or not it's a problem that we
> > need to solve at spec level:
> > 
> > 1) anything make encapsulated packets different or why we can't hit this
> > problem without encapsulation
> > 
> > 2) whether or not it's the implementation details that the spec doesn't
> > need to care (or how it is solved in real NIC)
> > 
> > Thanks
> > 
> > 
> > > At this point with no ways to mitigate, I don't feel this is something
> > > e.g. Linux can enable.  I am not going to nack the spec patch if
> > > others  find this somehow useful e.g. for dpdk.
> > > How about CC e.g. dpdk devs or whoever else is going to use this
> > > and asking them for the opinion?
> > > 
> > > 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-22  6:46     ` Heng Qi
  2023-02-22 11:30       ` Michael S. Tsirkin
@ 2023-02-23  2:50       ` Jason Wang
  2023-02-23  4:41         ` [virtio-dev] " Heng Qi
  2023-02-23 13:03         ` Michael S. Tsirkin
  1 sibling, 2 replies; 105+ messages in thread
From: Jason Wang @ 2023-02-23  2:50 UTC (permalink / raw)
  To: Heng Qi, Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo

Hi:

在 2023/2/22 14:46, Heng Qi 写道:
> Hi, Jason. Long time no see. :)
>
> 在 2023/2/22 上午11:22, Jason Wang 写道:
>>
>> 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
>>> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>>>> +\subparagraph{Security risks between encapsulated packets and RSS}
>>>> +There may be potential security risks when encapsulated packets 
>>>> using RSS to
>>>> +select queues for placement. When a user inside a tunnel tries to 
>>>> control the
>>
>>
>> What do you mean by "user" here? Is it a remote or local one?
>>
>
> I mean a remote attacker who is not under the control of the tunnel 
> owner.


Anything may the tunnel different? I think this can happen even without 
tunnel (and even with single queue).

How to mitigate those attackers seems more like a implementation details 
where might require fair queuing or other QOS technology which has been 
well studied.

It seems out of the scope of the spec (unless we want to let driver 
manageable QOS).

Thanks


>
> Thanks.
>
>>
>>>> +enqueuing of encapsulated packets, then the user can flood the 
>>>> device with invaild
>>>> +packets, and the flooded packets may be hashed into the same queue 
>>>> as packets in
>>>> +other normal tunnels, which causing the queue to overflow.
>>>> +
>>>> +This can pose several security risks:
>>>> +\begin{itemize}
>>>> +\item  Encapsulated packets in the normal tunnels cannot be 
>>>> enqueued due to queue
>>>> +       overflow, resulting in a large amount of packet loss.
>>>> +\item  The delay and retransmission of packets in the normal 
>>>> tunnels are extremely increased.
>>>> +\item  The user can observe the traffic information and enqueue 
>>>> information of other normal
>>>> +       tunnels, and conduct targeted DoS attacks.
>>>> +\end{\itemize}
>>>> +
>>> Hmm with this all written out it sounds pretty severe.
>>
>>
>> I think we need first understand whether or not it's a problem that 
>> we need to solve at spec level:
>>
>> 1) anything make encapsulated packets different or why we can't hit 
>> this problem without encapsulation
>>
>> 2) whether or not it's the implementation details that the spec 
>> doesn't need to care (or how it is solved in real NIC)
>>
>> Thanks
>>
>>
>>> At this point with no ways to mitigate, I don't feel this is something
>>> e.g. Linux can enable.  I am not going to nack the spec patch if
>>> others  find this somehow useful e.g. for dpdk.
>>> How about CC e.g. dpdk devs or whoever else is going to use this
>>> and asking them for the opinion?
>>>
>>>
>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-23  2:50       ` Jason Wang
@ 2023-02-23  4:41         ` Heng Qi
  2023-02-24  2:45           ` Jason Wang
  2023-02-23 13:03         ` Michael S. Tsirkin
  1 sibling, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-02-23  4:41 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo



在 2023/2/23 上午10:50, Jason Wang 写道:
> Hi:
>
> 在 2023/2/22 14:46, Heng Qi 写道:
>> Hi, Jason. Long time no see. :)
>>
>> 在 2023/2/22 上午11:22, Jason Wang 写道:
>>>
>>> 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
>>>> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>>>>> +\subparagraph{Security risks between encapsulated packets and RSS}
>>>>> +There may be potential security risks when encapsulated packets 
>>>>> using RSS to
>>>>> +select queues for placement. When a user inside a tunnel tries to 
>>>>> control the
>>>
>>>
>>> What do you mean by "user" here? Is it a remote or local one?
>>>
>>
>> I mean a remote attacker who is not under the control of the tunnel 
>> owner.
>
>
> Anything may the tunnel different? I think this can happen even 
> without tunnel (and even with single queue).

I agree.

>
> How to mitigate those attackers seems more like a implementation 
> details where might require fair queuing or other QOS technology which 
> has been well studied.

I am also not sure whether this point needs to be focused on in the 
spec, and I see that the protection against tunnel DoS is more protected 
outside the device,
but it seems to be okay to give some attack reminders.

Thanks.

>
> It seems out of the scope of the spec (unless we want to let driver 
> manageable QOS).
>
> Thanks
>
>
>>
>> Thanks.
>>
>>>
>>>>> +enqueuing of encapsulated packets, then the user can flood the 
>>>>> device with invaild
>>>>> +packets, and the flooded packets may be hashed into the same 
>>>>> queue as packets in
>>>>> +other normal tunnels, which causing the queue to overflow.
>>>>> +
>>>>> +This can pose several security risks:
>>>>> +\begin{itemize}
>>>>> +\item  Encapsulated packets in the normal tunnels cannot be 
>>>>> enqueued due to queue
>>>>> +       overflow, resulting in a large amount of packet loss.
>>>>> +\item  The delay and retransmission of packets in the normal 
>>>>> tunnels are extremely increased.
>>>>> +\item  The user can observe the traffic information and enqueue 
>>>>> information of other normal
>>>>> +       tunnels, and conduct targeted DoS attacks.
>>>>> +\end{\itemize}
>>>>> +
>>>> Hmm with this all written out it sounds pretty severe.
>>>
>>>
>>> I think we need first understand whether or not it's a problem that 
>>> we need to solve at spec level:
>>>
>>> 1) anything make encapsulated packets different or why we can't hit 
>>> this problem without encapsulation
>>>
>>> 2) whether or not it's the implementation details that the spec 
>>> doesn't need to care (or how it is solved in real NIC)
>>>
>>> Thanks
>>>
>>>
>>>> At this point with no ways to mitigate, I don't feel this is something
>>>> e.g. Linux can enable.  I am not going to nack the spec patch if
>>>> others  find this somehow useful e.g. for dpdk.
>>>> How about CC e.g. dpdk devs or whoever else is going to use this
>>>> and asking them for the opinion?
>>>>
>>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-23  2:50       ` Jason Wang
  2023-02-23  4:41         ` [virtio-dev] " Heng Qi
@ 2023-02-23 13:03         ` Michael S. Tsirkin
  2023-02-24  2:26           ` Jason Wang
  1 sibling, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-23 13:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Thu, Feb 23, 2023 at 10:50:48AM +0800, Jason Wang wrote:
> Hi:
> 
> 在 2023/2/22 14:46, Heng Qi 写道:
> > Hi, Jason. Long time no see. :)
> > 
> > 在 2023/2/22 上午11:22, Jason Wang 写道:
> > > 
> > > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > +\subparagraph{Security risks between encapsulated packets and RSS}
> > > > > +There may be potential security risks when encapsulated
> > > > > packets using RSS to
> > > > > +select queues for placement. When a user inside a tunnel
> > > > > tries to control the
> > > 
> > > 
> > > What do you mean by "user" here? Is it a remote or local one?
> > > 
> > 
> > I mean a remote attacker who is not under the control of the tunnel
> > owner.
> 
> 
> Anything may the tunnel different? I think this can happen even without
> tunnel (and even with single queue).

I think you are missing the fact that tunnel is normally a
security boundary: users within the tunnel can not control
what is happening outside.
The feature breaks the encapsulation somewhat.

For example without tunneling it is possible
to create a special "bad guy queue" and direct specific tunnels
there by playing with key and indirection table.

> How to mitigate those attackers seems more like a implementation details
> where might require fair queuing or other QOS technology which has been well
> studied.
> 
> It seems out of the scope of the spec (unless we want to let driver
> manageable QOS).
> 
> Thanks
> 
> 
> > 
> > Thanks.
> > 
> > > 
> > > > > +enqueuing of encapsulated packets, then the user can flood
> > > > > the device with invaild
> > > > > +packets, and the flooded packets may be hashed into the
> > > > > same queue as packets in
> > > > > +other normal tunnels, which causing the queue to overflow.
> > > > > +
> > > > > +This can pose several security risks:
> > > > > +\begin{itemize}
> > > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > > enqueued due to queue
> > > > > +       overflow, resulting in a large amount of packet loss.
> > > > > +\item  The delay and retransmission of packets in the
> > > > > normal tunnels are extremely increased.
> > > > > +\item  The user can observe the traffic information and
> > > > > enqueue information of other normal
> > > > > +       tunnels, and conduct targeted DoS attacks.
> > > > > +\end{\itemize}
> > > > > +
> > > > Hmm with this all written out it sounds pretty severe.
> > > 
> > > 
> > > I think we need first understand whether or not it's a problem that
> > > we need to solve at spec level:
> > > 
> > > 1) anything make encapsulated packets different or why we can't hit
> > > this problem without encapsulation
> > > 
> > > 2) whether or not it's the implementation details that the spec
> > > doesn't need to care (or how it is solved in real NIC)
> > > 
> > > Thanks
> > > 
> > > 
> > > > At this point with no ways to mitigate, I don't feel this is something
> > > > e.g. Linux can enable.  I am not going to nack the spec patch if
> > > > others  find this somehow useful e.g. for dpdk.
> > > > How about CC e.g. dpdk devs or whoever else is going to use this
> > > > and asking them for the opinion?
> > > > 
> > > > 
> > 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-18 14:37 [PATCH v9] virtio-net: support inner header hash Heng Qi
                   ` (2 preceding siblings ...)
  2023-02-21 17:50 ` Michael S. Tsirkin
@ 2023-02-23 13:13 ` Michael S. Tsirkin
  2023-02-23 14:40   ` [virtio-comment] " Parav Pandit
  2023-02-24  4:42   ` Heng Qi
  2023-02-28 11:16 ` Michael S. Tsirkin
  4 siblings, 2 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-23 13:13 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo, ailan

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> +\subparagraph{Security risks between encapsulated packets and RSS}
> +There may be potential security risks when encapsulated packets using RSS to
> +select queues for placement.

Is this just with RSS? I assume hash calculation is also used for
something like queueing so there's a similar risk even just
with hash reporting, no?


> When a user inside a tunnel tries to control the
> +enqueuing of encapsulated packets, then the user can flood the device with invaild
> +packets, and the flooded packets may be hashed into the same queue as packets in
> +other normal tunnels, which causing the queue to overflow.
> +
> +This can pose several security risks:
> +\begin{itemize}
> +\item  Encapsulated packets in the normal tunnels cannot be enqueued due to queue
> +       overflow, resulting in a large amount of packet loss.
> +\item  The delay and retransmission of packets in the normal tunnels are extremely increased.
> +\item  The user can observe the traffic information and enqueue information of other normal
> +       tunnels, and conduct targeted DoS attacks.
> +\end{\itemize}
> +


So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came
up with an idea: RSS indirection table entries are 16 bit but
onlu 15 bits are used to indentify an RX queue.
We can use the remaining bit as a "tunnel bit" to signal whether to use the
inner or the outer hash for queue selection.

The lookup will work like this then:

calculate outer hash
if (rss[outer hash] & tunnel bit)
then
	calculate inner hash
	return rss[inner hash] & ~tunnel bit
else
	return rss[outer hash]


this fixes the security issue returning us back to
status quo : specific tunnels can be directed to separate queues.


This is for RSS.


For hash reporting indirection table is not used.
Maybe it is enough to signal to driver that inner hash was used.
We do need that signalling though.

My question would be whether it's practical to implement in hardware.

-- 
MST


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-comment] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-23 13:13 ` Michael S. Tsirkin
@ 2023-02-23 14:40   ` Parav Pandit
  2023-02-24  8:13     ` Michael S. Tsirkin
  2023-02-24  4:42   ` Heng Qi
  1 sibling, 1 reply; 105+ messages in thread
From: Parav Pandit @ 2023-02-23 14:40 UTC (permalink / raw)
  To: Michael S. Tsirkin, Heng Qi
  Cc: virtio-comment, virtio-dev, Jason Wang, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo, ailan



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, February 23, 2023 8:14 AM
> 
> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:


> So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up with
> an idea: RSS indirection table entries are 16 bit but onlu 15 bits are used to
> indentify an RX queue.
> We can use the remaining bit as a "tunnel bit" to signal whether to use the
> inner or the outer hash for queue selection.
>
I further brainstormed internally with Saeed and Rony on this.

The inner hash is only needed for GRE, IPIP etc.
For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the source port of the outer header.
It does that based on the inner header.
Refer to [1] as one example.

[1] https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L922

> The lookup will work like this then:
> 
> calculate outer hash
> if (rss[outer hash] & tunnel bit)
Tunnel bit, you mean tunneled packet, right?

> then
> 	calculate inner hash
> 	return rss[inner hash] & ~tunnel bit
Why to end with a tunnel bit?

> else
> 	return rss[outer hash]
> 
> 
> this fixes the security issue returning us back to status quo : specific tunnels can
> be directed to separate queues.
>
The number of tunnels is far higher than the number of queues with para virt driver doing decap.
 
> 
> This is for RSS.
> 
> 
> For hash reporting indirection table is not used.
> Maybe it is enough to signal to driver that inner hash was used.
> We do need that signalling though.
> 
> My question would be whether it's practical to implement in hardware.

In above example, hw calculating double hash is difficult without much gain.
Either calculating on one inner or outer makes sense.

Signaling whether calculated on inner or outer is fine because hw exactly tells what it did.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-23 13:03         ` Michael S. Tsirkin
@ 2023-02-24  2:26           ` Jason Wang
  2023-02-24  8:06             ` [virtio-dev] " Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Jason Wang @ 2023-02-24  2:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Thu, Feb 23, 2023 at 9:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Feb 23, 2023 at 10:50:48AM +0800, Jason Wang wrote:
> > Hi:
> >
> > 在 2023/2/22 14:46, Heng Qi 写道:
> > > Hi, Jason. Long time no see. :)
> > >
> > > 在 2023/2/22 上午11:22, Jason Wang 写道:
> > > >
> > > > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > +\subparagraph{Security risks between encapsulated packets and RSS}
> > > > > > +There may be potential security risks when encapsulated
> > > > > > packets using RSS to
> > > > > > +select queues for placement. When a user inside a tunnel
> > > > > > tries to control the
> > > >
> > > >
> > > > What do you mean by "user" here? Is it a remote or local one?
> > > >
> > >
> > > I mean a remote attacker who is not under the control of the tunnel
> > > owner.
> >
> >
> > Anything may the tunnel different? I think this can happen even without
> > tunnel (and even with single queue).
>
> I think you are missing the fact that tunnel is normally a
> security boundary: users within the tunnel can not control
> what is happening outside.
> The feature breaks the encapsulation somewhat.

I'm not sure I understand here, if we allow hash based on the inner
packet, is it something that you meant the things that are happening
outside? It doesn't differ too much from the case where the tunnel is
not used. It's impossible to prevent what a remote user is trying to
send, and if there's a NIC behaviour that depends on the packet
content, the behaviour of the NIC is somehow under the control of the
remote user.

Since we only care about the device driver interface, what we can do
is probably:

1) allow the driver to disable the inner hash when it spots a
potential (D)DOS. And in the device, a fair queueing looks like a must
but it should be the implementation details.
2) hash based on both outer and inner

>
> For example without tunneling it is possible
> to create a special "bad guy queue" and direct specific tunnels
> there by playing with key and indirection table.

Anything makes the tunneling different? We can still do this via the
inner header hash, or at least we can disable the inner hash if we see
a remote DOS.

Thanks

>
> > How to mitigate those attackers seems more like a implementation details
> > where might require fair queuing or other QOS technology which has been well
> > studied.
> >
> > It seems out of the scope of the spec (unless we want to let driver
> > manageable QOS).
> >
> > Thanks
> >
> >
> > >
> > > Thanks.
> > >
> > > >
> > > > > > +enqueuing of encapsulated packets, then the user can flood
> > > > > > the device with invaild
> > > > > > +packets, and the flooded packets may be hashed into the
> > > > > > same queue as packets in
> > > > > > +other normal tunnels, which causing the queue to overflow.
> > > > > > +
> > > > > > +This can pose several security risks:
> > > > > > +\begin{itemize}
> > > > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > > > enqueued due to queue
> > > > > > +       overflow, resulting in a large amount of packet loss.
> > > > > > +\item  The delay and retransmission of packets in the
> > > > > > normal tunnels are extremely increased.
> > > > > > +\item  The user can observe the traffic information and
> > > > > > enqueue information of other normal
> > > > > > +       tunnels, and conduct targeted DoS attacks.
> > > > > > +\end{\itemize}
> > > > > > +
> > > > > Hmm with this all written out it sounds pretty severe.
> > > >
> > > >
> > > > I think we need first understand whether or not it's a problem that
> > > > we need to solve at spec level:
> > > >
> > > > 1) anything make encapsulated packets different or why we can't hit
> > > > this problem without encapsulation
> > > >
> > > > 2) whether or not it's the implementation details that the spec
> > > > doesn't need to care (or how it is solved in real NIC)
> > > >
> > > > Thanks
> > > >
> > > >
> > > > > At this point with no ways to mitigate, I don't feel this is something
> > > > > e.g. Linux can enable.  I am not going to nack the spec patch if
> > > > > others  find this somehow useful e.g. for dpdk.
> > > > > How about CC e.g. dpdk devs or whoever else is going to use this
> > > > > and asking them for the opinion?
> > > > >
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-23  4:41         ` [virtio-dev] " Heng Qi
@ 2023-02-24  2:45           ` Jason Wang
  2023-02-24  4:47             ` [virtio-comment] " Heng Qi
  2023-02-24  8:07             ` Michael S. Tsirkin
  0 siblings, 2 replies; 105+ messages in thread
From: Jason Wang @ 2023-02-24  2:45 UTC (permalink / raw)
  To: Heng Qi, Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo


在 2023/2/23 12:41, Heng Qi 写道:
>
>
> 在 2023/2/23 上午10:50, Jason Wang 写道:
>> Hi:
>>
>> 在 2023/2/22 14:46, Heng Qi 写道:
>>> Hi, Jason. Long time no see. :)
>>>
>>> 在 2023/2/22 上午11:22, Jason Wang 写道:
>>>>
>>>> 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
>>>>> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>>>>>> +\subparagraph{Security risks between encapsulated packets and RSS}
>>>>>> +There may be potential security risks when encapsulated packets 
>>>>>> using RSS to
>>>>>> +select queues for placement. When a user inside a tunnel tries 
>>>>>> to control the
>>>>
>>>>
>>>> What do you mean by "user" here? Is it a remote or local one?
>>>>
>>>
>>> I mean a remote attacker who is not under the control of the tunnel 
>>> owner.
>>
>>
>> Anything may the tunnel different? I think this can happen even 
>> without tunnel (and even with single queue).
>
> I agree.
>
>>
>> How to mitigate those attackers seems more like a implementation 
>> details where might require fair queuing or other QOS technology 
>> which has been well studied.
>
> I am also not sure whether this point needs to be focused on in the 
> spec, and I see that the protection against tunnel DoS is more 
> protected outside the device,
> but it seems to be okay to give some attack reminders.


Maybe it's sufficient to say the device should make sure the fairness 
among different flows when queuing packets?

Thanks


>
> Thanks.
>
>>
>> It seems out of the scope of the spec (unless we want to let driver 
>> manageable QOS).
>>
>> Thanks
>>
>>
>>>
>>> Thanks.
>>>
>>>>
>>>>>> +enqueuing of encapsulated packets, then the user can flood the 
>>>>>> device with invaild
>>>>>> +packets, and the flooded packets may be hashed into the same 
>>>>>> queue as packets in
>>>>>> +other normal tunnels, which causing the queue to overflow.
>>>>>> +
>>>>>> +This can pose several security risks:
>>>>>> +\begin{itemize}
>>>>>> +\item  Encapsulated packets in the normal tunnels cannot be 
>>>>>> enqueued due to queue
>>>>>> +       overflow, resulting in a large amount of packet loss.
>>>>>> +\item  The delay and retransmission of packets in the normal 
>>>>>> tunnels are extremely increased.
>>>>>> +\item  The user can observe the traffic information and enqueue 
>>>>>> information of other normal
>>>>>> +       tunnels, and conduct targeted DoS attacks.
>>>>>> +\end{\itemize}
>>>>>> +
>>>>> Hmm with this all written out it sounds pretty severe.
>>>>
>>>>
>>>> I think we need first understand whether or not it's a problem that 
>>>> we need to solve at spec level:
>>>>
>>>> 1) anything make encapsulated packets different or why we can't hit 
>>>> this problem without encapsulation
>>>>
>>>> 2) whether or not it's the implementation details that the spec 
>>>> doesn't need to care (or how it is solved in real NIC)
>>>>
>>>> Thanks
>>>>
>>>>
>>>>> At this point with no ways to mitigate, I don't feel this is 
>>>>> something
>>>>> e.g. Linux can enable.  I am not going to nack the spec patch if
>>>>> others  find this somehow useful e.g. for dpdk.
>>>>> How about CC e.g. dpdk devs or whoever else is going to use this
>>>>> and asking them for the opinion?
>>>>>
>>>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-23 13:13 ` Michael S. Tsirkin
  2023-02-23 14:40   ` [virtio-comment] " Parav Pandit
@ 2023-02-24  4:42   ` Heng Qi
  2023-02-24  8:04     ` Michael S. Tsirkin
  1 sibling, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-02-24  4:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo, ailan



在 2023/2/23 下午9:13, Michael S. Tsirkin 写道:
> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>> +\subparagraph{Security risks between encapsulated packets and RSS}
>> +There may be potential security risks when encapsulated packets using RSS to
>> +select queues for placement.
> Is this just with RSS? I assume hash calculation is also used for
> something like queueing so there's a similar risk even just
> with hash reporting, no?

I don't understand why it would be risky to just report hash when not 
used for queuing,
and even we don't report whether hash come from inner or outer now 
because there is no more hash_report_tunnel.

>
>> When a user inside a tunnel tries to control the
>> +enqueuing of encapsulated packets, then the user can flood the device with invaild
>> +packets, and the flooded packets may be hashed into the same queue as packets in
>> +other normal tunnels, which causing the queue to overflow.
>> +
>> +This can pose several security risks:
>> +\begin{itemize}
>> +\item  Encapsulated packets in the normal tunnels cannot be enqueued due to queue
>> +       overflow, resulting in a large amount of packet loss.
>> +\item  The delay and retransmission of packets in the normal tunnels are extremely increased.
>> +\item  The user can observe the traffic information and enqueue information of other normal
>> +       tunnels, and conduct targeted DoS attacks.
>> +\end{\itemize}
>> +
>
> So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came
> up with an idea: RSS indirection table entries are 16 bit but
> onlu 15 bits are used to indentify an RX queue.
> We can use the remaining bit as a "tunnel bit" to signal whether to use the
> inner or the outer hash for queue selection.
>
> The lookup will work like this then:
>
> calculate outer hash
> if (rss[outer hash] & tunnel bit)

How a tunnel bit distinguishes between multiple tunnel types, and I 
think it is not so reasonable to use the
indirection table to determine the switch of the inner hash. The inner 
hash is only the ability to calculate the hash,
and does not involve the indirection table.

Thanks.

> then
> 	calculate inner hash
> 	return rss[inner hash] & ~tunnel bit
> else
> 	return rss[outer hash]
>
>
> this fixes the security issue returning us back to
> status quo : specific tunnels can be directed to separate queues.
>
>
> This is for RSS.
>
>
> For hash reporting indirection table is not used.
> Maybe it is enough to signal to driver that inner hash was used.
> We do need that signalling though.
>
> My question would be whether it's practical to implement in hardware.
>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-24  2:45           ` Jason Wang
@ 2023-02-24  4:47             ` Heng Qi
  2023-02-24  8:07             ` Michael S. Tsirkin
  1 sibling, 0 replies; 105+ messages in thread
From: Heng Qi @ 2023-02-24  4:47 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo, ailan



在 2023/2/24 上午10:45, Jason Wang 写道:
>
> 在 2023/2/23 12:41, Heng Qi 写道:
>>
>>
>> 在 2023/2/23 上午10:50, Jason Wang 写道:
>>> Hi:
>>>
>>> 在 2023/2/22 14:46, Heng Qi 写道:
>>>> Hi, Jason. Long time no see. :)
>>>>
>>>> 在 2023/2/22 上午11:22, Jason Wang 写道:
>>>>>
>>>>> 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
>>>>>> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>>>>>>> +\subparagraph{Security risks between encapsulated packets and RSS}
>>>>>>> +There may be potential security risks when encapsulated packets 
>>>>>>> using RSS to
>>>>>>> +select queues for placement. When a user inside a tunnel tries 
>>>>>>> to control the
>>>>>
>>>>>
>>>>> What do you mean by "user" here? Is it a remote or local one?
>>>>>
>>>>
>>>> I mean a remote attacker who is not under the control of the tunnel 
>>>> owner.
>>>
>>>
>>> Anything may the tunnel different? I think this can happen even 
>>> without tunnel (and even with single queue).
>>
>> I agree.
>>
>>>
>>> How to mitigate those attackers seems more like a implementation 
>>> details where might require fair queuing or other QOS technology 
>>> which has been well studied.
>>
>> I am also not sure whether this point needs to be focused on in the 
>> spec, and I see that the protection against tunnel DoS is more 
>> protected outside the device,
>> but it seems to be okay to give some attack reminders.
>
>
> Maybe it's sufficient to say the device should make sure the fairness 
> among different flows when queuing packets?

Yes, maybe the device does not guarantee QoS or needs to guarantee 
enqueue fairness between flows.

Thanks.

>
> Thanks
>
>
>>
>> Thanks.
>>
>>>
>>> It seems out of the scope of the spec (unless we want to let driver 
>>> manageable QOS).
>>>
>>> Thanks
>>>
>>>
>>>>
>>>> Thanks.
>>>>
>>>>>
>>>>>>> +enqueuing of encapsulated packets, then the user can flood the 
>>>>>>> device with invaild
>>>>>>> +packets, and the flooded packets may be hashed into the same 
>>>>>>> queue as packets in
>>>>>>> +other normal tunnels, which causing the queue to overflow.
>>>>>>> +
>>>>>>> +This can pose several security risks:
>>>>>>> +\begin{itemize}
>>>>>>> +\item  Encapsulated packets in the normal tunnels cannot be 
>>>>>>> enqueued due to queue
>>>>>>> +       overflow, resulting in a large amount of packet loss.
>>>>>>> +\item  The delay and retransmission of packets in the normal 
>>>>>>> tunnels are extremely increased.
>>>>>>> +\item  The user can observe the traffic information and enqueue 
>>>>>>> information of other normal
>>>>>>> +       tunnels, and conduct targeted DoS attacks.
>>>>>>> +\end{\itemize}
>>>>>>> +
>>>>>> Hmm with this all written out it sounds pretty severe.
>>>>>
>>>>>
>>>>> I think we need first understand whether or not it's a problem 
>>>>> that we need to solve at spec level:
>>>>>
>>>>> 1) anything make encapsulated packets different or why we can't 
>>>>> hit this problem without encapsulation
>>>>>
>>>>> 2) whether or not it's the implementation details that the spec 
>>>>> doesn't need to care (or how it is solved in real NIC)
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>> At this point with no ways to mitigate, I don't feel this is 
>>>>>> something
>>>>>> e.g. Linux can enable.  I am not going to nack the spec patch if
>>>>>> others  find this somehow useful e.g. for dpdk.
>>>>>> How about CC e.g. dpdk devs or whoever else is going to use this
>>>>>> and asking them for the opinion?
>>>>>>
>>>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>
>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: 
> https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-24  4:42   ` Heng Qi
@ 2023-02-24  8:04     ` Michael S. Tsirkin
  0 siblings, 0 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-24  8:04 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo, ailan

On Fri, Feb 24, 2023 at 12:42:40PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/2/23 下午9:13, Michael S. Tsirkin 写道:
> > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > +\subparagraph{Security risks between encapsulated packets and RSS}
> > > +There may be potential security risks when encapsulated packets using RSS to
> > > +select queues for placement.
> > Is this just with RSS? I assume hash calculation is also used for
> > something like queueing so there's a similar risk even just
> > with hash reporting, no?
> 
> I don't understand why it would be risky to just report hash when not used
> for queuing,
> and even we don't report whether hash come from inner or outer now because
> there is no more hash_report_tunnel.

Well what is the hash used for? Presumably it's used for queueing within
driver, no? Collisions there then have exactly the same effect as queue
collisions in the device.

> > 
> > > When a user inside a tunnel tries to control the
> > > +enqueuing of encapsulated packets, then the user can flood the device with invaild
> > > +packets, and the flooded packets may be hashed into the same queue as packets in
> > > +other normal tunnels, which causing the queue to overflow.
> > > +
> > > +This can pose several security risks:
> > > +\begin{itemize}
> > > +\item  Encapsulated packets in the normal tunnels cannot be enqueued due to queue
> > > +       overflow, resulting in a large amount of packet loss.
> > > +\item  The delay and retransmission of packets in the normal tunnels are extremely increased.
> > > +\item  The user can observe the traffic information and enqueue information of other normal
> > > +       tunnels, and conduct targeted DoS attacks.
> > > +\end{\itemize}
> > > +
> > 
> > So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came
> > up with an idea: RSS indirection table entries are 16 bit but
> > onlu 15 bits are used to indentify an RX queue.
> > We can use the remaining bit as a "tunnel bit" to signal whether to use the
> > inner or the outer hash for queue selection.
> > 
> > The lookup will work like this then:
> > 
> > calculate outer hash
> > if (rss[outer hash] & tunnel bit)
> 
> How a tunnel bit distinguishes between multiple tunnel types, and I think it
> is not so reasonable to use the
> indirection table to determine the switch of the inner hash. The inner hash
> is only the ability to calculate the hash,
> and does not involve the indirection table.
> 
> Thanks.
> 
> > then
> > 	calculate inner hash
> > 	return rss[inner hash] & ~tunnel bit
> > else
> > 	return rss[outer hash]
> > 
> > 
> > this fixes the security issue returning us back to
> > status quo : specific tunnels can be directed to separate queues.
> > 
> > 
> > This is for RSS.
> > 
> > 
> > For hash reporting indirection table is not used.
> > Maybe it is enough to signal to driver that inner hash was used.
> > We do need that signalling though.
> > 
> > My question would be whether it's practical to implement in hardware.
> > 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-24  2:26           ` Jason Wang
@ 2023-02-24  8:06             ` Michael S. Tsirkin
  2023-02-27  4:07               ` Jason Wang
  0 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-24  8:06 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Fri, Feb 24, 2023 at 10:26:30AM +0800, Jason Wang wrote:
> On Thu, Feb 23, 2023 at 9:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Feb 23, 2023 at 10:50:48AM +0800, Jason Wang wrote:
> > > Hi:
> > >
> > > 在 2023/2/22 14:46, Heng Qi 写道:
> > > > Hi, Jason. Long time no see. :)
> > > >
> > > > 在 2023/2/22 上午11:22, Jason Wang 写道:
> > > > >
> > > > > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > +\subparagraph{Security risks between encapsulated packets and RSS}
> > > > > > > +There may be potential security risks when encapsulated
> > > > > > > packets using RSS to
> > > > > > > +select queues for placement. When a user inside a tunnel
> > > > > > > tries to control the
> > > > >
> > > > >
> > > > > What do you mean by "user" here? Is it a remote or local one?
> > > > >
> > > >
> > > > I mean a remote attacker who is not under the control of the tunnel
> > > > owner.
> > >
> > >
> > > Anything may the tunnel different? I think this can happen even without
> > > tunnel (and even with single queue).
> >
> > I think you are missing the fact that tunnel is normally a
> > security boundary: users within the tunnel can not control
> > what is happening outside.
> > The feature breaks the encapsulation somewhat.
> 
> I'm not sure I understand here, if we allow hash based on the inner
> packet, is it something that you meant the things that are happening
> outside? It doesn't differ too much from the case where the tunnel is
> not used. It's impossible to prevent what a remote user is trying to
> send, and if there's a NIC behaviour that depends on the packet
> content, the behaviour of the NIC is somehow under the control of the
> remote user.
> 
> Since we only care about the device driver interface, what we can do
> is probably:
> 
> 1) allow the driver to disable the inner hash when it spots a
> potential (D)DOS. And in the device, a fair queueing looks like a must
> but it should be the implementation details.

this breaks rss

> 2) hash based on both outer and inner

this might help a bit

> >
> > For example without tunneling it is possible
> > to create a special "bad guy queue" and direct specific tunnels
> > there by playing with key and indirection table.
> 
> Anything makes the tunneling different? We can still do this via the
> inner header hash, or at least we can disable the inner hash if we see
> a remote DOS.
> 
> Thanks

the difference is that tunneling is used for security/partitioning.

> >
> > > How to mitigate those attackers seems more like a implementation details
> > > where might require fair queuing or other QOS technology which has been well
> > > studied.
> > >
> > > It seems out of the scope of the spec (unless we want to let driver
> > > manageable QOS).
> > >
> > > Thanks
> > >
> > >
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > > > > +enqueuing of encapsulated packets, then the user can flood
> > > > > > > the device with invaild
> > > > > > > +packets, and the flooded packets may be hashed into the
> > > > > > > same queue as packets in
> > > > > > > +other normal tunnels, which causing the queue to overflow.
> > > > > > > +
> > > > > > > +This can pose several security risks:
> > > > > > > +\begin{itemize}
> > > > > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > > > > enqueued due to queue
> > > > > > > +       overflow, resulting in a large amount of packet loss.
> > > > > > > +\item  The delay and retransmission of packets in the
> > > > > > > normal tunnels are extremely increased.
> > > > > > > +\item  The user can observe the traffic information and
> > > > > > > enqueue information of other normal
> > > > > > > +       tunnels, and conduct targeted DoS attacks.
> > > > > > > +\end{\itemize}
> > > > > > > +
> > > > > > Hmm with this all written out it sounds pretty severe.
> > > > >
> > > > >
> > > > > I think we need first understand whether or not it's a problem that
> > > > > we need to solve at spec level:
> > > > >
> > > > > 1) anything make encapsulated packets different or why we can't hit
> > > > > this problem without encapsulation
> > > > >
> > > > > 2) whether or not it's the implementation details that the spec
> > > > > doesn't need to care (or how it is solved in real NIC)
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > > At this point with no ways to mitigate, I don't feel this is something
> > > > > > e.g. Linux can enable.  I am not going to nack the spec patch if
> > > > > > others  find this somehow useful e.g. for dpdk.
> > > > > > How about CC e.g. dpdk devs or whoever else is going to use this
> > > > > > and asking them for the opinion?
> > > > > >
> > > > > >
> > > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-24  2:45           ` Jason Wang
  2023-02-24  4:47             ` [virtio-comment] " Heng Qi
@ 2023-02-24  8:07             ` Michael S. Tsirkin
  1 sibling, 0 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-24  8:07 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Fri, Feb 24, 2023 at 10:45:13AM +0800, Jason Wang wrote:
> 
> 在 2023/2/23 12:41, Heng Qi 写道:
> > 
> > 
> > 在 2023/2/23 上午10:50, Jason Wang 写道:
> > > Hi:
> > > 
> > > 在 2023/2/22 14:46, Heng Qi 写道:
> > > > Hi, Jason. Long time no see. :)
> > > > 
> > > > 在 2023/2/22 上午11:22, Jason Wang 写道:
> > > > > 
> > > > > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > +\subparagraph{Security risks between encapsulated packets and RSS}
> > > > > > > +There may be potential security risks when
> > > > > > > encapsulated packets using RSS to
> > > > > > > +select queues for placement. When a user inside a
> > > > > > > tunnel tries to control the
> > > > > 
> > > > > 
> > > > > What do you mean by "user" here? Is it a remote or local one?
> > > > > 
> > > > 
> > > > I mean a remote attacker who is not under the control of the
> > > > tunnel owner.
> > > 
> > > 
> > > Anything may the tunnel different? I think this can happen even
> > > without tunnel (and even with single queue).
> > 
> > I agree.
> > 
> > > 
> > > How to mitigate those attackers seems more like a implementation
> > > details where might require fair queuing or other QOS technology
> > > which has been well studied.
> > 
> > I am also not sure whether this point needs to be focused on in the
> > spec, and I see that the protection against tunnel DoS is more protected
> > outside the device,
> > but it seems to be okay to give some attack reminders.
> 
> 
> Maybe it's sufficient to say the device should make sure the fairness among
> different flows when queuing packets?
> 
> Thanks

that isn't really achievable.

> 
> > 
> > Thanks.
> > 
> > > 
> > > It seems out of the scope of the spec (unless we want to let driver
> > > manageable QOS).
> > > 
> > > Thanks
> > > 
> > > 
> > > > 
> > > > Thanks.
> > > > 
> > > > > 
> > > > > > > +enqueuing of encapsulated packets, then the user
> > > > > > > can flood the device with invaild
> > > > > > > +packets, and the flooded packets may be hashed into
> > > > > > > the same queue as packets in
> > > > > > > +other normal tunnels, which causing the queue to overflow.
> > > > > > > +
> > > > > > > +This can pose several security risks:
> > > > > > > +\begin{itemize}
> > > > > > > +\item  Encapsulated packets in the normal tunnels
> > > > > > > cannot be enqueued due to queue
> > > > > > > +       overflow, resulting in a large amount of packet loss.
> > > > > > > +\item  The delay and retransmission of packets in
> > > > > > > the normal tunnels are extremely increased.
> > > > > > > +\item  The user can observe the traffic information
> > > > > > > and enqueue information of other normal
> > > > > > > +       tunnels, and conduct targeted DoS attacks.
> > > > > > > +\end{\itemize}
> > > > > > > +
> > > > > > Hmm with this all written out it sounds pretty severe.
> > > > > 
> > > > > 
> > > > > I think we need first understand whether or not it's a
> > > > > problem that we need to solve at spec level:
> > > > > 
> > > > > 1) anything make encapsulated packets different or why we
> > > > > can't hit this problem without encapsulation
> > > > > 
> > > > > 2) whether or not it's the implementation details that the
> > > > > spec doesn't need to care (or how it is solved in real NIC)
> > > > > 
> > > > > Thanks
> > > > > 
> > > > > 
> > > > > > At this point with no ways to mitigate, I don't feel
> > > > > > this is something
> > > > > > e.g. Linux can enable.  I am not going to nack the spec patch if
> > > > > > others  find this somehow useful e.g. for dpdk.
> > > > > > How about CC e.g. dpdk devs or whoever else is going to use this
> > > > > > and asking them for the opinion?
> > > > > > 
> > > > > > 
> > > > 
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-23 14:40   ` [virtio-comment] " Parav Pandit
@ 2023-02-24  8:13     ` Michael S. Tsirkin
  2023-02-24 14:38       ` [virtio-dev] " Heng Qi
  2023-02-27  0:29       ` Parav Pandit
  0 siblings, 2 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-24  8:13 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo, ailan

On Thu, Feb 23, 2023 at 02:40:46PM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, February 23, 2023 8:14 AM
> > 
> > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> 
> 
> > So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up with
> > an idea: RSS indirection table entries are 16 bit but onlu 15 bits are used to
> > indentify an RX queue.
> > We can use the remaining bit as a "tunnel bit" to signal whether to use the
> > inner or the outer hash for queue selection.
> >
> I further brainstormed internally with Saeed and Rony on this.
> 
> The inner hash is only needed for GRE, IPIP etc.
> For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the source port of the outer header.
> It does that based on the inner header.
> Refer to [1] as one example.
> 
> [1] https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L922

But I think hash was requested for RSS with dpdk, no?

> > The lookup will work like this then:
> > 
> > calculate outer hash
> > if (rss[outer hash] & tunnel bit)
> Tunnel bit, you mean tunneled packet, right?

this idea stores a bit in the indirection table
which signals which of the hashes to use for rss

> > then
> > 	calculate inner hash
> > 	return rss[inner hash] & ~tunnel bit
> Why to end with a tunnel bit?


this just clears the bit so we end up with a vq number.

> > else
> > 	return rss[outer hash]
> > 
> > 
> > this fixes the security issue returning us back to status quo : specific tunnels can
> > be directed to separate queues.
> >
> The number of tunnels is far higher than the number of queues with para virt driver doing decap.

True. This seeks to get us back to where we are before the feature:
driver can send specific outer hashes to specific queues.
outer hash collisions remain a problem.


> > 
> > This is for RSS.
> > 
> > 
> > For hash reporting indirection table is not used.
> > Maybe it is enough to signal to driver that inner hash was used.
> > We do need that signalling though.
> > 
> > My question would be whether it's practical to implement in hardware.
> 
> In above example, hw calculating double hash is difficult without much gain.
> Either calculating on one inner or outer makes sense.
> 
> Signaling whether calculated on inner or outer is fine because hw exactly tells what it did.

This, in a sense, is what reporting hash tunnel type did.
Do you now think we need it?

-- 
MST


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-24  8:13     ` Michael S. Tsirkin
@ 2023-02-24 14:38       ` Heng Qi
  2023-02-24 17:10         ` Michael S. Tsirkin
  2023-02-27  0:29       ` Parav Pandit
  1 sibling, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-02-24 14:38 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit
  Cc: virtio-comment, virtio-dev, Jason Wang, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo, ailan



在 2023/2/24 下午4:13, Michael S. Tsirkin 写道:
> On Thu, Feb 23, 2023 at 02:40:46PM +0000, Parav Pandit wrote:
>>
>>> From: Michael S. Tsirkin <mst@redhat.com>
>>> Sent: Thursday, February 23, 2023 8:14 AM
>>>
>>> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>>
>>> So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up with
>>> an idea: RSS indirection table entries are 16 bit but onlu 15 bits are used to
>>> indentify an RX queue.
>>> We can use the remaining bit as a "tunnel bit" to signal whether to use the
>>> inner or the outer hash for queue selection.
>>>
>> I further brainstormed internally with Saeed and Rony on this.
>>
>> The inner hash is only needed for GRE, IPIP etc.
>> For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the source port of the outer header.
>> It does that based on the inner header.
>> Refer to [1] as one example.
>>
>> [1] https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L922
> But I think hash was requested for RSS with dpdk, no?

I think yes, at least probably the first customer to use the feature 
might be dpdk.:)


>
>>> The lookup will work like this then:
>>>
>>> calculate outer hash
>>> if (rss[outer hash] & tunnel bit)
>> Tunnel bit, you mean tunneled packet, right?
> this idea stores a bit in the indirection table
> which signals which of the hashes to use for rss

This allows inner hash to have the ability to select a queue and place 
packets to the queue (that is, parallel to RSS),
which seems to be different from our discussion before v9. 🙁

Thanks.

>
>>> then
>>> 	calculate inner hash
>>> 	return rss[inner hash] & ~tunnel bit
>> Why to end with a tunnel bit?
>
> this just clears the bit so we end up with a vq number.
>
>>> else
>>> 	return rss[outer hash]
>>>
>>>
>>> this fixes the security issue returning us back to status quo : specific tunnels can
>>> be directed to separate queues.
>>>
>> The number of tunnels is far higher than the number of queues with para virt driver doing decap.
> True. This seeks to get us back to where we are before the feature:
> driver can send specific outer hashes to specific queues.
> outer hash collisions remain a problem.
>
>
>>> This is for RSS.
>>>
>>>
>>> For hash reporting indirection table is not used.
>>> Maybe it is enough to signal to driver that inner hash was used.
>>> We do need that signalling though.
>>>
>>> My question would be whether it's practical to implement in hardware.
>> In above example, hw calculating double hash is difficult without much gain.
>> Either calculating on one inner or outer makes sense.
>>
>> Signaling whether calculated on inner or outer is fine because hw exactly tells what it did.
> This, in a sense, is what reporting hash tunnel type did.
> Do you now think we need it?
>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-24 14:38       ` [virtio-dev] " Heng Qi
@ 2023-02-24 17:10         ` Michael S. Tsirkin
  2023-02-24 17:10           ` Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-24 17:10 UTC (permalink / raw)
  To: Heng Qi
  Cc: Parav Pandit, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo, ailan

On Fri, Feb 24, 2023 at 10:38:37PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/2/24 下午4:13, Michael S. Tsirkin 写道:
> > On Thu, Feb 23, 2023 at 02:40:46PM +0000, Parav Pandit wrote:
> > > 
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Thursday, February 23, 2023 8:14 AM
> > > > 
> > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > 
> > > > So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up with
> > > > an idea: RSS indirection table entries are 16 bit but onlu 15 bits are used to
> > > > indentify an RX queue.
> > > > We can use the remaining bit as a "tunnel bit" to signal whether to use the
> > > > inner or the outer hash for queue selection.
> > > > 
> > > I further brainstormed internally with Saeed and Rony on this.
> > > 
> > > The inner hash is only needed for GRE, IPIP etc.
> > > For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the source port of the outer header.
> > > It does that based on the inner header.
> > > Refer to [1] as one example.
> > > 
> > > [1] https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L922
> > But I think hash was requested for RSS with dpdk, no?
> 
> I think yes, at least probably the first customer to use the feature might
> be dpdk.:)
> 
> 
> > 
> > > > The lookup will work like this then:
> > > > 
> > > > calculate outer hash
> > > > if (rss[outer hash] & tunnel bit)
> > > Tunnel bit, you mean tunneled packet, right?
> > this idea stores a bit in the indirection table
> > which signals which of the hashes to use for rss
> 
> This allows inner hash to have the ability to select a queue and place
> packets to the queue (that is, parallel to RSS),
> which seems to be different from our discussion before v9. 🙁
> 
> Thanks.

Not exactly. The idea is that we start with outer hash.
Based on that we use rss table to decide whether to use the inner hash.

Given that Parav claims it's difficult to implement in
hardware I'm not insisting this idea is included
in the patchset. We can add it later.


> > 
> > > > then
> > > > 	calculate inner hash
> > > > 	return rss[inner hash] & ~tunnel bit
> > > Why to end with a tunnel bit?
> > 
> > this just clears the bit so we end up with a vq number.
> > 
> > > > else
> > > > 	return rss[outer hash]
> > > > 
> > > > 
> > > > this fixes the security issue returning us back to status quo : specific tunnels can
> > > > be directed to separate queues.
> > > > 
> > > The number of tunnels is far higher than the number of queues with para virt driver doing decap.
> > True. This seeks to get us back to where we are before the feature:
> > driver can send specific outer hashes to specific queues.
> > outer hash collisions remain a problem.
> > 
> > 
> > > > This is for RSS.
> > > > 
> > > > 
> > > > For hash reporting indirection table is not used.
> > > > Maybe it is enough to signal to driver that inner hash was used.
> > > > We do need that signalling though.
> > > > 
> > > > My question would be whether it's practical to implement in hardware.
> > > In above example, hw calculating double hash is difficult without much gain.
> > > Either calculating on one inner or outer makes sense.
> > > 
> > > Signaling whether calculated on inner or outer is fine because hw exactly tells what it did.
> > This, in a sense, is what reporting hash tunnel type did.
> > Do you now think we need it?
> > 


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-24 17:10         ` Michael S. Tsirkin
@ 2023-02-24 17:10           ` Michael S. Tsirkin
  0 siblings, 0 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-24 17:10 UTC (permalink / raw)
  To: Heng Qi
  Cc: Parav Pandit, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo, ailan

On Fri, Feb 24, 2023 at 10:38:37PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/2/24 下午4:13, Michael S. Tsirkin 写道:
> > On Thu, Feb 23, 2023 at 02:40:46PM +0000, Parav Pandit wrote:
> > > 
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Thursday, February 23, 2023 8:14 AM
> > > > 
> > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > 
> > > > So for RSS specifically, we brain-stormed with Amnon (Cc'd) and came up with
> > > > an idea: RSS indirection table entries are 16 bit but onlu 15 bits are used to
> > > > indentify an RX queue.
> > > > We can use the remaining bit as a "tunnel bit" to signal whether to use the
> > > > inner or the outer hash for queue selection.
> > > > 
> > > I further brainstormed internally with Saeed and Rony on this.
> > > 
> > > The inner hash is only needed for GRE, IPIP etc.
> > > For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the source port of the outer header.
> > > It does that based on the inner header.
> > > Refer to [1] as one example.
> > > 
> > > [1] https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L922
> > But I think hash was requested for RSS with dpdk, no?
> 
> I think yes, at least probably the first customer to use the feature might
> be dpdk.:)
> 
> 
> > 
> > > > The lookup will work like this then:
> > > > 
> > > > calculate outer hash
> > > > if (rss[outer hash] & tunnel bit)
> > > Tunnel bit, you mean tunneled packet, right?
> > this idea stores a bit in the indirection table
> > which signals which of the hashes to use for rss
> 
> This allows inner hash to have the ability to select a queue and place
> packets to the queue (that is, parallel to RSS),
> which seems to be different from our discussion before v9. 🙁
> 
> Thanks.

Not exactly. The idea is that we start with outer hash.
Based on that we use rss table to decide whether to use the inner hash.

Given that Parav claims it's difficult to implement in
hardware I'm not insisting this idea is included
in the patchset. We can add it later.


> > 
> > > > then
> > > > 	calculate inner hash
> > > > 	return rss[inner hash] & ~tunnel bit
> > > Why to end with a tunnel bit?
> > 
> > this just clears the bit so we end up with a vq number.
> > 
> > > > else
> > > > 	return rss[outer hash]
> > > > 
> > > > 
> > > > this fixes the security issue returning us back to status quo : specific tunnels can
> > > > be directed to separate queues.
> > > > 
> > > The number of tunnels is far higher than the number of queues with para virt driver doing decap.
> > True. This seeks to get us back to where we are before the feature:
> > driver can send specific outer hashes to specific queues.
> > outer hash collisions remain a problem.
> > 
> > 
> > > > This is for RSS.
> > > > 
> > > > 
> > > > For hash reporting indirection table is not used.
> > > > Maybe it is enough to signal to driver that inner hash was used.
> > > > We do need that signalling though.
> > > > 
> > > > My question would be whether it's practical to implement in hardware.
> > > In above example, hw calculating double hash is difficult without much gain.
> > > Either calculating on one inner or outer makes sense.
> > > 
> > > Signaling whether calculated on inner or outer is fine because hw exactly tells what it did.
> > This, in a sense, is what reporting hash tunnel type did.
> > Do you now think we need it?
> > 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-24  8:13     ` Michael S. Tsirkin
  2023-02-24 14:38       ` [virtio-dev] " Heng Qi
@ 2023-02-27  0:29       ` Parav Pandit
  2023-02-27  0:29         ` [virtio-dev] " Parav Pandit
  1 sibling, 1 reply; 105+ messages in thread
From: Parav Pandit @ 2023-02-27  0:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo, ailan


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Friday, February 24, 2023 3:13 AM

[..]
> > The inner hash is only needed for GRE, IPIP etc.
> > For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the
> source port of the outer header.
> > It does that based on the inner header.
> > Refer to [1] as one example.
> >
> > [1]
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L9
> > 22
> 
> But I think hash was requested for RSS with dpdk, no?
> 
Yes, but if src inserts the entropy (dpdk or kernel), UDP based tunnel can live with outer header hash.
Ip over ip, gre tunnels needs benefit if queues do not overflow or the processing is fast enough as Heng explained.


> > > The lookup will work like this then:
> > >
> > > calculate outer hash
> > > if (rss[outer hash] & tunnel bit)
> > Tunnel bit, you mean tunneled packet, right?
> 
> this idea stores a bit in the indirection table which signals which of the hashes
> to use for rss
> 
> > > then
> > > 	calculate inner hash
> > > 	return rss[inner hash] & ~tunnel bit
> > Why to end with a tunnel bit?
> 
> 
> this just clears the bit so we end up with a vq number.
> 
> > > else
> > > 	return rss[outer hash]
> > >
> > >

Above scheme partitions the rss indirection table into two parts.
1. one for tunnel processing
2. second without it. (this one uses outer hash as today)

When #1 is done in your example, it is without hierarchy.
So inner hash can still result in collision, as before in same VQ. 
Say VQ 0,1,2,3.
Indirection is setup so that 0,1 has tunnel bit set.
2,3 has tunnel bit cleared.
Rss of our hash finds it true and inner hash for two different tunnel is still maps to single VQ.

> > > this fixes the security issue returning us back to status quo :
> > > specific tunnels can be directed to separate queues.
> > >
> > The number of tunnels is far higher than the number of queues with para virt
> driver doing decap.
> 
> True. This seeks to get us back to where we are before the feature:
> driver can send specific outer hashes to specific queues.
> outer hash collisions remain a problem.
> 
So far mlx5 device has done hash on inner header for non udp.

For steering packets to specific queues is done by flow programming to the specific RQs which supports for outer, and inner both.
Ethtool -config-nfc has it for long time too, such flow steering is due for virtio net too.
It is orthogonal to RSS.

> 
> > >
> > > This is for RSS.
> > >
> > >
> > > For hash reporting indirection table is not used.
> > > Maybe it is enough to signal to driver that inner hash was used.
> > > We do need that signalling though.
> > >
> > > My question would be whether it's practical to implement in hardware.
> >
> > In above example, hw calculating double hash is difficult without much gain.
> > Either calculating on one inner or outer makes sense.
> >
> > Signaling whether calculated on inner or outer is fine because hw exactly tells
> what it did.
> 
> This, in a sense, is what reporting hash tunnel type did.
> Do you now think we need it?

I don't see a consumer sw of it.


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] RE: [PATCH v9] virtio-net: support inner header hash
  2023-02-27  0:29       ` Parav Pandit
@ 2023-02-27  0:29         ` Parav Pandit
  0 siblings, 0 replies; 105+ messages in thread
From: Parav Pandit @ 2023-02-27  0:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo, ailan


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Friday, February 24, 2023 3:13 AM

[..]
> > The inner hash is only needed for GRE, IPIP etc.
> > For VXLAN and NVGRE Linux kernel transmit side uses the entropy of the
> source port of the outer header.
> > It does that based on the inner header.
> > Refer to [1] as one example.
> >
> > [1]
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/geneve.c#L9
> > 22
> 
> But I think hash was requested for RSS with dpdk, no?
> 
Yes, but if src inserts the entropy (dpdk or kernel), UDP based tunnel can live with outer header hash.
Ip over ip, gre tunnels needs benefit if queues do not overflow or the processing is fast enough as Heng explained.


> > > The lookup will work like this then:
> > >
> > > calculate outer hash
> > > if (rss[outer hash] & tunnel bit)
> > Tunnel bit, you mean tunneled packet, right?
> 
> this idea stores a bit in the indirection table which signals which of the hashes
> to use for rss
> 
> > > then
> > > 	calculate inner hash
> > > 	return rss[inner hash] & ~tunnel bit
> > Why to end with a tunnel bit?
> 
> 
> this just clears the bit so we end up with a vq number.
> 
> > > else
> > > 	return rss[outer hash]
> > >
> > >

Above scheme partitions the rss indirection table into two parts.
1. one for tunnel processing
2. second without it. (this one uses outer hash as today)

When #1 is done in your example, it is without hierarchy.
So inner hash can still result in collision, as before in same VQ. 
Say VQ 0,1,2,3.
Indirection is setup so that 0,1 has tunnel bit set.
2,3 has tunnel bit cleared.
Rss of our hash finds it true and inner hash for two different tunnel is still maps to single VQ.

> > > this fixes the security issue returning us back to status quo :
> > > specific tunnels can be directed to separate queues.
> > >
> > The number of tunnels is far higher than the number of queues with para virt
> driver doing decap.
> 
> True. This seeks to get us back to where we are before the feature:
> driver can send specific outer hashes to specific queues.
> outer hash collisions remain a problem.
> 
So far mlx5 device has done hash on inner header for non udp.

For steering packets to specific queues is done by flow programming to the specific RQs which supports for outer, and inner both.
Ethtool -config-nfc has it for long time too, such flow steering is due for virtio net too.
It is orthogonal to RSS.

> 
> > >
> > > This is for RSS.
> > >
> > >
> > > For hash reporting indirection table is not used.
> > > Maybe it is enough to signal to driver that inner hash was used.
> > > We do need that signalling though.
> > >
> > > My question would be whether it's practical to implement in hardware.
> >
> > In above example, hw calculating double hash is difficult without much gain.
> > Either calculating on one inner or outer makes sense.
> >
> > Signaling whether calculated on inner or outer is fine because hw exactly tells
> what it did.
> 
> This, in a sense, is what reporting hash tunnel type did.
> Do you now think we need it?

I don't see a consumer sw of it.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-24  8:06             ` [virtio-dev] " Michael S. Tsirkin
@ 2023-02-27  4:07               ` Jason Wang
  2023-02-27  4:07                 ` [virtio-dev] " Jason Wang
  2023-02-27  7:39                 ` Michael S. Tsirkin
  0 siblings, 2 replies; 105+ messages in thread
From: Jason Wang @ 2023-02-27  4:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Fri, Feb 24, 2023 at 4:06 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Feb 24, 2023 at 10:26:30AM +0800, Jason Wang wrote:
> > On Thu, Feb 23, 2023 at 9:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Feb 23, 2023 at 10:50:48AM +0800, Jason Wang wrote:
> > > > Hi:
> > > >
> > > > 在 2023/2/22 14:46, Heng Qi 写道:
> > > > > Hi, Jason. Long time no see. :)
> > > > >
> > > > > 在 2023/2/22 上午11:22, Jason Wang 写道:
> > > > > >
> > > > > > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > > +\subparagraph{Security risks between encapsulated packets and RSS}
> > > > > > > > +There may be potential security risks when encapsulated
> > > > > > > > packets using RSS to
> > > > > > > > +select queues for placement. When a user inside a tunnel
> > > > > > > > tries to control the
> > > > > >
> > > > > >
> > > > > > What do you mean by "user" here? Is it a remote or local one?
> > > > > >
> > > > >
> > > > > I mean a remote attacker who is not under the control of the tunnel
> > > > > owner.
> > > >
> > > >
> > > > Anything may the tunnel different? I think this can happen even without
> > > > tunnel (and even with single queue).
> > >
> > > I think you are missing the fact that tunnel is normally a
> > > security boundary: users within the tunnel can not control
> > > what is happening outside.
> > > The feature breaks the encapsulation somewhat.
> >
> > I'm not sure I understand here, if we allow hash based on the inner
> > packet, is it something that you meant the things that are happening
> > outside? It doesn't differ too much from the case where the tunnel is
> > not used. It's impossible to prevent what a remote user is trying to
> > send, and if there's a NIC behaviour that depends on the packet
> > content, the behaviour of the NIC is somehow under the control of the
> > remote user.
> >
> > Since we only care about the device driver interface, what we can do
> > is probably:
> >
> > 1) allow the driver to disable the inner hash when it spots a
> > potential (D)DOS. And in the device, a fair queueing looks like a must
> > but it should be the implementation details.
>
> this breaks rss

Not sure I get here, what I want to say is that the issue described
here is not something than can be addressed in the level of hashing or
RSS. It needs to be processed before the packet can arrive at any hash
filters in the RX pipeline.

There's probably no need to mention in now consider we haven't (or
there's probably no need to) defined a full RX pipeline.

>
> > 2) hash based on both outer and inner
>
> this might help a bit
>
> > >
> > > For example without tunneling it is possible
> > > to create a special "bad guy queue" and direct specific tunnels
> > > there by playing with key and indirection table.
> >
> > Anything makes the tunneling different? We can still do this via the
> > inner header hash, or at least we can disable the inner hash if we see
> > a remote DOS.
> >
> > Thanks
>
> the difference is that tunneling is used for security/partitioning.

The problem is that we don't/can't support all type of tunnel. It
should be no difference with an old virtio-net device without tunnel
hashing.

Btw, this kind of 1:1 hash features seems not scalable and flexible.
It requires an endless extension on bits/fields. Modern NICs allow the
user to customize the hash calculation, for virtio-net we can allow to
use eBPF program to classify the packets. It seems to be more flexible
and scalable and there's almost no maintain burden in the spec (only
bytecode is required, no need any fancy features/interactions like
maps), easy to be migrated etc.

Prototype is also easy, tun/tap had an eBPF classifier for years.

Thanks

>
> > >
> > > > How to mitigate those attackers seems more like a implementation details
> > > > where might require fair queuing or other QOS technology which has been well
> > > > studied.
> > > >
> > > > It seems out of the scope of the spec (unless we want to let driver
> > > > manageable QOS).
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > > >
> > > > > > > > +enqueuing of encapsulated packets, then the user can flood
> > > > > > > > the device with invaild
> > > > > > > > +packets, and the flooded packets may be hashed into the
> > > > > > > > same queue as packets in
> > > > > > > > +other normal tunnels, which causing the queue to overflow.
> > > > > > > > +
> > > > > > > > +This can pose several security risks:
> > > > > > > > +\begin{itemize}
> > > > > > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > > > > > enqueued due to queue
> > > > > > > > +       overflow, resulting in a large amount of packet loss.
> > > > > > > > +\item  The delay and retransmission of packets in the
> > > > > > > > normal tunnels are extremely increased.
> > > > > > > > +\item  The user can observe the traffic information and
> > > > > > > > enqueue information of other normal
> > > > > > > > +       tunnels, and conduct targeted DoS attacks.
> > > > > > > > +\end{\itemize}
> > > > > > > > +
> > > > > > > Hmm with this all written out it sounds pretty severe.
> > > > > >
> > > > > >
> > > > > > I think we need first understand whether or not it's a problem that
> > > > > > we need to solve at spec level:
> > > > > >
> > > > > > 1) anything make encapsulated packets different or why we can't hit
> > > > > > this problem without encapsulation
> > > > > >
> > > > > > 2) whether or not it's the implementation details that the spec
> > > > > > doesn't need to care (or how it is solved in real NIC)
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > > At this point with no ways to mitigate, I don't feel this is something
> > > > > > > e.g. Linux can enable.  I am not going to nack the spec patch if
> > > > > > > others  find this somehow useful e.g. for dpdk.
> > > > > > > How about CC e.g. dpdk devs or whoever else is going to use this
> > > > > > > and asking them for the opinion?
> > > > > > >
> > > > > > >
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-27  4:07               ` Jason Wang
@ 2023-02-27  4:07                 ` Jason Wang
  2023-02-27  7:39                 ` Michael S. Tsirkin
  1 sibling, 0 replies; 105+ messages in thread
From: Jason Wang @ 2023-02-27  4:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Fri, Feb 24, 2023 at 4:06 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Feb 24, 2023 at 10:26:30AM +0800, Jason Wang wrote:
> > On Thu, Feb 23, 2023 at 9:03 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Feb 23, 2023 at 10:50:48AM +0800, Jason Wang wrote:
> > > > Hi:
> > > >
> > > > 在 2023/2/22 14:46, Heng Qi 写道:
> > > > > Hi, Jason. Long time no see. :)
> > > > >
> > > > > 在 2023/2/22 上午11:22, Jason Wang 写道:
> > > > > >
> > > > > > 在 2023/2/22 01:50, Michael S. Tsirkin 写道:
> > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > > +\subparagraph{Security risks between encapsulated packets and RSS}
> > > > > > > > +There may be potential security risks when encapsulated
> > > > > > > > packets using RSS to
> > > > > > > > +select queues for placement. When a user inside a tunnel
> > > > > > > > tries to control the
> > > > > >
> > > > > >
> > > > > > What do you mean by "user" here? Is it a remote or local one?
> > > > > >
> > > > >
> > > > > I mean a remote attacker who is not under the control of the tunnel
> > > > > owner.
> > > >
> > > >
> > > > Anything may the tunnel different? I think this can happen even without
> > > > tunnel (and even with single queue).
> > >
> > > I think you are missing the fact that tunnel is normally a
> > > security boundary: users within the tunnel can not control
> > > what is happening outside.
> > > The feature breaks the encapsulation somewhat.
> >
> > I'm not sure I understand here, if we allow hash based on the inner
> > packet, is it something that you meant the things that are happening
> > outside? It doesn't differ too much from the case where the tunnel is
> > not used. It's impossible to prevent what a remote user is trying to
> > send, and if there's a NIC behaviour that depends on the packet
> > content, the behaviour of the NIC is somehow under the control of the
> > remote user.
> >
> > Since we only care about the device driver interface, what we can do
> > is probably:
> >
> > 1) allow the driver to disable the inner hash when it spots a
> > potential (D)DOS. And in the device, a fair queueing looks like a must
> > but it should be the implementation details.
>
> this breaks rss

Not sure I get here, what I want to say is that the issue described
here is not something than can be addressed in the level of hashing or
RSS. It needs to be processed before the packet can arrive at any hash
filters in the RX pipeline.

There's probably no need to mention in now consider we haven't (or
there's probably no need to) defined a full RX pipeline.

>
> > 2) hash based on both outer and inner
>
> this might help a bit
>
> > >
> > > For example without tunneling it is possible
> > > to create a special "bad guy queue" and direct specific tunnels
> > > there by playing with key and indirection table.
> >
> > Anything makes the tunneling different? We can still do this via the
> > inner header hash, or at least we can disable the inner hash if we see
> > a remote DOS.
> >
> > Thanks
>
> the difference is that tunneling is used for security/partitioning.

The problem is that we don't/can't support all type of tunnel. It
should be no difference with an old virtio-net device without tunnel
hashing.

Btw, this kind of 1:1 hash features seems not scalable and flexible.
It requires an endless extension on bits/fields. Modern NICs allow the
user to customize the hash calculation, for virtio-net we can allow to
use eBPF program to classify the packets. It seems to be more flexible
and scalable and there's almost no maintain burden in the spec (only
bytecode is required, no need any fancy features/interactions like
maps), easy to be migrated etc.

Prototype is also easy, tun/tap had an eBPF classifier for years.

Thanks

>
> > >
> > > > How to mitigate those attackers seems more like a implementation details
> > > > where might require fair queuing or other QOS technology which has been well
> > > > studied.
> > > >
> > > > It seems out of the scope of the spec (unless we want to let driver
> > > > manageable QOS).
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > > >
> > > > > > > > +enqueuing of encapsulated packets, then the user can flood
> > > > > > > > the device with invaild
> > > > > > > > +packets, and the flooded packets may be hashed into the
> > > > > > > > same queue as packets in
> > > > > > > > +other normal tunnels, which causing the queue to overflow.
> > > > > > > > +
> > > > > > > > +This can pose several security risks:
> > > > > > > > +\begin{itemize}
> > > > > > > > +\item  Encapsulated packets in the normal tunnels cannot be
> > > > > > > > enqueued due to queue
> > > > > > > > +       overflow, resulting in a large amount of packet loss.
> > > > > > > > +\item  The delay and retransmission of packets in the
> > > > > > > > normal tunnels are extremely increased.
> > > > > > > > +\item  The user can observe the traffic information and
> > > > > > > > enqueue information of other normal
> > > > > > > > +       tunnels, and conduct targeted DoS attacks.
> > > > > > > > +\end{\itemize}
> > > > > > > > +
> > > > > > > Hmm with this all written out it sounds pretty severe.
> > > > > >
> > > > > >
> > > > > > I think we need first understand whether or not it's a problem that
> > > > > > we need to solve at spec level:
> > > > > >
> > > > > > 1) anything make encapsulated packets different or why we can't hit
> > > > > > this problem without encapsulation
> > > > > >
> > > > > > 2) whether or not it's the implementation details that the spec
> > > > > > doesn't need to care (or how it is solved in real NIC)
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > > At this point with no ways to mitigate, I don't feel this is something
> > > > > > > e.g. Linux can enable.  I am not going to nack the spec patch if
> > > > > > > others  find this somehow useful e.g. for dpdk.
> > > > > > > How about CC e.g. dpdk devs or whoever else is going to use this
> > > > > > > and asking them for the opinion?
> > > > > > >
> > > > > > >
> > > > >
> > >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-27  4:07               ` Jason Wang
  2023-02-27  4:07                 ` [virtio-dev] " Jason Wang
@ 2023-02-27  7:39                 ` Michael S. Tsirkin
  2023-02-27  7:39                   ` [virtio-dev] " Michael S. Tsirkin
  2023-02-27  8:35                   ` Jason Wang
  1 sibling, 2 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-27  7:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> Btw, this kind of 1:1 hash features seems not scalable and flexible.
> It requires an endless extension on bits/fields. Modern NICs allow the
> user to customize the hash calculation, for virtio-net we can allow to
> use eBPF program to classify the packets. It seems to be more flexible
> and scalable and there's almost no maintain burden in the spec (only
> bytecode is required, no need any fancy features/interactions like
> maps), easy to be migrated etc.
> 
> Prototype is also easy, tun/tap had an eBPF classifier for years.
> 
> Thanks

Yea BPF offload would be great to have. We have been discussing it for
years though - security issues keep blocking it. *Maybe* it's finally
going to be there but I'm not going to block this work waiting for BPF
offload.  And easily migrated is what BPF is not.

-- 
MST


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-27  7:39                 ` Michael S. Tsirkin
@ 2023-02-27  7:39                   ` Michael S. Tsirkin
  2023-02-27  8:35                   ` Jason Wang
  1 sibling, 0 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-27  7:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> Btw, this kind of 1:1 hash features seems not scalable and flexible.
> It requires an endless extension on bits/fields. Modern NICs allow the
> user to customize the hash calculation, for virtio-net we can allow to
> use eBPF program to classify the packets. It seems to be more flexible
> and scalable and there's almost no maintain burden in the spec (only
> bytecode is required, no need any fancy features/interactions like
> maps), easy to be migrated etc.
> 
> Prototype is also easy, tun/tap had an eBPF classifier for years.
> 
> Thanks

Yea BPF offload would be great to have. We have been discussing it for
years though - security issues keep blocking it. *Maybe* it's finally
going to be there but I'm not going to block this work waiting for BPF
offload.  And easily migrated is what BPF is not.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-27  7:39                 ` Michael S. Tsirkin
  2023-02-27  7:39                   ` [virtio-dev] " Michael S. Tsirkin
@ 2023-02-27  8:35                   ` Jason Wang
  2023-02-27  8:35                     ` [virtio-dev] " Jason Wang
                                       ` (2 more replies)
  1 sibling, 3 replies; 105+ messages in thread
From: Jason Wang @ 2023-02-27  8:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > It requires an endless extension on bits/fields. Modern NICs allow the
> > user to customize the hash calculation, for virtio-net we can allow to
> > use eBPF program to classify the packets. It seems to be more flexible
> > and scalable and there's almost no maintain burden in the spec (only
> > bytecode is required, no need any fancy features/interactions like
> > maps), easy to be migrated etc.
> >
> > Prototype is also easy, tun/tap had an eBPF classifier for years.
> >
> > Thanks
>
> Yea BPF offload would be great to have. We have been discussing it for
> years though - security issues keep blocking it. *Maybe* it's finally
> going to be there but I'm not going to block this work waiting for BPF
> offload.  And easily migrated is what BPF is not.

Just to make sure we're at the same page. I meant to find a way to
allow the driver/user to fully customize what it wants to
hash/classify. Similar technologies which is based on private solution
has been used by some vendors, which allow user to customize the
classifier[1]

ePBF looks like a good open-source solution candidate for this (there
could be others). But there could be many kinds of eBPF programs that
could be offloaded. One famous one is XDP which requires many features
other than the bytecode/VM like map access, tailcall. Starting from
such a complicated type is hard. Instead, we can start from a simple
type, that is the eBPF classifier. All it needs is to pass the
bytecode to the device, the device can choose to run it or compile it
to what it can understand for classifying. We don't need maps, tail
calls and other features. We don't need to worry about the security
because of its simplicity: the eBPF program is only in charge of doing
classification, no other interactions with the driver and packet
modification is prohibited. The feature is limited only to the
VM/bytecode abstraction itself.

What's more, it's a good first step to achieve full eBPF offloading in
the future.

Thanks

[1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html

>
> --
> MST
>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-27  8:35                   ` Jason Wang
@ 2023-02-27  8:35                     ` Jason Wang
  2023-02-27 12:38                     ` Heng Qi
  2023-02-27 17:49                     ` Michael S. Tsirkin
  2 siblings, 0 replies; 105+ messages in thread
From: Jason Wang @ 2023-02-27  8:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > It requires an endless extension on bits/fields. Modern NICs allow the
> > user to customize the hash calculation, for virtio-net we can allow to
> > use eBPF program to classify the packets. It seems to be more flexible
> > and scalable and there's almost no maintain burden in the spec (only
> > bytecode is required, no need any fancy features/interactions like
> > maps), easy to be migrated etc.
> >
> > Prototype is also easy, tun/tap had an eBPF classifier for years.
> >
> > Thanks
>
> Yea BPF offload would be great to have. We have been discussing it for
> years though - security issues keep blocking it. *Maybe* it's finally
> going to be there but I'm not going to block this work waiting for BPF
> offload.  And easily migrated is what BPF is not.

Just to make sure we're at the same page. I meant to find a way to
allow the driver/user to fully customize what it wants to
hash/classify. Similar technologies which is based on private solution
has been used by some vendors, which allow user to customize the
classifier[1]

ePBF looks like a good open-source solution candidate for this (there
could be others). But there could be many kinds of eBPF programs that
could be offloaded. One famous one is XDP which requires many features
other than the bytecode/VM like map access, tailcall. Starting from
such a complicated type is hard. Instead, we can start from a simple
type, that is the eBPF classifier. All it needs is to pass the
bytecode to the device, the device can choose to run it or compile it
to what it can understand for classifying. We don't need maps, tail
calls and other features. We don't need to worry about the security
because of its simplicity: the eBPF program is only in charge of doing
classification, no other interactions with the driver and packet
modification is prohibited. The feature is limited only to the
VM/bytecode abstraction itself.

What's more, it's a good first step to achieve full eBPF offloading in
the future.

Thanks

[1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html

>
> --
> MST
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-27  8:35                   ` Jason Wang
  2023-02-27  8:35                     ` [virtio-dev] " Jason Wang
@ 2023-02-27 12:38                     ` Heng Qi
  2023-02-27 12:38                       ` [virtio-dev] " Heng Qi
  2023-02-27 17:49                     ` Michael S. Tsirkin
  2 siblings, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-02-27 12:38 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo



在 2023/2/27 下午4:35, Jason Wang 写道:
> On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
>>> Btw, this kind of 1:1 hash features seems not scalable and flexible.
>>> It requires an endless extension on bits/fields. Modern NICs allow the
>>> user to customize the hash calculation, for virtio-net we can allow to
>>> use eBPF program to classify the packets. It seems to be more flexible
>>> and scalable and there's almost no maintain burden in the spec (only
>>> bytecode is required, no need any fancy features/interactions like
>>> maps), easy to be migrated etc.
>>>
>>> Prototype is also easy, tun/tap had an eBPF classifier for years.
>>>
>>> Thanks
>> Yea BPF offload would be great to have. We have been discussing it for
>> years though - security issues keep blocking it. *Maybe* it's finally
>> going to be there but I'm not going to block this work waiting for BPF
>> offload.  And easily migrated is what BPF is not.
> Just to make sure we're at the same page. I meant to find a way to
> allow the driver/user to fully customize what it wants to
> hash/classify. Similar technologies which is based on private solution
> has been used by some vendors, which allow user to customize the
> classifier[1]
>
> ePBF looks like a good open-source solution candidate for this (there
> could be others). But there could be many kinds of eBPF programs that
> could be offloaded. One famous one is XDP which requires many features
> other than the bytecode/VM like map access, tailcall. Starting from
> such a complicated type is hard. Instead, we can start from a simple
> type, that is the eBPF classifier. All it needs is to pass the
> bytecode to the device, the device can choose to run it or compile it
> to what it can understand for classifying. We don't need maps, tail
> calls and other features. We don't need to worry about the security
> because of its simplicity: the eBPF program is only in charge of doing
> classification, no other interactions with the driver and packet
> modification is prohibited. The feature is limited only to the
> VM/bytecode abstraction itself.

Since the instruction set of ebpf is not complicated, some devices 
already support the offloading of ebpf,
but what troubles them is how to standardize each device to implement 
standard and optional subsystem interfaces.
There are two reasons:
#1, due to the rapid development of ebpf , so it is difficult for them 
to ensure the backward compatibility of the interface.
#2, such as network and blk devices, which interfaces must be 
implemented to allow the same ebpf program to run perfectly on each other.
Also, ebpf program is not isolated, it is expected to interact with 
userspace programs, and few examples can demonstrate
how to use them effectively.

Maybe we can take advantage of the virtio spec to get out there first.

Thanks.

>
> What's more, it's a good first step to achieve full eBPF offloading in
> the future.
>
> Thanks
>
> [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
>
>> --
>> MST
>>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-27 12:38                     ` Heng Qi
@ 2023-02-27 12:38                       ` Heng Qi
  0 siblings, 0 replies; 105+ messages in thread
From: Heng Qi @ 2023-02-27 12:38 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo



在 2023/2/27 下午4:35, Jason Wang 写道:
> On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
>>> Btw, this kind of 1:1 hash features seems not scalable and flexible.
>>> It requires an endless extension on bits/fields. Modern NICs allow the
>>> user to customize the hash calculation, for virtio-net we can allow to
>>> use eBPF program to classify the packets. It seems to be more flexible
>>> and scalable and there's almost no maintain burden in the spec (only
>>> bytecode is required, no need any fancy features/interactions like
>>> maps), easy to be migrated etc.
>>>
>>> Prototype is also easy, tun/tap had an eBPF classifier for years.
>>>
>>> Thanks
>> Yea BPF offload would be great to have. We have been discussing it for
>> years though - security issues keep blocking it. *Maybe* it's finally
>> going to be there but I'm not going to block this work waiting for BPF
>> offload.  And easily migrated is what BPF is not.
> Just to make sure we're at the same page. I meant to find a way to
> allow the driver/user to fully customize what it wants to
> hash/classify. Similar technologies which is based on private solution
> has been used by some vendors, which allow user to customize the
> classifier[1]
>
> ePBF looks like a good open-source solution candidate for this (there
> could be others). But there could be many kinds of eBPF programs that
> could be offloaded. One famous one is XDP which requires many features
> other than the bytecode/VM like map access, tailcall. Starting from
> such a complicated type is hard. Instead, we can start from a simple
> type, that is the eBPF classifier. All it needs is to pass the
> bytecode to the device, the device can choose to run it or compile it
> to what it can understand for classifying. We don't need maps, tail
> calls and other features. We don't need to worry about the security
> because of its simplicity: the eBPF program is only in charge of doing
> classification, no other interactions with the driver and packet
> modification is prohibited. The feature is limited only to the
> VM/bytecode abstraction itself.

Since the instruction set of ebpf is not complicated, some devices 
already support the offloading of ebpf,
but what troubles them is how to standardize each device to implement 
standard and optional subsystem interfaces.
There are two reasons:
#1, due to the rapid development of ebpf , so it is difficult for them 
to ensure the backward compatibility of the interface.
#2, such as network and blk devices, which interfaces must be 
implemented to allow the same ebpf program to run perfectly on each other.
Also, ebpf program is not isolated, it is expected to interact with 
userspace programs, and few examples can demonstrate
how to use them effectively.

Maybe we can take advantage of the virtio spec to get out there first.

Thanks.

>
> What's more, it's a good first step to achieve full eBPF offloading in
> the future.
>
> Thanks
>
> [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
>
>> --
>> MST
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-27  8:35                   ` Jason Wang
  2023-02-27  8:35                     ` [virtio-dev] " Jason Wang
  2023-02-27 12:38                     ` Heng Qi
@ 2023-02-27 17:49                     ` Michael S. Tsirkin
  2023-02-27 17:49                       ` [virtio-dev] " Michael S. Tsirkin
  2023-02-28  3:04                       ` Jason Wang
  2 siblings, 2 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-27 17:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > user to customize the hash calculation, for virtio-net we can allow to
> > > use eBPF program to classify the packets. It seems to be more flexible
> > > and scalable and there's almost no maintain burden in the spec (only
> > > bytecode is required, no need any fancy features/interactions like
> > > maps), easy to be migrated etc.
> > >
> > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > >
> > > Thanks
> >
> > Yea BPF offload would be great to have. We have been discussing it for
> > years though - security issues keep blocking it. *Maybe* it's finally
> > going to be there but I'm not going to block this work waiting for BPF
> > offload.  And easily migrated is what BPF is not.
> 
> Just to make sure we're at the same page. I meant to find a way to
> allow the driver/user to fully customize what it wants to
> hash/classify. Similar technologies which is based on private solution
> has been used by some vendors, which allow user to customize the
> classifier[1]
> 
> ePBF looks like a good open-source solution candidate for this (there
> could be others). But there could be many kinds of eBPF programs that
> could be offloaded. One famous one is XDP which requires many features
> other than the bytecode/VM like map access, tailcall. Starting from
> such a complicated type is hard. Instead, we can start from a simple
> type, that is the eBPF classifier. All it needs is to pass the
> bytecode to the device, the device can choose to run it or compile it
> to what it can understand for classifying. We don't need maps, tail
> calls and other features.

Until people start asking exactly for maps because they want
state for their classifier? And it makes sense - if you want
e.g. load balancing you need stats which needs maps.

> We don't need to worry about the security
> because of its simplicity: the eBPF program is only in charge of doing
> classification, no other interactions with the driver and packet
> modification is prohibited. The feature is limited only to the
> VM/bytecode abstraction itself.
> 
> What's more, it's a good first step to achieve full eBPF offloading in
> the future.
> 
> Thanks
> 
> [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html

Dave seems to have nacked this approach, no?

> >
> > --
> > MST
> >


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-27 17:49                     ` Michael S. Tsirkin
@ 2023-02-27 17:49                       ` Michael S. Tsirkin
  2023-02-28  3:04                       ` Jason Wang
  1 sibling, 0 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-27 17:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > user to customize the hash calculation, for virtio-net we can allow to
> > > use eBPF program to classify the packets. It seems to be more flexible
> > > and scalable and there's almost no maintain burden in the spec (only
> > > bytecode is required, no need any fancy features/interactions like
> > > maps), easy to be migrated etc.
> > >
> > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > >
> > > Thanks
> >
> > Yea BPF offload would be great to have. We have been discussing it for
> > years though - security issues keep blocking it. *Maybe* it's finally
> > going to be there but I'm not going to block this work waiting for BPF
> > offload.  And easily migrated is what BPF is not.
> 
> Just to make sure we're at the same page. I meant to find a way to
> allow the driver/user to fully customize what it wants to
> hash/classify. Similar technologies which is based on private solution
> has been used by some vendors, which allow user to customize the
> classifier[1]
> 
> ePBF looks like a good open-source solution candidate for this (there
> could be others). But there could be many kinds of eBPF programs that
> could be offloaded. One famous one is XDP which requires many features
> other than the bytecode/VM like map access, tailcall. Starting from
> such a complicated type is hard. Instead, we can start from a simple
> type, that is the eBPF classifier. All it needs is to pass the
> bytecode to the device, the device can choose to run it or compile it
> to what it can understand for classifying. We don't need maps, tail
> calls and other features.

Until people start asking exactly for maps because they want
state for their classifier? And it makes sense - if you want
e.g. load balancing you need stats which needs maps.

> We don't need to worry about the security
> because of its simplicity: the eBPF program is only in charge of doing
> classification, no other interactions with the driver and packet
> modification is prohibited. The feature is limited only to the
> VM/bytecode abstraction itself.
> 
> What's more, it's a good first step to achieve full eBPF offloading in
> the future.
> 
> Thanks
> 
> [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html

Dave seems to have nacked this approach, no?

> >
> > --
> > MST
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-27 17:49                     ` Michael S. Tsirkin
  2023-02-27 17:49                       ` [virtio-dev] " Michael S. Tsirkin
@ 2023-02-28  3:04                       ` Jason Wang
  2023-02-28  3:04                         ` [virtio-dev] " Jason Wang
                                           ` (2 more replies)
  1 sibling, 3 replies; 105+ messages in thread
From: Jason Wang @ 2023-02-28  3:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > and scalable and there's almost no maintain burden in the spec (only
> > > > bytecode is required, no need any fancy features/interactions like
> > > > maps), easy to be migrated etc.
> > > >
> > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > >
> > > > Thanks
> > >
> > > Yea BPF offload would be great to have. We have been discussing it for
> > > years though - security issues keep blocking it. *Maybe* it's finally
> > > going to be there but I'm not going to block this work waiting for BPF
> > > offload.  And easily migrated is what BPF is not.
> >
> > Just to make sure we're at the same page. I meant to find a way to
> > allow the driver/user to fully customize what it wants to
> > hash/classify. Similar technologies which is based on private solution
> > has been used by some vendors, which allow user to customize the
> > classifier[1]
> >
> > ePBF looks like a good open-source solution candidate for this (there
> > could be others). But there could be many kinds of eBPF programs that
> > could be offloaded. One famous one is XDP which requires many features
> > other than the bytecode/VM like map access, tailcall. Starting from
> > such a complicated type is hard. Instead, we can start from a simple
> > type, that is the eBPF classifier. All it needs is to pass the
> > bytecode to the device, the device can choose to run it or compile it
> > to what it can understand for classifying. We don't need maps, tail
> > calls and other features.
>
> Until people start asking exactly for maps because they want
> state for their classifier?

Yes, but let's compare the eBPF without maps with the static feature
proposed here. It is much more scalable and flexible.

> And it makes sense - if you want
> e.g. load balancing you need stats which needs maps.

Yes, but we know it's possible to have that (through the XDP offload).
This is impossible with the approach proposed here.

>
> > We don't need to worry about the security
> > because of its simplicity: the eBPF program is only in charge of doing
> > classification, no other interactions with the driver and packet
> > modification is prohibited. The feature is limited only to the
> > VM/bytecode abstraction itself.
> >
> > What's more, it's a good first step to achieve full eBPF offloading in
> > the future.
> >
> > Thanks
> >
> > [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
>
> Dave seems to have nacked this approach, no?

I may miss something but looking at kernel commit, there are few
patches to support that:

E.g

commit c7648810961682b9388be2dd041df06915647445
Author: Tony Nguyen <anthony.l.nguyen@intel.com>
Date:   Mon Sep 9 06:47:44 2019 -0700

    ice: Implement Dynamic Device Personalization (DDP) download

And it has been used by DPDK drivers.

Thanks

>
> > >
> > > --
> > > MST
> > >
>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-28  3:04                       ` Jason Wang
@ 2023-02-28  3:04                         ` Jason Wang
  2023-02-28  8:52                         ` Michael S. Tsirkin
  2023-02-28 11:04                         ` Michael S. Tsirkin
  2 siblings, 0 replies; 105+ messages in thread
From: Jason Wang @ 2023-02-28  3:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > and scalable and there's almost no maintain burden in the spec (only
> > > > bytecode is required, no need any fancy features/interactions like
> > > > maps), easy to be migrated etc.
> > > >
> > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > >
> > > > Thanks
> > >
> > > Yea BPF offload would be great to have. We have been discussing it for
> > > years though - security issues keep blocking it. *Maybe* it's finally
> > > going to be there but I'm not going to block this work waiting for BPF
> > > offload.  And easily migrated is what BPF is not.
> >
> > Just to make sure we're at the same page. I meant to find a way to
> > allow the driver/user to fully customize what it wants to
> > hash/classify. Similar technologies which is based on private solution
> > has been used by some vendors, which allow user to customize the
> > classifier[1]
> >
> > ePBF looks like a good open-source solution candidate for this (there
> > could be others). But there could be many kinds of eBPF programs that
> > could be offloaded. One famous one is XDP which requires many features
> > other than the bytecode/VM like map access, tailcall. Starting from
> > such a complicated type is hard. Instead, we can start from a simple
> > type, that is the eBPF classifier. All it needs is to pass the
> > bytecode to the device, the device can choose to run it or compile it
> > to what it can understand for classifying. We don't need maps, tail
> > calls and other features.
>
> Until people start asking exactly for maps because they want
> state for their classifier?

Yes, but let's compare the eBPF without maps with the static feature
proposed here. It is much more scalable and flexible.

> And it makes sense - if you want
> e.g. load balancing you need stats which needs maps.

Yes, but we know it's possible to have that (through the XDP offload).
This is impossible with the approach proposed here.

>
> > We don't need to worry about the security
> > because of its simplicity: the eBPF program is only in charge of doing
> > classification, no other interactions with the driver and packet
> > modification is prohibited. The feature is limited only to the
> > VM/bytecode abstraction itself.
> >
> > What's more, it's a good first step to achieve full eBPF offloading in
> > the future.
> >
> > Thanks
> >
> > [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
>
> Dave seems to have nacked this approach, no?

I may miss something but looking at kernel commit, there are few
patches to support that:

E.g

commit c7648810961682b9388be2dd041df06915647445
Author: Tony Nguyen <anthony.l.nguyen@intel.com>
Date:   Mon Sep 9 06:47:44 2019 -0700

    ice: Implement Dynamic Device Personalization (DDP) download

And it has been used by DPDK drivers.

Thanks

>
> > >
> > > --
> > > MST
> > >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-28  3:04                       ` Jason Wang
  2023-02-28  3:04                         ` [virtio-dev] " Jason Wang
@ 2023-02-28  8:52                         ` Michael S. Tsirkin
  2023-02-28  8:52                           ` [virtio-dev] " Michael S. Tsirkin
  2023-02-28  9:56                           ` Heng Qi
  2023-02-28 11:04                         ` Michael S. Tsirkin
  2 siblings, 2 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-28  8:52 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > > and scalable and there's almost no maintain burden in the spec (only
> > > > > bytecode is required, no need any fancy features/interactions like
> > > > > maps), easy to be migrated etc.
> > > > >
> > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > >
> > > > > Thanks
> > > >
> > > > Yea BPF offload would be great to have. We have been discussing it for
> > > > years though - security issues keep blocking it. *Maybe* it's finally
> > > > going to be there but I'm not going to block this work waiting for BPF
> > > > offload.  And easily migrated is what BPF is not.
> > >
> > > Just to make sure we're at the same page. I meant to find a way to
> > > allow the driver/user to fully customize what it wants to
> > > hash/classify. Similar technologies which is based on private solution
> > > has been used by some vendors, which allow user to customize the
> > > classifier[1]
> > >
> > > ePBF looks like a good open-source solution candidate for this (there
> > > could be others). But there could be many kinds of eBPF programs that
> > > could be offloaded. One famous one is XDP which requires many features
> > > other than the bytecode/VM like map access, tailcall. Starting from
> > > such a complicated type is hard. Instead, we can start from a simple
> > > type, that is the eBPF classifier. All it needs is to pass the
> > > bytecode to the device, the device can choose to run it or compile it
> > > to what it can understand for classifying. We don't need maps, tail
> > > calls and other features.
> >
> > Until people start asking exactly for maps because they want
> > state for their classifier?
> 
> Yes, but let's compare the eBPF without maps with the static feature
> proposed here. It is much more scalable and flexible.
> 
> > And it makes sense - if you want
> > e.g. load balancing you need stats which needs maps.
> 
> Yes, but we know it's possible to have that (through the XDP offload).
> This is impossible with the approach proposed here.

I'm not actually objecting. And at least we then don't need to
worry about leaking info - it's not virtio leaking info
it's the bpf program. I wonder what does Heng Qi think.
Heng Qi would it work for your scenario?

> >
> > > We don't need to worry about the security
> > > because of its simplicity: the eBPF program is only in charge of doing
> > > classification, no other interactions with the driver and packet
> > > modification is prohibited. The feature is limited only to the
> > > VM/bytecode abstraction itself.
> > >
> > > What's more, it's a good first step to achieve full eBPF offloading in
> > > the future.
> > >
> > > Thanks
> > >
> > > [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> >
> > Dave seems to have nacked this approach, no?
> 
> I may miss something but looking at kernel commit, there are few
> patches to support that:
> 
> E.g
> 
> commit c7648810961682b9388be2dd041df06915647445
> Author: Tony Nguyen <anthony.l.nguyen@intel.com>
> Date:   Mon Sep 9 06:47:44 2019 -0700
> 
>     ice: Implement Dynamic Device Personalization (DDP) download
> 
> And it has been used by DPDK drivers.
> 
> Thanks

If we are talking about netdev then this discussion has to take place on netdev.
If it's dpdk this is more believable.

> >
> > > >
> > > > --
> > > > MST
> > > >
> >


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-28  8:52                         ` Michael S. Tsirkin
@ 2023-02-28  8:52                           ` Michael S. Tsirkin
  2023-02-28  9:56                           ` Heng Qi
  1 sibling, 0 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-28  8:52 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > > and scalable and there's almost no maintain burden in the spec (only
> > > > > bytecode is required, no need any fancy features/interactions like
> > > > > maps), easy to be migrated etc.
> > > > >
> > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > >
> > > > > Thanks
> > > >
> > > > Yea BPF offload would be great to have. We have been discussing it for
> > > > years though - security issues keep blocking it. *Maybe* it's finally
> > > > going to be there but I'm not going to block this work waiting for BPF
> > > > offload.  And easily migrated is what BPF is not.
> > >
> > > Just to make sure we're at the same page. I meant to find a way to
> > > allow the driver/user to fully customize what it wants to
> > > hash/classify. Similar technologies which is based on private solution
> > > has been used by some vendors, which allow user to customize the
> > > classifier[1]
> > >
> > > ePBF looks like a good open-source solution candidate for this (there
> > > could be others). But there could be many kinds of eBPF programs that
> > > could be offloaded. One famous one is XDP which requires many features
> > > other than the bytecode/VM like map access, tailcall. Starting from
> > > such a complicated type is hard. Instead, we can start from a simple
> > > type, that is the eBPF classifier. All it needs is to pass the
> > > bytecode to the device, the device can choose to run it or compile it
> > > to what it can understand for classifying. We don't need maps, tail
> > > calls and other features.
> >
> > Until people start asking exactly for maps because they want
> > state for their classifier?
> 
> Yes, but let's compare the eBPF without maps with the static feature
> proposed here. It is much more scalable and flexible.
> 
> > And it makes sense - if you want
> > e.g. load balancing you need stats which needs maps.
> 
> Yes, but we know it's possible to have that (through the XDP offload).
> This is impossible with the approach proposed here.

I'm not actually objecting. And at least we then don't need to
worry about leaking info - it's not virtio leaking info
it's the bpf program. I wonder what does Heng Qi think.
Heng Qi would it work for your scenario?

> >
> > > We don't need to worry about the security
> > > because of its simplicity: the eBPF program is only in charge of doing
> > > classification, no other interactions with the driver and packet
> > > modification is prohibited. The feature is limited only to the
> > > VM/bytecode abstraction itself.
> > >
> > > What's more, it's a good first step to achieve full eBPF offloading in
> > > the future.
> > >
> > > Thanks
> > >
> > > [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> >
> > Dave seems to have nacked this approach, no?
> 
> I may miss something but looking at kernel commit, there are few
> patches to support that:
> 
> E.g
> 
> commit c7648810961682b9388be2dd041df06915647445
> Author: Tony Nguyen <anthony.l.nguyen@intel.com>
> Date:   Mon Sep 9 06:47:44 2019 -0700
> 
>     ice: Implement Dynamic Device Personalization (DDP) download
> 
> And it has been used by DPDK drivers.
> 
> Thanks

If we are talking about netdev then this discussion has to take place on netdev.
If it's dpdk this is more believable.

> >
> > > >
> > > > --
> > > > MST
> > > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-28  8:52                         ` Michael S. Tsirkin
  2023-02-28  8:52                           ` [virtio-dev] " Michael S. Tsirkin
@ 2023-02-28  9:56                           ` Heng Qi
  2023-02-28  9:56                             ` Heng Qi
  1 sibling, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-02-28  9:56 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: virtio-comment, virtio-dev, Parav Pandit, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo



在 2023/2/28 下午4:52, Michael S. Tsirkin 写道:
> On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
>> On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
>>>> On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
>>>>>> Btw, this kind of 1:1 hash features seems not scalable and flexible.
>>>>>> It requires an endless extension on bits/fields. Modern NICs allow the
>>>>>> user to customize the hash calculation, for virtio-net we can allow to
>>>>>> use eBPF program to classify the packets. It seems to be more flexible
>>>>>> and scalable and there's almost no maintain burden in the spec (only
>>>>>> bytecode is required, no need any fancy features/interactions like
>>>>>> maps), easy to be migrated etc.
>>>>>>
>>>>>> Prototype is also easy, tun/tap had an eBPF classifier for years.
>>>>>>
>>>>>> Thanks
>>>>> Yea BPF offload would be great to have. We have been discussing it for
>>>>> years though - security issues keep blocking it. *Maybe* it's finally
>>>>> going to be there but I'm not going to block this work waiting for BPF
>>>>> offload.  And easily migrated is what BPF is not.
>>>> Just to make sure we're at the same page. I meant to find a way to
>>>> allow the driver/user to fully customize what it wants to
>>>> hash/classify. Similar technologies which is based on private solution
>>>> has been used by some vendors, which allow user to customize the
>>>> classifier[1]
>>>>
>>>> ePBF looks like a good open-source solution candidate for this (there
>>>> could be others). But there could be many kinds of eBPF programs that
>>>> could be offloaded. One famous one is XDP which requires many features
>>>> other than the bytecode/VM like map access, tailcall. Starting from
>>>> such a complicated type is hard. Instead, we can start from a simple
>>>> type, that is the eBPF classifier. All it needs is to pass the
>>>> bytecode to the device, the device can choose to run it or compile it
>>>> to what it can understand for classifying. We don't need maps, tail
>>>> calls and other features.
>>> Until people start asking exactly for maps because they want
>>> state for their classifier?
>> Yes, but let's compare the eBPF without maps with the static feature
>> proposed here. It is much more scalable and flexible.
>>
>>> And it makes sense - if you want
>>> e.g. load balancing you need stats which needs maps.
>> Yes, but we know it's possible to have that (through the XDP offload).
>> This is impossible with the approach proposed here.
> I'm not actually objecting. And at least we then don't need to
> worry about leaking info - it's not virtio leaking info
> it's the bpf program. I wonder what does Heng Qi think.
> Heng Qi would it work for your scenario?

We are positive on ebpf, which looks adequate in our scenario. Although 
it currently has some problems in offloading,
such as imperfect interfaces, unstable, and user-unfriendly ebpf codes 
may consume a lot of device resources. Device support for ebpf will also 
take time.
Also, the presence of ebpf offload does not conflict with other 
solutions, eg we still have RSS.

Our goal is to pass this patch first. For the support of ebpf 
offloading, we have not collected internal requirements for the time 
being, but it is indeed a good direction.

Thanks.

>
>>>> We don't need to worry about the security
>>>> because of its simplicity: the eBPF program is only in charge of doing
>>>> classification, no other interactions with the driver and packet
>>>> modification is prohibited. The feature is limited only to the
>>>> VM/bytecode abstraction itself.
>>>>
>>>> What's more, it's a good first step to achieve full eBPF offloading in
>>>> the future.
>>>>
>>>> Thanks
>>>>
>>>> [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
>>> Dave seems to have nacked this approach, no?
>> I may miss something but looking at kernel commit, there are few
>> patches to support that:
>>
>> E.g
>>
>> commit c7648810961682b9388be2dd041df06915647445
>> Author: Tony Nguyen <anthony.l.nguyen@intel.com>
>> Date:   Mon Sep 9 06:47:44 2019 -0700
>>
>>      ice: Implement Dynamic Device Personalization (DDP) download
>>
>> And it has been used by DPDK drivers.
>>
>> Thanks
> If we are talking about netdev then this discussion has to take place on netdev.
> If it's dpdk this is more believable.
>
>>>>> --
>>>>> MST
>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-28  9:56                           ` Heng Qi
@ 2023-02-28  9:56                             ` Heng Qi
  0 siblings, 0 replies; 105+ messages in thread
From: Heng Qi @ 2023-02-28  9:56 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: virtio-comment, virtio-dev, Parav Pandit, Yuri Benditovich,
	Cornelia Huck, Xuan Zhuo



在 2023/2/28 下午4:52, Michael S. Tsirkin 写道:
> On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
>> On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>>> On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
>>>> On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>> On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
>>>>>> Btw, this kind of 1:1 hash features seems not scalable and flexible.
>>>>>> It requires an endless extension on bits/fields. Modern NICs allow the
>>>>>> user to customize the hash calculation, for virtio-net we can allow to
>>>>>> use eBPF program to classify the packets. It seems to be more flexible
>>>>>> and scalable and there's almost no maintain burden in the spec (only
>>>>>> bytecode is required, no need any fancy features/interactions like
>>>>>> maps), easy to be migrated etc.
>>>>>>
>>>>>> Prototype is also easy, tun/tap had an eBPF classifier for years.
>>>>>>
>>>>>> Thanks
>>>>> Yea BPF offload would be great to have. We have been discussing it for
>>>>> years though - security issues keep blocking it. *Maybe* it's finally
>>>>> going to be there but I'm not going to block this work waiting for BPF
>>>>> offload.  And easily migrated is what BPF is not.
>>>> Just to make sure we're at the same page. I meant to find a way to
>>>> allow the driver/user to fully customize what it wants to
>>>> hash/classify. Similar technologies which is based on private solution
>>>> has been used by some vendors, which allow user to customize the
>>>> classifier[1]
>>>>
>>>> ePBF looks like a good open-source solution candidate for this (there
>>>> could be others). But there could be many kinds of eBPF programs that
>>>> could be offloaded. One famous one is XDP which requires many features
>>>> other than the bytecode/VM like map access, tailcall. Starting from
>>>> such a complicated type is hard. Instead, we can start from a simple
>>>> type, that is the eBPF classifier. All it needs is to pass the
>>>> bytecode to the device, the device can choose to run it or compile it
>>>> to what it can understand for classifying. We don't need maps, tail
>>>> calls and other features.
>>> Until people start asking exactly for maps because they want
>>> state for their classifier?
>> Yes, but let's compare the eBPF without maps with the static feature
>> proposed here. It is much more scalable and flexible.
>>
>>> And it makes sense - if you want
>>> e.g. load balancing you need stats which needs maps.
>> Yes, but we know it's possible to have that (through the XDP offload).
>> This is impossible with the approach proposed here.
> I'm not actually objecting. And at least we then don't need to
> worry about leaking info - it's not virtio leaking info
> it's the bpf program. I wonder what does Heng Qi think.
> Heng Qi would it work for your scenario?

We are positive on ebpf, which looks adequate in our scenario. Although 
it currently has some problems in offloading,
such as imperfect interfaces, unstable, and user-unfriendly ebpf codes 
may consume a lot of device resources. Device support for ebpf will also 
take time.
Also, the presence of ebpf offload does not conflict with other 
solutions, eg we still have RSS.

Our goal is to pass this patch first. For the support of ebpf 
offloading, we have not collected internal requirements for the time 
being, but it is indeed a good direction.

Thanks.

>
>>>> We don't need to worry about the security
>>>> because of its simplicity: the eBPF program is only in charge of doing
>>>> classification, no other interactions with the driver and packet
>>>> modification is prohibited. The feature is limited only to the
>>>> VM/bytecode abstraction itself.
>>>>
>>>> What's more, it's a good first step to achieve full eBPF offloading in
>>>> the future.
>>>>
>>>> Thanks
>>>>
>>>> [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
>>> Dave seems to have nacked this approach, no?
>> I may miss something but looking at kernel commit, there are few
>> patches to support that:
>>
>> E.g
>>
>> commit c7648810961682b9388be2dd041df06915647445
>> Author: Tony Nguyen <anthony.l.nguyen@intel.com>
>> Date:   Mon Sep 9 06:47:44 2019 -0700
>>
>>      ice: Implement Dynamic Device Personalization (DDP) download
>>
>> And it has been used by DPDK drivers.
>>
>> Thanks
> If we are talking about netdev then this discussion has to take place on netdev.
> If it's dpdk this is more believable.
>
>>>>> --
>>>>> MST
>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-28  3:04                       ` Jason Wang
  2023-02-28  3:04                         ` [virtio-dev] " Jason Wang
  2023-02-28  8:52                         ` Michael S. Tsirkin
@ 2023-02-28 11:04                         ` Michael S. Tsirkin
  2023-02-28 11:04                           ` [virtio-dev] " Michael S. Tsirkin
  2023-03-01  2:36                           ` Jason Wang
  2 siblings, 2 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-28 11:04 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > > and scalable and there's almost no maintain burden in the spec (only
> > > > > bytecode is required, no need any fancy features/interactions like
> > > > > maps), easy to be migrated etc.
> > > > >
> > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > >
> > > > > Thanks
> > > >
> > > > Yea BPF offload would be great to have. We have been discussing it for
> > > > years though - security issues keep blocking it. *Maybe* it's finally
> > > > going to be there but I'm not going to block this work waiting for BPF
> > > > offload.  And easily migrated is what BPF is not.
> > >
> > > Just to make sure we're at the same page. I meant to find a way to
> > > allow the driver/user to fully customize what it wants to
> > > hash/classify. Similar technologies which is based on private solution
> > > has been used by some vendors, which allow user to customize the
> > > classifier[1]
> > >
> > > ePBF looks like a good open-source solution candidate for this (there
> > > could be others). But there could be many kinds of eBPF programs that
> > > could be offloaded. One famous one is XDP which requires many features
> > > other than the bytecode/VM like map access, tailcall. Starting from
> > > such a complicated type is hard. Instead, we can start from a simple
> > > type, that is the eBPF classifier. All it needs is to pass the
> > > bytecode to the device, the device can choose to run it or compile it
> > > to what it can understand for classifying. We don't need maps, tail
> > > calls and other features.
> >
> > Until people start asking exactly for maps because they want
> > state for their classifier?
> 
> Yes, but let's compare the eBPF without maps with the static feature
> proposed here. It is much more scalable and flexible.

I looked for some examples of RSS using BPF and only found this:
https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
seems to use maps.


> > And it makes sense - if you want
> > e.g. load balancing you need stats which needs maps.
> 
> Yes, but we know it's possible to have that (through the XDP offload).

Not without a lot more work to make xdp offload happen.

> This is impossible with the approach proposed here.
> 
> >
> > > We don't need to worry about the security
> > > because of its simplicity: the eBPF program is only in charge of doing
> > > classification, no other interactions with the driver and packet
> > > modification is prohibited. The feature is limited only to the
> > > VM/bytecode abstraction itself.
> > >
> > > What's more, it's a good first step to achieve full eBPF offloading in
> > > the future.
> > >
> > > Thanks
> > >
> > > [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> >
> > Dave seems to have nacked this approach, no?
> 
> I may miss something but looking at kernel commit, there are few
> patches to support that:
> 
> E.g
> 
> commit c7648810961682b9388be2dd041df06915647445
> Author: Tony Nguyen <anthony.l.nguyen@intel.com>
> Date:   Mon Sep 9 06:47:44 2019 -0700
> 
>     ice: Implement Dynamic Device Personalization (DDP) download
> 
> And it has been used by DPDK drivers.
> 
> Thanks
> 
> >
> > > >
> > > > --
> > > > MST
> > > >
> >


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-28 11:04                         ` Michael S. Tsirkin
@ 2023-02-28 11:04                           ` Michael S. Tsirkin
  2023-03-01  2:36                           ` Jason Wang
  1 sibling, 0 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-28 11:04 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > > and scalable and there's almost no maintain burden in the spec (only
> > > > > bytecode is required, no need any fancy features/interactions like
> > > > > maps), easy to be migrated etc.
> > > > >
> > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > >
> > > > > Thanks
> > > >
> > > > Yea BPF offload would be great to have. We have been discussing it for
> > > > years though - security issues keep blocking it. *Maybe* it's finally
> > > > going to be there but I'm not going to block this work waiting for BPF
> > > > offload.  And easily migrated is what BPF is not.
> > >
> > > Just to make sure we're at the same page. I meant to find a way to
> > > allow the driver/user to fully customize what it wants to
> > > hash/classify. Similar technologies which is based on private solution
> > > has been used by some vendors, which allow user to customize the
> > > classifier[1]
> > >
> > > ePBF looks like a good open-source solution candidate for this (there
> > > could be others). But there could be many kinds of eBPF programs that
> > > could be offloaded. One famous one is XDP which requires many features
> > > other than the bytecode/VM like map access, tailcall. Starting from
> > > such a complicated type is hard. Instead, we can start from a simple
> > > type, that is the eBPF classifier. All it needs is to pass the
> > > bytecode to the device, the device can choose to run it or compile it
> > > to what it can understand for classifying. We don't need maps, tail
> > > calls and other features.
> >
> > Until people start asking exactly for maps because they want
> > state for their classifier?
> 
> Yes, but let's compare the eBPF without maps with the static feature
> proposed here. It is much more scalable and flexible.

I looked for some examples of RSS using BPF and only found this:
https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
seems to use maps.


> > And it makes sense - if you want
> > e.g. load balancing you need stats which needs maps.
> 
> Yes, but we know it's possible to have that (through the XDP offload).

Not without a lot more work to make xdp offload happen.

> This is impossible with the approach proposed here.
> 
> >
> > > We don't need to worry about the security
> > > because of its simplicity: the eBPF program is only in charge of doing
> > > classification, no other interactions with the driver and packet
> > > modification is prohibited. The feature is limited only to the
> > > VM/bytecode abstraction itself.
> > >
> > > What's more, it's a good first step to achieve full eBPF offloading in
> > > the future.
> > >
> > > Thanks
> > >
> > > [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> >
> > Dave seems to have nacked this approach, no?
> 
> I may miss something but looking at kernel commit, there are few
> patches to support that:
> 
> E.g
> 
> commit c7648810961682b9388be2dd041df06915647445
> Author: Tony Nguyen <anthony.l.nguyen@intel.com>
> Date:   Mon Sep 9 06:47:44 2019 -0700
> 
>     ice: Implement Dynamic Device Personalization (DDP) download
> 
> And it has been used by DPDK drivers.
> 
> Thanks
> 
> >
> > > >
> > > > --
> > > > MST
> > > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-18 14:37 [PATCH v9] virtio-net: support inner header hash Heng Qi
                   ` (3 preceding siblings ...)
  2023-02-23 13:13 ` Michael S. Tsirkin
@ 2023-02-28 11:16 ` Michael S. Tsirkin
  2023-02-28 11:16   ` [virtio-dev] " Michael S. Tsirkin
                     ` (3 more replies)
  4 siblings, 4 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-28 11:16 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> If the tunnel is used to encapsulate the packets, the hash calculated
> using the outer header of the receive packets is always fixed for the
> same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?
For example geneve spec says:

   it is necessary for entropy from encapsulated packets to be
   exposed in the tunnel header.  The most common technique for this is
   to use the UDP source port

same goes for vxlan did not check further.

so what is the problem?  and which tunnel types actually suffer from the
problem?

-- 
MST


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-28 11:16 ` Michael S. Tsirkin
@ 2023-02-28 11:16   ` Michael S. Tsirkin
  2023-03-01  2:56   ` Heng Qi
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-02-28 11:16 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> If the tunnel is used to encapsulate the packets, the hash calculated
> using the outer header of the receive packets is always fixed for the
> same flow packets, i.e. they will be steered to the same receive queue.

Wait a second. How is this true? Does not everyone stick the
inner header hash in the outer source port to solve this?
For example geneve spec says:

   it is necessary for entropy from encapsulated packets to be
   exposed in the tunnel header.  The most common technique for this is
   to use the UDP source port

same goes for vxlan did not check further.

so what is the problem?  and which tunnel types actually suffer from the
problem?

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-28 11:04                         ` Michael S. Tsirkin
  2023-02-28 11:04                           ` [virtio-dev] " Michael S. Tsirkin
@ 2023-03-01  2:36                           ` Jason Wang
  2023-03-01  2:36                             ` [virtio-dev] " Jason Wang
  2023-03-01 10:36                             ` Michael S. Tsirkin
  1 sibling, 2 replies; 105+ messages in thread
From: Jason Wang @ 2023-03-01  2:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 28, 2023 at 7:05 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> > On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > > > and scalable and there's almost no maintain burden in the spec (only
> > > > > > bytecode is required, no need any fancy features/interactions like
> > > > > > maps), easy to be migrated etc.
> > > > > >
> > > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Yea BPF offload would be great to have. We have been discussing it for
> > > > > years though - security issues keep blocking it. *Maybe* it's finally
> > > > > going to be there but I'm not going to block this work waiting for BPF
> > > > > offload.  And easily migrated is what BPF is not.
> > > >
> > > > Just to make sure we're at the same page. I meant to find a way to
> > > > allow the driver/user to fully customize what it wants to
> > > > hash/classify. Similar technologies which is based on private solution
> > > > has been used by some vendors, which allow user to customize the
> > > > classifier[1]
> > > >
> > > > ePBF looks like a good open-source solution candidate for this (there
> > > > could be others). But there could be many kinds of eBPF programs that
> > > > could be offloaded. One famous one is XDP which requires many features
> > > > other than the bytecode/VM like map access, tailcall. Starting from
> > > > such a complicated type is hard. Instead, we can start from a simple
> > > > type, that is the eBPF classifier. All it needs is to pass the
> > > > bytecode to the device, the device can choose to run it or compile it
> > > > to what it can understand for classifying. We don't need maps, tail
> > > > calls and other features.
> > >
> > > Until people start asking exactly for maps because they want
> > > state for their classifier?
> >
> > Yes, but let's compare the eBPF without maps with the static feature
> > proposed here. It is much more scalable and flexible.
>
> I looked for some examples of RSS using BPF and only found this:
> https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
> seems to use maps.

Yes and this is also the way we emulate RSS with TUN/TAP via steering
eBPF support for TUN/TAP. The reason is that it needs to emulate not
only the hash but also the indirection. If we only replace the hash
function with the eBPF program but reuse the RSS indirection table, we
don't need maps.

>
>
> > > And it makes sense - if you want
> > > e.g. load balancing you need stats which needs maps.
> >
> > Yes, but we know it's possible to have that (through the XDP offload).
>
> Not without a lot more work to make xdp offload happen.
>

Yes, that's why a simple eBPF RSS hashing program looks much more easier.

Thanks

> > This is impossible with the approach proposed here.
> >
> > >
> > > > We don't need to worry about the security
> > > > because of its simplicity: the eBPF program is only in charge of doing
> > > > classification, no other interactions with the driver and packet
> > > > modification is prohibited. The feature is limited only to the
> > > > VM/bytecode abstraction itself.
> > > >
> > > > What's more, it's a good first step to achieve full eBPF offloading in
> > > > the future.
> > > >
> > > > Thanks
> > > >
> > > > [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> > >
> > > Dave seems to have nacked this approach, no?
> >
> > I may miss something but looking at kernel commit, there are few
> > patches to support that:
> >
> > E.g
> >
> > commit c7648810961682b9388be2dd041df06915647445
> > Author: Tony Nguyen <anthony.l.nguyen@intel.com>
> > Date:   Mon Sep 9 06:47:44 2019 -0700
> >
> >     ice: Implement Dynamic Device Personalization (DDP) download
> >
> > And it has been used by DPDK drivers.
> >
> > Thanks
> >
> > >
> > > > >
> > > > > --
> > > > > MST
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-01  2:36                           ` Jason Wang
@ 2023-03-01  2:36                             ` Jason Wang
  2023-03-01 10:36                             ` Michael S. Tsirkin
  1 sibling, 0 replies; 105+ messages in thread
From: Jason Wang @ 2023-03-01  2:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Tue, Feb 28, 2023 at 7:05 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> > On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > > > and scalable and there's almost no maintain burden in the spec (only
> > > > > > bytecode is required, no need any fancy features/interactions like
> > > > > > maps), easy to be migrated etc.
> > > > > >
> > > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Yea BPF offload would be great to have. We have been discussing it for
> > > > > years though - security issues keep blocking it. *Maybe* it's finally
> > > > > going to be there but I'm not going to block this work waiting for BPF
> > > > > offload.  And easily migrated is what BPF is not.
> > > >
> > > > Just to make sure we're at the same page. I meant to find a way to
> > > > allow the driver/user to fully customize what it wants to
> > > > hash/classify. Similar technologies which is based on private solution
> > > > has been used by some vendors, which allow user to customize the
> > > > classifier[1]
> > > >
> > > > ePBF looks like a good open-source solution candidate for this (there
> > > > could be others). But there could be many kinds of eBPF programs that
> > > > could be offloaded. One famous one is XDP which requires many features
> > > > other than the bytecode/VM like map access, tailcall. Starting from
> > > > such a complicated type is hard. Instead, we can start from a simple
> > > > type, that is the eBPF classifier. All it needs is to pass the
> > > > bytecode to the device, the device can choose to run it or compile it
> > > > to what it can understand for classifying. We don't need maps, tail
> > > > calls and other features.
> > >
> > > Until people start asking exactly for maps because they want
> > > state for their classifier?
> >
> > Yes, but let's compare the eBPF without maps with the static feature
> > proposed here. It is much more scalable and flexible.
>
> I looked for some examples of RSS using BPF and only found this:
> https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
> seems to use maps.

Yes and this is also the way we emulate RSS with TUN/TAP via steering
eBPF support for TUN/TAP. The reason is that it needs to emulate not
only the hash but also the indirection. If we only replace the hash
function with the eBPF program but reuse the RSS indirection table, we
don't need maps.

>
>
> > > And it makes sense - if you want
> > > e.g. load balancing you need stats which needs maps.
> >
> > Yes, but we know it's possible to have that (through the XDP offload).
>
> Not without a lot more work to make xdp offload happen.
>

Yes, that's why a simple eBPF RSS hashing program looks much more easier.

Thanks

> > This is impossible with the approach proposed here.
> >
> > >
> > > > We don't need to worry about the security
> > > > because of its simplicity: the eBPF program is only in charge of doing
> > > > classification, no other interactions with the driver and packet
> > > > modification is prohibited. The feature is limited only to the
> > > > VM/bytecode abstraction itself.
> > > >
> > > > What's more, it's a good first step to achieve full eBPF offloading in
> > > > the future.
> > > >
> > > > Thanks
> > > >
> > > > [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> > >
> > > Dave seems to have nacked this approach, no?
> >
> > I may miss something but looking at kernel commit, there are few
> > patches to support that:
> >
> > E.g
> >
> > commit c7648810961682b9388be2dd041df06915647445
> > Author: Tony Nguyen <anthony.l.nguyen@intel.com>
> > Date:   Mon Sep 9 06:47:44 2019 -0700
> >
> >     ice: Implement Dynamic Device Personalization (DDP) download
> >
> > And it has been used by DPDK drivers.
> >
> > Thanks
> >
> > >
> > > > >
> > > > > --
> > > > > MST
> > > > >
> > >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-28 11:16 ` Michael S. Tsirkin
  2023-02-28 11:16   ` [virtio-dev] " Michael S. Tsirkin
@ 2023-03-01  2:56   ` Heng Qi
  2023-03-01  2:56     ` Heng Qi
  2023-03-08 14:39     ` [virtio-dev] Re: [virtio-comment] " Michael S. Tsirkin
  2023-03-01  3:30   ` [virtio-comment] " Heng Qi
  2023-03-09 12:28   ` [virtio-dev] " Heng Qi
  3 siblings, 2 replies; 105+ messages in thread
From: Heng Qi @ 2023-03-01  2:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>> If the tunnel is used to encapsulate the packets, the hash calculated
>> using the outer header of the receive packets is always fixed for the
>> same flow packets, i.e. they will be steered to the same receive queue.
> Wait a second. How is this true? Does not everyone stick the
> inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but 
it has a performance penalty, which I'll explain below.

> For example geneve spec says:
>
>     it is necessary for entropy from encapsulated packets to be
>     exposed in the tunnel header.  The most common technique for this is
>     to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the 
udp src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to 
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads, and then use the 
inner hash to forward them to another part of the CPUs that are 
responsible for processing.
1). During this process, the CPUs on the host is divided into two parts, 
one part is used as a forwarding node to parse the outer header,
      and the CPU utilization is low. Another part handles packets.
2). The entropy of the source udp src port is not enough, that is, the 
queue is not widely distributed.

2. When there is an inner header hash, the gateway will directly help 
parse the outer header, and use the inner 5 tuples to calculate the 
inner hash.
The tunneled packet is then handed over to the host.
1) All the CPUs of the host are used to process data packets, and there 
is no need to use some CPUs to forward and parse the outer header.
2) The entropy of the original quintuple is sufficient, and the queue is 
widely distributed.

Thanks.

>
> same goes for vxlan did not check further.
>
> so what is the problem?  and which tunnel types actually suffer from the
> problem?
>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-01  2:56   ` Heng Qi
@ 2023-03-01  2:56     ` Heng Qi
  2023-03-08 14:39     ` [virtio-dev] Re: [virtio-comment] " Michael S. Tsirkin
  1 sibling, 0 replies; 105+ messages in thread
From: Heng Qi @ 2023-03-01  2:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>> If the tunnel is used to encapsulate the packets, the hash calculated
>> using the outer header of the receive packets is always fixed for the
>> same flow packets, i.e. they will be steered to the same receive queue.
> Wait a second. How is this true? Does not everyone stick the
> inner header hash in the outer source port to solve this?

Yes, you are right. That's what we did before the inner header hash, but 
it has a performance penalty, which I'll explain below.

> For example geneve spec says:
>
>     it is necessary for entropy from encapsulated packets to be
>     exposed in the tunnel header.  The most common technique for this is
>     to use the UDP source port

The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the 
udp src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to 
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads, and then use the 
inner hash to forward them to another part of the CPUs that are 
responsible for processing.
1). During this process, the CPUs on the host is divided into two parts, 
one part is used as a forwarding node to parse the outer header,
      and the CPU utilization is low. Another part handles packets.
2). The entropy of the source udp src port is not enough, that is, the 
queue is not widely distributed.

2. When there is an inner header hash, the gateway will directly help 
parse the outer header, and use the inner 5 tuples to calculate the 
inner hash.
The tunneled packet is then handed over to the host.
1) All the CPUs of the host are used to process data packets, and there 
is no need to use some CPUs to forward and parse the outer header.
2) The entropy of the original quintuple is sufficient, and the queue is 
widely distributed.

Thanks.

>
> same goes for vxlan did not check further.
>
> so what is the problem?  and which tunnel types actually suffer from the
> problem?
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-comment] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-28 11:16 ` Michael S. Tsirkin
  2023-02-28 11:16   ` [virtio-dev] " Michael S. Tsirkin
  2023-03-01  2:56   ` Heng Qi
@ 2023-03-01  3:30   ` Heng Qi
  2023-03-01  3:30     ` [virtio-dev] " Heng Qi
  2023-03-01 11:07     ` Michael S. Tsirkin
  2023-03-09 12:28   ` [virtio-dev] " Heng Qi
  3 siblings, 2 replies; 105+ messages in thread
From: Heng Qi @ 2023-03-01  3:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>> If the tunnel is used to encapsulate the packets, the hash calculated
>> using the outer header of the receive packets is always fixed for the
>> same flow packets, i.e. they will be steered to the same receive queue.
> Wait a second. How is this true? Does not everyone stick the
> inner header hash in the outer source port to solve this?
> For example geneve spec says:
>
>     it is necessary for entropy from encapsulated packets to be
>     exposed in the tunnel header.  The most common technique for this is
>     to use the UDP source port
>
> same goes for vxlan did not check further.
>
> so what is the problem?  and which tunnel types actually suffer from the
> problem?


In fact, similar to protocols such as GRE, there is no outer transport 
header.

Thanks.

>


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-01  3:30   ` [virtio-comment] " Heng Qi
@ 2023-03-01  3:30     ` Heng Qi
  2023-03-01 11:07     ` Michael S. Tsirkin
  1 sibling, 0 replies; 105+ messages in thread
From: Heng Qi @ 2023-03-01  3:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>> If the tunnel is used to encapsulate the packets, the hash calculated
>> using the outer header of the receive packets is always fixed for the
>> same flow packets, i.e. they will be steered to the same receive queue.
> Wait a second. How is this true? Does not everyone stick the
> inner header hash in the outer source port to solve this?
> For example geneve spec says:
>
>     it is necessary for entropy from encapsulated packets to be
>     exposed in the tunnel header.  The most common technique for this is
>     to use the UDP source port
>
> same goes for vxlan did not check further.
>
> so what is the problem?  and which tunnel types actually suffer from the
> problem?


In fact, similar to protocols such as GRE, there is no outer transport 
header.

Thanks.

>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-01  2:36                           ` Jason Wang
  2023-03-01  2:36                             ` [virtio-dev] " Jason Wang
@ 2023-03-01 10:36                             ` Michael S. Tsirkin
  2023-03-02  2:57                               ` Jason Wang
  1 sibling, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-03-01 10:36 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Wed, Mar 01, 2023 at 10:36:41AM +0800, Jason Wang wrote:
> On Tue, Feb 28, 2023 at 7:05 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> > > On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > > > > and scalable and there's almost no maintain burden in the spec (only
> > > > > > > bytecode is required, no need any fancy features/interactions like
> > > > > > > maps), easy to be migrated etc.
> > > > > > >
> > > > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > > Yea BPF offload would be great to have. We have been discussing it for
> > > > > > years though - security issues keep blocking it. *Maybe* it's finally
> > > > > > going to be there but I'm not going to block this work waiting for BPF
> > > > > > offload.  And easily migrated is what BPF is not.
> > > > >
> > > > > Just to make sure we're at the same page. I meant to find a way to
> > > > > allow the driver/user to fully customize what it wants to
> > > > > hash/classify. Similar technologies which is based on private solution
> > > > > has been used by some vendors, which allow user to customize the
> > > > > classifier[1]
> > > > >
> > > > > ePBF looks like a good open-source solution candidate for this (there
> > > > > could be others). But there could be many kinds of eBPF programs that
> > > > > could be offloaded. One famous one is XDP which requires many features
> > > > > other than the bytecode/VM like map access, tailcall. Starting from
> > > > > such a complicated type is hard. Instead, we can start from a simple
> > > > > type, that is the eBPF classifier. All it needs is to pass the
> > > > > bytecode to the device, the device can choose to run it or compile it
> > > > > to what it can understand for classifying. We don't need maps, tail
> > > > > calls and other features.
> > > >
> > > > Until people start asking exactly for maps because they want
> > > > state for their classifier?
> > >
> > > Yes, but let's compare the eBPF without maps with the static feature
> > > proposed here. It is much more scalable and flexible.
> >
> > I looked for some examples of RSS using BPF and only found this:
> > https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
> > seems to use maps.
> 
> Yes and this is also the way we emulate RSS with TUN/TAP via steering
> eBPF support for TUN/TAP. The reason is that it needs to emulate not
> only the hash but also the indirection. If we only replace the hash
> function with the eBPF program but reuse the RSS indirection table, we
> don't need maps.

How? Add a special helper?

> >
> >
> > > > And it makes sense - if you want
> > > > e.g. load balancing you need stats which needs maps.
> > >
> > > Yes, but we know it's possible to have that (through the XDP offload).
> >
> > Not without a lot more work to make xdp offload happen.
> >
> 
> Yes, that's why a simple eBPF RSS hashing program looks much more easier.
> 
> Thanks

Notice that at this point this is no longer a generic BPF - you
are using a special helper. For tunnels I would imagine two tables
could easily turn out to be useful. Then what? Another table?
If yes then I can't say I like where this is going ...

> > > This is impossible with the approach proposed here.
> > >
> > > >
> > > > > We don't need to worry about the security
> > > > > because of its simplicity: the eBPF program is only in charge of doing
> > > > > classification, no other interactions with the driver and packet
> > > > > modification is prohibited. The feature is limited only to the
> > > > > VM/bytecode abstraction itself.
> > > > >
> > > > > What's more, it's a good first step to achieve full eBPF offloading in
> > > > > the future.
> > > > >
> > > > > Thanks
> > > > >
> > > > > [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> > > >
> > > > Dave seems to have nacked this approach, no?
> > >
> > > I may miss something but looking at kernel commit, there are few
> > > patches to support that:
> > >
> > > E.g
> > >
> > > commit c7648810961682b9388be2dd041df06915647445
> > > Author: Tony Nguyen <anthony.l.nguyen@intel.com>
> > > Date:   Mon Sep 9 06:47:44 2019 -0700
> > >
> > >     ice: Implement Dynamic Device Personalization (DDP) download
> > >
> > > And it has been used by DPDK drivers.
> > >
> > > Thanks
> > >
> > > >
> > > > > >
> > > > > > --
> > > > > > MST
> > > > > >
> > > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-01  3:30   ` [virtio-comment] " Heng Qi
  2023-03-01  3:30     ` [virtio-dev] " Heng Qi
@ 2023-03-01 11:07     ` Michael S. Tsirkin
  2023-03-01 15:10       ` Heng Qi
  1 sibling, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-03-01 11:07 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Wed, Mar 01, 2023 at 11:30:37AM +0800, Heng Qi wrote:
> 
> 
> 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > If the tunnel is used to encapsulate the packets, the hash calculated
> > > using the outer header of the receive packets is always fixed for the
> > > same flow packets, i.e. they will be steered to the same receive queue.
> > Wait a second. How is this true? Does not everyone stick the
> > inner header hash in the outer source port to solve this?
> > For example geneve spec says:
> > 
> >     it is necessary for entropy from encapsulated packets to be
> >     exposed in the tunnel header.  The most common technique for this is
> >     to use the UDP source port
> > 
> > same goes for vxlan did not check further.
> > 
> > so what is the problem?  and which tunnel types actually suffer from the
> > problem?
> 
> 
> In fact, similar to protocols such as GRE, there is no outer transport
> header.
> 
> Thanks.


Sorry I don't understand the answer. What is similar to what?
By GRE you mean NVGRE? That has FlowID for this purpose.
Only 8 bit - is this the issue? Not enough entropy?




-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-21  4:20 ` Parav Pandit
  2023-02-21  6:14   ` [virtio-comment] " Heng Qi
  2023-02-21 17:05   ` Michael S. Tsirkin
@ 2023-03-01 14:32   ` Heng Qi
  2 siblings, 0 replies; 105+ messages in thread
From: Heng Qi @ 2023-03-01 14:32 UTC (permalink / raw)
  To: Parav Pandit, Michael S . Tsirkin, Jason Wang
  Cc: Yuri Benditovich, Cornelia Huck, Xuan Zhuo, virtio-comment, virtio-dev



在 2023/2/21 下午12:20, Parav Pandit 写道:
>> From: Heng Qi <hengqi@linux.alibaba.com>
>> Sent: Saturday, February 18, 2023 9:37 AM
>> If the tunnel is used to encapsulate the packets, the hash calculated using the
> s/hash calculated/hash is calculated
>
>> outer header of the receive packets is always fixed for the same flow packets,
>> i.e. they will be steered to the same receive queue.
>>
> A little descriptive commit message like below reads better to me.
>
> Currently, when a received packet is an encapsulated packet meaning there is an outer and an inner header, virtio device is unable to calculate the hash for the inner header.
> Due to this limitation, multiple different flows identified by the inner header for the same outer header result in selecting the same receive queue.
> This effectively disables the RSS, resulting in poor receive performance.
>
> Hence, to overcome this limitation, a new feature is introduced using a feature bit VIRTIO_NET_F_HASH_TUNNEL.
> This feature enables the device to advertise the capability to calculate the hash for the inner packet header.
> Thereby regaining better RSS performance in presence of outer packet header.
>
>> We add a feature bit VIRTIO_NET_F_HASH_TUNNEL and related bitmasks in
>> \field{hash_tunnel_types}, which instructs the device to calculate the hash
>> using the inner headers of tunnel-encapsulated packets. Note that
>> VIRTIO_NET_F_HASH_TUNNEL only indicates the ability of the inner header
>> hash, and does not give the device the ability to use the hash value to select a
>> receiving queue to place the packet.
>>
>> Also, a feature bit VIRTIO_NET_F_HASH_REPORT_TUNNEL are added to report
>> an encapsulation type, and the feature depends on
>> VIRTIO_NET_F_HASH_REPORT.
> As we discussed that tunnel type alone is not useful the sw, neither as an individual field nor merged with some other field.
> Hence, please remove this feature bit. HASH_TUNNEL is good enough.
> Please remove the references to it at more places below.

If there is no \field{hash_report_tunnel} in the virtio net hdr, we seem 
to be able to re-place tunnel types
such as VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN in \field{hash_types}, 
combined with the
VIRTIO_NET_F_HASH_TUNNEL feature we can satisfy the migration, and it is 
simpler.
i.e. we don't seem to need \field{hash_tunnel_types} anymore.

What do you think?

>
>> It only means that the encapsulation type can be reported, it cannot instruct
>> the device to calculate the hash.
>>
>> +\item[VIRTIO_NET_F_HASH_TUNNEL(51)] Device supports inner header hash
>> +	for tunnel-encapsulated packets.
>> +
>> +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL(52)] Device can report an
>> encapsulation type.
>> +
> Please remove this.
>
>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications
>> coalescing.
>>
>>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>> @@ -3140,6 +3145,8 @@ \subsubsection{Feature bit
>> requirements}\label{sec:Device Types / Network Device
>> \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or
>> VIRTIO_NET_F_HOST_TSO6.
>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>> +\item[VIRTIO_NET_F_HASH_TUNNEL] Requires VIRTIO_NET_F_CTRL_VQ.
>> +\item[VIRTIO_NET_F_HASH_REPORT_TUNNEL] Requires
>> VIRTIO_NET_F_HASH_REPORT.
>>   \end{description}
>>
>>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types /
>> Network Device / Feature bits / Legacy Interface: Feature bits} @@ -3199,20
>> +3206,27 @@ \subsection{Device configuration layout}\label{sec:Device Types
>> / Network Device
>>           u8 rss_max_key_size;
>>           le16 rss_max_indirection_table_length;
>>           le32 supported_hash_types;
>> +        le32 supported_tunnel_hash_types;
>>   };
>>   \end{lstlisting}
>> -The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS
>> or VIRTIO_NET_F_HASH_REPORT is set.
>> +The following field, \field{rss_max_key_size} only exists if VIRTIO_NET_F_RSS,
>> VIRTIO_NET_F_HASH_REPORT or VIRTIO_NET_F_HASH_TUNNEL is set.
>>   It specifies the maximum supported length of RSS key in bytes.
>>
>>   The following field, \field{rss_max_indirection_table_length} only exists if
>> VIRTIO_NET_F_RSS is set.
>>   It specifies the maximum number of 16-bit entries in RSS indirection table.
>>
>>   The next field, \field{supported_hash_types} only exists if the device supports
>> hash calculation, -i.e. if VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT is
>> set.
>> +i.e. if VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or
>> VIRTIO_NET_F_HASH_TUNNEL is set.
>>
>>   Field \field{supported_hash_types} contains the bitmask of supported hash
>> types.
>>   See \ref{sec:Device Types / Network Device / Device Operation / Processing of
>> Incoming Packets / Hash calculation for incoming packets / Supported/enabled
>> hash types} for details of supported hash types.
>>
>> +The next field, \field{supported_tunnel_hash_types} only exists if the
>> +device supports inner hash calculation, i.e. if VIRTIO_NET_F_HASH_TUNNEL is
>> set.
>> +
>> +Field \field{supported_tunnel_hash_types} contains the bitmask of supported
>> tunnel hash types.
>> +See \ref{sec:Device Types / Network Device / Device Operation / Processing
>> of Incoming Packets / Hash calculation for incoming packets /
>> Supported/enabled tunnel hash types} for details of supported tunnel hash
>> types.
>> +
>>   \devicenormative{\subsubsection}{Device configuration layout}{Device Types /
>> Network Device / Device configuration layout}
>>
>>   The device MUST set \field{max_virtqueue_pairs} to between 1 and 0x8000
>> inclusive, @@ -3236,7 +3250,7 @@ \subsection{Device configuration
>> layout}\label{sec:Device Types / Network Device  negotiated.
>>
>>   The device MUST set \field{rss_max_key_size} to at least 40, if it offers -
>> VIRTIO_NET_F_RSS or VIRTIO_NET_F_HASH_REPORT.
>> +VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT or
>> VIRTIO_NET_F_HASH_TUNNEL.
>>
>>   The device MUST set \field{rss_max_indirection_table_length} to at least 128,
>> if it offers  VIRTIO_NET_F_RSS.
>> @@ -3385,7 +3399,8 @@ \subsection{Device Operation}\label{sec:Device
>> Types / Network Device / Device O
>>           le16 csum_offset;
>>           le16 num_buffers;
>>           le32 hash_value;        (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
>> -        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
>> +        le16 hash_report;       (Only if VIRTIO_NET_F_HASH_REPORT negotiated,
>> and the upper 8 bits indicates the
>> +                                 encapsulation type if
>> + VIRTIO_NET_F_HASH_REPORT_TUNNEL negotiated, otherwise reserved)
>>           le16 padding_reserved;  (Only if VIRTIO_NET_F_HASH_REPORT
>> negotiated)  };  \end{lstlisting} @@ -3838,11 +3853,15 @@
>> \subsubsection{Processing of Incoming Packets}\label{sec:Device Types /
>> Network  \begin{itemize}  \item The feature VIRTIO_NET_F_RSS was
>> negotiated. The device uses the hash to determine the receive virtqueue to
>> place incoming packets.
>>   \item The feature VIRTIO_NET_F_HASH_REPORT was negotiated. The device
>> reports the hash value and the hash type with the packet.
>> +\item The feature VIRTIO_NET_F_HASH_TUNNEL was negotiated. The device
>> supports inner hash calculation. If additionally
>> +      VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device reports
>> the encapsulation type as well.
>>   \end{itemize}
>>
>>   If the feature VIRTIO_NET_F_RSS was negotiated:
>>   \begin{itemize}
>>   \item The device uses \field{hash_types} of the virtio_net_rss_config structure
>> as 'Enabled hash types' bitmask.
>> +	If additionally VIRTIO_NET_F_HASH_TUNNEL was negotiated, the
>> device uses \field{hash_tunnel_types} of the
>> +	virtio_net_rss_config structure as 'Enabled hash tunnel types' bitmask.
>>   \item The device uses a key as defined in \field{hash_key_data} and
>> \field{hash_key_length} of the virtio_net_rss_config structure (see
>> \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue
>> / Receive-side scaling (RSS) / Setting RSS parameters}).
>>   \end{itemize}
>> @@ -3850,11 +3869,13 @@ \subsubsection{Processing of Incoming
>> Packets}\label{sec:Device Types / Network  If the feature VIRTIO_NET_F_RSS
>> was not negotiated:
>>   \begin{itemize}
>>   \item The device uses \field{hash_types} of the virtio_net_hash_config
>> structure as 'Enabled hash types' bitmask.
>> +	If additionally VIRTIO_NET_F_HASH_TUNNEL was negotiated, the
>> device uses \field{hash_tunnel_types} of the
>> +	virtio_net_hash_config structure as 'Enabled hash tunnel types'
>> bitmask.
>>   \item The device uses a key as defined in \field{hash_key_data} and
>> \field{hash_key_length} of the virtio_net_hash_config structure (see
>> \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue
>> / Automatic receive steering in multiqueue mode / Hash calculation}).
>>   \end{itemize}
>>
>> -Note that if the device offers VIRTIO_NET_F_HASH_REPORT, even if it
>> supports only one pair of virtqueues, it MUST support
>> +Note that if the device offers VIRTIO_NET_F_HASH_REPORT or
>> +VIRTIO_NET_F_HASH_TUNNEL, even if it supports only one pair of
>> +virtqueues, it MUST support
>>   at least one of commands of VIRTIO_NET_CTRL_MQ class to configure
>> reported hash parameters:
>>   \begin{itemize}
>>   \item If the device offers VIRTIO_NET_F_RSS, it MUST support
>> VIRTIO_NET_CTRL_MQ_RSS_CONFIG command per @@ -3863,8 +3884,36 @@
>> \subsubsection{Processing of Incoming Packets}\label{sec:Device Types /
>> Network
>>    \ref{sec:Device Types / Network Device / Device Operation / Control
>> Virtqueue / Automatic receive steering in multiqueue mode / Hash calculation}.
>>   \end{itemize}
>>
>> +\subparagraph{Tunnel/Encapsulated packet} \label{sec:Device Types /
>> +Network Device / Device Operation / Processing of Incoming Packets /
>> +Hash calculation for incoming packets / Tunnel/Encapsulated packet} A
>> +tunnel packet is encapsulated from the original packet based on the
>> +tunneling protocol (only a single level of encapsulation is currently
>> +supported). The encapsulated packet contains an outer header and an inner
>> header, and the device calculates the hash over either the inner header or the
>> outer header.
>> +
>> +When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and the
>> +corresponding encapsulation type is set in \field{hash_tunnel_types},
>> +the hash for a specific type of encapsulated packet is calculated over the inner
>> as opposed to outer header.
> To the outer header.
>
> Here, you want to say that
> When the feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and a received packet's outer header matches one of the supported hash_tunnel_types, the hash of the inner header is calculated.
>
>> +Supported encapsulation types are listed in \ref{sec:Device Types /
>> +Network Device / Device Operation / Processing of Incoming Packets /
>> +Hash calculation for incoming packets / Supported/enabled hash tunnel
>> types}.
>> +
>> +If both VIRTIO_NET_F_HASH_REPORT_TUNNEL and
>> VIRTIO_NET_F_HASH_REPORT
>> +are negotiated, and hash is calculated for an encapsulated  packet, the
>> +device reports the encapsulation type in addition to the hash value and
>> +hash type, regardless of whether the hash is calculated on the inner header or
>> the outer header.
>> +
>> +If VIRTIO_NET_F_HASH_REPORT and VIRTIO_NET_F_HASH_REPORT_TUNNEL
>> are
>> +negotiated but VIRTIO_NET_F_HASH_TUNNEL is not negotiated, the device
>> +calculates the hash over the outer header, and \field{hash_report} reports the
>> hash type and encapsulation type.
>> +
>> +Some encapsulated packet types: \hyperref[intro:GRE]{[GRE]},
>> +\hyperref[intro:VXLAN]{[VXLAN]}, \hyperref[intro:GENEVE]{[GENEVE]},
>> \hyperref[intro:IPIP]{[IPIP]} and \hyperref[intro:NVGRE]{[NVGRE]}.
>> +
>>   \subparagraph{Supported/enabled hash types}  \label{sec:Device Types /
>> Network Device / Device Operation / Processing of Incoming Packets / Hash
>> calculation for incoming packets / Supported/enabled hash types}
>> +This paragraph relies on definitions from \hyperref[intro:IP]{[IP]},
>> +\hyperref[intro:UDP]{[UDP]} and \hyperref[intro:TCP]{[TCP]}.
>>   Hash types applicable for IPv4 packets:
>>   \begin{lstlisting}
>>   #define VIRTIO_NET_HASH_TYPE_IPv4              (1 << 0)
>> @@ -3884,6 +3933,32 @@ \subsubsection{Processing of Incoming
>> Packets}\label{sec:Device Types / Network
>>   #define VIRTIO_NET_HASH_TYPE_UDP_EX            (1 << 8)
>>   \end{lstlisting}
>>
> Lets please remove the below encoding.
>
>> +\subparagraph{Supported/enabled tunnel hash types} \label{sec:Device
>> +Types / Network Device / Device Operation / Processing of Incoming
>> +Packets / Hash calculation for incoming packets / Supported/enabled
>> +tunnel hash types} If the feature VIRTIO_NET_F_HASH_TUNNEL is
>> +negotiated, the encapsulation hash type indicates that the hash is calculated
>> over the inner header of the encapsulated packet:
>> +Hash type applicable for inner payload of the gre-encapsulated packet
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GRE         (1 << 0)
>> +\end{lstlisting}
>> +Hash type applicable for inner payload of the vxlan-encapsulated packet
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_VXLAN       (1 << 1)
>> +\end{lstlisting}
>> +Hash type applicable for inner payload of the geneve-encapsulated
>> +packet \begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_GENEVE      (1 << 2)
>> +\end{lstlisting}
>> +Hash type applicable for inner payload of the ip-encapsulated packet
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_IPIP        (1 << 3)
>> +\end{lstlisting}
>> +Hash type applicable for inner payload of the nvgre-encapsulated packet
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_TYPE_NVGRE       (1 << 4)
>> +\end{lstlisting}
>> +
>>   \subparagraph{IPv4 packets}
>>   \label{sec:Device Types / Network Device / Device Operation / Processing of
>> Incoming Packets / Hash calculation for incoming packets / IPv4 packets}  The
>> device calculates the hash on IPv4 packets according to 'Enabled hash types'
>> bitmask as follows:
>> @@ -3975,17 +4050,47 @@ \subsubsection{Processing of Incoming
>> Packets}\label{sec:Device Types / Network  (see \ref{sec:Device Types /
>> Network Device / Device Operation / Processing of Incoming Packets / Hash
>> calculation for incoming packets / IPv6 packets without extension header}).
>>   \end{itemize}
>>
>> +\subparagraph{Inner hash calculation of an encapsulated packet} If the
>> +feature VIRTIO_NET_F_HASH_TUNNEL is negotiated and the corresponding
>> +encapsulation hash type is set in \field{hash_tunnel_types}, the device
>> +calculates the hash on the inner header of an encapsulated packet (See
>> +\ref{sec:Device Types / Network Device / Device Operation / Processing
>> +of Incoming Packets / Hash calculation for incoming packets /
>> Tunnel/Encapsulated packet}).
>> +
>> +\subparagraph{Security risks between encapsulated packets and RSS}
>> +There may be potential security risks when encapsulated packets using
> s/when encapsulated/when encapsulating/
>
>> +RSS to select queues for placement. When a user inside a tunnel tries
>> +to control the enqueuing of encapsulated packets, then the user can
>> +flood the device with invaild packets, and the flooded packets may be
>> +hashed into the same queue as packets in other normal tunnels, which causing
>> the queue to overflow.
>>
> Invalid packets are confusing and the wording of "which causing" is not proper.
> There is some duplicate wording below too.
>
> I think above and below risk can be summarized in bit simpler manner.
>
> How about,
>
> When a specific receive queue is shared to receive packets of multiple tunnels, there is no quality of service for packets of multiple tunnels.
>
> +
>> +This can pose several security risks:
>> +\begin{itemize}
>> +\item  Encapsulated packets in the normal tunnels cannot be enqueued due to
>> queue
>> +       overflow, resulting in a large amount of packet loss.
>> +\item  The delay and retransmission of packets in the normal tunnels are
>> extremely increased.
> This is something very protocol specific and doesn't belong here.
>
>> +\item  The user can observe the traffic information and enqueue information
>> of other normal
>> +       tunnels, and conduct targeted DoS attacks.
> Once hash_report_tunnel_types is removed, this second attack is no longer applicable.
> Hence, please remove this too.
>
>> +\end{\itemize}
>> +
>>   \paragraph{Hash reporting for incoming packets}  \label{sec:Device Types /
>> Network Device / Device Operation / Processing of Incoming Packets / Hash
>> reporting for incoming packets}
>> -
>> -If VIRTIO_NET_F_HASH_REPORT was negotiated and
>> - the device has calculated the hash for the packet, the device fills
>> \field{hash_report} with the report type of calculated hash -and
>> \field{hash_value} with the value of calculated hash.
>> -
>> -If VIRTIO_NET_F_HASH_REPORT was negotiated but due to any reason the -
>> hash was not calculated, the device sets \field{hash_report} to
>> VIRTIO_NET_HASH_REPORT_NONE.
>> -
>> -Possible values that the device can report in \field{hash_report} are defined
>> below.
>> +If VIRTIO_NET_F_HASH_REPORT was negotiated and the device has
>> +calculated the hash for the packet, the device fills the lower 8 bits
>> +of \field{hash_report} with the report type of calculated hash, and
>> +\field{hash_value} with the value of calculated hash. Also, if
>> +VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device needs to
>> fill the upper 8 bits of \field{hash_report} with the encapsulation type.
>> +
>> +If VIRTIO_NET_F_HASH_REPORT was negotiated but due to any reason the
>> +hash was not calculated, the device sets the lower 8 bits of
>> +\field{hash_report} to VIRTIO_NET_HASH_REPORT_NONE.
>> +
>> +If VIRTIO_NET_F_HASH_REPORT_TUNNEL was negotiated, the device fills the
>> +upper
>> +8 bits of \field{hash_report} with the encapsulation type for an
>> +encapsulated packet. Note that the upper 8 bits are all set to 0 for an
>> +unencapsulated packet, regardless of whether
>> VIRTIO_NET_F_HASH_REPORT_TUNNEL is negotiated or not.
>> +
>> +Possible hash types that the device can report in \field{hash_report} are
>> defined below.
>>   They correspond to supported hash types defined in  \ref{sec:Device Types /
>> Network Device / Device Operation / Processing of Incoming Packets / Hash
>> calculation for incoming packets / Supported/enabled hash types}  as follows:
>> @@ -4005,6 +4110,26 @@ \subsubsection{Processing of Incoming
>> Packets}\label{sec:Device Types / Network
>>   #define VIRTIO_NET_HASH_REPORT_UDPv6_EX        9
>>   \end{lstlisting}
>>
>> +The upper 8 bits of \field{hash_report} can report the encapsulation
>> +type to the driver if VIRTIO_NET_F_HASH_REPORT_TUNNEL is negotiated.
>> +Possible encapsulation types that the device can report in \field{hash_report}
>> are defined below.
>> +They correspond to supported hash tunnel types defined in
>> +\ref{sec:Device Types / Network Device / Device Operation / Processing
>> +of Incoming Packets / Hash calculation for incoming packets /
>> Supported/enabled hash tunnel types} as follows:
>> +
>> +VIRTIO_NET_HASH_TUNNEL_TYPE_XXX = 1 <<
>> +(VIRTIO_NET_HASH_TUNNEL_REPORT_XXX - 256)
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_GRE      256
>> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_VXLAN    257
>> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_GENEVE   258
>> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_IPIP     259
>> +#define VIRTIO_NET_HASH_TUNNEL_REPORT_NVGRE    260
>> +\end{lstlisting}
>> +
>> +They correspond to supported hash types defined in \ref{sec:Device
>> +Types / Network Device / Device Operation / Processing of Incoming Packets /
>> Hash calculation for incoming packets / Supported/enabled hash types}.
>> +
>>   \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device /
>> Device Operation / Control Virtqueue}
>>
>>   The driver uses the control virtqueue (if VIRTIO_NET_F_CTRL_VQ is @@ -
>> 4364,6 +4489,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types
>> / Network Device / Devi  \begin{lstlisting}  struct virtio_net_hash_config {
>>       le32 hash_types;
>> +    le32 hash_tunnel_types;
>>       le16 reserved[4];
>>       u8 hash_key_length;
>>       u8 hash_key_data[hash_key_length];
>> @@ -4372,7 +4498,11 @@ \subsubsection{Control
>> Virtqueue}\label{sec:Device Types / Network Device / Devi  Field
>> \field{hash_types} contains a bitmask of allowed hash types as  defined in
>> \ref{sec:Device Types / Network Device / Device Operation / Processing of
>> Incoming Packets / Hash calculation for incoming packets / Supported/enabled
>> hash types}.
>> -Initially the device has all hash types disabled and reports only
>> VIRTIO_NET_HASH_REPORT_NONE.
>> +
>> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash
>> +tunnel types as defined in \ref{sec:Device Types / Network Device / Device
>> Operation / Processing of Incoming Packets / Hash calculation for incoming
>> packets / Supported/enabled hash tunnel types}.
>> +
>> +Initially the device has all hash types and hash tunnel types disabled and
>> reports only VIRTIO_NET_HASH_REPORT_NONE.
>>
>>   Field \field{reserved} MUST contain zeroes. It is defined to make the structure
>> to match the layout of virtio_net_rss_config structure,  defined in
>> \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue
>> / Receive-side scaling (RSS)}.
>> @@ -4390,6 +4520,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device
>> Types / Network Device / Devi  \begin{lstlisting}  struct virtio_net_rss_config {
>>       le32 hash_types;
>> +    le32 hash_tunnel_types;
> This field is not needed as device config space advertisement for the support is enough.
>
> If the intent is to enable hashing for the specific tunnel(s), an individual command is better.
>
> Regardless, this new field cannot be in the middle of the new structure as it breaks backward compatibility.
>
>>       le16 indirection_table_mask;
>>       le16 unclassified_queue;
>>       le16 indirection_table[indirection_table_length];
>> @@ -4402,6 +4533,9 @@ \subsubsection{Control Virtqueue}\label{sec:Device
>> Types / Network Device / Devi  defined in  \ref{sec:Device Types / Network
>> Device / Device Operation / Processing of Incoming Packets / Hash calculation
>> for incoming packets / Supported/enabled hash types}.
>>
>> +Field \field{hash_tunnel_types} contains a bitmask of allowed hash
>> +tunnel types as defined in \ref{sec:Device Types / Network Device / Device
>> Operation / Processing of Incoming Packets / Hash calculation for incoming
>> packets / Supported/enabled hash tunnel types}.
>> +
>>   Field \field{indirection_table_mask} is a mask to be applied to  the calculated
>> hash to produce an index in the  \field{indirection_table} array.
>> diff --git a/introduction.tex b/introduction.tex index 287c5fc..69b95ae 100644
>> --- a/introduction.tex
>> +++ b/introduction.tex


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-01 11:07     ` Michael S. Tsirkin
@ 2023-03-01 15:10       ` Heng Qi
  0 siblings, 0 replies; 105+ messages in thread
From: Heng Qi @ 2023-03-01 15:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



在 2023/3/1 下午7:07, Michael S. Tsirkin 写道:
> On Wed, Mar 01, 2023 at 11:30:37AM +0800, Heng Qi wrote:
>>
>> 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
>>> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>>>> If the tunnel is used to encapsulate the packets, the hash calculated
>>>> using the outer header of the receive packets is always fixed for the
>>>> same flow packets, i.e. they will be steered to the same receive queue.
>>> Wait a second. How is this true? Does not everyone stick the
>>> inner header hash in the outer source port to solve this?
>>> For example geneve spec says:
>>>
>>>      it is necessary for entropy from encapsulated packets to be
>>>      exposed in the tunnel header.  The most common technique for this is
>>>      to use the UDP source port
>>>
>>> same goes for vxlan did not check further.
>>>
>>> so what is the problem?  and which tunnel types actually suffer from the
>>> problem?
>>
>> In fact, similar to protocols such as GRE, there is no outer transport
>> header.
>>
>> Thanks.
>
> Sorry I don't understand the answer. What is similar to what?
> By GRE you mean NVGRE? That has FlowID for this purpose.
> Only 8 bit - is this the issue? Not enough entropy?

Sorry I almost missed this email. 😮

Did you miss the reply in the other thread:
"
The end point of the tunnel called the gateway (with DPDK on top of it).

1. When there is no inner header hash, entropy can be inserted into the 
udp src port of the outer header of the tunnel,
and then the tunnel packet is handed over to the host. The host needs to 
take out a part of the CPUs to parse the outer headers (but not drop them)
to calculate the inner hash for the inner payloads, and then use the 
inner hash to forward them to another part of the CPUs that are 
responsible for processing.
1). During this process, the CPUs on the host are divided into two 
parts, one part is used as a forwarding node to parse the outer header,
       and the CPU utilization is low. Another part handles packets.
2). The entropy of the source udp src port is not enough, that is, the 
queue is not widely distributed.

2. When there is an inner header hash, the gateway will directly help 
parse the outer header, and use the inner 5 tuples to calculate the 
inner hash.
The tunneled packet is then handed over to the host.
1) All the CPUs of the host are used to process data packets, and there 
is no need to use some CPUs to forward and parse the outer header.
2) The entropy of the original quintuple is sufficient, and the queue is 
widely distributed.
"

In this thread, I mean protocols such as Generic Routing Encapsulation 
(GRE)[1], which have IPv4 as *Delivery Header*.
Compared with VXLAN, which increases entropy through outer udp src port, 
GRE has less entropy.

[1] https://www.rfc-editor.org/rfc/rfc2784.html

Thanks.

>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-01 10:36                             ` Michael S. Tsirkin
@ 2023-03-02  2:57                               ` Jason Wang
  2023-03-02  7:42                                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Jason Wang @ 2023-03-02  2:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Wed, Mar 1, 2023 at 6:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Mar 01, 2023 at 10:36:41AM +0800, Jason Wang wrote:
> > On Tue, Feb 28, 2023 at 7:05 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> > > > On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > > > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > > > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > > > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > > > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > > > > > and scalable and there's almost no maintain burden in the spec (only
> > > > > > > > bytecode is required, no need any fancy features/interactions like
> > > > > > > > maps), easy to be migrated etc.
> > > > > > > >
> > > > > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > >
> > > > > > > Yea BPF offload would be great to have. We have been discussing it for
> > > > > > > years though - security issues keep blocking it. *Maybe* it's finally
> > > > > > > going to be there but I'm not going to block this work waiting for BPF
> > > > > > > offload.  And easily migrated is what BPF is not.
> > > > > >
> > > > > > Just to make sure we're at the same page. I meant to find a way to
> > > > > > allow the driver/user to fully customize what it wants to
> > > > > > hash/classify. Similar technologies which is based on private solution
> > > > > > has been used by some vendors, which allow user to customize the
> > > > > > classifier[1]
> > > > > >
> > > > > > ePBF looks like a good open-source solution candidate for this (there
> > > > > > could be others). But there could be many kinds of eBPF programs that
> > > > > > could be offloaded. One famous one is XDP which requires many features
> > > > > > other than the bytecode/VM like map access, tailcall. Starting from
> > > > > > such a complicated type is hard. Instead, we can start from a simple
> > > > > > type, that is the eBPF classifier. All it needs is to pass the
> > > > > > bytecode to the device, the device can choose to run it or compile it
> > > > > > to what it can understand for classifying. We don't need maps, tail
> > > > > > calls and other features.
> > > > >
> > > > > Until people start asking exactly for maps because they want
> > > > > state for their classifier?
> > > >
> > > > Yes, but let's compare the eBPF without maps with the static feature
> > > > proposed here. It is much more scalable and flexible.
> > >
> > > I looked for some examples of RSS using BPF and only found this:
> > > https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
> > > seems to use maps.
> >
> > Yes and this is also the way we emulate RSS with TUN/TAP via steering
> > eBPF support for TUN/TAP. The reason is that it needs to emulate not
> > only the hash but also the indirection. If we only replace the hash
> > function with the eBPF program but reuse the RSS indirection table, we
> > don't need maps.
>
> How? Add a special helper?

We can let the eBPF program return the hash:

[eBPF hasing] -> hash value -> [indirection table lookup]

Note that if we don't consider future full eBPF offloading, we can
start with classical BPF.

Thanks

>
> > >
> > >
> > > > > And it makes sense - if you want
> > > > > e.g. load balancing you need stats which needs maps.
> > > >
> > > > Yes, but we know it's possible to have that (through the XDP offload).
> > >
> > > Not without a lot more work to make xdp offload happen.
> > >
> >
> > Yes, that's why a simple eBPF RSS hashing program looks much more easier.
> >
> > Thanks
>
> Notice that at this point this is no longer a generic BPF - you
> are using a special helper. For tunnels I would imagine two tables
> could easily turn out to be useful. Then what? Another table?
> If yes then I can't say I like where this is going ...
>
> > > > This is impossible with the approach proposed here.
> > > >
> > > > >
> > > > > > We don't need to worry about the security
> > > > > > because of its simplicity: the eBPF program is only in charge of doing
> > > > > > classification, no other interactions with the driver and packet
> > > > > > modification is prohibited. The feature is limited only to the
> > > > > > VM/bytecode abstraction itself.
> > > > > >
> > > > > > What's more, it's a good first step to achieve full eBPF offloading in
> > > > > > the future.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> > > > >
> > > > > Dave seems to have nacked this approach, no?
> > > >
> > > > I may miss something but looking at kernel commit, there are few
> > > > patches to support that:
> > > >
> > > > E.g
> > > >
> > > > commit c7648810961682b9388be2dd041df06915647445
> > > > Author: Tony Nguyen <anthony.l.nguyen@intel.com>
> > > > Date:   Mon Sep 9 06:47:44 2019 -0700
> > > >
> > > >     ice: Implement Dynamic Device Personalization (DDP) download
> > > >
> > > > And it has been used by DPDK drivers.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > > >
> > > > > > > --
> > > > > > > MST
> > > > > > >
> > > > >
> > >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-02  2:57                               ` Jason Wang
@ 2023-03-02  7:42                                 ` Michael S. Tsirkin
  2023-03-02  7:57                                   ` Jason Wang
  0 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-03-02  7:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Thu, Mar 02, 2023 at 10:57:12AM +0800, Jason Wang wrote:
> On Wed, Mar 1, 2023 at 6:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Mar 01, 2023 at 10:36:41AM +0800, Jason Wang wrote:
> > > On Tue, Feb 28, 2023 at 7:05 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> > > > > On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > > > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > > > > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > > > > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > > > > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > > > > > > and scalable and there's almost no maintain burden in the spec (only
> > > > > > > > > bytecode is required, no need any fancy features/interactions like
> > > > > > > > > maps), easy to be migrated etc.
> > > > > > > > >
> > > > > > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > >
> > > > > > > > Yea BPF offload would be great to have. We have been discussing it for
> > > > > > > > years though - security issues keep blocking it. *Maybe* it's finally
> > > > > > > > going to be there but I'm not going to block this work waiting for BPF
> > > > > > > > offload.  And easily migrated is what BPF is not.
> > > > > > >
> > > > > > > Just to make sure we're at the same page. I meant to find a way to
> > > > > > > allow the driver/user to fully customize what it wants to
> > > > > > > hash/classify. Similar technologies which is based on private solution
> > > > > > > has been used by some vendors, which allow user to customize the
> > > > > > > classifier[1]
> > > > > > >
> > > > > > > ePBF looks like a good open-source solution candidate for this (there
> > > > > > > could be others). But there could be many kinds of eBPF programs that
> > > > > > > could be offloaded. One famous one is XDP which requires many features
> > > > > > > other than the bytecode/VM like map access, tailcall. Starting from
> > > > > > > such a complicated type is hard. Instead, we can start from a simple
> > > > > > > type, that is the eBPF classifier. All it needs is to pass the
> > > > > > > bytecode to the device, the device can choose to run it or compile it
> > > > > > > to what it can understand for classifying. We don't need maps, tail
> > > > > > > calls and other features.
> > > > > >
> > > > > > Until people start asking exactly for maps because they want
> > > > > > state for their classifier?
> > > > >
> > > > > Yes, but let's compare the eBPF without maps with the static feature
> > > > > proposed here. It is much more scalable and flexible.
> > > >
> > > > I looked for some examples of RSS using BPF and only found this:
> > > > https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
> > > > seems to use maps.
> > >
> > > Yes and this is also the way we emulate RSS with TUN/TAP via steering
> > > eBPF support for TUN/TAP. The reason is that it needs to emulate not
> > > only the hash but also the indirection. If we only replace the hash
> > > function with the eBPF program but reuse the RSS indirection table, we
> > > don't need maps.
> >
> > How? Add a special helper?
> 
> We can let the eBPF program return the hash:
> 
> [eBPF hasing] -> hash value -> [indirection table lookup]
> 
> Note that if we don't consider future full eBPF offloading, we can
> start with classical BPF.
> 
> Thanks

So again this is a custom thing not a standard use of BPF.
Normally value returned is pass/drop.

> >
> > > >
> > > >
> > > > > > And it makes sense - if you want
> > > > > > e.g. load balancing you need stats which needs maps.
> > > > >
> > > > > Yes, but we know it's possible to have that (through the XDP offload).
> > > >
> > > > Not without a lot more work to make xdp offload happen.
> > > >
> > >
> > > Yes, that's why a simple eBPF RSS hashing program looks much more easier.
> > >
> > > Thanks
> >
> > Notice that at this point this is no longer a generic BPF - you
> > are using a special helper. For tunnels I would imagine two tables
> > could easily turn out to be useful. Then what? Another table?
> > If yes then I can't say I like where this is going ...
> >
> > > > > This is impossible with the approach proposed here.
> > > > >
> > > > > >
> > > > > > > We don't need to worry about the security
> > > > > > > because of its simplicity: the eBPF program is only in charge of doing
> > > > > > > classification, no other interactions with the driver and packet
> > > > > > > modification is prohibited. The feature is limited only to the
> > > > > > > VM/bytecode abstraction itself.
> > > > > > >
> > > > > > > What's more, it's a good first step to achieve full eBPF offloading in
> > > > > > > the future.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> > > > > >
> > > > > > Dave seems to have nacked this approach, no?
> > > > >
> > > > > I may miss something but looking at kernel commit, there are few
> > > > > patches to support that:
> > > > >
> > > > > E.g
> > > > >
> > > > > commit c7648810961682b9388be2dd041df06915647445
> > > > > Author: Tony Nguyen <anthony.l.nguyen@intel.com>
> > > > > Date:   Mon Sep 9 06:47:44 2019 -0700
> > > > >
> > > > >     ice: Implement Dynamic Device Personalization (DDP) download
> > > > >
> > > > > And it has been used by DPDK drivers.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > MST
> > > > > > > >
> > > > > >
> > > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-02  7:42                                 ` Michael S. Tsirkin
@ 2023-03-02  7:57                                   ` Jason Wang
  2023-03-02  8:09                                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Jason Wang @ 2023-03-02  7:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Thu, Mar 2, 2023 at 3:42 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Mar 02, 2023 at 10:57:12AM +0800, Jason Wang wrote:
> > On Wed, Mar 1, 2023 at 6:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Mar 01, 2023 at 10:36:41AM +0800, Jason Wang wrote:
> > > > On Tue, Feb 28, 2023 at 7:05 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Tue, Feb 28, 2023 at 11:04:26AM +0800, Jason Wang wrote:
> > > > > > On Tue, Feb 28, 2023 at 1:49 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Feb 27, 2023 at 04:35:09PM +0800, Jason Wang wrote:
> > > > > > > > On Mon, Feb 27, 2023 at 3:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Feb 27, 2023 at 12:07:17PM +0800, Jason Wang wrote:
> > > > > > > > > > Btw, this kind of 1:1 hash features seems not scalable and flexible.
> > > > > > > > > > It requires an endless extension on bits/fields. Modern NICs allow the
> > > > > > > > > > user to customize the hash calculation, for virtio-net we can allow to
> > > > > > > > > > use eBPF program to classify the packets. It seems to be more flexible
> > > > > > > > > > and scalable and there's almost no maintain burden in the spec (only
> > > > > > > > > > bytecode is required, no need any fancy features/interactions like
> > > > > > > > > > maps), easy to be migrated etc.
> > > > > > > > > >
> > > > > > > > > > Prototype is also easy, tun/tap had an eBPF classifier for years.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > Yea BPF offload would be great to have. We have been discussing it for
> > > > > > > > > years though - security issues keep blocking it. *Maybe* it's finally
> > > > > > > > > going to be there but I'm not going to block this work waiting for BPF
> > > > > > > > > offload.  And easily migrated is what BPF is not.
> > > > > > > >
> > > > > > > > Just to make sure we're at the same page. I meant to find a way to
> > > > > > > > allow the driver/user to fully customize what it wants to
> > > > > > > > hash/classify. Similar technologies which is based on private solution
> > > > > > > > has been used by some vendors, which allow user to customize the
> > > > > > > > classifier[1]
> > > > > > > >
> > > > > > > > ePBF looks like a good open-source solution candidate for this (there
> > > > > > > > could be others). But there could be many kinds of eBPF programs that
> > > > > > > > could be offloaded. One famous one is XDP which requires many features
> > > > > > > > other than the bytecode/VM like map access, tailcall. Starting from
> > > > > > > > such a complicated type is hard. Instead, we can start from a simple
> > > > > > > > type, that is the eBPF classifier. All it needs is to pass the
> > > > > > > > bytecode to the device, the device can choose to run it or compile it
> > > > > > > > to what it can understand for classifying. We don't need maps, tail
> > > > > > > > calls and other features.
> > > > > > >
> > > > > > > Until people start asking exactly for maps because they want
> > > > > > > state for their classifier?
> > > > > >
> > > > > > Yes, but let's compare the eBPF without maps with the static feature
> > > > > > proposed here. It is much more scalable and flexible.
> > > > >
> > > > > I looked for some examples of RSS using BPF and only found this:
> > > > > https://github.com/Netronome/bpf-samples/blob/master/programmable_rss/rss_user.c
> > > > > seems to use maps.
> > > >
> > > > Yes and this is also the way we emulate RSS with TUN/TAP via steering
> > > > eBPF support for TUN/TAP. The reason is that it needs to emulate not
> > > > only the hash but also the indirection. If we only replace the hash
> > > > function with the eBPF program but reuse the RSS indirection table, we
> > > > don't need maps.
> > >
> > > How? Add a special helper?
> >
> > We can let the eBPF program return the hash:
> >
> > [eBPF hasing] -> hash value -> [indirection table lookup]
> >
> > Note that if we don't consider future full eBPF offloading, we can
> > start with classical BPF.
> >
> > Thanks
>
> So again this is a custom thing not a standard use of BPF.
> Normally value returned is pass/drop.

AFAIK there's no standard here. The semantic of the return value is
determined by the context of the (e)BPF program.

Kernel had already used the eBPF program for hashing, classifying
various types of eBPF program other than XDP/socket filter
(pass/drop).

Thanks

>
> > >
> > > > >
> > > > >
> > > > > > > And it makes sense - if you want
> > > > > > > e.g. load balancing you need stats which needs maps.
> > > > > >
> > > > > > Yes, but we know it's possible to have that (through the XDP offload).
> > > > >
> > > > > Not without a lot more work to make xdp offload happen.
> > > > >
> > > >
> > > > Yes, that's why a simple eBPF RSS hashing program looks much more easier.
> > > >
> > > > Thanks
> > >
> > > Notice that at this point this is no longer a generic BPF - you
> > > are using a special helper. For tunnels I would imagine two tables
> > > could easily turn out to be useful. Then what? Another table?
> > > If yes then I can't say I like where this is going ...
> > >
> > > > > > This is impossible with the approach proposed here.
> > > > > >
> > > > > > >
> > > > > > > > We don't need to worry about the security
> > > > > > > > because of its simplicity: the eBPF program is only in charge of doing
> > > > > > > > classification, no other interactions with the driver and packet
> > > > > > > > modification is prohibited. The feature is limited only to the
> > > > > > > > VM/bytecode abstraction itself.
> > > > > > > >
> > > > > > > > What's more, it's a good first step to achieve full eBPF offloading in
> > > > > > > > the future.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > [1] https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
> > > > > > >
> > > > > > > Dave seems to have nacked this approach, no?
> > > > > >
> > > > > > I may miss something but looking at kernel commit, there are few
> > > > > > patches to support that:
> > > > > >
> > > > > > E.g
> > > > > >
> > > > > > commit c7648810961682b9388be2dd041df06915647445
> > > > > > Author: Tony Nguyen <anthony.l.nguyen@intel.com>
> > > > > > Date:   Mon Sep 9 06:47:44 2019 -0700
> > > > > >
> > > > > >     ice: Implement Dynamic Device Personalization (DDP) download
> > > > > >
> > > > > > And it has been used by DPDK drivers.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > MST
> > > > > > > > >
> > > > > > >
> > > > >
> > >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-02  7:57                                   ` Jason Wang
@ 2023-03-02  8:09                                     ` Michael S. Tsirkin
  2023-03-02  8:15                                       ` Jason Wang
  0 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-03-02  8:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote:
> Kernel had already used the eBPF program for hashing, classifying
> various types of eBPF program other than XDP/socket filter
> (pass/drop).
> 
> Thanks

where is it used for hashing?

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-02  8:09                                     ` Michael S. Tsirkin
@ 2023-03-02  8:15                                       ` Jason Wang
  2023-03-02  8:41                                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Jason Wang @ 2023-03-02  8:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Thu, Mar 2, 2023 at 4:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote:
> > Kernel had already used the eBPF program for hashing, classifying
> > various types of eBPF program other than XDP/socket filter
> > (pass/drop).
> >
> > Thanks
>
> where is it used for hashing?

I can see it is used by team/lb:

static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv,
                                    struct sk_buff *skb)
{
        struct bpf_prog *fp;
        uint32_t lhash;
        unsigned char *c;

        fp = rcu_dereference_bh(lb_priv->fp);
        if (unlikely(!fp))
                return 0;
        lhash = bpf_prog_run(fp, skb);
        c = (char *) &lhash;
        return c[0] ^ c[1] ^ c[2] ^ c[3];
}

But the point is that the return value is determined by the prog type
(or the context).

Thanks

>
> --
> MST
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-02  8:15                                       ` Jason Wang
@ 2023-03-02  8:41                                         ` Michael S. Tsirkin
  2023-03-02  8:59                                           ` Jason Wang
  0 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-03-02  8:41 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Thu, Mar 02, 2023 at 04:15:39PM +0800, Jason Wang wrote:
> On Thu, Mar 2, 2023 at 4:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote:
> > > Kernel had already used the eBPF program for hashing, classifying
> > > various types of eBPF program other than XDP/socket filter
> > > (pass/drop).
> > >
> > > Thanks
> >
> > where is it used for hashing?
> 
> I can see it is used by team/lb:
> 
> static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv,
>                                     struct sk_buff *skb)
> {
>         struct bpf_prog *fp;
>         uint32_t lhash;
>         unsigned char *c;
> 
>         fp = rcu_dereference_bh(lb_priv->fp);
>         if (unlikely(!fp))
>                 return 0;
>         lhash = bpf_prog_run(fp, skb);
>         c = (char *) &lhash;
>         return c[0] ^ c[1] ^ c[2] ^ c[3];
> }
> 
> But the point is that the return value is determined by the prog type
> (or the context).
> 
> Thanks

OK so assuming we do this, how will users program this exactly?
Given this is not standard, which tools will be used to attach such
a program to the device?


> >
> > --
> > MST
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-02  8:41                                         ` Michael S. Tsirkin
@ 2023-03-02  8:59                                           ` Jason Wang
  2023-03-02  9:46                                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Jason Wang @ 2023-03-02  8:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Thu, Mar 2, 2023 at 4:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Mar 02, 2023 at 04:15:39PM +0800, Jason Wang wrote:
> > On Thu, Mar 2, 2023 at 4:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote:
> > > > Kernel had already used the eBPF program for hashing, classifying
> > > > various types of eBPF program other than XDP/socket filter
> > > > (pass/drop).
> > > >
> > > > Thanks
> > >
> > > where is it used for hashing?
> >
> > I can see it is used by team/lb:
> >
> > static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv,
> >                                     struct sk_buff *skb)
> > {
> >         struct bpf_prog *fp;
> >         uint32_t lhash;
> >         unsigned char *c;
> >
> >         fp = rcu_dereference_bh(lb_priv->fp);
> >         if (unlikely(!fp))
> >                 return 0;
> >         lhash = bpf_prog_run(fp, skb);
> >         c = (char *) &lhash;
> >         return c[0] ^ c[1] ^ c[2] ^ c[3];
> > }
> >
> > But the point is that the return value is determined by the prog type
> > (or the context).
> >
> > Thanks
>
> OK so assuming we do this, how will users program this exactly?

For DPDK users, it could be integrated with the PMD.
For kernel ueres, it probably requires a virtio specific netlink or char device.

> Given this is not standard, which tools will be used to attach such
> a program to the device?

vDPA tool?

Thanks

>
>
> > >
> > > --
> > > MST
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-02  8:59                                           ` Jason Wang
@ 2023-03-02  9:46                                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-03-02  9:46 UTC (permalink / raw)
  To: Jason Wang
  Cc: Heng Qi, virtio-comment, virtio-dev, Parav Pandit,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Thu, Mar 02, 2023 at 04:59:46PM +0800, Jason Wang wrote:
> On Thu, Mar 2, 2023 at 4:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Mar 02, 2023 at 04:15:39PM +0800, Jason Wang wrote:
> > > On Thu, Mar 2, 2023 at 4:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Thu, Mar 02, 2023 at 03:57:10PM +0800, Jason Wang wrote:
> > > > > Kernel had already used the eBPF program for hashing, classifying
> > > > > various types of eBPF program other than XDP/socket filter
> > > > > (pass/drop).
> > > > >
> > > > > Thanks
> > > >
> > > > where is it used for hashing?
> > >
> > > I can see it is used by team/lb:
> > >
> > > static unsigned int lb_get_skb_hash(struct lb_priv *lb_priv,
> > >                                     struct sk_buff *skb)
> > > {
> > >         struct bpf_prog *fp;
> > >         uint32_t lhash;
> > >         unsigned char *c;
> > >
> > >         fp = rcu_dereference_bh(lb_priv->fp);
> > >         if (unlikely(!fp))
> > >                 return 0;
> > >         lhash = bpf_prog_run(fp, skb);
> > >         c = (char *) &lhash;
> > >         return c[0] ^ c[1] ^ c[2] ^ c[3];
> > > }
> > >
> > > But the point is that the return value is determined by the prog type
> > > (or the context).
> > >
> > > Thanks
> >
> > OK so assuming we do this, how will users program this exactly?
> 
> For DPDK users, it could be integrated with the PMD.
> For kernel ueres, it probably requires a virtio specific netlink or char device.
> 
> > Given this is not standard, which tools will be used to attach such
> > a program to the device?
> 
> vDPA tool?
> 
> Thanks

Ugh.  I think I'd like ethtool to work.




> >
> >
> > > >
> > > > --
> > > > MST
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-01  2:56   ` Heng Qi
  2023-03-01  2:56     ` Heng Qi
@ 2023-03-08 14:39     ` Michael S. Tsirkin
  2023-03-09  4:55       ` Heng Qi
  1 sibling, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-03-08 14:39 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> 
> 
> 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > If the tunnel is used to encapsulate the packets, the hash calculated
> > > using the outer header of the receive packets is always fixed for the
> > > same flow packets, i.e. they will be steered to the same receive queue.
> > Wait a second. How is this true? Does not everyone stick the
> > inner header hash in the outer source port to solve this?
> 
> Yes, you are right. That's what we did before the inner header hash, but it
> has a performance penalty, which I'll explain below.
> 
> > For example geneve spec says:
> > 
> >     it is necessary for entropy from encapsulated packets to be
> >     exposed in the tunnel header.  The most common technique for this is
> >     to use the UDP source port
> 
> The end point of the tunnel called the gateway (with DPDK on top of it).
> 
> 1. When there is no inner header hash, entropy can be inserted into the udp
> src port of the outer header of the tunnel,
> and then the tunnel packet is handed over to the host. The host needs to
> take out a part of the CPUs to parse the outer headers (but not drop them)
> to calculate the inner hash for the inner payloads,
> and then use the inner
> hash to forward them to another part of the CPUs that are responsible for
> processing.

I don't get this part. Leave inner hashes to the guest inside the
tunnel, why is your host doing this?

> 1). During this process, the CPUs on the host is divided into two parts, one
> part is used as a forwarding node to parse the outer header,
>      and the CPU utilization is low. Another part handles packets.

Some overhead is clearly involved in *sending* packets -
to calculate the hash and stick it in the port number.
This is, however, a separate problem and if you want to
solve it then my suggestion would be to teach the *transmit*
side about GRE offloads, so it can fill the source port in the card.

> 2). The entropy of the source udp src port is not enough, that is, the queue
> is not widely distributed.

how isn't it enough? 16 bit is enough to cover all vqs ...

> 2. When there is an inner header hash, the gateway will directly help parse
> the outer header, and use the inner 5 tuples to calculate the inner hash.
> The tunneled packet is then handed over to the host.
> 1) All the CPUs of the host are used to process data packets, and there is
> no need to use some CPUs to forward and parse the outer header.

You really have to parse the outer header anyway,
otherwise there's no tunneling.
Unless you want to teach virtio to implement tunneling
in hardware, which is something I'd find it easier to
get behind.

> 2) The entropy of the original quintuple is sufficient, and the queue is
> widely distributed.

It's exactly the same entropy, why would it be better? In fact you
are taking out the outer hash entropy making things worse.

> 
> Thanks.
> > 
> > same goes for vxlan did not check further.
> > 
> > so what is the problem?  and which tunnel types actually suffer from the
> > problem?
> > 
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-08 14:39     ` [virtio-dev] Re: [virtio-comment] " Michael S. Tsirkin
@ 2023-03-09  4:55       ` Heng Qi
  2023-03-09 19:36         ` Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-03-09  4:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
>>
>> 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
>>> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>>>> If the tunnel is used to encapsulate the packets, the hash calculated
>>>> using the outer header of the receive packets is always fixed for the
>>>> same flow packets, i.e. they will be steered to the same receive queue.
>>> Wait a second. How is this true? Does not everyone stick the
>>> inner header hash in the outer source port to solve this?
>> Yes, you are right. That's what we did before the inner header hash, but it
>> has a performance penalty, which I'll explain below.
>>
>>> For example geneve spec says:
>>>
>>>      it is necessary for entropy from encapsulated packets to be
>>>      exposed in the tunnel header.  The most common technique for this is
>>>      to use the UDP source port
>> The end point of the tunnel called the gateway (with DPDK on top of it).
>>
>> 1. When there is no inner header hash, entropy can be inserted into the udp
>> src port of the outer header of the tunnel,
>> and then the tunnel packet is handed over to the host. The host needs to
>> take out a part of the CPUs to parse the outer headers (but not drop them)
>> to calculate the inner hash for the inner payloads,
>> and then use the inner
>> hash to forward them to another part of the CPUs that are responsible for
>> processing.
> I don't get this part. Leave inner hashes to the guest inside the
> tunnel, why is your host doing this?

Assuming that the same flow includes a unidirectional flow a->b, or a 
bidirectional flow a->b and b->a,
such flow may be out of order when processed by the gateway(DPDK):

1. In unidirectional mode, if the same flow is switched to another 
gateway for some reason, resulting in different outer IP address,
     then this flow may be processed by different CPUs after reaching 
the host if there is no inner hash. So after the host receives the
     flow, first use the forwarding CPUs to parse the inner hash, and 
then use the hash to ensure that the flow is processed by the
     same CPU.
2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow 
may go to gateway 2. In order to ensure that the same flow is
     processed by the same CPU, we still need the forwarding CPUs to 
parse the real inner hash(here, the hash key needs to be replaced with a 
symmetric hash key).

>
>> 1). During this process, the CPUs on the host is divided into two parts, one
>> part is used as a forwarding node to parse the outer header,
>>       and the CPU utilization is low. Another part handles packets.
> Some overhead is clearly involved in *sending* packets -
> to calculate the hash and stick it in the port number.
> This is, however, a separate problem and if you want to
> solve it then my suggestion would be to teach the *transmit*
> side about GRE offloads, so it can fill the source port in the card.
>
>> 2). The entropy of the source udp src port is not enough, that is, the queue
>> is not widely distributed.
> how isn't it enough? 16 bit is enough to cover all vqs ...

A 5-tuple brings more entropy than a single port, doesn't it? In fact, 
the inner hash of the physical network card used by
the business team is indeed better than the udp port number of the outer 
header we modify now, but they did not give me the data.

>> 2. When there is an inner header hash, the gateway will directly help parse
>> the outer header, and use the inner 5 tuples to calculate the inner hash.
>> The tunneled packet is then handed over to the host.
>> 1) All the CPUs of the host are used to process data packets, and there is
>> no need to use some CPUs to forward and parse the outer header.
> You really have to parse the outer header anyway,
> otherwise there's no tunneling.
> Unless you want to teach virtio to implement tunneling
> in hardware, which is something I'd find it easier to
> get behind.

There is no need to parse the outer header twice, because we use shared 
memory.

>> 2) The entropy of the original quintuple is sufficient, and the queue is
>> widely distributed.
> It's exactly the same entropy, why would it be better? In fact you
> are taking out the outer hash entropy making things worse.

I don't get the point, why the entropy of the inner 5-tuple and the 
outer tunnel header is the same,
multiple streams have the same outer header.

Thanks.
>
>> Thanks.
>>> same goes for vxlan did not check further.
>>>
>>> so what is the problem?  and which tunnel types actually suffer from the
>>> problem?
>>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-02-28 11:16 ` Michael S. Tsirkin
                     ` (2 preceding siblings ...)
  2023-03-01  3:30   ` [virtio-comment] " Heng Qi
@ 2023-03-09 12:28   ` Heng Qi
  3 siblings, 0 replies; 105+ messages in thread
From: Heng Qi @ 2023-03-09 12:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>> If the tunnel is used to encapsulate the packets, the hash calculated
>> using the outer header of the receive packets is always fixed for the
>> same flow packets, i.e. they will be steered to the same receive queue.
> Wait a second. How is this true? Does not everyone stick the
> inner header hash in the outer source port to solve this?
> For example geneve spec says:
>
>     it is necessary for entropy from encapsulated packets to be
>     exposed in the tunnel header.  The most common technique for this is
>     to use the UDP source port
>
> same goes for vxlan did not check further.
>
> so what is the problem?  and which tunnel types actually suffer from the
> problem?
>

Inner hash can at least hash tunnel flows without outer transport 
headers like GRE to multiple queues,
which is beneficial to us.

For tunnel flows with outer transport headers like VXLAN, although they 
can hash flows to different queues
by setting different outer udp port, this does not conflict with inner 
hash. Inner hashing can also be used for this purpose.

For the same flow, packets in the receiving and sending directions may 
pass through different tunnels respectively, which cause
the same flow to be hashed to different queues. In this case, we have to 
calculate a symmetric hash (can be called an inner symmetric hash, which 
is a type of inner hash.)
through the inner header, so that the same flow can be hashed to the 
same queue.

Symmetric hashing can ignore the order of the 5-tuples to calculate the 
hash, that is, the hash values ​​calculated by (a1, a2, a3, a4) and (a2, 
a1, a4, a3) respectively are the same.

Thanks.



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-09  4:55       ` Heng Qi
@ 2023-03-09 19:36         ` Michael S. Tsirkin
  2023-03-11  3:23           ` Heng Qi
  0 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-03-09 19:36 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > 
> > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > If the tunnel is used to encapsulate the packets, the hash calculated
> > > > > using the outer header of the receive packets is always fixed for the
> > > > > same flow packets, i.e. they will be steered to the same receive queue.
> > > > Wait a second. How is this true? Does not everyone stick the
> > > > inner header hash in the outer source port to solve this?
> > > Yes, you are right. That's what we did before the inner header hash, but it
> > > has a performance penalty, which I'll explain below.
> > > 
> > > > For example geneve spec says:
> > > > 
> > > >      it is necessary for entropy from encapsulated packets to be
> > > >      exposed in the tunnel header.  The most common technique for this is
> > > >      to use the UDP source port
> > > The end point of the tunnel called the gateway (with DPDK on top of it).
> > > 
> > > 1. When there is no inner header hash, entropy can be inserted into the udp
> > > src port of the outer header of the tunnel,
> > > and then the tunnel packet is handed over to the host. The host needs to
> > > take out a part of the CPUs to parse the outer headers (but not drop them)
> > > to calculate the inner hash for the inner payloads,
> > > and then use the inner
> > > hash to forward them to another part of the CPUs that are responsible for
> > > processing.
> > I don't get this part. Leave inner hashes to the guest inside the
> > tunnel, why is your host doing this?
> 
> Assuming that the same flow includes a unidirectional flow a->b, or a
> bidirectional flow a->b and b->a,
> such flow may be out of order when processed by the gateway(DPDK):
> 
> 1. In unidirectional mode, if the same flow is switched to another gateway
> for some reason, resulting in different outer IP address,
>     then this flow may be processed by different CPUs after reaching the
> host if there is no inner hash. So after the host receives the
>     flow, first use the forwarding CPUs to parse the inner hash, and then
> use the hash to ensure that the flow is processed by the
>     same CPU.
> 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
> go to gateway 2. In order to ensure that the same flow is
>     processed by the same CPU, we still need the forwarding CPUs to parse
> the real inner hash(here, the hash key needs to be replaced with a symmetric
> hash key).

Oh intersting. What are those gateways, how come there's expectation
that you can change their addresses and topology
completely seamlessly without any reordering whatsoever?
Isn't network topology change kind of guaranteed to change ordering
sometimes?


> > 
> > > 1). During this process, the CPUs on the host is divided into two parts, one
> > > part is used as a forwarding node to parse the outer header,
> > >       and the CPU utilization is low. Another part handles packets.
> > Some overhead is clearly involved in *sending* packets -
> > to calculate the hash and stick it in the port number.
> > This is, however, a separate problem and if you want to
> > solve it then my suggestion would be to teach the *transmit*
> > side about GRE offloads, so it can fill the source port in the card.
> > 
> > > 2). The entropy of the source udp src port is not enough, that is, the queue
> > > is not widely distributed.
> > how isn't it enough? 16 bit is enough to cover all vqs ...
> 
> A 5-tuple brings more entropy than a single port, doesn't it?

But you don't need more for RSS, the indirection table is not
that large.

> In fact, the
> inner hash of the physical network card used by
> the business team is indeed better than the udp port number of the outer
> header we modify now, but they did not give me the data.

Admittedly, out hash value is 32 bit.

> > > 2. When there is an inner header hash, the gateway will directly help parse
> > > the outer header, and use the inner 5 tuples to calculate the inner hash.
> > > The tunneled packet is then handed over to the host.
> > > 1) All the CPUs of the host are used to process data packets, and there is
> > > no need to use some CPUs to forward and parse the outer header.
> > You really have to parse the outer header anyway,
> > otherwise there's no tunneling.
> > Unless you want to teach virtio to implement tunneling
> > in hardware, which is something I'd find it easier to
> > get behind.
> 
> There is no need to parse the outer header twice, because we use shared
> memory.

shared with what? you need the outer header to identify the tunnel.

> > > 2) The entropy of the original quintuple is sufficient, and the queue is
> > > widely distributed.
> > It's exactly the same entropy, why would it be better? In fact you
> > are taking out the outer hash entropy making things worse.
> 
> I don't get the point, why the entropy of the inner 5-tuple and the outer
> tunnel header is the same,
> multiple streams have the same outer header.
> 
> Thanks.

well our hash is 32 bit. source port is just 16 bit.
so yes it's more entropy but RSS can't use more than 16 bit.
why do you need so many? you have more than 64k CPUs to offload to?


> > 
> > > Thanks.
> > > > same goes for vxlan did not check further.
> > > > 
> > > > so what is the problem?  and which tunnel types actually suffer from the
> > > > problem?
> > > > 
> > > 
> > > This publicly archived list offers a means to provide input to the
> > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > 
> > > In order to verify user consent to the Feedback License terms and
> > > to minimize spam in the list archive, subscription is required
> > > before posting.
> > > 
> > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > List help: virtio-comment-help@lists.oasis-open.org
> > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > Committee: https://www.oasis-open.org/committees/virtio/
> > > Join OASIS: https://www.oasis-open.org/join/
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-09 19:36         ` Michael S. Tsirkin
@ 2023-03-11  3:23           ` Heng Qi
  2023-03-15 11:58             ` [virtio-dev] Re: [virtio-comment] " Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-03-11  3:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo




在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
> On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
>>
>> 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
>>> On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
>>>> 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
>>>>> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>>>>>> If the tunnel is used to encapsulate the packets, the hash calculated
>>>>>> using the outer header of the receive packets is always fixed for the
>>>>>> same flow packets, i.e. they will be steered to the same receive queue.
>>>>> Wait a second. How is this true? Does not everyone stick the
>>>>> inner header hash in the outer source port to solve this?
>>>> Yes, you are right. That's what we did before the inner header hash, but it
>>>> has a performance penalty, which I'll explain below.
>>>>
>>>>> For example geneve spec says:
>>>>>
>>>>>       it is necessary for entropy from encapsulated packets to be
>>>>>       exposed in the tunnel header.  The most common technique for this is
>>>>>       to use the UDP source port
>>>> The end point of the tunnel called the gateway (with DPDK on top of it).
>>>>
>>>> 1. When there is no inner header hash, entropy can be inserted into the udp
>>>> src port of the outer header of the tunnel,
>>>> and then the tunnel packet is handed over to the host. The host needs to
>>>> take out a part of the CPUs to parse the outer headers (but not drop them)
>>>> to calculate the inner hash for the inner payloads,
>>>> and then use the inner
>>>> hash to forward them to another part of the CPUs that are responsible for
>>>> processing.
>>> I don't get this part. Leave inner hashes to the guest inside the
>>> tunnel, why is your host doing this?


Let's simplify some details and take a fresh look at two different 
scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).

1. In Scenario1, we can improve the processing performance of the same 
flow by implementing inner symmetric hashing.

This is because even though client1 and client2 communicate 
bidirectionally through the same flow, their data may pass

through and be encapsulated by different tunnels, resulting in the same 
flow being hashed to different queues and processed by different CPUs.

To ensure consistency and optimized processing, we need to parse out the 
inner header and compute a symmetric hash on it using a special rss key.

Sorry for not mentioning the inner symmetric hash before, in order to 
prevent the introduction of more concepts, but it is indeed a kind of 
inner hash.

2. In Scenario2 with GRE, the lack of outer transport headers means that 
flows between multiple communication pairs encapsulated by the same tunnel

will all be hashed to the same queue. To address this, we need to 
implement inner hashing to improve the performance of RSS. By parsing 
and calculating

the inner hash, different flows can be hashed to different queues.

Thanks.



>> Assuming that the same flow includes a unidirectional flow a->b, or a
>> bidirectional flow a->b and b->a,
>> such flow may be out of order when processed by the gateway(DPDK):
>>
>> 1. In unidirectional mode, if the same flow is switched to another gateway
>> for some reason, resulting in different outer IP address,
>>      then this flow may be processed by different CPUs after reaching the
>> host if there is no inner hash. So after the host receives the
>>      flow, first use the forwarding CPUs to parse the inner hash, and then
>> use the hash to ensure that the flow is processed by the
>>      same CPU.
>> 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
>> go to gateway 2. In order to ensure that the same flow is
>>      processed by the same CPU, we still need the forwarding CPUs to parse
>> the real inner hash(here, the hash key needs to be replaced with a symmetric
>> hash key).
> Oh intersting. What are those gateways, how come there's expectation
> that you can change their addresses and topology
> completely seamlessly without any reordering whatsoever?
> Isn't network topology change kind of guaranteed to change ordering
> sometimes?
>
>
>>>> 1). During this process, the CPUs on the host is divided into two parts, one
>>>> part is used as a forwarding node to parse the outer header,
>>>>        and the CPU utilization is low. Another part handles packets.
>>> Some overhead is clearly involved in *sending* packets -
>>> to calculate the hash and stick it in the port number.
>>> This is, however, a separate problem and if you want to
>>> solve it then my suggestion would be to teach the *transmit*
>>> side about GRE offloads, so it can fill the source port in the card.
>>>
>>>> 2). The entropy of the source udp src port is not enough, that is, the queue
>>>> is not widely distributed.
>>> how isn't it enough? 16 bit is enough to cover all vqs ...
>> A 5-tuple brings more entropy than a single port, doesn't it?
> But you don't need more for RSS, the indirection table is not
> that large.
>
>> In fact, the
>> inner hash of the physical network card used by
>> the business team is indeed better than the udp port number of the outer
>> header we modify now, but they did not give me the data.
> Admittedly, out hash value is 32 bit.
>
>>>> 2. When there is an inner header hash, the gateway will directly help parse
>>>> the outer header, and use the inner 5 tuples to calculate the inner hash.
>>>> The tunneled packet is then handed over to the host.
>>>> 1) All the CPUs of the host are used to process data packets, and there is
>>>> no need to use some CPUs to forward and parse the outer header.
>>> You really have to parse the outer header anyway,
>>> otherwise there's no tunneling.
>>> Unless you want to teach virtio to implement tunneling
>>> in hardware, which is something I'd find it easier to
>>> get behind.
>> There is no need to parse the outer header twice, because we use shared
>> memory.
> shared with what? you need the outer header to identify the tunnel.
>
>>>> 2) The entropy of the original quintuple is sufficient, and the queue is
>>>> widely distributed.
>>> It's exactly the same entropy, why would it be better? In fact you
>>> are taking out the outer hash entropy making things worse.
>> I don't get the point, why the entropy of the inner 5-tuple and the outer
>> tunnel header is the same,
>> multiple streams have the same outer header.
>>
>> Thanks.
> well our hash is 32 bit. source port is just 16 bit.
> so yes it's more entropy but RSS can't use more than 16 bit.
> why do you need so many? you have more than 64k CPUs to offload to?
>
>
>>>> Thanks.
>>>>> same goes for vxlan did not check further.
>>>>>
>>>>> so what is the problem?  and which tunnel types actually suffer from the
>>>>> problem?
>>>>>
>>>> This publicly archived list offers a means to provide input to the
>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>
>>>> In order to verify user consent to the Feedback License terms and
>>>> to minimize spam in the list archive, subscription is required
>>>> before posting.
>>>>
>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>> Join OASIS: https://www.oasis-open.org/join/
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-11  3:23           ` Heng Qi
@ 2023-03-15 11:58             ` Michael S. Tsirkin
  2023-03-15 12:55               ` Heng Qi
  0 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-03-15 11:58 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
> 
> 
> 
> 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
> > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> > > 
> > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > If the tunnel is used to encapsulate the packets, the hash calculated
> > > > > > > using the outer header of the receive packets is always fixed for the
> > > > > > > same flow packets, i.e. they will be steered to the same receive queue.
> > > > > > Wait a second. How is this true? Does not everyone stick the
> > > > > > inner header hash in the outer source port to solve this?
> > > > > Yes, you are right. That's what we did before the inner header hash, but it
> > > > > has a performance penalty, which I'll explain below.
> > > > > 
> > > > > > For example geneve spec says:
> > > > > > 
> > > > > >       it is necessary for entropy from encapsulated packets to be
> > > > > >       exposed in the tunnel header.  The most common technique for this is
> > > > > >       to use the UDP source port
> > > > > The end point of the tunnel called the gateway (with DPDK on top of it).
> > > > > 
> > > > > 1. When there is no inner header hash, entropy can be inserted into the udp
> > > > > src port of the outer header of the tunnel,
> > > > > and then the tunnel packet is handed over to the host. The host needs to
> > > > > take out a part of the CPUs to parse the outer headers (but not drop them)
> > > > > to calculate the inner hash for the inner payloads,
> > > > > and then use the inner
> > > > > hash to forward them to another part of the CPUs that are responsible for
> > > > > processing.
> > > > I don't get this part. Leave inner hashes to the guest inside the
> > > > tunnel, why is your host doing this?
> 
> 
> Let's simplify some details and take a fresh look at two different
> scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
> 
> 1. In Scenario1, we can improve the processing performance of the same flow
> by implementing inner symmetric hashing.
> 
> This is because even though client1 and client2 communicate bidirectionally
> through the same flow, their data may pass
> 
> through and be encapsulated by different tunnels, resulting in the same flow
> being hashed to different queues and processed by different CPUs.
> 
> To ensure consistency and optimized processing, we need to parse out the
> inner header and compute a symmetric hash on it using a special rss key.
> 
> Sorry for not mentioning the inner symmetric hash before, in order to
> prevent the introduction of more concepts, but it is indeed a kind of inner
> hash.

If parts of a flow go through different tunnels won't this cause
reordering at the network level? Why is it so important to prevent it at
the nic then?  Or, since you are stressing symmetric hash, are you
talking about TX and RX side going through different tunnels?


> 2. In Scenario2 with GRE, the lack of outer transport headers means that
> flows between multiple communication pairs encapsulated by the same tunnel
> 
> will all be hashed to the same queue. To address this, we need to implement
> inner hashing to improve the performance of RSS. By parsing and calculating
> 
> the inner hash, different flows can be hashed to different queues.
> 
> Thanks.
> 
> 

Well 2 is at least inexact, there's flowID there. It's just 8 bit
so not sufficient if there are more than 512 queues. Still 512 queues
is quite a lot. Are you trying to solve for configurations with
more than 512 queues then?


> > > Assuming that the same flow includes a unidirectional flow a->b, or a
> > > bidirectional flow a->b and b->a,
> > > such flow may be out of order when processed by the gateway(DPDK):
> > > 
> > > 1. In unidirectional mode, if the same flow is switched to another gateway
> > > for some reason, resulting in different outer IP address,
> > >      then this flow may be processed by different CPUs after reaching the
> > > host if there is no inner hash. So after the host receives the
> > >      flow, first use the forwarding CPUs to parse the inner hash, and then
> > > use the hash to ensure that the flow is processed by the
> > >      same CPU.
> > > 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
> > > go to gateway 2. In order to ensure that the same flow is
> > >      processed by the same CPU, we still need the forwarding CPUs to parse
> > > the real inner hash(here, the hash key needs to be replaced with a symmetric
> > > hash key).
> > Oh intersting. What are those gateways, how come there's expectation
> > that you can change their addresses and topology
> > completely seamlessly without any reordering whatsoever?
> > Isn't network topology change kind of guaranteed to change ordering
> > sometimes?
> > 
> > 
> > > > > 1). During this process, the CPUs on the host is divided into two parts, one
> > > > > part is used as a forwarding node to parse the outer header,
> > > > >        and the CPU utilization is low. Another part handles packets.
> > > > Some overhead is clearly involved in *sending* packets -
> > > > to calculate the hash and stick it in the port number.
> > > > This is, however, a separate problem and if you want to
> > > > solve it then my suggestion would be to teach the *transmit*
> > > > side about GRE offloads, so it can fill the source port in the card.
> > > > 
> > > > > 2). The entropy of the source udp src port is not enough, that is, the queue
> > > > > is not widely distributed.
> > > > how isn't it enough? 16 bit is enough to cover all vqs ...
> > > A 5-tuple brings more entropy than a single port, doesn't it?
> > But you don't need more for RSS, the indirection table is not
> > that large.
> > 
> > > In fact, the
> > > inner hash of the physical network card used by
> > > the business team is indeed better than the udp port number of the outer
> > > header we modify now, but they did not give me the data.
> > Admittedly, out hash value is 32 bit.
> > 
> > > > > 2. When there is an inner header hash, the gateway will directly help parse
> > > > > the outer header, and use the inner 5 tuples to calculate the inner hash.
> > > > > The tunneled packet is then handed over to the host.
> > > > > 1) All the CPUs of the host are used to process data packets, and there is
> > > > > no need to use some CPUs to forward and parse the outer header.
> > > > You really have to parse the outer header anyway,
> > > > otherwise there's no tunneling.
> > > > Unless you want to teach virtio to implement tunneling
> > > > in hardware, which is something I'd find it easier to
> > > > get behind.
> > > There is no need to parse the outer header twice, because we use shared
> > > memory.
> > shared with what? you need the outer header to identify the tunnel.
> > 
> > > > > 2) The entropy of the original quintuple is sufficient, and the queue is
> > > > > widely distributed.
> > > > It's exactly the same entropy, why would it be better? In fact you
> > > > are taking out the outer hash entropy making things worse.
> > > I don't get the point, why the entropy of the inner 5-tuple and the outer
> > > tunnel header is the same,
> > > multiple streams have the same outer header.
> > > 
> > > Thanks.
> > well our hash is 32 bit. source port is just 16 bit.
> > so yes it's more entropy but RSS can't use more than 16 bit.
> > why do you need so many? you have more than 64k CPUs to offload to?
> > 
> > 
> > > > > Thanks.
> > > > > > same goes for vxlan did not check further.
> > > > > > 
> > > > > > so what is the problem?  and which tunnel types actually suffer from the
> > > > > > problem?
> > > > > > 
> > > > > This publicly archived list offers a means to provide input to the
> > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > > 
> > > > > In order to verify user consent to the Feedback License terms and
> > > > > to minimize spam in the list archive, subscription is required
> > > > > before posting.
> > > > > 
> > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-15 11:58             ` [virtio-dev] Re: [virtio-comment] " Michael S. Tsirkin
@ 2023-03-15 12:55               ` Heng Qi
  2023-03-15 14:57                 ` Michael S. Tsirkin
  2023-03-20 19:48                 ` Michael S. Tsirkin
  0 siblings, 2 replies; 105+ messages in thread
From: Heng Qi @ 2023-03-15 12:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:
> On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
>>
>>
>> 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
>>> On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
>>>> 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
>>>>> On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
>>>>>> 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
>>>>>>> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>>>>>>>> If the tunnel is used to encapsulate the packets, the hash calculated
>>>>>>>> using the outer header of the receive packets is always fixed for the
>>>>>>>> same flow packets, i.e. they will be steered to the same receive queue.
>>>>>>> Wait a second. How is this true? Does not everyone stick the
>>>>>>> inner header hash in the outer source port to solve this?
>>>>>> Yes, you are right. That's what we did before the inner header hash, but it
>>>>>> has a performance penalty, which I'll explain below.
>>>>>>
>>>>>>> For example geneve spec says:
>>>>>>>
>>>>>>>        it is necessary for entropy from encapsulated packets to be
>>>>>>>        exposed in the tunnel header.  The most common technique for this is
>>>>>>>        to use the UDP source port
>>>>>> The end point of the tunnel called the gateway (with DPDK on top of it).
>>>>>>
>>>>>> 1. When there is no inner header hash, entropy can be inserted into the udp
>>>>>> src port of the outer header of the tunnel,
>>>>>> and then the tunnel packet is handed over to the host. The host needs to
>>>>>> take out a part of the CPUs to parse the outer headers (but not drop them)
>>>>>> to calculate the inner hash for the inner payloads,
>>>>>> and then use the inner
>>>>>> hash to forward them to another part of the CPUs that are responsible for
>>>>>> processing.
>>>>> I don't get this part. Leave inner hashes to the guest inside the
>>>>> tunnel, why is your host doing this?
>>
>> Let's simplify some details and take a fresh look at two different
>> scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
>>
>> 1. In Scenario1, we can improve the processing performance of the same flow
>> by implementing inner symmetric hashing.
>>
>> This is because even though client1 and client2 communicate bidirectionally
>> through the same flow, their data may pass
>>
>> through and be encapsulated by different tunnels, resulting in the same flow
>> being hashed to different queues and processed by different CPUs.
>>
>> To ensure consistency and optimized processing, we need to parse out the
>> inner header and compute a symmetric hash on it using a special rss key.
>>
>> Sorry for not mentioning the inner symmetric hash before, in order to
>> prevent the introduction of more concepts, but it is indeed a kind of inner
>> hash.
> If parts of a flow go through different tunnels won't this cause
> reordering at the network level? Why is it so important to prevent it at
> the nic then?  Or, since you are stressing symmetric hash, are you
> talking about TX and RX side going through different tunnels?

Yes, the directions client1->client2 and client2->client1 may go through 
different tunnels.
Using inner symmetric hashing can satisfy the same CPU to process two 
directions of the same flow to improve performance.

>
>
>> 2. In Scenario2 with GRE, the lack of outer transport headers means that
>> flows between multiple communication pairs encapsulated by the same tunnel
>>
>> will all be hashed to the same queue. To address this, we need to implement
>> inner hashing to improve the performance of RSS. By parsing and calculating
>>
>> the inner hash, different flows can be hashed to different queues.
>>
>> Thanks.
>>
>>
> Well 2 is at least inexact, there's flowID there. It's just 8 bit

We use the most basic GRE header fields (not NVGRE), not even optional 
fields.
There is also no flow id in the GRE header, should you be referring to 
NVGRE?

Thanks.

> so not sufficient if there are more than 512 queues. Still 512 queues
> is quite a lot. Are you trying to solve for configurations with
> more than 512 queues then?
>
>
>>>> Assuming that the same flow includes a unidirectional flow a->b, or a
>>>> bidirectional flow a->b and b->a,
>>>> such flow may be out of order when processed by the gateway(DPDK):
>>>>
>>>> 1. In unidirectional mode, if the same flow is switched to another gateway
>>>> for some reason, resulting in different outer IP address,
>>>>       then this flow may be processed by different CPUs after reaching the
>>>> host if there is no inner hash. So after the host receives the
>>>>       flow, first use the forwarding CPUs to parse the inner hash, and then
>>>> use the hash to ensure that the flow is processed by the
>>>>       same CPU.
>>>> 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
>>>> go to gateway 2. In order to ensure that the same flow is
>>>>       processed by the same CPU, we still need the forwarding CPUs to parse
>>>> the real inner hash(here, the hash key needs to be replaced with a symmetric
>>>> hash key).
>>> Oh intersting. What are those gateways, how come there's expectation
>>> that you can change their addresses and topology
>>> completely seamlessly without any reordering whatsoever?
>>> Isn't network topology change kind of guaranteed to change ordering
>>> sometimes?
>>>
>>>
>>>>>> 1). During this process, the CPUs on the host is divided into two parts, one
>>>>>> part is used as a forwarding node to parse the outer header,
>>>>>>         and the CPU utilization is low. Another part handles packets.
>>>>> Some overhead is clearly involved in *sending* packets -
>>>>> to calculate the hash and stick it in the port number.
>>>>> This is, however, a separate problem and if you want to
>>>>> solve it then my suggestion would be to teach the *transmit*
>>>>> side about GRE offloads, so it can fill the source port in the card.
>>>>>
>>>>>> 2). The entropy of the source udp src port is not enough, that is, the queue
>>>>>> is not widely distributed.
>>>>> how isn't it enough? 16 bit is enough to cover all vqs ...
>>>> A 5-tuple brings more entropy than a single port, doesn't it?
>>> But you don't need more for RSS, the indirection table is not
>>> that large.
>>>
>>>> In fact, the
>>>> inner hash of the physical network card used by
>>>> the business team is indeed better than the udp port number of the outer
>>>> header we modify now, but they did not give me the data.
>>> Admittedly, out hash value is 32 bit.
>>>
>>>>>> 2. When there is an inner header hash, the gateway will directly help parse
>>>>>> the outer header, and use the inner 5 tuples to calculate the inner hash.
>>>>>> The tunneled packet is then handed over to the host.
>>>>>> 1) All the CPUs of the host are used to process data packets, and there is
>>>>>> no need to use some CPUs to forward and parse the outer header.
>>>>> You really have to parse the outer header anyway,
>>>>> otherwise there's no tunneling.
>>>>> Unless you want to teach virtio to implement tunneling
>>>>> in hardware, which is something I'd find it easier to
>>>>> get behind.
>>>> There is no need to parse the outer header twice, because we use shared
>>>> memory.
>>> shared with what? you need the outer header to identify the tunnel.
>>>
>>>>>> 2) The entropy of the original quintuple is sufficient, and the queue is
>>>>>> widely distributed.
>>>>> It's exactly the same entropy, why would it be better? In fact you
>>>>> are taking out the outer hash entropy making things worse.
>>>> I don't get the point, why the entropy of the inner 5-tuple and the outer
>>>> tunnel header is the same,
>>>> multiple streams have the same outer header.
>>>>
>>>> Thanks.
>>> well our hash is 32 bit. source port is just 16 bit.
>>> so yes it's more entropy but RSS can't use more than 16 bit.
>>> why do you need so many? you have more than 64k CPUs to offload to?
>>>
>>>
>>>>>> Thanks.
>>>>>>> same goes for vxlan did not check further.
>>>>>>>
>>>>>>> so what is the problem?  and which tunnel types actually suffer from the
>>>>>>> problem?
>>>>>>>
>>>>>> This publicly archived list offers a means to provide input to the
>>>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>>>
>>>>>> In order to verify user consent to the Feedback License terms and
>>>>>> to minimize spam in the list archive, subscription is required
>>>>>> before posting.
>>>>>>
>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>>>> Join OASIS: https://www.oasis-open.org/join/
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-15 12:55               ` Heng Qi
@ 2023-03-15 14:57                 ` Michael S. Tsirkin
  2023-03-16 13:17                   ` Heng Qi
  2023-03-20 19:48                 ` Michael S. Tsirkin
  1 sibling, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-03-15 14:57 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:
> > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
> > > 
> > > 
> > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
> > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > > > If the tunnel is used to encapsulate the packets, the hash calculated
> > > > > > > > > using the outer header of the receive packets is always fixed for the
> > > > > > > > > same flow packets, i.e. they will be steered to the same receive queue.
> > > > > > > > Wait a second. How is this true? Does not everyone stick the
> > > > > > > > inner header hash in the outer source port to solve this?
> > > > > > > Yes, you are right. That's what we did before the inner header hash, but it
> > > > > > > has a performance penalty, which I'll explain below.
> > > > > > > 
> > > > > > > > For example geneve spec says:
> > > > > > > > 
> > > > > > > >        it is necessary for entropy from encapsulated packets to be
> > > > > > > >        exposed in the tunnel header.  The most common technique for this is
> > > > > > > >        to use the UDP source port
> > > > > > > The end point of the tunnel called the gateway (with DPDK on top of it).
> > > > > > > 
> > > > > > > 1. When there is no inner header hash, entropy can be inserted into the udp
> > > > > > > src port of the outer header of the tunnel,
> > > > > > > and then the tunnel packet is handed over to the host. The host needs to
> > > > > > > take out a part of the CPUs to parse the outer headers (but not drop them)
> > > > > > > to calculate the inner hash for the inner payloads,
> > > > > > > and then use the inner
> > > > > > > hash to forward them to another part of the CPUs that are responsible for
> > > > > > > processing.
> > > > > > I don't get this part. Leave inner hashes to the guest inside the
> > > > > > tunnel, why is your host doing this?
> > > 
> > > Let's simplify some details and take a fresh look at two different
> > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
> > > 
> > > 1. In Scenario1, we can improve the processing performance of the same flow
> > > by implementing inner symmetric hashing.
> > > 
> > > This is because even though client1 and client2 communicate bidirectionally
> > > through the same flow, their data may pass
> > > 
> > > through and be encapsulated by different tunnels, resulting in the same flow
> > > being hashed to different queues and processed by different CPUs.
> > > 
> > > To ensure consistency and optimized processing, we need to parse out the
> > > inner header and compute a symmetric hash on it using a special rss key.
> > > 
> > > Sorry for not mentioning the inner symmetric hash before, in order to
> > > prevent the introduction of more concepts, but it is indeed a kind of inner
> > > hash.
> > If parts of a flow go through different tunnels won't this cause
> > reordering at the network level? Why is it so important to prevent it at
> > the nic then?  Or, since you are stressing symmetric hash, are you
> > talking about TX and RX side going through different tunnels?
> 
> Yes, the directions client1->client2 and client2->client1 may go through
> different tunnels.
> Using inner symmetric hashing can satisfy the same CPU to process two
> directions of the same flow to improve performance.

Well sure but ... are you just doing forwarding or inner processing too?
If forwarding why do you care about matching TX and RX queues? If e2e
processing can't you just store the incoming hash in the flow and reuse
on TX? This is what Linux is doing...



> > 
> > 
> > > 2. In Scenario2 with GRE, the lack of outer transport headers means that
> > > flows between multiple communication pairs encapsulated by the same tunnel
> > > 
> > > will all be hashed to the same queue. To address this, we need to implement
> > > inner hashing to improve the performance of RSS. By parsing and calculating
> > > 
> > > the inner hash, different flows can be hashed to different queues.
> > > 
> > > Thanks.
> > > 
> > > 
> > Well 2 is at least inexact, there's flowID there. It's just 8 bit
> 
> We use the most basic GRE header fields (not NVGRE), not even optional
> fields.
> There is also no flow id in the GRE header, should you be referring to
> NVGRE?
> 
> Thanks.
> 
> > so not sufficient if there are more than 512 queues. Still 512 queues
> > is quite a lot. Are you trying to solve for configurations with
> > more than 512 queues then?
> > 
> > 
> > > > > Assuming that the same flow includes a unidirectional flow a->b, or a
> > > > > bidirectional flow a->b and b->a,
> > > > > such flow may be out of order when processed by the gateway(DPDK):
> > > > > 
> > > > > 1. In unidirectional mode, if the same flow is switched to another gateway
> > > > > for some reason, resulting in different outer IP address,
> > > > >       then this flow may be processed by different CPUs after reaching the
> > > > > host if there is no inner hash. So after the host receives the
> > > > >       flow, first use the forwarding CPUs to parse the inner hash, and then
> > > > > use the hash to ensure that the flow is processed by the
> > > > >       same CPU.
> > > > > 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
> > > > > go to gateway 2. In order to ensure that the same flow is
> > > > >       processed by the same CPU, we still need the forwarding CPUs to parse
> > > > > the real inner hash(here, the hash key needs to be replaced with a symmetric
> > > > > hash key).
> > > > Oh intersting. What are those gateways, how come there's expectation
> > > > that you can change their addresses and topology
> > > > completely seamlessly without any reordering whatsoever?
> > > > Isn't network topology change kind of guaranteed to change ordering
> > > > sometimes?
> > > > 
> > > > 
> > > > > > > 1). During this process, the CPUs on the host is divided into two parts, one
> > > > > > > part is used as a forwarding node to parse the outer header,
> > > > > > >         and the CPU utilization is low. Another part handles packets.
> > > > > > Some overhead is clearly involved in *sending* packets -
> > > > > > to calculate the hash and stick it in the port number.
> > > > > > This is, however, a separate problem and if you want to
> > > > > > solve it then my suggestion would be to teach the *transmit*
> > > > > > side about GRE offloads, so it can fill the source port in the card.
> > > > > > 
> > > > > > > 2). The entropy of the source udp src port is not enough, that is, the queue
> > > > > > > is not widely distributed.
> > > > > > how isn't it enough? 16 bit is enough to cover all vqs ...
> > > > > A 5-tuple brings more entropy than a single port, doesn't it?
> > > > But you don't need more for RSS, the indirection table is not
> > > > that large.
> > > > 
> > > > > In fact, the
> > > > > inner hash of the physical network card used by
> > > > > the business team is indeed better than the udp port number of the outer
> > > > > header we modify now, but they did not give me the data.
> > > > Admittedly, out hash value is 32 bit.
> > > > 
> > > > > > > 2. When there is an inner header hash, the gateway will directly help parse
> > > > > > > the outer header, and use the inner 5 tuples to calculate the inner hash.
> > > > > > > The tunneled packet is then handed over to the host.
> > > > > > > 1) All the CPUs of the host are used to process data packets, and there is
> > > > > > > no need to use some CPUs to forward and parse the outer header.
> > > > > > You really have to parse the outer header anyway,
> > > > > > otherwise there's no tunneling.
> > > > > > Unless you want to teach virtio to implement tunneling
> > > > > > in hardware, which is something I'd find it easier to
> > > > > > get behind.
> > > > > There is no need to parse the outer header twice, because we use shared
> > > > > memory.
> > > > shared with what? you need the outer header to identify the tunnel.
> > > > 
> > > > > > > 2) The entropy of the original quintuple is sufficient, and the queue is
> > > > > > > widely distributed.
> > > > > > It's exactly the same entropy, why would it be better? In fact you
> > > > > > are taking out the outer hash entropy making things worse.
> > > > > I don't get the point, why the entropy of the inner 5-tuple and the outer
> > > > > tunnel header is the same,
> > > > > multiple streams have the same outer header.
> > > > > 
> > > > > Thanks.
> > > > well our hash is 32 bit. source port is just 16 bit.
> > > > so yes it's more entropy but RSS can't use more than 16 bit.
> > > > why do you need so many? you have more than 64k CPUs to offload to?
> > > > 
> > > > 
> > > > > > > Thanks.
> > > > > > > > same goes for vxlan did not check further.
> > > > > > > > 
> > > > > > > > so what is the problem?  and which tunnel types actually suffer from the
> > > > > > > > problem?
> > > > > > > > 
> > > > > > > This publicly archived list offers a means to provide input to the
> > > > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > > > > 
> > > > > > > In order to verify user consent to the Feedback License terms and
> > > > > > > to minimize spam in the list archive, subscription is required
> > > > > > > before posting.
> > > > > > > 
> > > > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > 
> > > This publicly archived list offers a means to provide input to the
> > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > 
> > > In order to verify user consent to the Feedback License terms and
> > > to minimize spam in the list archive, subscription is required
> > > before posting.
> > > 
> > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > List help: virtio-comment-help@lists.oasis-open.org
> > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > Committee: https://www.oasis-open.org/committees/virtio/
> > > Join OASIS: https://www.oasis-open.org/join/
> > 
> > This publicly archived list offers a means to provide input to the
> > OASIS Virtual I/O Device (VIRTIO) TC.
> > 
> > In order to verify user consent to the Feedback License terms and
> > to minimize spam in the list archive, subscription is required
> > before posting.
> > 
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-15 14:57                 ` Michael S. Tsirkin
@ 2023-03-16 13:17                   ` Heng Qi
  2023-03-20 19:45                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-03-16 13:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote:
> On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> > 
> > 
> > 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:
> > > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
> > > > 
> > > > 
> > > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
> > > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> > > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > > > > If the tunnel is used to encapsulate the packets, the hash calculated
> > > > > > > > > > using the outer header of the receive packets is always fixed for the
> > > > > > > > > > same flow packets, i.e. they will be steered to the same receive queue.
> > > > > > > > > Wait a second. How is this true? Does not everyone stick the
> > > > > > > > > inner header hash in the outer source port to solve this?
> > > > > > > > Yes, you are right. That's what we did before the inner header hash, but it
> > > > > > > > has a performance penalty, which I'll explain below.
> > > > > > > > 
> > > > > > > > > For example geneve spec says:
> > > > > > > > > 
> > > > > > > > >        it is necessary for entropy from encapsulated packets to be
> > > > > > > > >        exposed in the tunnel header.  The most common technique for this is
> > > > > > > > >        to use the UDP source port
> > > > > > > > The end point of the tunnel called the gateway (with DPDK on top of it).
> > > > > > > > 
> > > > > > > > 1. When there is no inner header hash, entropy can be inserted into the udp
> > > > > > > > src port of the outer header of the tunnel,
> > > > > > > > and then the tunnel packet is handed over to the host. The host needs to
> > > > > > > > take out a part of the CPUs to parse the outer headers (but not drop them)
> > > > > > > > to calculate the inner hash for the inner payloads,
> > > > > > > > and then use the inner
> > > > > > > > hash to forward them to another part of the CPUs that are responsible for
> > > > > > > > processing.
> > > > > > > I don't get this part. Leave inner hashes to the guest inside the
> > > > > > > tunnel, why is your host doing this?
> > > > 
> > > > Let's simplify some details and take a fresh look at two different
> > > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
> > > > 
> > > > 1. In Scenario1, we can improve the processing performance of the same flow
> > > > by implementing inner symmetric hashing.
> > > > 
> > > > This is because even though client1 and client2 communicate bidirectionally
> > > > through the same flow, their data may pass
> > > > 
> > > > through and be encapsulated by different tunnels, resulting in the same flow
> > > > being hashed to different queues and processed by different CPUs.
> > > > 
> > > > To ensure consistency and optimized processing, we need to parse out the
> > > > inner header and compute a symmetric hash on it using a special rss key.
> > > > 
> > > > Sorry for not mentioning the inner symmetric hash before, in order to
> > > > prevent the introduction of more concepts, but it is indeed a kind of inner
> > > > hash.
> > > If parts of a flow go through different tunnels won't this cause
> > > reordering at the network level? Why is it so important to prevent it at
> > > the nic then?  Or, since you are stressing symmetric hash, are you
> > > talking about TX and RX side going through different tunnels?
> > 
> > Yes, the directions client1->client2 and client2->client1 may go through
> > different tunnels.
> > Using inner symmetric hashing can satisfy the same CPU to process two
> > directions of the same flow to improve performance.
> 
> Well sure but ... are you just doing forwarding or inner processing too?

When there is an inner hash, there is no forwarding anymore.

> If forwarding why do you care about matching TX and RX queues? If e2e

In fact, we are just matching on the same rx queue. The network topology
is roughly as follows. The processing host will receive the packets
sent from client1 and client2 respectively, then make some action judgments,
and return them to client2 and client1 respectively.

client1                   client2
   |                         |
   |      __________         |
   +----->| tunnel |<--------+
          |--------|
             |  |
             |  |
             |  |
             v  v
       +-----------------+
       | processing host |
       +-----------------+

Thanks.

> processing can't you just store the incoming hash in the flow and reuse
> on TX? This is what Linux is doing...
> 
> 
> 
> > > 
> > > 
> > > > 2. In Scenario2 with GRE, the lack of outer transport headers means that
> > > > flows between multiple communication pairs encapsulated by the same tunnel
> > > > 
> > > > will all be hashed to the same queue. To address this, we need to implement
> > > > inner hashing to improve the performance of RSS. By parsing and calculating
> > > > 
> > > > the inner hash, different flows can be hashed to different queues.
> > > > 
> > > > Thanks.
> > > > 
> > > > 
> > > Well 2 is at least inexact, there's flowID there. It's just 8 bit
> > 
> > We use the most basic GRE header fields (not NVGRE), not even optional
> > fields.
> > There is also no flow id in the GRE header, should you be referring to
> > NVGRE?
> > 
> > Thanks.
> > 
> > > so not sufficient if there are more than 512 queues. Still 512 queues
> > > is quite a lot. Are you trying to solve for configurations with
> > > more than 512 queues then?
> > > 
> > > 
> > > > > > Assuming that the same flow includes a unidirectional flow a->b, or a
> > > > > > bidirectional flow a->b and b->a,
> > > > > > such flow may be out of order when processed by the gateway(DPDK):
> > > > > > 
> > > > > > 1. In unidirectional mode, if the same flow is switched to another gateway
> > > > > > for some reason, resulting in different outer IP address,
> > > > > >       then this flow may be processed by different CPUs after reaching the
> > > > > > host if there is no inner hash. So after the host receives the
> > > > > >       flow, first use the forwarding CPUs to parse the inner hash, and then
> > > > > > use the hash to ensure that the flow is processed by the
> > > > > >       same CPU.
> > > > > > 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
> > > > > > go to gateway 2. In order to ensure that the same flow is
> > > > > >       processed by the same CPU, we still need the forwarding CPUs to parse
> > > > > > the real inner hash(here, the hash key needs to be replaced with a symmetric
> > > > > > hash key).
> > > > > Oh intersting. What are those gateways, how come there's expectation
> > > > > that you can change their addresses and topology
> > > > > completely seamlessly without any reordering whatsoever?
> > > > > Isn't network topology change kind of guaranteed to change ordering
> > > > > sometimes?
> > > > > 
> > > > > 
> > > > > > > > 1). During this process, the CPUs on the host is divided into two parts, one
> > > > > > > > part is used as a forwarding node to parse the outer header,
> > > > > > > >         and the CPU utilization is low. Another part handles packets.
> > > > > > > Some overhead is clearly involved in *sending* packets -
> > > > > > > to calculate the hash and stick it in the port number.
> > > > > > > This is, however, a separate problem and if you want to
> > > > > > > solve it then my suggestion would be to teach the *transmit*
> > > > > > > side about GRE offloads, so it can fill the source port in the card.
> > > > > > > 
> > > > > > > > 2). The entropy of the source udp src port is not enough, that is, the queue
> > > > > > > > is not widely distributed.
> > > > > > > how isn't it enough? 16 bit is enough to cover all vqs ...
> > > > > > A 5-tuple brings more entropy than a single port, doesn't it?
> > > > > But you don't need more for RSS, the indirection table is not
> > > > > that large.
> > > > > 
> > > > > > In fact, the
> > > > > > inner hash of the physical network card used by
> > > > > > the business team is indeed better than the udp port number of the outer
> > > > > > header we modify now, but they did not give me the data.
> > > > > Admittedly, out hash value is 32 bit.
> > > > > 
> > > > > > > > 2. When there is an inner header hash, the gateway will directly help parse
> > > > > > > > the outer header, and use the inner 5 tuples to calculate the inner hash.
> > > > > > > > The tunneled packet is then handed over to the host.
> > > > > > > > 1) All the CPUs of the host are used to process data packets, and there is
> > > > > > > > no need to use some CPUs to forward and parse the outer header.
> > > > > > > You really have to parse the outer header anyway,
> > > > > > > otherwise there's no tunneling.
> > > > > > > Unless you want to teach virtio to implement tunneling
> > > > > > > in hardware, which is something I'd find it easier to
> > > > > > > get behind.
> > > > > > There is no need to parse the outer header twice, because we use shared
> > > > > > memory.
> > > > > shared with what? you need the outer header to identify the tunnel.
> > > > > 
> > > > > > > > 2) The entropy of the original quintuple is sufficient, and the queue is
> > > > > > > > widely distributed.
> > > > > > > It's exactly the same entropy, why would it be better? In fact you
> > > > > > > are taking out the outer hash entropy making things worse.
> > > > > > I don't get the point, why the entropy of the inner 5-tuple and the outer
> > > > > > tunnel header is the same,
> > > > > > multiple streams have the same outer header.
> > > > > > 
> > > > > > Thanks.
> > > > > well our hash is 32 bit. source port is just 16 bit.
> > > > > so yes it's more entropy but RSS can't use more than 16 bit.
> > > > > why do you need so many? you have more than 64k CPUs to offload to?
> > > > > 
> > > > > 
> > > > > > > > Thanks.
> > > > > > > > > same goes for vxlan did not check further.
> > > > > > > > > 
> > > > > > > > > so what is the problem?  and which tunnel types actually suffer from the
> > > > > > > > > problem?
> > > > > > > > > 
> > > > > > > > This publicly archived list offers a means to provide input to the
> > > > > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > > > > > 
> > > > > > > > In order to verify user consent to the Feedback License terms and
> > > > > > > > to minimize spam in the list archive, subscription is required
> > > > > > > > before posting.
> > > > > > > > 
> > > > > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > > > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > 
> > > > This publicly archived list offers a means to provide input to the
> > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > 
> > > > In order to verify user consent to the Feedback License terms and
> > > > to minimize spam in the list archive, subscription is required
> > > > before posting.
> > > > 
> > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > Join OASIS: https://www.oasis-open.org/join/
> > > 
> > > This publicly archived list offers a means to provide input to the
> > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > 
> > > In order to verify user consent to the Feedback License terms and
> > > to minimize spam in the list archive, subscription is required
> > > before posting.
> > > 
> > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > List help: virtio-comment-help@lists.oasis-open.org
> > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > Committee: https://www.oasis-open.org/committees/virtio/
> > > Join OASIS: https://www.oasis-open.org/join/

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-16 13:17                   ` Heng Qi
@ 2023-03-20 19:45                     ` Michael S. Tsirkin
  2023-03-30 12:10                       ` Heng Qi
  0 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-03-20 19:45 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Thu, Mar 16, 2023 at 09:17:26PM +0800, Heng Qi wrote:
> On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> > > 
> > > 
> > > 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:
> > > > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
> > > > > 
> > > > > 
> > > > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
> > > > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> > > > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > > > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > > > > > If the tunnel is used to encapsulate the packets, the hash calculated
> > > > > > > > > > > using the outer header of the receive packets is always fixed for the
> > > > > > > > > > > same flow packets, i.e. they will be steered to the same receive queue.
> > > > > > > > > > Wait a second. How is this true? Does not everyone stick the
> > > > > > > > > > inner header hash in the outer source port to solve this?
> > > > > > > > > Yes, you are right. That's what we did before the inner header hash, but it
> > > > > > > > > has a performance penalty, which I'll explain below.
> > > > > > > > > 
> > > > > > > > > > For example geneve spec says:
> > > > > > > > > > 
> > > > > > > > > >        it is necessary for entropy from encapsulated packets to be
> > > > > > > > > >        exposed in the tunnel header.  The most common technique for this is
> > > > > > > > > >        to use the UDP source port
> > > > > > > > > The end point of the tunnel called the gateway (with DPDK on top of it).
> > > > > > > > > 
> > > > > > > > > 1. When there is no inner header hash, entropy can be inserted into the udp
> > > > > > > > > src port of the outer header of the tunnel,
> > > > > > > > > and then the tunnel packet is handed over to the host. The host needs to
> > > > > > > > > take out a part of the CPUs to parse the outer headers (but not drop them)
> > > > > > > > > to calculate the inner hash for the inner payloads,
> > > > > > > > > and then use the inner
> > > > > > > > > hash to forward them to another part of the CPUs that are responsible for
> > > > > > > > > processing.
> > > > > > > > I don't get this part. Leave inner hashes to the guest inside the
> > > > > > > > tunnel, why is your host doing this?
> > > > > 
> > > > > Let's simplify some details and take a fresh look at two different
> > > > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
> > > > > 
> > > > > 1. In Scenario1, we can improve the processing performance of the same flow
> > > > > by implementing inner symmetric hashing.
> > > > > 
> > > > > This is because even though client1 and client2 communicate bidirectionally
> > > > > through the same flow, their data may pass
> > > > > 
> > > > > through and be encapsulated by different tunnels, resulting in the same flow
> > > > > being hashed to different queues and processed by different CPUs.
> > > > > 
> > > > > To ensure consistency and optimized processing, we need to parse out the
> > > > > inner header and compute a symmetric hash on it using a special rss key.
> > > > > 
> > > > > Sorry for not mentioning the inner symmetric hash before, in order to
> > > > > prevent the introduction of more concepts, but it is indeed a kind of inner
> > > > > hash.
> > > > If parts of a flow go through different tunnels won't this cause
> > > > reordering at the network level? Why is it so important to prevent it at
> > > > the nic then?  Or, since you are stressing symmetric hash, are you
> > > > talking about TX and RX side going through different tunnels?
> > > 
> > > Yes, the directions client1->client2 and client2->client1 may go through
> > > different tunnels.
> > > Using inner symmetric hashing can satisfy the same CPU to process two
> > > directions of the same flow to improve performance.
> > 
> > Well sure but ... are you just doing forwarding or inner processing too?
> 
> When there is an inner hash, there is no forwarding anymore.
> 
> > If forwarding why do you care about matching TX and RX queues? If e2e
> 
> In fact, we are just matching on the same rx queue. The network topology
> is roughly as follows. The processing host will receive the packets
> sent from client1 and client2 respectively, then make some action judgments,
> and return them to client2 and client1 respectively.
> 
> client1                   client2
>    |                         |
>    |      __________         |
>    +----->| tunnel |<--------+
>           |--------|
>              |  |
>              |  |
>              |  |
>              v  v
>        +-----------------+
>        | processing host |
>        +-----------------+
> 
> Thanks.

monotoring host would be a better term

> > processing can't you just store the incoming hash in the flow and reuse
> > on TX? This is what Linux is doing...
> > 
> > 
> > 
> > > > 
> > > > 
> > > > > 2. In Scenario2 with GRE, the lack of outer transport headers means that
> > > > > flows between multiple communication pairs encapsulated by the same tunnel
> > > > > 
> > > > > will all be hashed to the same queue. To address this, we need to implement
> > > > > inner hashing to improve the performance of RSS. By parsing and calculating
> > > > > 
> > > > > the inner hash, different flows can be hashed to different queues.
> > > > > 
> > > > > Thanks.
> > > > > 
> > > > > 
> > > > Well 2 is at least inexact, there's flowID there. It's just 8 bit
> > > 
> > > We use the most basic GRE header fields (not NVGRE), not even optional
> > > fields.
> > > There is also no flow id in the GRE header, should you be referring to
> > > NVGRE?
> > > 
> > > Thanks.
> > > 
> > > > so not sufficient if there are more than 512 queues. Still 512 queues
> > > > is quite a lot. Are you trying to solve for configurations with
> > > > more than 512 queues then?
> > > > 
> > > > 
> > > > > > > Assuming that the same flow includes a unidirectional flow a->b, or a
> > > > > > > bidirectional flow a->b and b->a,
> > > > > > > such flow may be out of order when processed by the gateway(DPDK):
> > > > > > > 
> > > > > > > 1. In unidirectional mode, if the same flow is switched to another gateway
> > > > > > > for some reason, resulting in different outer IP address,
> > > > > > >       then this flow may be processed by different CPUs after reaching the
> > > > > > > host if there is no inner hash. So after the host receives the
> > > > > > >       flow, first use the forwarding CPUs to parse the inner hash, and then
> > > > > > > use the hash to ensure that the flow is processed by the
> > > > > > >       same CPU.
> > > > > > > 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
> > > > > > > go to gateway 2. In order to ensure that the same flow is
> > > > > > >       processed by the same CPU, we still need the forwarding CPUs to parse
> > > > > > > the real inner hash(here, the hash key needs to be replaced with a symmetric
> > > > > > > hash key).
> > > > > > Oh intersting. What are those gateways, how come there's expectation
> > > > > > that you can change their addresses and topology
> > > > > > completely seamlessly without any reordering whatsoever?
> > > > > > Isn't network topology change kind of guaranteed to change ordering
> > > > > > sometimes?
> > > > > > 
> > > > > > 
> > > > > > > > > 1). During this process, the CPUs on the host is divided into two parts, one
> > > > > > > > > part is used as a forwarding node to parse the outer header,
> > > > > > > > >         and the CPU utilization is low. Another part handles packets.
> > > > > > > > Some overhead is clearly involved in *sending* packets -
> > > > > > > > to calculate the hash and stick it in the port number.
> > > > > > > > This is, however, a separate problem and if you want to
> > > > > > > > solve it then my suggestion would be to teach the *transmit*
> > > > > > > > side about GRE offloads, so it can fill the source port in the card.
> > > > > > > > 
> > > > > > > > > 2). The entropy of the source udp src port is not enough, that is, the queue
> > > > > > > > > is not widely distributed.
> > > > > > > > how isn't it enough? 16 bit is enough to cover all vqs ...
> > > > > > > A 5-tuple brings more entropy than a single port, doesn't it?
> > > > > > But you don't need more for RSS, the indirection table is not
> > > > > > that large.
> > > > > > 
> > > > > > > In fact, the
> > > > > > > inner hash of the physical network card used by
> > > > > > > the business team is indeed better than the udp port number of the outer
> > > > > > > header we modify now, but they did not give me the data.
> > > > > > Admittedly, out hash value is 32 bit.
> > > > > > 
> > > > > > > > > 2. When there is an inner header hash, the gateway will directly help parse
> > > > > > > > > the outer header, and use the inner 5 tuples to calculate the inner hash.
> > > > > > > > > The tunneled packet is then handed over to the host.
> > > > > > > > > 1) All the CPUs of the host are used to process data packets, and there is
> > > > > > > > > no need to use some CPUs to forward and parse the outer header.
> > > > > > > > You really have to parse the outer header anyway,
> > > > > > > > otherwise there's no tunneling.
> > > > > > > > Unless you want to teach virtio to implement tunneling
> > > > > > > > in hardware, which is something I'd find it easier to
> > > > > > > > get behind.
> > > > > > > There is no need to parse the outer header twice, because we use shared
> > > > > > > memory.
> > > > > > shared with what? you need the outer header to identify the tunnel.
> > > > > > 
> > > > > > > > > 2) The entropy of the original quintuple is sufficient, and the queue is
> > > > > > > > > widely distributed.
> > > > > > > > It's exactly the same entropy, why would it be better? In fact you
> > > > > > > > are taking out the outer hash entropy making things worse.
> > > > > > > I don't get the point, why the entropy of the inner 5-tuple and the outer
> > > > > > > tunnel header is the same,
> > > > > > > multiple streams have the same outer header.
> > > > > > > 
> > > > > > > Thanks.
> > > > > > well our hash is 32 bit. source port is just 16 bit.
> > > > > > so yes it's more entropy but RSS can't use more than 16 bit.
> > > > > > why do you need so many? you have more than 64k CPUs to offload to?
> > > > > > 
> > > > > > 
> > > > > > > > > Thanks.
> > > > > > > > > > same goes for vxlan did not check further.
> > > > > > > > > > 
> > > > > > > > > > so what is the problem?  and which tunnel types actually suffer from the
> > > > > > > > > > problem?
> > > > > > > > > > 
> > > > > > > > > This publicly archived list offers a means to provide input to the
> > > > > > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > > > > > > 
> > > > > > > > > In order to verify user consent to the Feedback License terms and
> > > > > > > > > to minimize spam in the list archive, subscription is required
> > > > > > > > > before posting.
> > > > > > > > > 
> > > > > > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > > > > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > 
> > > > > This publicly archived list offers a means to provide input to the
> > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > > 
> > > > > In order to verify user consent to the Feedback License terms and
> > > > > to minimize spam in the list archive, subscription is required
> > > > > before posting.
> > > > > 
> > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > 
> > > > This publicly archived list offers a means to provide input to the
> > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > 
> > > > In order to verify user consent to the Feedback License terms and
> > > > to minimize spam in the list archive, subscription is required
> > > > before posting.
> > > > 
> > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > Join OASIS: https://www.oasis-open.org/join/


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-15 12:55               ` Heng Qi
  2023-03-15 14:57                 ` Michael S. Tsirkin
@ 2023-03-20 19:48                 ` Michael S. Tsirkin
  2023-03-30 12:37                   ` Heng Qi
  1 sibling, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-03-20 19:48 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> We use the most basic GRE header fields (not NVGRE), not even optional
> fields.

I'd say yes, the most convincing usecase is with legacy GRE.
Given that, do you need the rest of protocols there?
We can start with just legacy GRE (think about including IPv6 or not).
Given how narrow this usecase is, I'd be fine with focusing
just on this, and addressing more protocols down the road
with something programmable like BPF. WDYT?

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-20 19:45                     ` Michael S. Tsirkin
@ 2023-03-30 12:10                       ` Heng Qi
  0 siblings, 0 replies; 105+ messages in thread
From: Heng Qi @ 2023-03-30 12:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



在 2023/3/21 上午3:45, Michael S. Tsirkin 写道:
> On Thu, Mar 16, 2023 at 09:17:26PM +0800, Heng Qi wrote:
>> On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote:
>>> On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
>>>>
>>>> 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:
>>>>> On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
>>>>>>
>>>>>> 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
>>>>>>> On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
>>>>>>>> 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
>>>>>>>>> On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
>>>>>>>>>> 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
>>>>>>>>>>> On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
>>>>>>>>>>>> If the tunnel is used to encapsulate the packets, the hash calculated
>>>>>>>>>>>> using the outer header of the receive packets is always fixed for the
>>>>>>>>>>>> same flow packets, i.e. they will be steered to the same receive queue.
>>>>>>>>>>> Wait a second. How is this true? Does not everyone stick the
>>>>>>>>>>> inner header hash in the outer source port to solve this?
>>>>>>>>>> Yes, you are right. That's what we did before the inner header hash, but it
>>>>>>>>>> has a performance penalty, which I'll explain below.
>>>>>>>>>>
>>>>>>>>>>> For example geneve spec says:
>>>>>>>>>>>
>>>>>>>>>>>         it is necessary for entropy from encapsulated packets to be
>>>>>>>>>>>         exposed in the tunnel header.  The most common technique for this is
>>>>>>>>>>>         to use the UDP source port
>>>>>>>>>> The end point of the tunnel called the gateway (with DPDK on top of it).
>>>>>>>>>>
>>>>>>>>>> 1. When there is no inner header hash, entropy can be inserted into the udp
>>>>>>>>>> src port of the outer header of the tunnel,
>>>>>>>>>> and then the tunnel packet is handed over to the host. The host needs to
>>>>>>>>>> take out a part of the CPUs to parse the outer headers (but not drop them)
>>>>>>>>>> to calculate the inner hash for the inner payloads,
>>>>>>>>>> and then use the inner
>>>>>>>>>> hash to forward them to another part of the CPUs that are responsible for
>>>>>>>>>> processing.
>>>>>>>>> I don't get this part. Leave inner hashes to the guest inside the
>>>>>>>>> tunnel, why is your host doing this?
>>>>>> Let's simplify some details and take a fresh look at two different
>>>>>> scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
>>>>>>
>>>>>> 1. In Scenario1, we can improve the processing performance of the same flow
>>>>>> by implementing inner symmetric hashing.
>>>>>>
>>>>>> This is because even though client1 and client2 communicate bidirectionally
>>>>>> through the same flow, their data may pass
>>>>>>
>>>>>> through and be encapsulated by different tunnels, resulting in the same flow
>>>>>> being hashed to different queues and processed by different CPUs.
>>>>>>
>>>>>> To ensure consistency and optimized processing, we need to parse out the
>>>>>> inner header and compute a symmetric hash on it using a special rss key.
>>>>>>
>>>>>> Sorry for not mentioning the inner symmetric hash before, in order to
>>>>>> prevent the introduction of more concepts, but it is indeed a kind of inner
>>>>>> hash.
>>>>> If parts of a flow go through different tunnels won't this cause
>>>>> reordering at the network level? Why is it so important to prevent it at
>>>>> the nic then?  Or, since you are stressing symmetric hash, are you
>>>>> talking about TX and RX side going through different tunnels?
>>>> Yes, the directions client1->client2 and client2->client1 may go through
>>>> different tunnels.
>>>> Using inner symmetric hashing can satisfy the same CPU to process two
>>>> directions of the same flow to improve performance.
>>> Well sure but ... are you just doing forwarding or inner processing too?
>> When there is an inner hash, there is no forwarding anymore.
>>
>>> If forwarding why do you care about matching TX and RX queues? If e2e
>> In fact, we are just matching on the same rx queue. The network topology
>> is roughly as follows. The processing host will receive the packets
>> sent from client1 and client2 respectively, then make some action judgments,
>> and return them to client2 and client1 respectively.
>>
>> client1                   client2
>>     |                         |
>>     |      __________         |
>>     +----->| tunnel |<--------+
>>            |--------|
>>               |  |
>>               |  |
>>               |  |
>>               v  v
>>         +-----------------+
>>         | processing host |
>>         +-----------------+
>>
>> Thanks.
> monotoring host would be a better term

Sure.

I'm so sorry I didn't realize I missed this until I checked my emails. 😮 :(


>
>>> processing can't you just store the incoming hash in the flow and reuse
>>> on TX? This is what Linux is doing...
>>>
>>>
>>>
>>>>>
>>>>>> 2. In Scenario2 with GRE, the lack of outer transport headers means that
>>>>>> flows between multiple communication pairs encapsulated by the same tunnel
>>>>>>
>>>>>> will all be hashed to the same queue. To address this, we need to implement
>>>>>> inner hashing to improve the performance of RSS. By parsing and calculating
>>>>>>
>>>>>> the inner hash, different flows can be hashed to different queues.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>> Well 2 is at least inexact, there's flowID there. It's just 8 bit
>>>> We use the most basic GRE header fields (not NVGRE), not even optional
>>>> fields.
>>>> There is also no flow id in the GRE header, should you be referring to
>>>> NVGRE?
>>>>
>>>> Thanks.
>>>>
>>>>> so not sufficient if there are more than 512 queues. Still 512 queues
>>>>> is quite a lot. Are you trying to solve for configurations with
>>>>> more than 512 queues then?
>>>>>
>>>>>
>>>>>>>> Assuming that the same flow includes a unidirectional flow a->b, or a
>>>>>>>> bidirectional flow a->b and b->a,
>>>>>>>> such flow may be out of order when processed by the gateway(DPDK):
>>>>>>>>
>>>>>>>> 1. In unidirectional mode, if the same flow is switched to another gateway
>>>>>>>> for some reason, resulting in different outer IP address,
>>>>>>>>        then this flow may be processed by different CPUs after reaching the
>>>>>>>> host if there is no inner hash. So after the host receives the
>>>>>>>>        flow, first use the forwarding CPUs to parse the inner hash, and then
>>>>>>>> use the hash to ensure that the flow is processed by the
>>>>>>>>        same CPU.
>>>>>>>> 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
>>>>>>>> go to gateway 2. In order to ensure that the same flow is
>>>>>>>>        processed by the same CPU, we still need the forwarding CPUs to parse
>>>>>>>> the real inner hash(here, the hash key needs to be replaced with a symmetric
>>>>>>>> hash key).
>>>>>>> Oh intersting. What are those gateways, how come there's expectation
>>>>>>> that you can change their addresses and topology
>>>>>>> completely seamlessly without any reordering whatsoever?
>>>>>>> Isn't network topology change kind of guaranteed to change ordering
>>>>>>> sometimes?
>>>>>>>
>>>>>>>
>>>>>>>>>> 1). During this process, the CPUs on the host is divided into two parts, one
>>>>>>>>>> part is used as a forwarding node to parse the outer header,
>>>>>>>>>>          and the CPU utilization is low. Another part handles packets.
>>>>>>>>> Some overhead is clearly involved in *sending* packets -
>>>>>>>>> to calculate the hash and stick it in the port number.
>>>>>>>>> This is, however, a separate problem and if you want to
>>>>>>>>> solve it then my suggestion would be to teach the *transmit*
>>>>>>>>> side about GRE offloads, so it can fill the source port in the card.
>>>>>>>>>
>>>>>>>>>> 2). The entropy of the source udp src port is not enough, that is, the queue
>>>>>>>>>> is not widely distributed.
>>>>>>>>> how isn't it enough? 16 bit is enough to cover all vqs ...
>>>>>>>> A 5-tuple brings more entropy than a single port, doesn't it?
>>>>>>> But you don't need more for RSS, the indirection table is not
>>>>>>> that large.
>>>>>>>
>>>>>>>> In fact, the
>>>>>>>> inner hash of the physical network card used by
>>>>>>>> the business team is indeed better than the udp port number of the outer
>>>>>>>> header we modify now, but they did not give me the data.
>>>>>>> Admittedly, out hash value is 32 bit.
>>>>>>>
>>>>>>>>>> 2. When there is an inner header hash, the gateway will directly help parse
>>>>>>>>>> the outer header, and use the inner 5 tuples to calculate the inner hash.
>>>>>>>>>> The tunneled packet is then handed over to the host.
>>>>>>>>>> 1) All the CPUs of the host are used to process data packets, and there is
>>>>>>>>>> no need to use some CPUs to forward and parse the outer header.
>>>>>>>>> You really have to parse the outer header anyway,
>>>>>>>>> otherwise there's no tunneling.
>>>>>>>>> Unless you want to teach virtio to implement tunneling
>>>>>>>>> in hardware, which is something I'd find it easier to
>>>>>>>>> get behind.
>>>>>>>> There is no need to parse the outer header twice, because we use shared
>>>>>>>> memory.
>>>>>>> shared with what? you need the outer header to identify the tunnel.
>>>>>>>
>>>>>>>>>> 2) The entropy of the original quintuple is sufficient, and the queue is
>>>>>>>>>> widely distributed.
>>>>>>>>> It's exactly the same entropy, why would it be better? In fact you
>>>>>>>>> are taking out the outer hash entropy making things worse.
>>>>>>>> I don't get the point, why the entropy of the inner 5-tuple and the outer
>>>>>>>> tunnel header is the same,
>>>>>>>> multiple streams have the same outer header.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>> well our hash is 32 bit. source port is just 16 bit.
>>>>>>> so yes it's more entropy but RSS can't use more than 16 bit.
>>>>>>> why do you need so many? you have more than 64k CPUs to offload to?
>>>>>>>
>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>> same goes for vxlan did not check further.
>>>>>>>>>>>
>>>>>>>>>>> so what is the problem?  and which tunnel types actually suffer from the
>>>>>>>>>>> problem?
>>>>>>>>>>>
>>>>>>>>>> This publicly archived list offers a means to provide input to the
>>>>>>>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>>>>>>>
>>>>>>>>>> In order to verify user consent to the Feedback License terms and
>>>>>>>>>> to minimize spam in the list archive, subscription is required
>>>>>>>>>> before posting.
>>>>>>>>>>
>>>>>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>>>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>>>>>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>>>>>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>>>>>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>>>>>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>>>>>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>>>>>>>> Join OASIS: https://www.oasis-open.org/join/
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>>> This publicly archived list offers a means to provide input to the
>>>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>>>
>>>>>> In order to verify user consent to the Feedback License terms and
>>>>>> to minimize spam in the list archive, subscription is required
>>>>>> before posting.
>>>>>>
>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>>>> Join OASIS: https://www.oasis-open.org/join/
>>>>> This publicly archived list offers a means to provide input to the
>>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>>
>>>>> In order to verify user consent to the Feedback License terms and
>>>>> to minimize spam in the list archive, subscription is required
>>>>> before posting.
>>>>>
>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>>> Join OASIS: https://www.oasis-open.org/join/


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-20 19:48                 ` Michael S. Tsirkin
@ 2023-03-30 12:37                   ` Heng Qi
  2023-04-08 10:29                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 105+ messages in thread
From: Heng Qi @ 2023-03-30 12:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



在 2023/3/21 上午3:48, Michael S. Tsirkin 写道:
> On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
>> We use the most basic GRE header fields (not NVGRE), not even optional
>> fields.
> I'd say yes, the most convincing usecase is with legacy GRE.

Yes. But we still have a strong need for VXLAN and GENEVE to do 
symmetric hashing. Please consider this.

> Given that, do you need the rest of protocols there?

I would say that I checked the current tunneling protocols used for 
overlay networks and their respective RFC versions compared to each other.

They are:

1. GRE_rfc2784 :This protocol is only specified for IPv4 and used as 
either the payload or delivery protocol.
     link : https://datatracker.ietf.org/doc/rfc2784/

2. GRE_rfc2890: This protocol describes extensions by which two fields, 
Key and Sequence Number, can be optionally carried in the GRE Header.
     link: https://www.rfc-editor.org/rfc/rfc2890

3. GRE_rfc7676: IPv6 Support for Generic Routing Encapsulation (GRE). 
This protocol is specified for IPv6 and used as either the payload or 
delivery protocol.
     Note that this does not change the GRE header format or any 
behaviors specified by RFC 2784 or RFC 2890.
     link: https://datatracker.ietf.org/doc/rfc7676/

4. GRE-in-UDP: GRE-in-UDP Encapsulation. This specifies a method of 
encapsulating network protocol packets within GRE and UDP headers.
     This GRE-in-UDP encapsulation allows the UDP source port field to 
be used as an entropy field. This protocol is specified for IPv4 and 
IPv6, and used as either the payload or delivery protocol.
     link: https://www.rfc-editor.org/rfc/rfc8086

5. VXLAN: Virtual eXtensible Local Area Network.
     link: https://datatracker.ietf.org/doc/rfc7348/

6. VXLAN-GPE: Generic Protocol Extension for VXLAN. This protocol 
describes extending Virtual eXtensible Local Area Network (VXLAN) via 
changes to the VXLAN header.
     link: https://www.ietf.org/archive/id/draft-ietf-nvo3-vxlan-gpe-12.txt

7. GENEVE: Generic Network Virtualization Encapsulation.
     link: https://datatracker.ietf.org/doc/rfc8926/

8. IPIP: IP Encapsulation within IP.
     link: https://www.rfc-editor.org/rfc/rfc2003

9. NVGRE: Network Virtualization Using Generic Routing Encapsulation
     link: https://www.rfc-editor.org/rfc/rfc7637.html

10. STT: Stateless Transport Tunneling. STT is particularly useful when 
some tunnel endpoints are in end-systems, as it utilizes the 
capabilities of the network interface card to improve performance.
       link: https://www.ietf.org/archive/id/draft-davie-stt-08.txt

Among them, GRE_rfc2784, VXLAN and GENEVE are our internal requirements 
for inner header hashing.
GRE_rfc2784 requires RSS hashing to different queues.
For the monitoring scenario I mentioned, VXLAN or GRE_rfc2890 also needs 
to use inner symmetric hashing.

I know you mean to want this feature to only support GRE_rfc2784, since 
it's the most convincing for RSS.
But RSS hashes packets to different queues for different streams.
For the same flow, it needs to hash it to the same queue.
So this doesn't distort the role of RSS, and I believe that for modern 
protocols like VXLAN and others, inner symmetric hashing is still a 
common requirement for other vendors using virtio devices.

So, can we make this feature support all the protocols I have checked 
above, so that vendors can choose to support the protocols they want. 
And this can avoid the addition of new tunnel protocols
in the near future as much as possible.

Do you think it's ok?

Again: I'm so sorry I didn't realize I missed this until I checked my 
emails. 🙁😮

> We can start with just legacy GRE (think about including IPv6 or not).
> Given how narrow this usecase is, I'd be fine with focusing
> just on this, and addressing more protocols down the road
> with something programmable like BPF. WDYT?
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-03-30 12:37                   ` Heng Qi
@ 2023-04-08 10:29                     ` Michael S. Tsirkin
  2023-04-10 13:26                       ` Heng Qi
  0 siblings, 1 reply; 105+ messages in thread
From: Michael S. Tsirkin @ 2023-04-08 10:29 UTC (permalink / raw)
  To: Heng Qi
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo

On Thu, Mar 30, 2023 at 08:37:21PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/3/21 上午3:48, Michael S. Tsirkin 写道:
> > On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> > > We use the most basic GRE header fields (not NVGRE), not even optional
> > > fields.
> > I'd say yes, the most convincing usecase is with legacy GRE.
> 
> Yes. But we still have a strong need for VXLAN and GENEVE to do symmetric
> hashing. Please consider this.

Using a specific key seems fragile though in that a different one is
needed for e.g. ipv4 and ipv6.  An issue with VXLAN and GENEVE, yes?
Will support for XOR hashing address this sufficiently or is that not
acceptable to you? Or alternatively a modified Toeplitz, e.g. this
https://inbox.dpdk.org/dev/20190731123040.GG4512@6wind.com/
suggests Mellanox supports that. WDYT?

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
  2023-04-08 10:29                     ` Michael S. Tsirkin
@ 2023-04-10 13:26                       ` Heng Qi
  0 siblings, 0 replies; 105+ messages in thread
From: Heng Qi @ 2023-04-10 13:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, virtio-dev, Parav Pandit, Jason Wang,
	Yuri Benditovich, Cornelia Huck, Xuan Zhuo



在 2023/4/8 下午6:29, Michael S. Tsirkin 写道:
> On Thu, Mar 30, 2023 at 08:37:21PM +0800, Heng Qi wrote:
>>
>> 在 2023/3/21 上午3:48, Michael S. Tsirkin 写道:
>>> On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
>>>> We use the most basic GRE header fields (not NVGRE), not even optional
>>>> fields.
>>> I'd say yes, the most convincing usecase is with legacy GRE.
>> Yes. But we still have a strong need for VXLAN and GENEVE to do symmetric
>> hashing. Please consider this.
> Using a specific key seems fragile though in that a different one is
> needed for e.g. ipv4 and ipv6.  An issue with VXLAN and GENEVE, yes?

Yes.

> Will support for XOR hashing address this sufficiently or is that not
> acceptable to you? Or alternatively a modified Toeplitz, e.g. this

This is a very good opinion, I will want to follow up on this work and I 
have expressed in other threads.

Thanks.

> https://inbox.dpdk.org/dev/20190731123040.GG4512@6wind.com/
> suggests Mellanox supports that. WDYT?
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 105+ messages in thread

end of thread, other threads:[~2023-04-10 13:26 UTC | newest]

Thread overview: 105+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-18 14:37 [PATCH v9] virtio-net: support inner header hash Heng Qi
2023-02-20 15:53 ` [virtio-comment] Re: [virtio-dev] " Heng Qi
2023-02-20 16:12   ` Michael S. Tsirkin
2023-02-21  4:20 ` Parav Pandit
2023-02-21  6:14   ` [virtio-comment] " Heng Qi
2023-02-21 12:47     ` Parav Pandit
2023-02-21 13:34       ` Heng Qi
2023-02-21 15:32         ` Parav Pandit
2023-02-21 16:44           ` [virtio-comment] Re: [virtio-dev] " Heng Qi
2023-02-21 16:50             ` Parav Pandit
2023-02-21 17:13               ` Michael S. Tsirkin
2023-02-21 17:40                 ` [virtio-comment] " Parav Pandit
2023-02-21 17:44                   ` Michael S. Tsirkin
2023-02-21 17:54                     ` Parav Pandit
2023-02-21 17:17               ` [virtio-comment] " Heng Qi
2023-02-21 17:39                 ` Parav Pandit
2023-02-21 13:37       ` Heng Qi
2023-02-21 17:05   ` Michael S. Tsirkin
2023-02-21 19:29     ` Parav Pandit
2023-02-21 21:23       ` Michael S. Tsirkin
2023-02-21 21:36         ` Parav Pandit
2023-02-21 21:46           ` Michael S. Tsirkin
2023-02-21 22:32             ` Parav Pandit
2023-02-21 23:18               ` Michael S. Tsirkin
2023-02-22  1:41                 ` Parav Pandit
2023-02-22  2:51                 ` [virtio-dev] " Heng Qi
2023-02-22  2:34       ` [virtio-dev] " Heng Qi
2023-02-22  6:21         ` Michael S. Tsirkin
2023-02-22  7:03           ` Heng Qi
2023-02-22 11:29             ` Michael S. Tsirkin
2023-03-01 14:32   ` [virtio-dev] " Heng Qi
2023-02-21 17:50 ` Michael S. Tsirkin
2023-02-22  3:22   ` Jason Wang
2023-02-22  6:46     ` Heng Qi
2023-02-22 11:30       ` Michael S. Tsirkin
2023-02-23  2:50       ` Jason Wang
2023-02-23  4:41         ` [virtio-dev] " Heng Qi
2023-02-24  2:45           ` Jason Wang
2023-02-24  4:47             ` [virtio-comment] " Heng Qi
2023-02-24  8:07             ` Michael S. Tsirkin
2023-02-23 13:03         ` Michael S. Tsirkin
2023-02-24  2:26           ` Jason Wang
2023-02-24  8:06             ` [virtio-dev] " Michael S. Tsirkin
2023-02-27  4:07               ` Jason Wang
2023-02-27  4:07                 ` [virtio-dev] " Jason Wang
2023-02-27  7:39                 ` Michael S. Tsirkin
2023-02-27  7:39                   ` [virtio-dev] " Michael S. Tsirkin
2023-02-27  8:35                   ` Jason Wang
2023-02-27  8:35                     ` [virtio-dev] " Jason Wang
2023-02-27 12:38                     ` Heng Qi
2023-02-27 12:38                       ` [virtio-dev] " Heng Qi
2023-02-27 17:49                     ` Michael S. Tsirkin
2023-02-27 17:49                       ` [virtio-dev] " Michael S. Tsirkin
2023-02-28  3:04                       ` Jason Wang
2023-02-28  3:04                         ` [virtio-dev] " Jason Wang
2023-02-28  8:52                         ` Michael S. Tsirkin
2023-02-28  8:52                           ` [virtio-dev] " Michael S. Tsirkin
2023-02-28  9:56                           ` Heng Qi
2023-02-28  9:56                             ` Heng Qi
2023-02-28 11:04                         ` Michael S. Tsirkin
2023-02-28 11:04                           ` [virtio-dev] " Michael S. Tsirkin
2023-03-01  2:36                           ` Jason Wang
2023-03-01  2:36                             ` [virtio-dev] " Jason Wang
2023-03-01 10:36                             ` Michael S. Tsirkin
2023-03-02  2:57                               ` Jason Wang
2023-03-02  7:42                                 ` Michael S. Tsirkin
2023-03-02  7:57                                   ` Jason Wang
2023-03-02  8:09                                     ` Michael S. Tsirkin
2023-03-02  8:15                                       ` Jason Wang
2023-03-02  8:41                                         ` Michael S. Tsirkin
2023-03-02  8:59                                           ` Jason Wang
2023-03-02  9:46                                             ` Michael S. Tsirkin
2023-02-23 13:13 ` Michael S. Tsirkin
2023-02-23 14:40   ` [virtio-comment] " Parav Pandit
2023-02-24  8:13     ` Michael S. Tsirkin
2023-02-24 14:38       ` [virtio-dev] " Heng Qi
2023-02-24 17:10         ` Michael S. Tsirkin
2023-02-24 17:10           ` Michael S. Tsirkin
2023-02-27  0:29       ` Parav Pandit
2023-02-27  0:29         ` [virtio-dev] " Parav Pandit
2023-02-24  4:42   ` Heng Qi
2023-02-24  8:04     ` Michael S. Tsirkin
2023-02-28 11:16 ` Michael S. Tsirkin
2023-02-28 11:16   ` [virtio-dev] " Michael S. Tsirkin
2023-03-01  2:56   ` Heng Qi
2023-03-01  2:56     ` Heng Qi
2023-03-08 14:39     ` [virtio-dev] Re: [virtio-comment] " Michael S. Tsirkin
2023-03-09  4:55       ` Heng Qi
2023-03-09 19:36         ` Michael S. Tsirkin
2023-03-11  3:23           ` Heng Qi
2023-03-15 11:58             ` [virtio-dev] Re: [virtio-comment] " Michael S. Tsirkin
2023-03-15 12:55               ` Heng Qi
2023-03-15 14:57                 ` Michael S. Tsirkin
2023-03-16 13:17                   ` Heng Qi
2023-03-20 19:45                     ` Michael S. Tsirkin
2023-03-30 12:10                       ` Heng Qi
2023-03-20 19:48                 ` Michael S. Tsirkin
2023-03-30 12:37                   ` Heng Qi
2023-04-08 10:29                     ` Michael S. Tsirkin
2023-04-10 13:26                       ` Heng Qi
2023-03-01  3:30   ` [virtio-comment] " Heng Qi
2023-03-01  3:30     ` [virtio-dev] " Heng Qi
2023-03-01 11:07     ` Michael S. Tsirkin
2023-03-01 15:10       ` Heng Qi
2023-03-09 12:28   ` [virtio-dev] " Heng Qi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).