[PATCH v4 0/3] public/io/netif.h: support for toeplitz hashing

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/3] public/io/netif.h: support for toeplitz hashing
@ 2016-01-07 13:05 Paul Durrant
  2016-01-07 13:05 ` [PATCH v4 1/3] public/io/netif.h: document transmit and receive wire formats separately Paul Durrant
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Paul Durrant @ 2016-01-07 13:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Paul Durrant

This series documents changes needed to support toeplitz hashing in a
backend, configurable by the frontend.

Patch #1 is a clean-up patch that clarifies the guest transmit and
receive side packet formats.

Patch #2 documents a new 'control ring' for passing bulk data between
frontend and backend. This is needed for passing the hash mapping table
and hash key. It also documents messages to allow a frontend to configure
toeplitz hashing.

Patch #3 documents a new extra info segment that can be used for passing
hash values along with packets on both the transmit and receive side.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 1/3] public/io/netif.h: document transmit and receive wire formats separately
  2016-01-07 13:05 [PATCH v4 0/3] public/io/netif.h: support for toeplitz hashing Paul Durrant
@ 2016-01-07 13:05 ` Paul Durrant
  2016-01-08 15:24   ` Ian Campbell
  2016-01-07 13:05 ` [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing Paul Durrant
  2016-01-07 13:05 ` [PATCH v4 3/3] public/io/netif.h: document new extra info for passing hash values Paul Durrant
  2 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-01-07 13:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Tim Deegan, Ian Jackson, Paul Durrant,
	Jan Beulich

Currently there is no documented wire format for guest receive-side
packets but the location of the 'wire format' comment block suggests
it is the same as transmit-side. This is almost true but there is a
subtle difference in the use of the 'size' field for the first fragment.

For clarity this patch creates separate comment blocks for receive
and transmit side packet wire formats, tries to be more clear about the
distinction between 'fragments' and 'extras', and documents the subtlety
concerning the size field of the first fragment.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
 xen/include/public/io/netif.h | 39 ++++++++++++++++++++++++++-------------
 1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/xen/include/public/io/netif.h b/xen/include/public/io/netif.h
index e103cf3..1790ea0 100644
--- a/xen/include/public/io/netif.h
+++ b/xen/include/public/io/netif.h
@@ -151,22 +151,22 @@
  */
 
 /*
- * This is the 'wire' format for packets:
- *  Request 1: netif_tx_request_t -- NETTXF_* (any flags)
- * [Request 2: netif_extra_info_t] (only if request 1 has
- *                                  NETTXF_extra_info)
- * [Request 3: netif_extra_info_t] (only if request 2 has
- *                                  XEN_NETIF_EXTRA_MORE)
- *  Request 4: netif_tx_request_t -- NETTXF_more_data
- *  Request 5: netif_tx_request_t -- NETTXF_more_data
- *  ...
- *  Request N: netif_tx_request_t -- 0
- */
-
-/*
  * Guest transmit
  * ==============
  *
+ * This is the 'wire' format for packets:
+ *  Fragment 1: netif_tx_request_t  - flags = NETTXF_*
+ *                                    size = total packet size
+ * [Extra 1: netif_extra_info_t]    - (only if fragment 1 flags include
+ *                                     NETTXF_extra_info)
+ * [Extra N: netif_extra_info_t]    - (only if extra N-1 flags include
+ *                                     XEN_NETIF_EXTRA_MORE)
+ *  ...
+ *  Fragment N: netif_tx_request_t  - (only if fragment N-1 flags include
+ *                                     NETTXF_more_data)
+ *                                    flags = 0
+ *                                    size = fragment size
+ *
  * Ring slot size is 12 octets, however not all request/response
  * structs use the full size.
  *
@@ -202,6 +202,19 @@
  * Guest receive
  * =============
  *
+ * This is the 'wire' format for packets:
+ *  Fragment 1: netif_rx_request_t  - flags = NETRXF_*
+ *                                    size = fragment size
+ * [Extra 1: netif_extra_info_t]    - (only if fragment 1 flags include
+ *                                     NETRXF_extra_info)
+ * [Extra N: netif_extra_info_t]    - (only if extra N-1 flags include
+ *                                     XEN_NETIF_EXTRA_MORE)
+ *  ...
+ *  Fragment N: netif_rx_request_t  - (only if fragment N-1 flags include
+ *                                     NETRXF_more_data)
+ *                                    flags = 0
+ *                                    size = fragment size
+ *
  * Ring slot size is 8 octets.
  *
  * rx request (netif_rx_request_t)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing
  2016-01-07 13:05 [PATCH v4 0/3] public/io/netif.h: support for toeplitz hashing Paul Durrant
  2016-01-07 13:05 ` [PATCH v4 1/3] public/io/netif.h: document transmit and receive wire formats separately Paul Durrant
@ 2016-01-07 13:05 ` Paul Durrant
  2016-01-08 15:53   ` Ian Campbell
  2016-01-08 16:07   ` David Vrabel
  2016-01-07 13:05 ` [PATCH v4 3/3] public/io/netif.h: document new extra info for passing hash values Paul Durrant
  2 siblings, 2 replies; 19+ messages in thread
From: Paul Durrant @ 2016-01-07 13:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Tim Deegan, Ian Jackson, Paul Durrant,
	Jan Beulich

This patch documents a new shared ring between frontend and backend that
can be used to pass bulk out-of-band data, such as that required to
implement toeplitz hashing in the backend such that it is configurable by
the frontend.

The patch then goes on to document the messages passed over the control
ring that can be used to configure toeplitz hashing.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
v4:
 - Fix netif_ctrl_response_t definition to match specification

v3:
 - Fix commit comment

v2:
 - Use a balanced fix-sized message ring for the control ring
   (bulk data now passed by grant reference).
---
 xen/include/public/io/netif.h | 264 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 264 insertions(+)

diff --git a/xen/include/public/io/netif.h b/xen/include/public/io/netif.h
index 1790ea0..06e0b61 100644
--- a/xen/include/public/io/netif.h
+++ b/xen/include/public/io/netif.h
@@ -151,6 +151,270 @@
  */
 
 /*
+ * Control ring
+ * ============
+ *
+ * Some features, such as toeplitz hashing (detailed below), require a
+ * significant amount of out-of-band data to be passed from frontend to
+ * backend. Use of xenstore is not suitable for large quantities of data
+ * because of quota limitations and so a dedicated 'control ring' is used.
+ * The ability of the backend to use a control ring is advertised by
+ * setting:
+ *
+ * /local/domain/X/backend/<domid>/<vif>/feature-control-ring = "1"
+ *
+ * The frontend provides a control ring to the backend by setting:
+ *
+ * /local/domain/<domid>/device/vif/<vif>/ctrl-ring-ref = <gref>
+ * /local/domain/<domid>/device/vif/<vif>/event-channel-ctrl = <port>
+ *
+ * where <gref> is the grant reference of the shared page used to
+ * implement the control ring and <port> is an event channel to be used
+ * as a mailbox interrupt, before the frontend moves into the connected
+ * state.
+ *
+ * The control ring uses a fixed request/response message size and is
+ * balanced (i.e. one request to one response), so operationally it is much
+ * the same as a tramsmit or receive ring.
+ */
+
+/*
+ * Toeplitz hash types
+ * ===================
+ *
+ * For the purposes of the definitions below, 'Packet[]' is an array of
+ * octets containing an IP packet without options, 'Array[X..Y]' means a
+ * sub-array of 'Array' containing bytes X thru Y inclusive, and '+' is
+ * used to indicate concatenation of arrays.
+ */
+
+/*
+ * A hash calculated over an IP version 4 header as follows:
+ *
+ * Buffer[0..8] = Packet[12..15] + Packet[16..19]
+ * Result = ToeplitzHash(Buffer, 8)
+ */
+#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV4     0
+#define NETIF_CTRL_TOEPLITZ_FLAG_IPV4      (1 << _NETIF_CTRL_TOEPLITZ_FLAG_IPV4)
+
+/*
+ * A hash calculated over an IP version 4 header and TCP header as
+ * follows:
+ *
+ * Buffer[0..12] = Packet[12..15] + Packet[16..19] +
+ *                 Packet[20..21] + Packet[22..23]
+ * Result = ToeplitzHash(Buffer, 12)
+ */
+#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP 1
+#define NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP  (1 << _NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP)
+
+/*
+ * A hash calculated over an IP version 6 header as follows:
+ *
+ * Buffer[0..32] = Packet[8..23] + Packet[24..39]
+ * Result = ToeplitzHash(Buffer, 32)
+ */
+#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV6     2
+#define NETIF_CTRL_TOEPLITZ_FLAG_IPV6      (1 << _NETIF_CTRL_TOEPLITZ_FLAG_IPV4)
+
+/*
+ * A hash calculated over an IP version 6 header and TCP header as
+ * follows:
+ *
+ * Buffer[0..36] = Packet[8..23] + Packet[24..39] +
+ *                 Packet[40..41] + Packet[42..43]
+ * Result = ToeplitzHash(Buffer, 36)
+ */
+#define _NETIF_CTRL_TOEPLITZ_FLAG_IPV6_TCP 3
+#define NETIF_CTRL_TOEPLITZ_FLAG_IPV6_TCP  (1 << _NETIF_CTRL_TOEPLITZ_FLAG_IPV4_TCP)
+
+/*
+ * Control requests (netif_ctrl_request_t)
+ * =======================================
+ *
+ * All requests have the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |    id     |   type    |         data[0]       |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |         data[1]       |
+ * +-----+-----+-----+-----+
+ *
+ * id: the request identifier, echoed in response.
+ * type: the type of request (see below)
+ * data[]: any data associated with the request (determined by type)
+ */
+
+struct netif_ctrl_request {
+    uint16_t id;
+    uint16_t type;
+
+#define NETIF_CTRL_TYPE_INVALID              0
+#define NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS   1
+#define NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS   2
+#define NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY     3
+#define NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING 4
+
+    uint32_t data[2];
+};
+typedef struct netif_ctrl_request netif_ctrl_request_t;
+
+/*
+ * type = NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS:
+ *
+ * This is sent by the frontend to query the types of toeplitz
+ * hash supported by the backend. No data is required and to the
+ * data[] field is set to 0.
+ *
+ * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS:
+ *
+ * This is sent by the frontend to set the types of toeplitz hash that
+ * the backend should calculate. Note that the 'maximal' type of hash
+ * should always be chosen. For example, if the frontend sets both IPV4
+ * and IPV4_TCP hash types then the latter hash type should be calculated
+ * for any TCP packet and the former only calculated for non-TCP packets.
+ * The data[0] field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values
+ * defined above. The data[1] field is set to 0.
+ *
+ * NOTE: Setting data[0] to 0 disables toeplitz hashing and the backend
+ *       is free to choose how it steers packets to queues (which is the
+ *       default state).
+ *
+ * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY:
+ *
+ * This is sent by the frontend to set the key of toeplitz hash that
+ * the backend should calculate. The toeplitz algorithm is illustrated
+ * by the following pseudo-code:
+ *
+ * (Buffer[] and Key[] are treated as shift-registers where the MSB of
+ * Buffer/Key[0] is considered 'left-most' and the LSB of Buffer/Key[N-1]
+ * is the 'right-most').
+ *
+ * Value = 0
+ * For number of bits in Buffer[]
+ *    If (left-most bit of Buffer[] is 1)
+ *        Value ^= left-most 32 bits of Key[]
+ *    Key[] << 1
+ *    Buffer[] << 1
+ *
+ * The data[0] field is set to the size of key in octets. The data[1]
+ * field is set to a grant reference of a page containing the key. The
+ * reference must remain valid until the corresponding
+ * netif_ctrl_response_t has been processed.
+ *
+ * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING:
+ *
+ * This is sent by the frontend to set the mapping of toeplitz hash to
+ * queue number to be applied by the backend.
+ *
+ * The data[0] field is set to the order of the mapping. The data[1] field
+ * is set to a grant reference of a page containing the mapping. The
+ * reference must remain valid until the corresponding
+ * netif_ctrl_response_t has been processed.
+ *
+ * The format of the mapping is:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    queue[0]                   |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    queue[1]                   |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    queue[2]                   |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |                    queue[3]                   |
+ *                         .
+ *                         .
+ * |                    queue[N-1]                 |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * where each queue value is less than "multi-queue-num-queues" (see above)
+ * and N is 1 << data[0].
+ *
+ * NOTE: Before a specific mapping is set using this request, the backend
+ *       should map all toeplitz hash values to queue 0 (which is the only
+ *       queue guaranteed to exist in all cases).
+ */
+
+/*
+ * Control responses (netif_ctrl_response_t)
+ * =========================================
+ *
+ * All responses have the following format:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |    id     |   pad     |         status        |
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |         data          |
+ * +-----+-----+-----+-----+
+ *
+ * id: the corresponding request identifier
+ * pad: set to 0
+ * status: the status of request processing
+ * data: any data associated with the response (determined by type and
+ *       status)
+ */
+
+struct netif_ctrl_response {
+    uint16_t id;
+    uint16_t pad;
+    uint32_t status;
+
+#define NETIF_CTRL_STATUS_SUCCESS           0
+#define NETIF_CTRL_STATUS_NOT_SUPPORTED     1
+#define NETIF_CTRL_STATUS_INVALID_PARAMETER 2
+#define NETIF_CTRL_STATUS_BUFFER_OVERFLOW   3
+
+    uint32_t data;
+};
+typedef struct netif_ctrl_response netif_ctrl_response_t;
+
+/*
+ * type = <unknown>
+ *
+ * The default response for any unrecognised request has the status field
+ * set to NETIF_CTRL_STATUS_NOT_SUPPORTED and the data field set to 0.
+ *
+ * type = NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS:
+ *
+ * Since the request carries no data there is no reason for processing to
+ * fail, hence the status field is set to NETIF_CTRL_STATUS_SUCCESS and the
+ * data field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values (defined
+ * above) indicating which hash types are supported by the backend.
+ * If no hashing is supported then the data field should be set to 0.
+ *
+ * type = NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS:
+ *
+ * If the data[0] field in the request is invalid (i.e. contains unsupported
+ * hash types) then the status field is set to
+ * NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset should succeed
+ * and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS.
+ * The data field should be set to 0.
+ *
+ * type = NETIF_CTRL_MSG_SET_TOEPLITZ_KEY:
+ *
+ * If the data[0] field in the request is an invalid key length (too big)
+ * then the status field is set to NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
+ * data[1] field is an invalid grant reference then the status field is set
+ * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the request should
+ * succeed and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS.
+ * The data field should be set to 0.
+ *
+ * type = NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING:
+ *
+ * If the data[0] field in the request is an invalid mapping order (too big)
+ * then the status field is set to NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
+ * data[1] field is an invalid grant reference then the status field is set
+ * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset should
+ * succeed and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS.
+ * The data field should be set to 0.
+ */
+
+DEFINE_RING_TYPES(netif_ctrl, struct netif_ctrl_request, struct netif_ctrl_response);
+
+/*
  * Guest transmit
  * ==============
  *
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 3/3] public/io/netif.h: document new extra info for passing hash values
  2016-01-07 13:05 [PATCH v4 0/3] public/io/netif.h: support for toeplitz hashing Paul Durrant
  2016-01-07 13:05 ` [PATCH v4 1/3] public/io/netif.h: document transmit and receive wire formats separately Paul Durrant
  2016-01-07 13:05 ` [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing Paul Durrant
@ 2016-01-07 13:05 ` Paul Durrant
  2016-01-08 16:05   ` Ian Campbell
  2 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-01-07 13:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Tim Deegan, Ian Jackson, Paul Durrant,
	Jan Beulich

To properly support NDIS RSS, the Windows frontend PV driver needs the
toeplitz hash value calculated by the backend (otherwise it would have to
duplicate the calculation).

This patch adds documentation for "feature-hash" and a definition of a
new XEN_NETIF_EXTRA_TYPE_HASH extra info segment which both frontends
and backends can use to pass hash values to the other end.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
---
 xen/include/public/io/netif.h | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/xen/include/public/io/netif.h b/xen/include/public/io/netif.h
index 06e0b61..6c6bc9f 100644
--- a/xen/include/public/io/netif.h
+++ b/xen/include/public/io/netif.h
@@ -151,6 +151,12 @@
  */
 
 /*
+ * "feature-hash" advertises the capability to accept extra info slots of
+ * type XEN_NETIF_EXTRA_TYPE_HASH. They will not be sent by either end
+ * unless the other end advertises this feature.
+ */
+
+/*
  * Control ring
  * ============
  *
@@ -574,6 +580,18 @@ DEFINE_RING_TYPES(netif_ctrl, struct netif_ctrl_request, struct netif_ctrl_respo
  * type: Must be XEN_NETIF_EXTRA_TYPE_MCAST_{ADD,DEL}
  * flags: XEN_NETIF_EXTRA_FLAG_*
  * addr: address to add/remove
+ *
+ * XEN_NETIF_EXTRA_TYPE_HASH:
+ *
+ *    0     1     2     3     4     5     6     7  octet
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ * |type |flags|htype| pad |LSB ---- value ---- MSB|
+ * +-----+-----+-----+-----+-----+-----+-----+-----+
+ *
+ * type: Must be XEN_NETIF_EXTRA_TYPE_HASH
+ * flags: XEN_NETIF_EXTRA_FLAG_*
+ * htype: XEN_NETIF_HASH_TYPE_*
+ * value: Hash value
  */
 
 /* Protocol checksum field is blank in the packet (hardware offload)? */
@@ -607,7 +625,8 @@ typedef struct netif_tx_request netif_tx_request_t;
 #define XEN_NETIF_EXTRA_TYPE_GSO       (1)  /* u.gso */
 #define XEN_NETIF_EXTRA_TYPE_MCAST_ADD (2)  /* u.mcast */
 #define XEN_NETIF_EXTRA_TYPE_MCAST_DEL (3)  /* u.mcast */
-#define XEN_NETIF_EXTRA_TYPE_MAX       (4)
+#define XEN_NETIF_EXTRA_TYPE_HASH      (4)  /* u.hash */
+#define XEN_NETIF_EXTRA_TYPE_MAX       (5)
 
 /* netif_extra_info_t flags. */
 #define _XEN_NETIF_EXTRA_FLAG_MORE (0)
@@ -619,6 +638,16 @@ typedef struct netif_tx_request netif_tx_request_t;
 #define XEN_NETIF_GSO_TYPE_TCPV6        (2)
 
 /*
+ * Hash types. (See NETIF_CTRL_TOEPLITZ_FLAG_* definitions above
+ * for more information).
+ */
+#define XEN_NETIF_HASH_TYPE_NONE        0
+#define XEN_NETIF_HASH_TYPE_IPV4        1
+#define XEN_NETIF_HASH_TYPE_IPV4_TCP    2
+#define XEN_NETIF_HASH_TYPE_IPV6        3
+#define XEN_NETIF_HASH_TYPE_IPV6_TCP    4
+
+/*
  * This structure needs to fit within both netif_tx_request_t and
  * netif_rx_response_t for compatibility.
  */
@@ -635,6 +664,11 @@ struct netif_extra_info {
         struct {
             uint8_t addr[6];
         } mcast;
+        struct {
+            uint8_t type;
+            uint8_t pad;
+            uint8_t value[4];
+        } hash;
         uint16_t pad[3];
     } u;
 };
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/3] public/io/netif.h: document transmit and receive wire formats separately
  2016-01-07 13:05 ` [PATCH v4 1/3] public/io/netif.h: document transmit and receive wire formats separately Paul Durrant
@ 2016-01-08 15:24   ` Ian Campbell
  2016-01-08 15:56     ` Paul Durrant
  0 siblings, 1 reply; 19+ messages in thread
From: Ian Campbell @ 2016-01-08 15:24 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Keir Fraser, Ian Jackson, Jan Beulich, Tim Deegan

On Thu, 2016-01-07 at 13:05 +0000, Paul Durrant wrote:
> Currently there is no documented wire format for guest receive-side
> packets but the location of the 'wire format' comment block suggests
> it is the same as transmit-side. This is almost true but there is a
> subtle difference in the use of the 'size' field for the first fragment.
> 
> For clarity this patch creates separate comment blocks for receive
> and transmit side packet wire formats, tries to be more clear about the
> distinction between 'fragments' and 'extras', and documents the subtlety
> concerning the size field of the first fragment.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Keir Fraser <keir@xen.org>
> Cc: Tim Deegan <tim@xen.org>
> ---
>  xen/include/public/io/netif.h | 39 ++++++++++++++++++++++++++-----------
> --
>  1 file changed, 26 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/include/public/io/netif.h
> b/xen/include/public/io/netif.h
> index e103cf3..1790ea0 100644
> --- a/xen/include/public/io/netif.h
> +++ b/xen/include/public/io/netif.h
> @@ -151,22 +151,22 @@
>   */
>  
>  /*
> - * This is the 'wire' format for packets:
> - *  Request 1: netif_tx_request_t -- NETTXF_* (any flags)
> - * [Request 2: netif_extra_info_t] (only if request 1 has
> - *                                  NETTXF_extra_info)
> - * [Request 3: netif_extra_info_t] (only if request 2 has
> - *                                  XEN_NETIF_EXTRA_MORE)
> - *  Request 4: netif_tx_request_t -- NETTXF_more_data
> - *  Request 5: netif_tx_request_t -- NETTXF_more_data
> - *  ...
> - *  Request N: netif_tx_request_t -- 0
> - */
> -
> -/*
>   * Guest transmit
>   * ==============
>   *
> + * This is the 'wire' format for packets:
> + *  Fragment 1: netif_tx_request_t  - flags = NETTXF_*
> + *                                    size = total packet size
> + * [Extra 1: netif_extra_info_t]    - (only if fragment 1 flags include
> + *                                     NETTXF_extra_info)
> + * [Extra N: netif_extra_info_t]    - (only if extra N-1 flags include
> + *                                     XEN_NETIF_EXTRA_MORE)
> + *  ...
> + *  Fragment N: netif_tx_request_t  - (only if fragment N-1 flags include
> + *                                     NETTXF_more_data)

For Fragment 2 is it the Flags of Fragment N-1 = 1 which are relevant, or
the flags in the optional Extra N which may be in the middle (i.e. the
immediately preceding slot)?

Am I correct in remembering that in the presence of NETTXF_more_data the
only way to know the actual size of Fragment 1 is to take away the total of
all the extras from Frag 1's size?

> + *                                    flags = 0
> + *                                    size = fragment size
> + *
>   * Ring slot size is 12 octets, however not all request/response
>   * structs use the full size.
>   *
> @@ -202,6 +202,19 @@
>   * Guest receive
>   * =============
>   *
> + * This is the 'wire' format for packets:
> + *  Fragment 1: netif_rx_request_t  - flags = NETRXF_*
> + *                                    size = fragment size
> + * [Extra 1: netif_extra_info_t]    - (only if fragment 1 flags include
> + *                                     NETRXF_extra_info)
> + * [Extra N: netif_extra_info_t]    - (only if extra N-1 flags include
> + *                                     XEN_NETIF_EXTRA_MORE)
> + *  ...
> + *  Fragment N: netif_rx_request_t  - (only if fragment N-1 flags include
> + *                                     NETRXF_more_data)
> + *                                    flags = 0
> + *                                    size = fragment size

Same Q re which NETRXF_more_data is relevant.

In this path there is no indication of the total packet size other than
adding everything up?

Given that they differ in a subtle way would a quick but explicit "NOTE: RX
and TX differ" be a useful addition do you think?

> + *
>   * Ring slot size is 8 octets.
>   *
>   * rx request (netif_rx_request_t)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing
  2016-01-07 13:05 ` [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing Paul Durrant
@ 2016-01-08 15:53   ` Ian Campbell
  2016-01-08 16:19     ` Paul Durrant
  2016-01-08 16:07   ` David Vrabel
  1 sibling, 1 reply; 19+ messages in thread
From: Ian Campbell @ 2016-01-08 15:53 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Keir Fraser, Ian Jackson, Jan Beulich, Tim Deegan

On Thu, 2016-01-07 at 13:05 +0000, Paul Durrant wrote:
> 
> diff --git a/xen/include/public/io/netif.h
> b/xen/include/public/io/netif.h
> index 1790ea0..06e0b61 100644
> --- a/xen/include/public/io/netif.h
> +++ b/xen/include/public/io/netif.h
> @@ -151,6 +151,270 @@
>   */
>  
>  /*
> + * Control ring
> + * ============
> + *
> + * Some features, such as toeplitz hashing (detailed below), require a
> + * significant amount of out-of-band data to be passed from frontend to
> + * backend. Use of xenstore is not suitable for large quantities of data
> + * because of quota limitations and so a dedicated 'control ring' is used.
> + * The ability of the backend to use a control ring is advertised by
> + * setting:
> + *
> + * /local/domain/X/backend///feature-control-ring = "1"
> + *
> + * The frontend provides a control ring to the backend by setting:
> + *
> + * /local/domain//device/vif//ctrl-ring-ref = 
> + * /local/domain//device/vif//event-channel-ctrl = 
> + *
> + * where  is the grant reference of the shared page used to
> + * implement the control ring and  is an event channel to be used
> + * as a mailbox interrupt, before the frontend moves into the connected
> + * state.

I read this as saying that the mailbox interrupt can be used until the
frontend moves into the connected state, which I realise isn't what you
meant but having so much stuff between "by setting" and "before" makes it
easy to read wrongly.

How about "... as a mailbox interrupt. These keys must be written before
moving into the connected state." ?

> + * The control ring uses a fixed request/response message size and is
> + * balanced (i.e. one request to one response), so operationally it is much
> + * the same as a tramsmit or receive ring.

"transmit".

Is it intended to support out of order responses?

> + */
> +
> +/*
> + * Toeplitz hash types
> + * ===================
> + *
> + * For the purposes of the definitions below, 'Packet[]' is an array of
> + * octets containing an IP packet without options, 'Array[X..Y]' means a
> + * sub-array of 'Array' containing bytes X thru Y inclusive, and '+' is
> + * used to indicate concatenation of arrays.
> + */
> +
> +/*
> + * A hash calculated over an IP version 4 header as follows:
> + *
> + * Buffer[0..8] = Packet[12..15] + Packet[16..19]

Is there a reason to write it like this rather than:
+ * Buffer[0..8] = Packet[12..19]
?

Maybe something which is obvious if you know about Toeplitz in more than
just a passing fashion?


> +/*
> + * Control requests (netif_ctrl_request_t)
> + * =======================================
> + *
> + * All requests have the following format:
> + *
> + *    0     1     2     3     4     5     6     7  octet
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * |    id     |   type    |         data[0]       |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * |         data[1]       |
> + * +-----+-----+-----+-----+

These are packed in the ring, i.e. there are no implicit padding bytes,
right?

I know the Toeplitz stuff doesn't need it, but we should consider designing
in a scheme to handle control commands which need >8 octets of data, or
else we risk ending up with something as confusing as the data path...

> + *
> + * id: the request identifier, echoed in response.
> + * type: the type of request (see below)
> + * data[]: any data associated with the request (determined by type)
> + */
> +
> +struct netif_ctrl_request {
> +    uint16_t id;
> +    uint16_t type;
> +
> +#define NETIF_CTRL_TYPE_INVALID              0
> +#define NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS   1
> +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS   2
> +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY     3
> +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING 4
> +
> +    uint32_t data[2];
> +};
> +typedef struct netif_ctrl_request netif_ctrl_request_t;
> +
> +/*
> + * type = NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS:
> + *
> + * This is sent by the frontend to query the types of toeplitz
> + * hash supported by the backend. No data is required and to the

"and to the" has an extra "to" in it I think?


> + * data[] field is set to 0.

"fields are" ?

> + *
> + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS:
> + *
> + * This is sent by the frontend to set the types of toeplitz hash that
> + * the backend should calculate. Note that the 'maximal' type of hash
> + * should always be chosen. For example, if the frontend sets both IPV4
> + * and IPV4_TCP hash types then the latter hash type should be calculated
> + * for any TCP packet and the former only calculated for non-TCP packets.
> + * The data[0] field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values
> + * defined above. The data[1] field is set to 0.
> + *
> + * NOTE: Setting data[0] to 0 disables toeplitz hashing and the backend
> + *       is free to choose how it steers packets to queues (which is the
> + *       default state).
> + *
> + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY:

Can this be sent before FLAGS?

Is it permissible to send this for a hash scheme not included in flags?

> + *
> + * This is sent by the frontend to set the key of toeplitz hash that
> + * the backend should calculate. The toeplitz algorithm is illustrated
> + * by the following pseudo-code:
> + *
> + * (Buffer[] and Key[] are treated as shift-registers where the MSB of
> + * Buffer/Key[0] is considered 'left-most' and the LSB of Buffer/Key[N-
> 1]
> + * is the 'right-most').
> + *
> + * Value = 0
> + * For number of bits in Buffer[]
> + *    If (left-most bit of Buffer[] is 1)
> + *        Value ^= left-most 32 bits of Key[]
> + *    Key[] << 1
> + *    Buffer[] << 1
> + *
> + * The data[0] field is set to the size of key in octets. The data[1]
> + * field is set to a grant reference of a page containing the key.

I suppose it is implicit that the key starts at offset 0 in the page?

>  The
> + * reference must remain valid until the corresponding
> + * netif_ctrl_response_t has been processed.

May the ref be r/o? If so, should that be encouraged?

> + *
> + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING:
> + *
> + * This is sent by the frontend to set the mapping of toeplitz hash to
> + * queue number to be applied by the backend.
> + *
> + * The data[0] field is set to the order of the mapping. The data[1] field
> + * is set to a grant reference of a page containing the mapping. The
> + * reference must remain valid until the corresponding
> + * netif_ctrl_response_t has been processed.
> + *
> + * The format of the mapping is:
> + *
> + *    0     1     2     3     4     5     6     7  octet
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * |                    queue[0]                   |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * |                    queue[1]                   |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * |                    queue[2]                   |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * |                    queue[3]                   |
> + *                         .
> + *                         .
> + * |                    queue[N-1]                 |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + *
> + * where each queue value is less than "multi-queue-num-queues" (see above)
> + * and N is 1 << data[0].
> + *
> + * NOTE: Before a specific mapping is set using this request, the backend
> + *       should map all toeplitz hash values to queue 0 (which is the only
> + *       queue guaranteed to exist in all cases).

i.e. between receiving a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS with a non-zero 
set of flags and NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING it should do this?

If it has not seen a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS, or it has seen one
with flags = 0 then as before the b/e is free to steer as it likes?

Is it valid to send this mapping command first before the flags?


> + */
> +
> +/*
> + * Control responses (netif_ctrl_response_t)
> + * =========================================
> + *
> + * All responses have the following format:
> + *
> + *    0     1     2     3     4     5     6     7  octet
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * |    id     |   pad     |         status        |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * |         data          |
> + * +-----+-----+-----+-----+
> + *
> + * id: the corresponding request identifier
> + * pad: set to 0
> + * status: the status of request processing
> + * data: any data associated with the response (determined by type and
> + *       status)

type needs to be remembered via the id? Why not echo it in the pad field
for convenience?


> + */
> +
> +struct netif_ctrl_response {
> +    uint16_t id;
> +    uint16_t pad;
> +    uint32_t status;
> +
> +#define NETIF_CTRL_STATUS_SUCCESS           0
> +#define NETIF_CTRL_STATUS_NOT_SUPPORTED     1
> +#define NETIF_CTRL_STATUS_INVALID_PARAMETER 2
> +#define NETIF_CTRL_STATUS_BUFFER_OVERFLOW   3

Is the intention that these will be the union of all possible failure
modes, rather than making provision for type-specific failures?

These are all pretty generic though, and I suppose we could always figure
that out when some command has a very obscure failure case.

> +
> +    uint32_t data;
> +};
> +typedef struct netif_ctrl_response netif_ctrl_response_t;
> +
> +/*
> + * type = 
> + *
> + * The default response for any unrecognised request has the status field
> + * set to NETIF_CTRL_STATUS_NOT_SUPPORTED and the data field set to 0.
> + *
> + * type = NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS:
> + *
> + * Since the request carries no data there is no reason for processing to
> + * fail, hence the status field is set to NETIF_CTRL_STATUS_SUCCESS and the
> + * data field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values (defined
> + * above) indicating which hash types are supported by the backend.
> + * If no hashing is supported then the data field should be set to 0.

I wonder if it would be clearer to define the generic request and response
structures up front followed by the specific commands and their
request+response formats at the same time, rather than splitting the
specific command requests and responses into different places?

Like:

Command: NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS

Description: Blah blah

Request data format: Words words 
Valid response status codes:
  - One
  - Another
Responds data format: Words

(i.e. more like API than protocol documentation in the form of independent
looking things going back and forth)

> + *
> + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS:
> + *
> + * If the data[0] field in the request is invalid (i.e. contains unsupported
> + * hash types) then the status field is set to
> + * NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset should 

"request"

> succeed
> + * and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS.
> + * The data field should be set to 0.
> + *
> + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_KEY:
> + *
> + * If the data[0] field in the request is an invalid key length (too big)

Can it not be too small as well? What is the behaviour if it is?

> + * then the status field is set to NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> + * data[1] field is an invalid grant reference then the status field is set
> + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the request should
> + * succeed and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS.
> + * The data field should be set to 0.
> + *
> + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING:
> + *
> + * If the data[0] field in the request is an invalid mapping order (too big)

or too small?

> + * then the status field is set to NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> + * data[1] field is an invalid grant reference then the status field is set
> + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset should

"request"

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/3] public/io/netif.h: document transmit and receive wire formats separately
  2016-01-08 15:24   ` Ian Campbell
@ 2016-01-08 15:56     ` Paul Durrant
  2016-01-08 16:10       ` Ian Campbell
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-01-08 15:56 UTC (permalink / raw)
  To: Ian Campbell, xen-devel
  Cc: Ian Jackson, Keir (Xen.org), Jan Beulich, Tim (Xen.org)

> -----Original Message-----
[snip]
> >   * Guest transmit
> >   * ==============
> >   *
> > + * This is the 'wire' format for packets:
> > + *  Fragment 1: netif_tx_request_t  - flags = NETTXF_*
> > + *                                    size = total packet size
> > + * [Extra 1: netif_extra_info_t]    - (only if fragment 1 flags include
> > + *                                     NETTXF_extra_info)
> > + * [Extra N: netif_extra_info_t]    - (only if extra N-1 flags include
> > + *                                     XEN_NETIF_EXTRA_MORE)
> > + *  ...
> > + *  Fragment N: netif_tx_request_t  - (only if fragment N-1 flags include
> > + *                                     NETTXF_more_data)
> 
> For Fragment 2 is it the Flags of Fragment N-1 = 1 which are relevant, or
> the flags in the optional Extra N which may be in the middle (i.e. the
> immediately preceding slot)?

It's Fragment N-1's flags. The flags on the Extras are not relevant. In fact the only Extra flag defined is the one that says there's another Extra :-)

> 
> Am I correct in remembering that in the presence of NETTXF_more_data the
> only way to know the actual size of Fragment 1 is to take away the total of
> all the extras from Frag 1's size?

That's right.

> 
> > + *                                    flags = 0
> > + *                                    size = fragment size
> > + *
> >   * Ring slot size is 12 octets, however not all request/response
> >   * structs use the full size.
> >   *
> > @@ -202,6 +202,19 @@
> >   * Guest receive
> >   * =============
> >   *
> > + * This is the 'wire' format for packets:
> > + *  Fragment 1: netif_rx_request_t  - flags = NETRXF_*
> > + *                                    size = fragment size
> > + * [Extra 1: netif_extra_info_t]    - (only if fragment 1 flags include
> > + *                                     NETRXF_extra_info)
> > + * [Extra N: netif_extra_info_t]    - (only if extra N-1 flags include
> > + *                                     XEN_NETIF_EXTRA_MORE)
> > + *  ...
> > + *  Fragment N: netif_rx_request_t  - (only if fragment N-1 flags include
> > + *                                     NETRXF_more_data)
> > + *                                    flags = 0
> > + *                                    size = fragment size
> 
> Same Q re which NETRXF_more_data is relevant.
> 

Same answer :-)

> In this path there is no indication of the total packet size other than
> adding everything up?
> 

Correct. 

> Given that they differ in a subtle way would a quick but explicit "NOTE: RX
> and TX differ" be a useful addition do you think?
> 

Yes, that's probably a good plan. I'll stick an extra comment in to that effect.

  Paul

> > + *
> >   * Ring slot size is 8 octets.
> >   *
> >   * rx request (netif_rx_request_t)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/3] public/io/netif.h: document new extra info for passing hash values
  2016-01-07 13:05 ` [PATCH v4 3/3] public/io/netif.h: document new extra info for passing hash values Paul Durrant
@ 2016-01-08 16:05   ` Ian Campbell
  2016-01-08 16:26     ` Paul Durrant
  0 siblings, 1 reply; 19+ messages in thread
From: Ian Campbell @ 2016-01-08 16:05 UTC (permalink / raw)
  To: Paul Durrant, xen-devel; +Cc: Keir Fraser, Ian Jackson, Jan Beulich, Tim Deegan

On Thu, 2016-01-07 at 13:05 +0000, Paul Durrant wrote:
> 
>  /*
> + * Hash types. (See NETIF_CTRL_TOEPLITZ_FLAG_* definitions above
> + * for more information).
> + */
> +#define XEN_NETIF_HASH_TYPE_NONE        0
> +#define XEN_NETIF_HASH_TYPE_IPV4        1
> +#define XEN_NETIF_HASH_TYPE_IPV4_TCP    2
> +#define XEN_NETIF_HASH_TYPE_IPV6        3
> +#define XEN_NETIF_HASH_TYPE_IPV6_TCP    4

Should these be TYPE_TOEPLITZ_FOO? I suppose there are other possible TCPv4
hashes for example.

Perhaps a comment along the lines "XEN_NETIF_HASH_TYPE_TOEPLITZ_*
corresponds precisely to the bit positions of NETIF_CTRL_TOEPLITZ_FLAG_*
values", or even defining one in terms of the other?

The control side (previous patch) has a toeplitz specific mechanism, with
individual hash bits with it, whereas this is presented more as a generic
hash mechanism with specific bits corresponding to toeplitz hashes.
i.e. NETIF_CTRL_TOEPLITZ* vs XEN_NETIF_EXTRA_TYPE_HASH. Seems like a
wrinkle, but I'm not sure if its an important one.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing
  2016-01-07 13:05 ` [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing Paul Durrant
  2016-01-08 15:53   ` Ian Campbell
@ 2016-01-08 16:07   ` David Vrabel
  2016-01-08 16:21     ` Paul Durrant
  1 sibling, 1 reply; 19+ messages in thread
From: David Vrabel @ 2016-01-08 16:07 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Ian Jackson, Tim Deegan, Keir Fraser, Ian Campbell, Jan Beulich

On 07/01/16 13:05, Paul Durrant wrote:
> This patch documents a new shared ring between frontend and backend that
> can be used to pass bulk out-of-band data, such as that required to
> implement toeplitz hashing in the backend such that it is configurable by
> the frontend.
> 
> The patch then goes on to document the messages passed over the control
> ring that can be used to configure toeplitz hashing.
[...]
> --- a/xen/include/public/io/netif.h
> +++ b/xen/include/public/io/netif.h
> @@ -151,6 +151,270 @@
>   */
>  
>  /*
> + * Control ring
> + * ============
[...]
> + * The frontend provides a control ring to the backend by setting:
> + *
> + * /local/domain/<domid>/device/vif/<vif>/ctrl-ring-ref = <gref>
> + * /local/domain/<domid>/device/vif/<vif>/event-channel-ctrl = <port>

ctrl-event-channel to match ctrl-ring-ref?

David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 1/3] public/io/netif.h: document transmit and receive wire formats separately
  2016-01-08 15:56     ` Paul Durrant
@ 2016-01-08 16:10       ` Ian Campbell
  0 siblings, 0 replies; 19+ messages in thread
From: Ian Campbell @ 2016-01-08 16:10 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Ian Jackson, Keir (Xen.org), Jan Beulich, Tim (Xen.org)

On Fri, 2016-01-08 at 15:56 +0000, Paul Durrant wrote:
> > -----Original Message-----
> [snip]
> > >   * Guest transmit
> > >   * ==============
> > >   *
> > > + * This is the 'wire' format for packets:
> > > + *  Fragment 1: netif_tx_request_t  - flags = NETTXF_*
> > > + *                                    size = total packet size
> > > + * [Extra 1: netif_extra_info_t]    - (only if fragment 1 flags
> > > include
> > > + *                                     NETTXF_extra_info)
> > > + * [Extra N: netif_extra_info_t]    - (only if extra N-1 flags
> > > include
> > > + *                                     XEN_NETIF_EXTRA_MORE)
> > > + *  ...
> > > + *  Fragment N: netif_tx_request_t  - (only if fragment N-1 flags
> > > include
> > > + *                                     NETTXF_more_data)
> > 
> > For Fragment 2 is it the Flags of Fragment N-1 = 1 which are relevant,
> > or
> > the flags in the optional Extra N which may be in the middle (i.e. the
> > immediately preceding slot)?
> 
> It's Fragment N-1's flags. The flags on the Extras are not relevant.

I think it would be worth saying that explicitly, even though your text is
strictly correct someone might not quite follow.

>  In fact the only Extra flag defined is the one that says there's another
> Extra :-)

As opposed to extra types of which there are several.

> 
> > 
> > Am I correct in remembering that in the presence of NETTXF_more_data the
> > only way to know the actual size of Fragment 1 is to take away the total of
> > all the extras from Frag 1's size?
> 
> That's right.

Can you add a note please?

> > Same Q re which NETRXF_more_data is relevant.
> > 
> 
> Same answer :-)

Same request ;-)

> > In this path there is no indication of the total packet size other than
> > adding everything up?
> > 
> 
> Correct. 

Again perhaps worth a note.

> > Given that they differ in a subtle way would a quick but explicit
> > "NOTE: RX
> > and TX differ" be a useful addition do you think?
> > 
> 
> Yes, that's probably a good plan. I'll stick an extra comment in to that
> effect.

Ta.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing
  2016-01-08 15:53   ` Ian Campbell
@ 2016-01-08 16:19     ` Paul Durrant
  2016-01-08 16:46       ` Ian Campbell
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-01-08 16:19 UTC (permalink / raw)
  To: Ian Campbell, xen-devel
  Cc: Ian Jackson, Keir (Xen.org), Jan Beulich, Tim (Xen.org)

> -----Original Message-----
> From: Ian Campbell [mailto:ian.campbell@citrix.com]
> Sent: 08 January 2016 15:54
> To: Paul Durrant; xen-devel@lists.xenproject.org
> Cc: Ian Jackson; Jan Beulich; Keir (Xen.org); Tim (Xen.org)
> Subject: Re: [PATCH v4 2/3] public/io/netif.h: document control ring and
> toeplitz hashing
> 
> On Thu, 2016-01-07 at 13:05 +0000, Paul Durrant wrote:
> >
> > diff --git a/xen/include/public/io/netif.h
> > b/xen/include/public/io/netif.h
> > index 1790ea0..06e0b61 100644
> > --- a/xen/include/public/io/netif.h
> > +++ b/xen/include/public/io/netif.h
> > @@ -151,6 +151,270 @@
> >   */
> >
> >  /*
> > + * Control ring
> > + * ============
> > + *
> > + * Some features, such as toeplitz hashing (detailed below), require a
> > + * significant amount of out-of-band data to be passed from frontend to
> > + * backend. Use of xenstore is not suitable for large quantities of data
> > + * because of quota limitations and so a dedicated 'control ring' is used.
> > + * The ability of the backend to use a control ring is advertised by
> > + * setting:
> > + *
> > + * /local/domain/X/backend///feature-control-ring = "1"
> > + *
> > + * The frontend provides a control ring to the backend by setting:
> > + *
> > + * /local/domain//device/vif//ctrl-ring-ref =
> > + * /local/domain//device/vif//event-channel-ctrl =
> > + *
> > + * where  is the grant reference of the shared page used to
> > + * implement the control ring and  is an event channel to be used
> > + * as a mailbox interrupt, before the frontend moves into the connected
> > + * state.
> 
> I read this as saying that the mailbox interrupt can be used until the
> frontend moves into the connected state, which I realise isn't what you
> meant but having so much stuff between "by setting" and "before" makes it
> easy to read wrongly.
> 
> How about "... as a mailbox interrupt. These keys must be written before
> moving into the connected state." ?

Yes, that's clearer.

> 
> > + * The control ring uses a fixed request/response message size and is
> > + * balanced (i.e. one request to one response), so operationally it is much
> > + * the same as a tramsmit or receive ring.
> 
> "transmit".
> 
> Is it intended to support out of order responses?

I didn't want to preclude it in case someone wants to add a potentially long running control operation in future.

> 
> > + */
> > +
> > +/*
> > + * Toeplitz hash types
> > + * ===================
> > + *
> > + * For the purposes of the definitions below, 'Packet[]' is an array of
> > + * octets containing an IP packet without options, 'Array[X..Y]' means a
> > + * sub-array of 'Array' containing bytes X thru Y inclusive, and '+' is
> > + * used to indicate concatenation of arrays.
> > + */
> > +
> > +/*
> > + * A hash calculated over an IP version 4 header as follows:
> > + *
> > + * Buffer[0..8] = Packet[12..15] + Packet[16..19]
> 
> Is there a reason to write it like this rather than:
> + * Buffer[0..8] = Packet[12..19]
> ?
> 
> Maybe something which is obvious if you know about Toeplitz in more than
> just a passing fashion?

The reason is actually so it more resembles the Microsoft documentation at https://msdn.microsoft.com/en-us/library/windows/hardware/ff570725%28v=vs.85%29.aspx. Perhaps it would make more sense if I pointed out that Packet[12..15] is the source address and Packet[16..19] is the dest address (and do the same in the subsequent comments)?

> 
> 
> > +/*
> > + * Control requests (netif_ctrl_request_t)
> > + * =======================================
> > + *
> > + * All requests have the following format:
> > + *
> > + *    0     1     2     3     4     5     6     7  octet
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |    id     |   type    |         data[0]       |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |         data[1]       |
> > + * +-----+-----+-----+-----+
> 
> These are packed in the ring, i.e. there are no implicit padding bytes,
> right?

DEFINE_RING_TYPES unions the req and rsp structures and then defines an array but since req and rsp are the same size in this case the reqs should end up packed.

> 
> I know the Toeplitz stuff doesn't need it, but we should consider designing
> in a scheme to handle control commands which need >8 octets of data, or
> else we risk ending up with something as confusing as the data path...
> 

I'm ok with that, but you can always use a gref (as is done for the key and mapping table). If 8 seems to small, what do you feel would be the right number to choose?

> > + *
> > + * id: the request identifier, echoed in response.
> > + * type: the type of request (see below)
> > + * data[]: any data associated with the request (determined by type)
> > + */
> > +
> > +struct netif_ctrl_request {
> > +    uint16_t id;
> > +    uint16_t type;
> > +
> > +#define NETIF_CTRL_TYPE_INVALID              0
> > +#define NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS   1
> > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS   2
> > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY     3
> > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING 4
> > +
> > +    uint32_t data[2];
> > +};
> > +typedef struct netif_ctrl_request netif_ctrl_request_t;
> > +
> > +/*
> > + * type = NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS:
> > + *
> > + * This is sent by the frontend to query the types of toeplitz
> > + * hash supported by the backend. No data is required and to the
> 
> "and to the" has an extra "to" in it I think?
> 
> 
> > + * data[] field is set to 0.
> 
> "fields are" ?
> 
> > + *
> > + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS:
> > + *
> > + * This is sent by the frontend to set the types of toeplitz hash that
> > + * the backend should calculate. Note that the 'maximal' type of hash
> > + * should always be chosen. For example, if the frontend sets both IPV4
> > + * and IPV4_TCP hash types then the latter hash type should be calculated
> > + * for any TCP packet and the former only calculated for non-TCP packets.
> > + * The data[0] field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_*
> values
> > + * defined above. The data[1] field is set to 0.
> > + *
> > + * NOTE: Setting data[0] to 0 disables toeplitz hashing and the backend
> > + *       is free to choose how it steers packets to queues (which is the
> > + *       default state).
> > + *
> > + * type = NETIF_CTRL_TYPE_SET_`TOEPLITZ_KEY:
> 
> Can this be sent before FLAGS?
> 
> Is it permissible to send this for a hash scheme not included in flags?
> 

Yes, it can but the idea is that nothing has any effect until a valid set of flags is specified.

> > + *
> > + * This is sent by the frontend to set the key of toeplitz hash that
> > + * the backend should calculate. The toeplitz algorithm is illustrated
> > + * by the following pseudo-code:
> > + *
> > + * (Buffer[] and Key[] are treated as shift-registers where the MSB of
> > + * Buffer/Key[0] is considered 'left-most' and the LSB of Buffer/Key[N-
> > 1]
> > + * is the 'right-most').
> > + *
> > + * Value = 0
> > + * For number of bits in Buffer[]
> > + *    If (left-most bit of Buffer[] is 1)
> > + *        Value ^= left-most 32 bits of Key[]
> > + *    Key[] << 1
> > + *    Buffer[] << 1
> > + *
> > + * The data[0] field is set to the size of key in octets. The data[1]
> > + * field is set to a grant reference of a page containing the key.
> 
> I suppose it is implicit that the key starts at offset 0 in the page?
> 

Yes. I'll call that out.

> >  The
> > + * reference must remain valid until the corresponding
> > + * netif_ctrl_response_t has been processed.
> 
> May the ref be r/o? If so, should that be encouraged?
> 

Yes, it can and should be r/o.

> > + *
> > + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING:
> > + *
> > + * This is sent by the frontend to set the mapping of toeplitz hash to
> > + * queue number to be applied by the backend.
> > + *
> > + * The data[0] field is set to the order of the mapping. The data[1] field
> > + * is set to a grant reference of a page containing the mapping. The
> > + * reference must remain valid until the corresponding
> > + * netif_ctrl_response_t has been processed.
> > + *
> > + * The format of the mapping is:
> > + *
> > + *    0     1     2     3     4     5     6     7  octet
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |                    queue[0]                   |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |                    queue[1]                   |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |                    queue[2]                   |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |                    queue[3]                   |
> > + *                         .
> > + *                         .
> > + * |                    queue[N-1]                 |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + *
> > + * where each queue value is less than "multi-queue-num-queues" (see
> above)
> > + * and N is 1 << data[0].
> > + *
> > + * NOTE: Before a specific mapping is set using this request, the backend
> > + *       should map all toeplitz hash values to queue 0 (which is the only
> > + *       queue guaranteed to exist in all cases).
> 
> i.e. between receiving a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS with a non-
> zero
> set of flags and NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING it should do
> this?
> 

Yes. The default map should be zeroed.

> If it has not seen a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS, or it has seen
> one
> with flags = 0 then as before the b/e is free to steer as it likes?
> 

Yes. No change to default behaviour.

> Is it valid to send this mapping command first before the flags?
> 

As explained above, yes.

> 
> > + */
> > +
> > +/*
> > + * Control responses (netif_ctrl_response_t)
> > + * =========================================
> > + *
> > + * All responses have the following format:
> > + *
> > + *    0     1     2     3     4     5     6     7  octet
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |    id     |   pad     |         status        |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |         data          |
> > + * +-----+-----+-----+-----+
> > + *
> > + * id: the corresponding request identifier
> > + * pad: set to 0
> > + * status: the status of request processing
> > + * data: any data associated with the response (determined by type and
> > + *       status)
> 
> type needs to be remembered via the id? Why not echo it in the pad field
> for convenience?
> 

I originally had that but thought it would be superfluous given that the id is echoed. I can add it back if you think that it would be a good idea. All my implementation did with it though was ASSERT that it matched the req.

> 
> > + */
> > +
> > +struct netif_ctrl_response {
> > +    uint16_t id;
> > +    uint16_t pad;
> > +    uint32_t status;
> > +
> > +#define NETIF_CTRL_STATUS_SUCCESS           0
> > +#define NETIF_CTRL_STATUS_NOT_SUPPORTED     1
> > +#define NETIF_CTRL_STATUS_INVALID_PARAMETER 2
> > +#define NETIF_CTRL_STATUS_BUFFER_OVERFLOW   3
> 
> Is the intention that these will be the union of all possible failure
> modes, rather than making provision for type-specific failures?
> 

Yes, they are just supposed to be errno-like things.

> These are all pretty generic though, and I suppose we could always figure
> that out when some command has a very obscure failure case.
> 

Yes.

> > +
> > +    uint32_t data;
> > +};
> > +typedef struct netif_ctrl_response netif_ctrl_response_t;
> > +
> > +/*
> > + * type =
> > + *
> > + * The default response for any unrecognised request has the status field
> > + * set to NETIF_CTRL_STATUS_NOT_SUPPORTED and the data field set to
> 0.
> > + *
> > + * type = NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS:
> > + *
> > + * Since the request carries no data there is no reason for processing to
> > + * fail, hence the status field is set to NETIF_CTRL_STATUS_SUCCESS and
> the
> > + * data field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values
> (defined
> > + * above) indicating which hash types are supported by the backend.
> > + * If no hashing is supported then the data field should be set to 0.
> 
> I wonder if it would be clearer to define the generic request and response
> structures up front followed by the specific commands and their
> request+response formats at the same time, rather than splitting the
> specific command requests and responses into different places?
> 
> Like:
> 
> Command: NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS
> 
> Description: Blah blah
> 
> Request data format: Words words
> Valid response status codes:
>   - One
>   - Another
> Responds data format: Words
> 
> (i.e. more like API than protocol documentation in the form of independent
> looking things going back and forth)
> 

Yes, that may be easier to read. I'll do that.

> > + *
> > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS:
> > + *
> > + * If the data[0] field in the request is invalid (i.e. contains unsupported
> > + * hash types) then the status field is set to
> > + * NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset
> should
> 
> "request"
> 
> > succeed
> > + * and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS.
> > + * The data field should be set to 0.
> > + *
> > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_KEY:
> > + *
> > + * If the data[0] field in the request is an invalid key length (too big)
> 
> Can it not be too small as well? What is the behaviour if it is?
> 

I guess the way the algorithm is defined, a key of less than 32 bits doesn't make much sense.

> > + * then the status field is set to
> NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> > + * data[1] field is an invalid grant reference then the status field is set
> > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the request
> should
> > + * succeed and hence the status field is set to
> NETIF_CTRL_STATUS_SUCCESS.
> > + * The data field should be set to 0.
> > + *
> > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING:
> > + *
> > + * If the data[0] field in the request is an invalid mapping order (too big)
> 
> or too small?

An order 0 mapping would still contain a single entry and you can't get any smaller than that :-)

> 
> > + * then the status field is set to
> NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> > + * data[1] field is an invalid grant reference then the status field is set
> > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset
> should
> 
> "request"
> 

Thanks for all the typo/spelling corrections. I'll pick those up too.

  Paul

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing
  2016-01-08 16:07   ` David Vrabel
@ 2016-01-08 16:21     ` Paul Durrant
  2016-01-08 16:22       ` David Vrabel
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-01-08 16:21 UTC (permalink / raw)
  To: David Vrabel, xen-devel
  Cc: Tim (Xen.org), Keir (Xen.org), Ian Campbell, Jan Beulich, Ian Jackson

> -----Original Message-----
> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: 08 January 2016 16:07
> To: Paul Durrant; xen-devel@lists.xenproject.org
> Cc: Keir (Xen.org); Ian Campbell; Tim (Xen.org); Ian Jackson; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v4 2/3] public/io/netif.h: document control
> ring and toeplitz hashing
> 
> On 07/01/16 13:05, Paul Durrant wrote:
> > This patch documents a new shared ring between frontend and backend
> that
> > can be used to pass bulk out-of-band data, such as that required to
> > implement toeplitz hashing in the backend such that it is configurable by
> > the frontend.
> >
> > The patch then goes on to document the messages passed over the control
> > ring that can be used to configure toeplitz hashing.
> [...]
> > --- a/xen/include/public/io/netif.h
> > +++ b/xen/include/public/io/netif.h
> > @@ -151,6 +151,270 @@
> >   */
> >
> >  /*
> > + * Control ring
> > + * ============
> [...]
> > + * The frontend provides a control ring to the backend by setting:
> > + *
> > + * /local/domain/<domid>/device/vif/<vif>/ctrl-ring-ref = <gref>
> > + * /local/domain/<domid>/device/vif/<vif>/event-channel-ctrl = <port>
> 
> ctrl-event-channel to match ctrl-ring-ref?
> 

I was following the style of the tx and rx rings where it is 'blah-ring-ref' and 'event-channel-blah'. Do you think I should break precedent?

   Paul

> David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing
  2016-01-08 16:21     ` Paul Durrant
@ 2016-01-08 16:22       ` David Vrabel
  0 siblings, 0 replies; 19+ messages in thread
From: David Vrabel @ 2016-01-08 16:22 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Tim (Xen.org), Keir (Xen.org), Ian Campbell, Jan Beulich, Ian Jackson

On 08/01/16 16:21, Paul Durrant wrote:
>> -----Original Message-----
>> From: David Vrabel [mailto:david.vrabel@citrix.com]
>> Sent: 08 January 2016 16:07
>> To: Paul Durrant; xen-devel@lists.xenproject.org
>> Cc: Keir (Xen.org); Ian Campbell; Tim (Xen.org); Ian Jackson; Jan Beulich
>> Subject: Re: [Xen-devel] [PATCH v4 2/3] public/io/netif.h: document control
>> ring and toeplitz hashing
>>
>> On 07/01/16 13:05, Paul Durrant wrote:
>>> This patch documents a new shared ring between frontend and backend
>> that
>>> can be used to pass bulk out-of-band data, such as that required to
>>> implement toeplitz hashing in the backend such that it is configurable by
>>> the frontend.
>>>
>>> The patch then goes on to document the messages passed over the control
>>> ring that can be used to configure toeplitz hashing.
>> [...]
>>> --- a/xen/include/public/io/netif.h
>>> +++ b/xen/include/public/io/netif.h
>>> @@ -151,6 +151,270 @@
>>>   */
>>>
>>>  /*
>>> + * Control ring
>>> + * ============
>> [...]
>>> + * The frontend provides a control ring to the backend by setting:
>>> + *
>>> + * /local/domain/<domid>/device/vif/<vif>/ctrl-ring-ref = <gref>
>>> + * /local/domain/<domid>/device/vif/<vif>/event-channel-ctrl = <port>
>>
>> ctrl-event-channel to match ctrl-ring-ref?
>>
> 
> I was following the style of the tx and rx rings where it is 'blah-ring-ref' and 'event-channel-blah'. Do you think I should break precedent?

I hadn't noticed that.  Either way is fine by me then.

David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/3] public/io/netif.h: document new extra info for passing hash values
  2016-01-08 16:05   ` Ian Campbell
@ 2016-01-08 16:26     ` Paul Durrant
  2016-01-08 16:48       ` Ian Campbell
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-01-08 16:26 UTC (permalink / raw)
  To: Ian Campbell, xen-devel
  Cc: Ian Jackson, Keir (Xen.org), Jan Beulich, Tim (Xen.org)

> -----Original Message-----
> From: Ian Campbell [mailto:ian.campbell@citrix.com]
> Sent: 08 January 2016 16:06
> To: Paul Durrant; xen-devel@lists.xenproject.org
> Cc: Ian Jackson; Jan Beulich; Keir (Xen.org); Tim (Xen.org)
> Subject: Re: [PATCH v4 3/3] public/io/netif.h: document new extra info for
> passing hash values
> 
> On Thu, 2016-01-07 at 13:05 +0000, Paul Durrant wrote:
> >
> >  /*
> > + * Hash types. (See NETIF_CTRL_TOEPLITZ_FLAG_* definitions above
> > + * for more information).
> > + */
> > +#define XEN_NETIF_HASH_TYPE_NONE        0
> > +#define XEN_NETIF_HASH_TYPE_IPV4        1
> > +#define XEN_NETIF_HASH_TYPE_IPV4_TCP    2
> > +#define XEN_NETIF_HASH_TYPE_IPV6        3
> > +#define XEN_NETIF_HASH_TYPE_IPV6_TCP    4
> 
> Should these be TYPE_TOEPLITZ_FOO? I suppose there are other possible
> TCPv4
> hashes for example.

I didn't consider that possibility of having multiple hashing schemes in operation at the same time, but I guess someone may want such a thing at some point.

> 
> Perhaps a comment along the lines "XEN_NETIF_HASH_TYPE_TOEPLITZ_*
> corresponds precisely to the bit positions of NETIF_CTRL_TOEPLITZ_FLAG_*
> values", or even defining one in terms of the other?

Yes, that sounds reasonable...

> 
> The control side (previous patch) has a toeplitz specific mechanism, with
> individual hash bits with it, whereas this is presented more as a generic
> hash mechanism with specific bits corresponding to toeplitz hashes.
> i.e. NETIF_CTRL_TOEPLITZ* vs XEN_NETIF_EXTRA_TYPE_HASH. Seems like a
> wrinkle, but I'm not sure if its an important one.
> 

...if perhaps I make the extra type XEN_NETIF_EXTRA_TYPE_TOEPLITZ_HASH. That would leave things more open for someone else adding an extra type for another hash and needed a slightly different format/set of flags.

  Paul

> Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing
  2016-01-08 16:19     ` Paul Durrant
@ 2016-01-08 16:46       ` Ian Campbell
  2016-01-08 17:07         ` Paul Durrant
  0 siblings, 1 reply; 19+ messages in thread
From: Ian Campbell @ 2016-01-08 16:46 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Ian Jackson, Keir (Xen.org), Jan Beulich, Tim (Xen.org)

On Fri, 2016-01-08 at 16:19 +0000, Paul Durrant wrote:
> > > + * The control ring uses a fixed request/response message size and
> > > is
> > > + * balanced (i.e. one request to one response), so operationally it
> > > is much
> > > + * the same as a tramsmit or receive ring.
> > 
> > "transmit".
> > 
> > Is it intended to support out of order responses?
> 
> I didn't want to preclude it in case someone wants to add a potentially
> long running control operation in future.

Probably best to explicitly flag this possibility.


> 
> > 
> > > + */
> > > +
> > > +/*
> > > + * Toeplitz hash types
> > > + * ===================
> > > + *
> > > + * For the purposes of the definitions below, 'Packet[]' is an array
> > > of
> > > + * octets containing an IP packet without options, 'Array[X..Y]'
> > > means a
> > > + * sub-array of 'Array' containing bytes X thru Y inclusive, and '+'
> > > is
> > > + * used to indicate concatenation of arrays.
> > > + */
> > > +
> > > +/*
> > > + * A hash calculated over an IP version 4 header as follows:
> > > + *
> > > + * Buffer[0..8] = Packet[12..15] + Packet[16..19]
> > 
> > Is there a reason to write it like this rather than:
> > + * Buffer[0..8] = Packet[12..19]
> > ?
> > 
> > Maybe something which is obvious if you know about Toeplitz in more
> > than
> > just a passing fashion?
> 
> The reason is actually so it more resembles the Microsoft documentation
> at https://msdn.microsoft.com/en-
> us/library/windows/hardware/ff570725%28v=vs.85%29.aspx.

Ah, that's not unreasonable.

>  Perhaps it would make more sense if I pointed out that Packet[12..15] is
> the source address and Packet[16..19] is the dest address (and do the
> same in the subsequent comments)?

Assuming it doesn't negatively impact the readability, sure.

> > I know the Toeplitz stuff doesn't need it, but we should consider
> > designing
> > in a scheme to handle control commands which need >8 octets of data, or
> > else we risk ending up with something as confusing as the data path...
> > 
> 
> I'm ok with that, but you can always use a gref (as is done for the key
> and mapping table). If 8 seems to small, what do you feel would be the
> right number to choose?

I'm happy with indirect via a gref as a scheme if that seems sufficient.

+ *
> > > + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS:
> > > + *
> > > + * This is sent by the frontend to set the types of toeplitz hash
> > > that
> > > + * the backend should calculate. Note that the 'maximal' type of
> > > hash
> > > + * should always be chosen. For example, if the frontend sets both
> > > IPV4
> > > + * and IPV4_TCP hash types then the latter hash type should be
> > > calculated
> > > + * for any TCP packet and the former only calculated for non-TCP
> > > packets.
> > > + * The data[0] field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_*
> > values
> > > + * defined above. The data[1] field is set to 0.
> > > + *
> > > + * NOTE: Setting data[0] to 0 disables toeplitz hashing and the
> > > backend
> > > + *       is free to choose how it steers packets to queues (which is
> > > the
> > > + *       default state).
> > > + *
> > > + * type = NETIF_CTRL_TYPE_SET_`TOEPLITZ_KEY:
> > 
> > Can this be sent before FLAGS?
> > 
> > Is it permissible to send this for a hash scheme not included in flags?
> > 
> 
> Yes, it can but the idea is that nothing has any effect until a valid set
> of flags is specified.

So the b/e is required to remember keys corresponding to disabled hashes
and immediately use them when the hash is enabled, it is not allowed to
ignore this message and expect another key once the hash is enabled.

If you enable a hash without setting a key then what does the key default
to? Traffic could be flowing while this is going on, right (since things
are Connected)?

> > > +/*
> > > + * Control responses (netif_ctrl_response_t)
> > > + * =========================================
> > > + *
> > > + * All responses have the following format:
> > > + *
> > > + *    0     1     2     3     4     5     6     7  octet
> > > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > > + * |    id     |   pad     |         status        |
> > > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > > + * |         data          |
> > > + * +-----+-----+-----+-----+
> > > + *
> > > + * id: the corresponding request identifier
> > > + * pad: set to 0
> > > + * status: the status of request processing
> > > + * data: any data associated with the response (determined by type
> > > and
> > > + *       status)
> > 
> > type needs to be remembered via the id? Why not echo it in the pad
> > field
> > for convenience?
> > 
> 
> I originally had that but thought it would be superfluous given that the
> id is echoed. I can add it back if you think that it would be a good
> idea. All my implementation did with it though was ASSERT that it matched
> the req.

Was it a coincidence that you had the req in hand given the id, or is it a
fundamental property of any implementation of this protocol that you would
have it handy?
> 
> > > + *
> > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS:
> > > + *
> > > + * If the data[0] field in the request is invalid (i.e. contains
> > > unsupported
> > > + * hash types) then the status field is set to
> > > + * NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset
> > should
> > 
> > "request"
> > 
> > > succeed
> > > + * and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS.
> > > + * The data field should be set to 0.
> > > + *
> > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_KEY:
> > > + *
> > > + * If the data[0] field in the request is an invalid key length (too
> > > big)
> > 
> > Can it not be too small as well? What is the behaviour if it is?
> > 
> 
> I guess the way the algorithm is defined, a key of less than 32 bits
> doesn't make much sense.

What if I throw caution to the wind and set one byte anyway?

> 
> > > + * then the status field is set to
> > NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> > > + * data[1] field is an invalid grant reference then the status field
> > > is set
> > > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the request
> > should
> > > + * succeed and hence the status field is set to
> > NETIF_CTRL_STATUS_SUCCESS.
> > > + * The data field should be set to 0.
> > > + *
> > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING:
> > > + *
> > > + * If the data[0] field in the request is an invalid mapping order
> > > (too big)
> > 
> > or too small?
> 
> An order 0 mapping would still contain a single entry and you can't get
> any smaller than that :-)

I missed the order, assumed it was size, but still, what if there are 16
queues and I pass order 0, the behaviour of queues 2..16 needs to be well
defined.

Since this is a single page, what happens with >=512 queues? (and how far
away is that possibility).

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 3/3] public/io/netif.h: document new extra info for passing hash values
  2016-01-08 16:26     ` Paul Durrant
@ 2016-01-08 16:48       ` Ian Campbell
  0 siblings, 0 replies; 19+ messages in thread
From: Ian Campbell @ 2016-01-08 16:48 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Ian Jackson, Keir (Xen.org), Jan Beulich, Tim (Xen.org)

On Fri, 2016-01-08 at 16:26 +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: Ian Campbell [mailto:ian.campbell@citrix.com]
> > Sent: 08 January 2016 16:06
> > To: Paul Durrant; xen-devel@lists.xenproject.org
> > Cc: Ian Jackson; Jan Beulich; Keir (Xen.org); Tim (Xen.org)
> > Subject: Re: [PATCH v4 3/3] public/io/netif.h: document new extra info
> > for
> > passing hash values
> > 
> > On Thu, 2016-01-07 at 13:05 +0000, Paul Durrant wrote:
> > > 
> > >  /*
> > > + * Hash types. (See NETIF_CTRL_TOEPLITZ_FLAG_* definitions above
> > > + * for more information).
> > > + */
> > > +#define XEN_NETIF_HASH_TYPE_NONE        0
> > > +#define XEN_NETIF_HASH_TYPE_IPV4        1
> > > +#define XEN_NETIF_HASH_TYPE_IPV4_TCP    2
> > > +#define XEN_NETIF_HASH_TYPE_IPV6        3
> > > +#define XEN_NETIF_HASH_TYPE_IPV6_TCP    4
> > 
> > Should these be TYPE_TOEPLITZ_FOO? I suppose there are other possible
> > TCPv4
> > hashes for example.
> 
> I didn't consider that possibility of having multiple hashing schemes in
> operation at the same time, but I guess someone may want such a thing at
> some point.

Hrm, yes, I suppose IPV4_TCPa and IPV4_TCPb would be somewhat mutually
exclusive.
[...]
> > 
> > The control side (previous patch) has a toeplitz specific mechanism,
> > with
> > individual hash bits with it, whereas this is presented more as a
> > generic
> > hash mechanism with specific bits corresponding to toeplitz hashes.
> > i.e. NETIF_CTRL_TOEPLITZ* vs XEN_NETIF_EXTRA_TYPE_HASH. Seems like a
> > wrinkle, but I'm not sure if its an important one.
> > 
> 
> ...if perhaps I make the extra type XEN_NETIF_EXTRA_TYPE_TOEPLITZ_HASH.
> That would leave things more open for someone else adding an extra type
> for another hash and needed a slightly different format/set of flags.

I'm happy with that (and think it's probably a good idea despite the
comment above).

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing
  2016-01-08 16:46       ` Ian Campbell
@ 2016-01-08 17:07         ` Paul Durrant
  2016-01-08 17:22           ` Ian Campbell
  0 siblings, 1 reply; 19+ messages in thread
From: Paul Durrant @ 2016-01-08 17:07 UTC (permalink / raw)
  To: Ian Campbell, xen-devel
  Cc: Ian Jackson, Keir (Xen.org), Jan Beulich, Tim (Xen.org)

> -----Original Message-----
> From: Ian Campbell [mailto:ian.campbell@citrix.com]
> Sent: 08 January 2016 16:46
> To: Paul Durrant; xen-devel@lists.xenproject.org
> Cc: Ian Jackson; Jan Beulich; Keir (Xen.org); Tim (Xen.org)
> Subject: Re: [PATCH v4 2/3] public/io/netif.h: document control ring and
> toeplitz hashing
> 
> On Fri, 2016-01-08 at 16:19 +0000, Paul Durrant wrote:
> > > > + * The control ring uses a fixed request/response message size and
> > > > is
> > > > + * balanced (i.e. one request to one response), so operationally it
> > > > is much
> > > > + * the same as a tramsmit or receive ring.
> > >
> > > "transmit".
> > >
> > > Is it intended to support out of order responses?
> >
> > I didn't want to preclude it in case someone wants to add a potentially
> > long running control operation in future.
> 
> Probably best to explicitly flag this possibility.
> 

Sure. Will do.

> 
> >
> > >
> > > > + */
> > > > +
> > > > +/*
> > > > + * Toeplitz hash types
> > > > + * ===================
> > > > + *
> > > > + * For the purposes of the definitions below, 'Packet[]' is an array
> > > > of
> > > > + * octets containing an IP packet without options, 'Array[X..Y]'
> > > > means a
> > > > + * sub-array of 'Array' containing bytes X thru Y inclusive, and '+'
> > > > is
> > > > + * used to indicate concatenation of arrays.
> > > > + */
> > > > +
> > > > +/*
> > > > + * A hash calculated over an IP version 4 header as follows:
> > > > + *
> > > > + * Buffer[0..8] = Packet[12..15] + Packet[16..19]
> > >
> > > Is there a reason to write it like this rather than:
> > > + * Buffer[0..8] = Packet[12..19]
> > > ?
> > >
> > > Maybe something which is obvious if you know about Toeplitz in more
> > > than
> > > just a passing fashion?
> >
> > The reason is actually so it more resembles the Microsoft documentation
> > at https://msdn.microsoft.com/en-
> > us/library/windows/hardware/ff570725%28v=vs.85%29.aspx.
> 
> Ah, that's not unreasonable.
> 
> >  Perhaps it would make more sense if I pointed out that Packet[12..15] is
> > the source address and Packet[16..19] is the dest address (and do the
> > same in the subsequent comments)?
> 
> Assuming it doesn't negatively impact the readability, sure.
> 
> > > I know the Toeplitz stuff doesn't need it, but we should consider
> > > designing
> > > in a scheme to handle control commands which need >8 octets of data, or
> > > else we risk ending up with something as confusing as the data path...
> > >
> >
> > I'm ok with that, but you can always use a gref (as is done for the key
> > and mapping table). If 8 seems to small, what do you feel would be the
> > right number to choose?
> 
> I'm happy with indirect via a gref as a scheme if that seems sufficient.
> 

I think it should be.

> + *
> > > > + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS:
> > > > + *
> > > > + * This is sent by the frontend to set the types of toeplitz hash
> > > > that
> > > > + * the backend should calculate. Note that the 'maximal' type of
> > > > hash
> > > > + * should always be chosen. For example, if the frontend sets both
> > > > IPV4
> > > > + * and IPV4_TCP hash types then the latter hash type should be
> > > > calculated
> > > > + * for any TCP packet and the former only calculated for non-TCP
> > > > packets.
> > > > + * The data[0] field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_*
> > > values
> > > > + * defined above. The data[1] field is set to 0.
> > > > + *
> > > > + * NOTE: Setting data[0] to 0 disables toeplitz hashing and the
> > > > backend
> > > > + *       is free to choose how it steers packets to queues (which is
> > > > the
> > > > + *       default state).
> > > > + *
> > > > + * type = NETIF_CTRL_TYPE_SET_`TOEPLITZ_KEY:
> > >
> > > Can this be sent before FLAGS?
> > >
> > > Is it permissible to send this for a hash scheme not included in flags?
> > >
> >
> > Yes, it can but the idea is that nothing has any effect until a valid set
> > of flags is specified.
> 
> So the b/e is required to remember keys corresponding to disabled hashes
> and immediately use them when the hash is enabled, it is not allowed to
> ignore this message and expect another key once the hash is enabled.
> 

That was the idea. It's actually more straightforward to implement that way. I'll add comments to clarify though.

> If you enable a hash without setting a key then what does the key default
> to? Traffic could be flowing while this is going on, right (since things
> are Connected)?

Good, point. I'll say that a default key of all zeroes (which means all hash values will be 0) should be assumed until one is set.

> 
> > > > +/*
> > > > + * Control responses (netif_ctrl_response_t)
> > > > + * =========================================
> > > > + *
> > > > + * All responses have the following format:
> > > > + *
> > > > + *    0     1     2     3     4     5     6     7  octet
> > > > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > > > + * |    id     |   pad     |         status        |
> > > > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > > > + * |         data          |
> > > > + * +-----+-----+-----+-----+
> > > > + *
> > > > + * id: the corresponding request identifier
> > > > + * pad: set to 0
> > > > + * status: the status of request processing
> > > > + * data: any data associated with the response (determined by type
> > > > and
> > > > + *       status)
> > >
> > > type needs to be remembered via the id? Why not echo it in the pad
> > > field
> > > for convenience?
> > >
> >
> > I originally had that but thought it would be superfluous given that the
> > id is echoed. I can add it back if you think that it would be a good
> > idea. All my implementation did with it though was ASSERT that it matched
> > the req.
> 
> Was it a coincidence that you had the req in hand given the id, or is it a
> fundamental property of any implementation of this protocol that you would
> have it handy?

It's pretty fundamental. You need to get back at the grefs used for the key and the mapping so any implementation is pretty certain to have the req in hand.

> >
> > > > + *
> > > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS:
> > > > + *
> > > > + * If the data[0] field in the request is invalid (i.e. contains
> > > > unsupported
> > > > + * hash types) then the status field is set to
> > > > + * NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the
> requset
> > > should
> > >
> > > "request"
> > >
> > > > succeed
> > > > + * and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS.
> > > > + * The data field should be set to 0.
> > > > + *
> > > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_KEY:
> > > > + *
> > > > + * If the data[0] field in the request is an invalid key length (too
> > > > big)
> > >
> > > Can it not be too small as well? What is the behaviour if it is?
> > >
> >
> > I guess the way the algorithm is defined, a key of less than 32 bits
> > doesn't make much sense.
> 
> What if I throw caution to the wind and set one byte anyway?
> 

At the moment, it goes into the 'left most' (in Microsoft's terminology) byte and any remaining bytes are zero filled. From the spec it's not clear if zero-filling is the right thing to do but since Windows always sets a 40 byte key it's never been an issue.

> >
> > > > + * then the status field is set to
> > > NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> > > > + * data[1] field is an invalid grant reference then the status field
> > > > is set
> > > > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the
> request
> > > should
> > > > + * succeed and hence the status field is set to
> > > NETIF_CTRL_STATUS_SUCCESS.
> > > > + * The data field should be set to 0.
> > > > + *
> > > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING:
> > > > + *
> > > > + * If the data[0] field in the request is an invalid mapping order
> > > > (too big)
> > >
> > > or too small?
> >
> > An order 0 mapping would still contain a single entry and you can't get
> > any smaller than that :-)
> 
> I missed the order, assumed it was size, but still, what if there are 16
> queues and I pass order 0, the behaviour of queues 2..16 needs to be well
> defined.

The hash is simply used as an index modulo mapping size (I'll make that clear somewhere) so a mapping size less than the number of queues you have is ok but clearly suboptimal.

> 
> Since this is a single page, what happens with >=512 queues? (and how far
> away is that possibility).
> 

Good question. An 8 byte queue number is clearly overkill if we can only steer to 512 of them. I could define a limit on the number of queues at 256 and use byte array instead. Realistically, it's going to be a while before we get to 256 cpus in a Windows VM.

  Paul

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing
  2016-01-08 17:07         ` Paul Durrant
@ 2016-01-08 17:22           ` Ian Campbell
  2016-01-08 17:35             ` Paul Durrant
  0 siblings, 1 reply; 19+ messages in thread
From: Ian Campbell @ 2016-01-08 17:22 UTC (permalink / raw)
  To: Paul Durrant, xen-devel
  Cc: Ian Jackson, Keir (Xen.org), Jan Beulich, Tim (Xen.org)

On Fri, 2016-01-08 at 17:07 +0000, Paul Durrant wrote:
> > +/*
> > > > > + * Control responses (netif_ctrl_response_t)
> > > > > + * =========================================
> > > > > + *
> > > > > + * All responses have the following format:
> > > > > + *
> > > > > + *    0     1     2     3     4     5     6     7  octet
> > > > > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > > > > + * |    id     |   pad     |         status        |
> > > > > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > > > > + * |         data          |
> > > > > + * +-----+-----+-----+-----+
> > > > > + *
> > > > > + * id: the corresponding request identifier
> > > > > + * pad: set to 0
> > > > > + * status: the status of request processing
> > > > > + * data: any data associated with the response (determined by
> > > > > type
> > > > > and
> > > > > + *       status)
> > > > 
> > > > type needs to be remembered via the id? Why not echo it in the pad
> > > > field
> > > > for convenience?
> > > > 
> > > 
> > > I originally had that but thought it would be superfluous given that
> > > the
> > > id is echoed. I can add it back if you think that it would be a good
> > > idea. All my implementation did with it though was ASSERT that it
> > > matched
> > > the req.
> > 
> > Was it a coincidence that you had the req in hand given the id, or is
> > it a
> > fundamental property of any implementation of this protocol that you
> > would
> > have it handy?
> 
> It's pretty fundamental. You need to get back at the grefs used for the
> key and the mapping so any implementation is pretty certain to have the
> req in hand.

Not all commands use a gref though, and future ones may not.

Given we are otherwise just wasting those 16 bits we may as well stick the
type into them.

> 
> > > > > + *
> > > > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS:
> > > > > + *
> > > > > + * If the data[0] field in the request is invalid (i.e. contains
> > > > > unsupported
> > > > > + * hash types) then the status field is set to
> > > > > + * NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the
> > requset
> > > > should
> > > > 
> > > > "request"
> > > > 
> > > > > succeed
> > > > > + * and hence the status field is set to
> > > > > NETIF_CTRL_STATUS_SUCCESS.
> > > > > + * The data field should be set to 0.
> > > > > + *
> > > > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_KEY:
> > > > > + *
> > > > > + * If the data[0] field in the request is an invalid key length
> > > > > (too
> > > > > big)
> > > > 
> > > > Can it not be too small as well? What is the behaviour if it is?
> > > > 
> > > 
> > > I guess the way the algorithm is defined, a key of less than 32 bits
> > > doesn't make much sense.
> > 
> > What if I throw caution to the wind and set one byte anyway?
> > 
> 
> At the moment, it goes into the 'left most' (in Microsoft's terminology)
> byte and any remaining bytes are zero filled. From the spec it's not
> clear if zero-filling is the right thing to do but since Windows always
> sets a 40 byte key it's never been an issue.

I think we should either clearly define the behaviour, or set a minimum
size.

Does this zero extending apply to any multiple of something other than 32
bits? Like if I give 9 octets does it effectively get padded to 12 with
zeroes?

> 
> > > 
> > > > > + * then the status field is set to
> > > > NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> > > > > + * data[1] field is an invalid grant reference then the status
> > > > > field
> > > > > is set
> > > > > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the
> > request
> > > > should
> > > > > + * succeed and hence the status field is set to
> > > > NETIF_CTRL_STATUS_SUCCESS.
> > > > > + * The data field should be set to 0.
> > > > > + *
> > > > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING:
> > > > > + *
> > > > > + * If the data[0] field in the request is an invalid mapping
> > > > > order
> > > > > (too big)
> > > > 
> > > > or too small?
> > > 
> > > An order 0 mapping would still contain a single entry and you can't
> > > get
> > > any smaller than that :-)
> > 
> > I missed the order, assumed it was size, but still, what if there are
> > 16
> > queues and I pass order 0, the behaviour of queues 2..16 needs to be
> > well
> > defined.
> 
> The hash is simply used as an index modulo mapping size (I'll make that
> clear somewhere)

thanks.

>  so a mapping size less than the number of queues you have is ok but
> clearly suboptimal.
> 
> > 
> > Since this is a single page, what happens with >=512 queues? (and how
> > far
> > away is that possibility).
> > 
> 
> Good question. An 8 byte queue number

8 bytes is the hash match isn't it and Q number is the offset into the
page, isn't it? Or have I misunderstood how something fits together?

> is clearly overkill if we can only steer to 512 of them. I could define a
> limit on the number of queues at 256 and use byte array instead.
> Realistically, it's going to be a while before we get to 256 cpus in a
> Windows VM.

Them's brave words ;-)

We should maybe consider that a f/e OS other than Windows might, for some
reason, implement Toeplitz and have a higher probability of supporting
large #s of vcpus sooner?

Split data[0] into (total) size and offset (to set blocks of values)?

>   Paul
> 
> > Ian.
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing
  2016-01-08 17:22           ` Ian Campbell
@ 2016-01-08 17:35             ` Paul Durrant
  0 siblings, 0 replies; 19+ messages in thread
From: Paul Durrant @ 2016-01-08 17:35 UTC (permalink / raw)
  To: Ian Campbell, xen-devel
  Cc: Ian Jackson, Keir (Xen.org), Jan Beulich, Tim (Xen.org)

> -----Original Message-----
> From: Ian Campbell [mailto:ian.campbell@citrix.com]
> Sent: 08 January 2016 17:22
> To: Paul Durrant; xen-devel@lists.xenproject.org
> Cc: Ian Jackson; Jan Beulich; Keir (Xen.org); Tim (Xen.org)
> Subject: Re: [PATCH v4 2/3] public/io/netif.h: document control ring and
> toeplitz hashing
> 
> On Fri, 2016-01-08 at 17:07 +0000, Paul Durrant wrote:
> > > +/*
> > > > > > + * Control responses (netif_ctrl_response_t)
> > > > > > + * =========================================
> > > > > > + *
> > > > > > + * All responses have the following format:
> > > > > > + *
> > > > > > + *    0     1     2     3     4     5     6     7  octet
> > > > > > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > > > > > + * |    id     |   pad     |         status        |
> > > > > > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > > > > > + * |         data          |
> > > > > > + * +-----+-----+-----+-----+
> > > > > > + *
> > > > > > + * id: the corresponding request identifier
> > > > > > + * pad: set to 0
> > > > > > + * status: the status of request processing
> > > > > > + * data: any data associated with the response (determined by
> > > > > > type
> > > > > > and
> > > > > > + *       status)
> > > > >
> > > > > type needs to be remembered via the id? Why not echo it in the pad
> > > > > field
> > > > > for convenience?
> > > > >
> > > >
> > > > I originally had that but thought it would be superfluous given that
> > > > the
> > > > id is echoed. I can add it back if you think that it would be a good
> > > > idea. All my implementation did with it though was ASSERT that it
> > > > matched
> > > > the req.
> > >
> > > Was it a coincidence that you had the req in hand given the id, or is
> > > it a
> > > fundamental property of any implementation of this protocol that you
> > > would
> > > have it handy?
> >
> > It's pretty fundamental. You need to get back at the grefs used for the
> > key and the mapping so any implementation is pretty certain to have the
> > req in hand.
> 
> Not all commands use a gref though, and future ones may not.
> 
> Given we are otherwise just wasting those 16 bits we may as well stick the
> type into them.
> 

Ok. Will revert to doing that.

> >
> > > > > > + *
> > > > > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS:
> > > > > > + *
> > > > > > + * If the data[0] field in the request is invalid (i.e. contains
> > > > > > unsupported
> > > > > > + * hash types) then the status field is set to
> > > > > > + * NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the
> > > requset
> > > > > should
> > > > >
> > > > > "request"
> > > > >
> > > > > > succeed
> > > > > > + * and hence the status field is set to
> > > > > > NETIF_CTRL_STATUS_SUCCESS.
> > > > > > + * The data field should be set to 0.
> > > > > > + *
> > > > > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_KEY:
> > > > > > + *
> > > > > > + * If the data[0] field in the request is an invalid key length
> > > > > > (too
> > > > > > big)
> > > > >
> > > > > Can it not be too small as well? What is the behaviour if it is?
> > > > >
> > > >
> > > > I guess the way the algorithm is defined, a key of less than 32 bits
> > > > doesn't make much sense.
> > >
> > > What if I throw caution to the wind and set one byte anyway?
> > >
> >
> > At the moment, it goes into the 'left most' (in Microsoft's terminology)
> > byte and any remaining bytes are zero filled. From the spec it's not
> > clear if zero-filling is the right thing to do but since Windows always
> > sets a 40 byte key it's never been an issue.
> 
> I think we should either clearly define the behaviour, or set a minimum
> size.

Ok. I may as well set the minimum size at 40 then.

> 
> Does this zero extending apply to any multiple of something other than 32
> bits? Like if I give 9 octets does it effectively get padded to 12 with
> zeroes?
> 

No, the key is essentially treated as a shift register so it's just implemented as a byte array.

> >
> > > >
> > > > > > + * then the status field is set to
> > > > > NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> > > > > > + * data[1] field is an invalid grant reference then the status
> > > > > > field
> > > > > > is set
> > > > > > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the
> > > request
> > > > > should
> > > > > > + * succeed and hence the status field is set to
> > > > > NETIF_CTRL_STATUS_SUCCESS.
> > > > > > + * The data field should be set to 0.
> > > > > > + *
> > > > > > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING:
> > > > > > + *
> > > > > > + * If the data[0] field in the request is an invalid mapping
> > > > > > order
> > > > > > (too big)
> > > > >
> > > > > or too small?
> > > >
> > > > An order 0 mapping would still contain a single entry and you can't
> > > > get
> > > > any smaller than that :-)
> > >
> > > I missed the order, assumed it was size, but still, what if there are
> > > 16
> > > queues and I pass order 0, the behaviour of queues 2..16 needs to be
> > > well
> > > defined.
> >
> > The hash is simply used as an index modulo mapping size (I'll make that
> > clear somewhere)
> 
> thanks.
> 
> >  so a mapping size less than the number of queues you have is ok but
> > clearly suboptimal.
> >
> > >
> > > Since this is a single page, what happens with >=512 queues? (and how
> > > far
> > > away is that possibility).
> > >
> >
> > Good question. An 8 byte queue number
> 
> 8 bytes is the hash match isn't it and Q number is the offset into the
> page, isn't it? Or have I misunderstood how something fits together?

No. Other way round :-) The hash is treated as an index into the table, modulo table size and the values in the table are the actual queue numbers.

> 
> > is clearly overkill if we can only steer to 512 of them. I could define a
> > limit on the number of queues at 256 and use byte array instead.
> > Realistically, it's going to be a while before we get to 256 cpus in a
> > Windows VM.
> 
> Them's brave words ;-)
> 

Getting to 32 has been hard enough. Breaking through the 64 cpu limit is going to be 'interesting' too, I suspect. (A lot of core Windows stuff uses native word size bit masks for cpu affinity so you have to start using 'processor groups' and it all gets a bit funky).

> We should maybe consider that a f/e OS other than Windows might, for
> some
> reason, implement Toeplitz and have a higher probability of supporting
> large #s of vcpus sooner?
> 
> Split data[0] into (total) size and offset (to set blocks of values)?

Ok. If I go for a 16-bit queue number, that's probably enough, and means that a 4k page can hold 2k mappings. Then I'll do as you suggest and use a size, offset pair in place of the order field.

  Paul

> 
> >   Paul
> >
> > > Ian.
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-01-08 17:38 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-07 13:05 [PATCH v4 0/3] public/io/netif.h: support for toeplitz hashing Paul Durrant
2016-01-07 13:05 ` [PATCH v4 1/3] public/io/netif.h: document transmit and receive wire formats separately Paul Durrant
2016-01-08 15:24   ` Ian Campbell
2016-01-08 15:56     ` Paul Durrant
2016-01-08 16:10       ` Ian Campbell
2016-01-07 13:05 ` [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing Paul Durrant
2016-01-08 15:53   ` Ian Campbell
2016-01-08 16:19     ` Paul Durrant
2016-01-08 16:46       ` Ian Campbell
2016-01-08 17:07         ` Paul Durrant
2016-01-08 17:22           ` Ian Campbell
2016-01-08 17:35             ` Paul Durrant
2016-01-08 16:07   ` David Vrabel
2016-01-08 16:21     ` Paul Durrant
2016-01-08 16:22       ` David Vrabel
2016-01-07 13:05 ` [PATCH v4 3/3] public/io/netif.h: document new extra info for passing hash values Paul Durrant
2016-01-08 16:05   ` Ian Campbell
2016-01-08 16:26     ` Paul Durrant
2016-01-08 16:48       ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.