All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Durrant <Paul.Durrant@citrix.com>
To: Ian Campbell <Ian.Campbell@citrix.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: Ian Jackson <Ian.Jackson@citrix.com>,
	"Keir (Xen.org)" <keir@xen.org>, Jan Beulich <jbeulich@suse.com>,
	"Tim (Xen.org)" <tim@xen.org>
Subject: Re: [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing
Date: Fri, 8 Jan 2016 16:19:40 +0000	[thread overview]
Message-ID: <8738d051636f4c7f85181322699e1084@AMSPEX02CL03.citrite.net> (raw)
In-Reply-To: <1452268426.26438.26.camel@citrix.com>

> -----Original Message-----
> From: Ian Campbell [mailto:ian.campbell@citrix.com]
> Sent: 08 January 2016 15:54
> To: Paul Durrant; xen-devel@lists.xenproject.org
> Cc: Ian Jackson; Jan Beulich; Keir (Xen.org); Tim (Xen.org)
> Subject: Re: [PATCH v4 2/3] public/io/netif.h: document control ring and
> toeplitz hashing
> 
> On Thu, 2016-01-07 at 13:05 +0000, Paul Durrant wrote:
> >
> > diff --git a/xen/include/public/io/netif.h
> > b/xen/include/public/io/netif.h
> > index 1790ea0..06e0b61 100644
> > --- a/xen/include/public/io/netif.h
> > +++ b/xen/include/public/io/netif.h
> > @@ -151,6 +151,270 @@
> >   */
> >
> >  /*
> > + * Control ring
> > + * ============
> > + *
> > + * Some features, such as toeplitz hashing (detailed below), require a
> > + * significant amount of out-of-band data to be passed from frontend to
> > + * backend. Use of xenstore is not suitable for large quantities of data
> > + * because of quota limitations and so a dedicated 'control ring' is used.
> > + * The ability of the backend to use a control ring is advertised by
> > + * setting:
> > + *
> > + * /local/domain/X/backend///feature-control-ring = "1"
> > + *
> > + * The frontend provides a control ring to the backend by setting:
> > + *
> > + * /local/domain//device/vif//ctrl-ring-ref =
> > + * /local/domain//device/vif//event-channel-ctrl =
> > + *
> > + * where  is the grant reference of the shared page used to
> > + * implement the control ring and  is an event channel to be used
> > + * as a mailbox interrupt, before the frontend moves into the connected
> > + * state.
> 
> I read this as saying that the mailbox interrupt can be used until the
> frontend moves into the connected state, which I realise isn't what you
> meant but having so much stuff between "by setting" and "before" makes it
> easy to read wrongly.
> 
> How about "... as a mailbox interrupt. These keys must be written before
> moving into the connected state." ?

Yes, that's clearer.

> 
> > + * The control ring uses a fixed request/response message size and is
> > + * balanced (i.e. one request to one response), so operationally it is much
> > + * the same as a tramsmit or receive ring.
> 
> "transmit".
> 
> Is it intended to support out of order responses?

I didn't want to preclude it in case someone wants to add a potentially long running control operation in future.

> 
> > + */
> > +
> > +/*
> > + * Toeplitz hash types
> > + * ===================
> > + *
> > + * For the purposes of the definitions below, 'Packet[]' is an array of
> > + * octets containing an IP packet without options, 'Array[X..Y]' means a
> > + * sub-array of 'Array' containing bytes X thru Y inclusive, and '+' is
> > + * used to indicate concatenation of arrays.
> > + */
> > +
> > +/*
> > + * A hash calculated over an IP version 4 header as follows:
> > + *
> > + * Buffer[0..8] = Packet[12..15] + Packet[16..19]
> 
> Is there a reason to write it like this rather than:
> + * Buffer[0..8] = Packet[12..19]
> ?
> 
> Maybe something which is obvious if you know about Toeplitz in more than
> just a passing fashion?

The reason is actually so it more resembles the Microsoft documentation at https://msdn.microsoft.com/en-us/library/windows/hardware/ff570725%28v=vs.85%29.aspx. Perhaps it would make more sense if I pointed out that Packet[12..15] is the source address and Packet[16..19] is the dest address (and do the same in the subsequent comments)?

> 
> 
> > +/*
> > + * Control requests (netif_ctrl_request_t)
> > + * =======================================
> > + *
> > + * All requests have the following format:
> > + *
> > + *    0     1     2     3     4     5     6     7  octet
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |    id     |   type    |         data[0]       |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |         data[1]       |
> > + * +-----+-----+-----+-----+
> 
> These are packed in the ring, i.e. there are no implicit padding bytes,
> right?

DEFINE_RING_TYPES unions the req and rsp structures and then defines an array but since req and rsp are the same size in this case the reqs should end up packed.

> 
> I know the Toeplitz stuff doesn't need it, but we should consider designing
> in a scheme to handle control commands which need >8 octets of data, or
> else we risk ending up with something as confusing as the data path...
> 

I'm ok with that, but you can always use a gref (as is done for the key and mapping table). If 8 seems to small, what do you feel would be the right number to choose?

> > + *
> > + * id: the request identifier, echoed in response.
> > + * type: the type of request (see below)
> > + * data[]: any data associated with the request (determined by type)
> > + */
> > +
> > +struct netif_ctrl_request {
> > +    uint16_t id;
> > +    uint16_t type;
> > +
> > +#define NETIF_CTRL_TYPE_INVALID              0
> > +#define NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS   1
> > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS   2
> > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY     3
> > +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING 4
> > +
> > +    uint32_t data[2];
> > +};
> > +typedef struct netif_ctrl_request netif_ctrl_request_t;
> > +
> > +/*
> > + * type = NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS:
> > + *
> > + * This is sent by the frontend to query the types of toeplitz
> > + * hash supported by the backend. No data is required and to the
> 
> "and to the" has an extra "to" in it I think?
> 
> 
> > + * data[] field is set to 0.
> 
> "fields are" ?
> 
> > + *
> > + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS:
> > + *
> > + * This is sent by the frontend to set the types of toeplitz hash that
> > + * the backend should calculate. Note that the 'maximal' type of hash
> > + * should always be chosen. For example, if the frontend sets both IPV4
> > + * and IPV4_TCP hash types then the latter hash type should be calculated
> > + * for any TCP packet and the former only calculated for non-TCP packets.
> > + * The data[0] field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_*
> values
> > + * defined above. The data[1] field is set to 0.
> > + *
> > + * NOTE: Setting data[0] to 0 disables toeplitz hashing and the backend
> > + *       is free to choose how it steers packets to queues (which is the
> > + *       default state).
> > + *
> > + * type = NETIF_CTRL_TYPE_SET_`TOEPLITZ_KEY:
> 
> Can this be sent before FLAGS?
> 
> Is it permissible to send this for a hash scheme not included in flags?
> 

Yes, it can but the idea is that nothing has any effect until a valid set of flags is specified.

> > + *
> > + * This is sent by the frontend to set the key of toeplitz hash that
> > + * the backend should calculate. The toeplitz algorithm is illustrated
> > + * by the following pseudo-code:
> > + *
> > + * (Buffer[] and Key[] are treated as shift-registers where the MSB of
> > + * Buffer/Key[0] is considered 'left-most' and the LSB of Buffer/Key[N-
> > 1]
> > + * is the 'right-most').
> > + *
> > + * Value = 0
> > + * For number of bits in Buffer[]
> > + *    If (left-most bit of Buffer[] is 1)
> > + *        Value ^= left-most 32 bits of Key[]
> > + *    Key[] << 1
> > + *    Buffer[] << 1
> > + *
> > + * The data[0] field is set to the size of key in octets. The data[1]
> > + * field is set to a grant reference of a page containing the key.
> 
> I suppose it is implicit that the key starts at offset 0 in the page?
> 

Yes. I'll call that out.

> >  The
> > + * reference must remain valid until the corresponding
> > + * netif_ctrl_response_t has been processed.
> 
> May the ref be r/o? If so, should that be encouraged?
> 

Yes, it can and should be r/o.

> > + *
> > + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING:
> > + *
> > + * This is sent by the frontend to set the mapping of toeplitz hash to
> > + * queue number to be applied by the backend.
> > + *
> > + * The data[0] field is set to the order of the mapping. The data[1] field
> > + * is set to a grant reference of a page containing the mapping. The
> > + * reference must remain valid until the corresponding
> > + * netif_ctrl_response_t has been processed.
> > + *
> > + * The format of the mapping is:
> > + *
> > + *    0     1     2     3     4     5     6     7  octet
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |                    queue[0]                   |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |                    queue[1]                   |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |                    queue[2]                   |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |                    queue[3]                   |
> > + *                         .
> > + *                         .
> > + * |                    queue[N-1]                 |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + *
> > + * where each queue value is less than "multi-queue-num-queues" (see
> above)
> > + * and N is 1 << data[0].
> > + *
> > + * NOTE: Before a specific mapping is set using this request, the backend
> > + *       should map all toeplitz hash values to queue 0 (which is the only
> > + *       queue guaranteed to exist in all cases).
> 
> i.e. between receiving a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS with a non-
> zero
> set of flags and NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING it should do
> this?
> 

Yes. The default map should be zeroed.

> If it has not seen a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS, or it has seen
> one
> with flags = 0 then as before the b/e is free to steer as it likes?
> 

Yes. No change to default behaviour.

> Is it valid to send this mapping command first before the flags?
> 

As explained above, yes.

> 
> > + */
> > +
> > +/*
> > + * Control responses (netif_ctrl_response_t)
> > + * =========================================
> > + *
> > + * All responses have the following format:
> > + *
> > + *    0     1     2     3     4     5     6     7  octet
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |    id     |   pad     |         status        |
> > + * +-----+-----+-----+-----+-----+-----+-----+-----+
> > + * |         data          |
> > + * +-----+-----+-----+-----+
> > + *
> > + * id: the corresponding request identifier
> > + * pad: set to 0
> > + * status: the status of request processing
> > + * data: any data associated with the response (determined by type and
> > + *       status)
> 
> type needs to be remembered via the id? Why not echo it in the pad field
> for convenience?
> 

I originally had that but thought it would be superfluous given that the id is echoed. I can add it back if you think that it would be a good idea. All my implementation did with it though was ASSERT that it matched the req.

> 
> > + */
> > +
> > +struct netif_ctrl_response {
> > +    uint16_t id;
> > +    uint16_t pad;
> > +    uint32_t status;
> > +
> > +#define NETIF_CTRL_STATUS_SUCCESS           0
> > +#define NETIF_CTRL_STATUS_NOT_SUPPORTED     1
> > +#define NETIF_CTRL_STATUS_INVALID_PARAMETER 2
> > +#define NETIF_CTRL_STATUS_BUFFER_OVERFLOW   3
> 
> Is the intention that these will be the union of all possible failure
> modes, rather than making provision for type-specific failures?
> 

Yes, they are just supposed to be errno-like things.

> These are all pretty generic though, and I suppose we could always figure
> that out when some command has a very obscure failure case.
> 

Yes.

> > +
> > +    uint32_t data;
> > +};
> > +typedef struct netif_ctrl_response netif_ctrl_response_t;
> > +
> > +/*
> > + * type =
> > + *
> > + * The default response for any unrecognised request has the status field
> > + * set to NETIF_CTRL_STATUS_NOT_SUPPORTED and the data field set to
> 0.
> > + *
> > + * type = NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS:
> > + *
> > + * Since the request carries no data there is no reason for processing to
> > + * fail, hence the status field is set to NETIF_CTRL_STATUS_SUCCESS and
> the
> > + * data field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values
> (defined
> > + * above) indicating which hash types are supported by the backend.
> > + * If no hashing is supported then the data field should be set to 0.
> 
> I wonder if it would be clearer to define the generic request and response
> structures up front followed by the specific commands and their
> request+response formats at the same time, rather than splitting the
> specific command requests and responses into different places?
> 
> Like:
> 
> Command: NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS
> 
> Description: Blah blah
> 
> Request data format: Words words
> Valid response status codes:
>   - One
>   - Another
> Responds data format: Words
> 
> (i.e. more like API than protocol documentation in the form of independent
> looking things going back and forth)
> 

Yes, that may be easier to read. I'll do that.

> > + *
> > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS:
> > + *
> > + * If the data[0] field in the request is invalid (i.e. contains unsupported
> > + * hash types) then the status field is set to
> > + * NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset
> should
> 
> "request"
> 
> > succeed
> > + * and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS.
> > + * The data field should be set to 0.
> > + *
> > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_KEY:
> > + *
> > + * If the data[0] field in the request is an invalid key length (too big)
> 
> Can it not be too small as well? What is the behaviour if it is?
> 

I guess the way the algorithm is defined, a key of less than 32 bits doesn't make much sense.

> > + * then the status field is set to
> NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> > + * data[1] field is an invalid grant reference then the status field is set
> > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the request
> should
> > + * succeed and hence the status field is set to
> NETIF_CTRL_STATUS_SUCCESS.
> > + * The data field should be set to 0.
> > + *
> > + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING:
> > + *
> > + * If the data[0] field in the request is an invalid mapping order (too big)
> 
> or too small?

An order 0 mapping would still contain a single entry and you can't get any smaller than that :-)

> 
> > + * then the status field is set to
> NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> > + * data[1] field is an invalid grant reference then the status field is set
> > + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset
> should
> 
> "request"
> 

Thanks for all the typo/spelling corrections. I'll pick those up too.

  Paul

> Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2016-01-08 16:19 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-07 13:05 [PATCH v4 0/3] public/io/netif.h: support for toeplitz hashing Paul Durrant
2016-01-07 13:05 ` [PATCH v4 1/3] public/io/netif.h: document transmit and receive wire formats separately Paul Durrant
2016-01-08 15:24   ` Ian Campbell
2016-01-08 15:56     ` Paul Durrant
2016-01-08 16:10       ` Ian Campbell
2016-01-07 13:05 ` [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing Paul Durrant
2016-01-08 15:53   ` Ian Campbell
2016-01-08 16:19     ` Paul Durrant [this message]
2016-01-08 16:46       ` Ian Campbell
2016-01-08 17:07         ` Paul Durrant
2016-01-08 17:22           ` Ian Campbell
2016-01-08 17:35             ` Paul Durrant
2016-01-08 16:07   ` David Vrabel
2016-01-08 16:21     ` Paul Durrant
2016-01-08 16:22       ` David Vrabel
2016-01-07 13:05 ` [PATCH v4 3/3] public/io/netif.h: document new extra info for passing hash values Paul Durrant
2016-01-08 16:05   ` Ian Campbell
2016-01-08 16:26     ` Paul Durrant
2016-01-08 16:48       ` Ian Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8738d051636f4c7f85181322699e1084@AMSPEX02CL03.citrite.net \
    --to=paul.durrant@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=keir@xen.org \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.