All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: zhenwei pi <pizhenwei@bytedance.com>, parav@nvidia.com
Cc: mst@redhat.com, jasowang@redhat.com,
	virtio-comment@lists.oasis-open.org, houp@yusur.tech,
	helei.sig11@bytedance.com, xinhao.kong@duke.edu
Subject: [virtio-comment] Re: Re: Re: Re: [PATCH v2 06/11] transport-fabrics: introduce command set
Date: Thu, 8 Jun 2023 12:41:23 -0400	[thread overview]
Message-ID: <20230608164123.GB2240319@fedora> (raw)
In-Reply-To: <86978fc3-c065-5a64-2996-28b4eb2b40bd@bytedance.com>

[-- Attachment #1: Type: text/plain, Size: 12585 bytes --]

On Wed, Jun 07, 2023 at 10:58:45AM +0800, zhenwei pi wrote:
> On 6/6/23 21:34, Stefan Hajnoczi wrote:
> > On Tue, Jun 06, 2023 at 09:31:27AM +0800, zhenwei pi wrote:
> > > On 6/6/23 00:30, Stefan Hajnoczi wrote:
> > > > On Fri, Jun 02, 2023 at 01:15:00PM +0800, zhenwei pi wrote:
> > > > > 
> > > > > 
> > > > > On 6/1/23 01:10, Stefan Hajnoczi wrote:
> > > > > > On Thu, May 04, 2023 at 04:19:05PM +0800, zhenwei pi wrote:
> > > > > > > Introduce command structures for Virtio-oF.
> > > > > > > 
> > > > > > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> > > > > > > ---
> > > > > > >     transport-fabrics.tex | 209 ++++++++++++++++++++++++++++++++++++++++++
> > > > > > >     1 file changed, 209 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/transport-fabrics.tex b/transport-fabrics.tex
> > > > > > > index 7711321..37f57c6 100644
> > > > > > > --- a/transport-fabrics.tex
> > > > > > > +++ b/transport-fabrics.tex
> > > > > > > @@ -495,3 +495,212 @@ \subsubsection{Buffer Mapping Definition}\label{sec:Virtio Transport Options / V
> > > > > > >                         |value |  -> 8193 (value.u32)
> > > > > > >                         +------+
> > > > > > >     \end{lstlisting}
> > > > > > > +
> > > > > > > +\subsubsection{Commands Definition}\label{sec:Virtio Transport Options / Virtio Over Fabrics / Transmission Protocol / Commands Definition}
> > > > > > > +This section defines command structures for Virtio Over Fabrics.
> > > > > > > +
> > > > > > > +A common structure virtio_of_value is fixed to 8 bytes and MUST be used as one
> > > > > > > +of the following format:
> > > > > > > +
> > > > > > > +\begin{itemize}
> > > > > > > +\item u8
> > > > > > > +\item le16
> > > > > > > +\item le32
> > > > > > > +\item le64
> > > > > > > +\end{itemize}
> > > > > > 
> > > > > > The way it's written does not document where the u8, u16, u32 bytes are
> > > > > > located and that the unused bytes are 0. I think I understand what you
> > > > > > mean though:
> > > > > > 
> > > > > >      le64 value = cpu_to_le64((u64)v); /* v is u8, u16, u32, or u64 */
> > > > > > 
> > > > > > Please clarify.
> > > > > > 
> > > > > 
> > > > > I want to describe an union structure of 8 bytes:
> > > > > union virtio_of_value {
> > > > >       u8;
> > > > >       u16;
> > > > >       u32;
> > > > >       u64;
> > > > > };
> > > > > 
> > > > > Depending on the opcode, use the right one.
> > > > 
> > > > I was trying to point out that the memory layout of C unions is not
> > > > portable. Your example does not define the exact in-memory layout of
> > > > union virtio_of_value. Here is the first web search result I found about
> > > > this topic:
> > > > 
> > > >     "Q: And a related question: if you dump unions in binary form to a file,
> > > >     and then reload them from the file on a different platform, or with a
> > > >     program compiled by a different compiler, are you guaranteed to get
> > > >     back what you stored? (I think not, but I'm not sure)
> > > > 
> > > >     A: You're right; you're not."
> > > > 
> > > >     https://bytes.com/topic/c/answers/220372-unions-storage-abis
> > > > 
> > > > In the cpu_to_le64() code example that I gave, the exact in-memory
> > > > layout is well-defined. There is no ambiguity.
> > > > 
> > > 
> > > OK, thanks.
> > > 
> > > > > > > +\hline
> > > > > > > +0xff00 - 0xfffd & Reserved \\
> > > > > > > +\hline
> > > > > > > +\end{tabular}
> > > > > > > +
> > > > > > > +\paragraph{Connect Command}\label{sec:Virtio Transport Options / Virtio Over Fabrics / Transmission Protocol / Commands Definition / Connect Command}
> > > > > > > +The Connect Command is used to establish Virtio Over Fabrics queue. The control
> > > > > > > +queue MUST be established firstly, then the Connect command establishes an
> > > > > > > +association between the initiator and the target.
> > > > > > 
> > > > > > Is a "Virtio Over Fabrics queue" different from a virtqueue?
> > > > > > 
> > > > > > If I understand correctly, the control queue must be established by the
> > > > > > initiator first and then the Connect command is sent to begin
> > > > > > communication between the initiator and the target?
> > > > > > 
> > > > > 
> > > > > The queue mapping is missing in the '[PATCH v2 01/11] transport-fabrics:
> > > > > introduce Virtio Over Fabrics overview', like:
> > > > > A "Virtio Over Fabrics queue" is a reliable connection between initiator and
> > > > > target. There are 2 types of Virtio Over Fabrics queue:
> > > > > +\begin{itemize}
> > > > > +\item A single Control queue is required to execute control operations.
> > > > > +\item 0 or more Virtio Over Fabrics queues map the virtqueues.
> > > > > +\end{itemize}
> > > > 
> > > > That helps, thanks!
> > > > 
> > > > > 
> > > > > > > +
> > > > > > > +The Target ID of 0xffff is reserved, then:
> > > > > > 
> > > > > > Please move this after the fields have been shown and the purpose of the
> > > > > > Target ID field has been explained.
> > > > > > 
> > > > > > > +\begin{itemize}
> > > > > > > +\item The Target ID of 0xffff MUST be specified as the Target ID in a Connect
> > > > > > > +Command for the control queue.
> > > > > > > +\item The target SHOULD allocate any available Target ID to the initiator,
> > > > > > > +and return the allocated Target ID in the Completion.
> > > > > > > +\item The returned Target ID MUST be specified as the Target ID, and the Queue ID
> > > > > > > +MUST be specified in a Connect Command for the virtqueue.
> > > > > > > +\end{itemize}
> > > > > > 
> > > > > > What is the purpose of the Target ID? Is it to allow a server to provide
> > > > > > access to multiple targets over the same connection?
> > > > > > 
> > > > > 
> > > > > A target listens on a port, and provides access to 0 or more targets. An
> > > > > initiator connect the specific target by TVQN of connect command.
> > > > > An initiator could connect a single target, multiple initiators could
> > > > > connect the same target(typically, shared disk/fs).
> > > > 
> > > > Why is the target ID separate from the TVQN? If the Target ID is a
> > > > separate parameter then users will have to learn additional
> > > > syntax/command-line options to specify the TVQN + Target ID and that
> > > > syntax may vary between software.
> > > > 
> > > > > 
> > > > > > > +
> > > > > > > +The Connect Command has following structure:
> > > > > > > +
> > > > > > > +\begin{lstlisting}
> > > > > > > +struct virtio_of_command_connect {
> > > > > > > +        le16 opcode;
> > > > > > > +        le16 command_id;
> > > > > > > +        le16 target_id;
> > > > > > > +        le16 queue_id;
> > > > > > > +        le16 ndesc;
> > > > > > 
> > > > > > Where is this field documented?
> > > > > > 
> > > > > 
> > > > > OK. Will add.
> > > > > 
> > > > > > Why does the initiator send ndesc to the target? Normally a VIRTIO Transport reports the device's max descriptors and then the driver can tell the device to reduce the number of descriptors, if desired.
> > > > > > 
> > > > > 
> > > > > A target supports at lease 1 descriptor. The 'ndesc' of struct
> > > > > virtio_of_command_connect indicates the full PDU contains: struct
> > > > > virtio_of_command_connect + 1 * virtio_of_vq_desc + data.
> > > > > 
> > > > > > > +#define VIRTIO_OF_CONNECTION_TCP     1
> > > > > > > +#define VIRTIO_OF_CONNECTION_RDMA    2
> > > > > > 
> > > > > > What does RDMA mean? I thought RDMA is a general concept that several
> > > > > > fabrics implement (with different details like how addressing works).
> > > > > > 
> > > > > 
> > > > > I guest your concern is the difference of IB/RoCE/iWarp ...
> > > > > We are trying to define the payload protocol here, so I think we can ignore
> > > > > the difference of the HCA.
> > > > 
> > > > I see, maybe this could be called STREAM vs KEYED instead of TCP vs RDMA?
> > > > 
> > > 
> > > I'd like to define two PDU mapping rules(in '[PATCH v2 04/11]
> > > transport-fabrics: introduce Stream Transmission' and '[PATCH v2 05/11]
> > > transport-fabrics: introduce Keyed Transmission'): STREAM and KEYED. A
> > > transport protocols need to use one.
> > > 
> > > Then we can define protocols:
> > > #define VIRTIO_OF_CONNECTION_TCP     1		-> use STREAM
> > > #define VIRTIO_OF_CONNECTION_RDMA    2		-> use KEYED
> > > #define VIRTIO_OF_CONNECTION_TLS     3(in the future)	-> use STREAM
> > > #define VIRTIO_OF_CONNECTION_XXX
> > 
> > It's not clear to me whether TCP actually means TCP/IP or if it actually
> > means STREAM. For example, if I run Virtio Over Fabrics over AF_VSOCK,
> > would it use VIRTIO_OF_CONNECTION_TCP although there is no TCP/IP? If
> > so, then I think the name TCP is misleading and STREAM would be clearer.
> > 
> 
> What about dropping 'oftype' field from this command? When the command is
> allowed to issue, the reliable connection is already established, at this
> point, we have enough information about the connection type.
> 
> Instead, we define the multiple transports in the following section, like:
> \subsection{Transport Binding}\label{sec:Virtio Transport Options / Virtio
> Over Fabrics / Transport Binding}
> \subsubsection{TCP/IP}\label{sec:Virtio Transport Options / Virtio Over
> Fabrics / Transport Binding / TCP_IP}
> TCP/IP supports both IPv4 and IPv6, it uses \ref{sec:Virtio Transport
> Options / Virtio Over Fabrics / Transmission Protocol / Commands Definition
> / Stream Transmission}
> ~\nameref{sec:Virtio Transport Options / Virtio Over Fabrics / Transmission
> Protocol / Commands Definition / Stream Transmission} ...
> 
> \subsubsection{TLS-TCP/IP}\label{sec:Virtio Transport Options / Virtio Over
> Fabrics / Transport Binding / TLS-TCP_IP}
> TLS-TCP/IP supports both IPv4 and IPv6 ...
> 
> \subsubsection{RDMA}\label{sec:Virtio Transport Options / Virtio Over
> Fabrics / Transport Binding / RDMA}
> RDMA MUST use \ref{sec:Virtio Transport Options / Virtio Over Fabrics /
> Transmission Protocol / Commands Definition / Keyed Transmission}
> ~\nameref{sec:Virtio Transport Options / Virtio Over Fabrics / Transmission
> Protocol / Commands Definition / Keyed Transmission} ...
> 
> [\subsubsection{TCP/VSOCK}\label{sec:Virtio Transport Options / Virtio Over
> Fabrics / ransport Binding / TCP_VSOCK} ...]

Sounds good. Thanks!

> 
> > > 
> > > > > 
> > > > > > > +        u8 oftype;
> > > > > > > +        u8 padding[5];
> > > > > > > +};
> > > > > > > +\end{lstlisting}
> > > > > > > +
> > > > > > > +The Connect commands MUST contains one Segment Descriptor and one structure
> > > > > > > +virtio_of_command_connect to specify Initiator VQN and Target VNQ,
> > > > > > > +virtio_of_command_connect has following structure:
> > > > > > 
> > > > > > I'm confsued. virtio_of_command_connect was defined above. The struct
> > > > > > defined below is virtio_of_connect. Does this paragraph need to be
> > > > > > updated (virtio_of_command_connect -> virtio_of_connect)?
> > > > > > 
> > > > > > Why is virtio_of_connect a separate struct and not part of
> > > > > > virtio_of_command_connect?
> > > > > > 
> > > > > 
> > > > > Because I'd like to define all the commands with a fixed length.
> > > > 
> > > > I don't understand. virtio_of_connect and virtio_of_command_connect are
> > > > both fixed-length. Why can't they be unified into 1 fixed-length struct?
> > > > 
> > > 
> > > For stream protocol, it always work fine.
> > > For keyed protocol, for example RDMA, the target side needs to use
> > > ibv_post_recv to receive a large size(sizeof virtio_of_command_connect +
> > > sizeof virtio_of_connect). If the target uses ibv_post_recv to receive
> > > sizeof(CMD) + sizeof(DESC) * 1, the initiator fails in RDMA SEND.
> > 
> > I read that "A RC connection is very similar to a TCP connection" in the
> > NVIDIA documentation
> > (https://docs.nvidia.com/networking/display/RDMAAwareProgrammingv17/Transport+Modes)
> > and expected SOCK_STREAM semantics for RDMA SEND.
> > 
> > Are you saying ibv_post_send() fails when the receiver's work request
> > sg_list size is smaller (fewer bytes) than the sender's?
> > 
> 
> Yes, it will fail.
> The receiver get a CQE with status 'IBV_WC_LOC_LEN_ERR', see
> https://www.rdmamojo.com/2013/02/15/ibv_poll_cq/

Parav: Can you confirm that this is expected?

This makes it hard to inline payloads as I was suggesting before :(.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2023-06-08 16:41 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-04  8:18 [virtio-comment] [PATCH v2 00/11] Introduce Virtio Over Fabrics zhenwei pi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 01/11] transport-fabrics: introduce Virtio Over Fabrics overview zhenwei pi
2023-05-04  8:57   ` David Hildenbrand
2023-05-04  9:46     ` zhenwei pi
2023-05-04 10:05       ` Michael S. Tsirkin
2023-05-04 10:12         ` David Hildenbrand
2023-05-04 10:50         ` Re: " zhenwei pi
2023-05-31 14:00   ` [virtio-comment] " Stefan Hajnoczi
2023-06-02  1:17     ` [virtio-comment] " zhenwei pi
2023-06-05  2:39   ` [virtio-comment] " Parav Pandit
2023-06-05  2:39   ` Parav Pandit
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 02/11] transport-fabrics: introduce Virtio Qualified Name zhenwei pi
2023-05-31 14:06   ` Stefan Hajnoczi
2023-06-02  1:50     ` zhenwei pi
2023-06-05  2:40       ` Parav Pandit
2023-06-05  7:57         ` zhenwei pi
2023-06-05 17:05         ` Stefan Hajnoczi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 03/11] transport-fabircs: introduce Segment Descriptor Definition zhenwei pi
2023-05-31 14:23   ` Stefan Hajnoczi
2023-06-02  3:08     ` zhenwei pi
2023-06-05  2:40   ` [virtio-comment] " Parav Pandit
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 04/11] transport-fabrics: introduce Stream Transmission zhenwei pi
2023-05-31 15:20   ` Stefan Hajnoczi
2023-06-02  2:26     ` zhenwei pi
2023-06-05 16:11       ` Stefan Hajnoczi
2023-06-06  3:13         ` zhenwei pi
2023-06-06 13:09           ` Stefan Hajnoczi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 05/11] transport-fabrics: introduce Keyed Transmission zhenwei pi
2023-05-31 16:20   ` [virtio-comment] " Stefan Hajnoczi
2023-06-01  9:02     ` zhenwei pi
2023-06-01 11:33       ` Stefan Hajnoczi
2023-06-01 13:09         ` zhenwei pi
2023-06-01 19:13           ` Stefan Hajnoczi
2023-06-01 21:23             ` Stefan Hajnoczi
2023-06-02  0:55               ` zhenwei pi
2023-06-05 17:21                 ` Stefan Hajnoczi
2023-06-05  2:41   ` Parav Pandit
2023-06-05  8:41     ` zhenwei pi
2023-06-05 11:45       ` Parav Pandit
2023-06-05 12:50         ` zhenwei pi
2023-06-05 13:12           ` Parav Pandit
2023-06-06  7:13             ` zhenwei pi
2023-06-06 21:52               ` Parav Pandit
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 06/11] transport-fabrics: introduce command set zhenwei pi
2023-05-31 17:10   ` [virtio-comment] " Stefan Hajnoczi
2023-06-02  5:15     ` [virtio-comment] " zhenwei pi
2023-06-05 16:30       ` Stefan Hajnoczi
2023-06-06  1:31         ` [virtio-comment] " zhenwei pi
2023-06-06 13:34           ` Stefan Hajnoczi
2023-06-07  2:58             ` [virtio-comment] " zhenwei pi
2023-06-08 16:41               ` Stefan Hajnoczi [this message]
2023-06-08 17:01                 ` [virtio-comment] " Parav Pandit
2023-06-09  1:39                   ` [virtio-comment] " zhenwei pi
2023-06-09  2:06                     ` [virtio-comment] " Parav Pandit
2023-06-09  3:55                       ` zhenwei pi
2023-06-11 20:56                         ` Parav Pandit
2023-06-06  2:02         ` [virtio-comment] " zhenwei pi
2023-06-06 13:44           ` Stefan Hajnoczi
2023-06-07  2:03             ` [virtio-comment] " zhenwei pi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 07/11] transport-fabrics: introduce opcodes zhenwei pi
2023-05-31 17:11   ` [virtio-comment] " Stefan Hajnoczi
     [not found]   ` <20230531205508.GA1509630@fedora>
2023-06-02  8:39     ` [virtio-comment] " zhenwei pi
2023-06-05 16:46       ` Stefan Hajnoczi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 08/11] transport-fabrics: introduce status of completion zhenwei pi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 09/11] transport-fabrics: add TCP&RDMA binding zhenwei pi
     [not found]   ` <20230531210255.GC1509630@fedora>
2023-06-02  9:07     ` [virtio-comment] Re: " zhenwei pi
2023-06-05 16:57       ` Stefan Hajnoczi
2023-06-06  1:41         ` [virtio-comment] " zhenwei pi
2023-06-06 13:51           ` Stefan Hajnoczi
2023-06-07  2:15             ` zhenwei pi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 10/11] transport-fabrics: add device initialization zhenwei pi
     [not found]   ` <20230531210925.GD1509630@fedora>
2023-06-02  9:11     ` zhenwei pi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 11/11] transport-fabrics: support inline data for keyed transmission zhenwei pi
2023-05-29  0:56 ` [virtio-comment] PING: [PATCH v2 00/11] Introduce Virtio Over Fabrics zhenwei pi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230608164123.GB2240319@fedora \
    --to=stefanha@redhat.com \
    --cc=helei.sig11@bytedance.com \
    --cc=houp@yusur.tech \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=parav@nvidia.com \
    --cc=pizhenwei@bytedance.com \
    --cc=virtio-comment@lists.oasis-open.org \
    --cc=xinhao.kong@duke.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.