virtio-comment.lists.oasis-open.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Heng Qi <hengqi@linux.alibaba.com>
Cc: virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org, Parav Pandit <parav@nvidia.com>,
	Jason Wang <jasowang@redhat.com>,
	Yuri Benditovich <yuri.benditovich@daynix.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Subject: Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
Date: Mon, 20 Mar 2023 15:45:13 -0400	[thread overview]
Message-ID: <20230320154456-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20230316131726.GA20524@h68b04307.sqa.eu95>

On Thu, Mar 16, 2023 at 09:17:26PM +0800, Heng Qi wrote:
> On Wed, Mar 15, 2023 at 10:57:40AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> > > 
> > > 
> > > 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:
> > > > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
> > > > > 
> > > > > 
> > > > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
> > > > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> > > > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > > > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > > > > > If the tunnel is used to encapsulate the packets, the hash calculated
> > > > > > > > > > > using the outer header of the receive packets is always fixed for the
> > > > > > > > > > > same flow packets, i.e. they will be steered to the same receive queue.
> > > > > > > > > > Wait a second. How is this true? Does not everyone stick the
> > > > > > > > > > inner header hash in the outer source port to solve this?
> > > > > > > > > Yes, you are right. That's what we did before the inner header hash, but it
> > > > > > > > > has a performance penalty, which I'll explain below.
> > > > > > > > > 
> > > > > > > > > > For example geneve spec says:
> > > > > > > > > > 
> > > > > > > > > >        it is necessary for entropy from encapsulated packets to be
> > > > > > > > > >        exposed in the tunnel header.  The most common technique for this is
> > > > > > > > > >        to use the UDP source port
> > > > > > > > > The end point of the tunnel called the gateway (with DPDK on top of it).
> > > > > > > > > 
> > > > > > > > > 1. When there is no inner header hash, entropy can be inserted into the udp
> > > > > > > > > src port of the outer header of the tunnel,
> > > > > > > > > and then the tunnel packet is handed over to the host. The host needs to
> > > > > > > > > take out a part of the CPUs to parse the outer headers (but not drop them)
> > > > > > > > > to calculate the inner hash for the inner payloads,
> > > > > > > > > and then use the inner
> > > > > > > > > hash to forward them to another part of the CPUs that are responsible for
> > > > > > > > > processing.
> > > > > > > > I don't get this part. Leave inner hashes to the guest inside the
> > > > > > > > tunnel, why is your host doing this?
> > > > > 
> > > > > Let's simplify some details and take a fresh look at two different
> > > > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
> > > > > 
> > > > > 1. In Scenario1, we can improve the processing performance of the same flow
> > > > > by implementing inner symmetric hashing.
> > > > > 
> > > > > This is because even though client1 and client2 communicate bidirectionally
> > > > > through the same flow, their data may pass
> > > > > 
> > > > > through and be encapsulated by different tunnels, resulting in the same flow
> > > > > being hashed to different queues and processed by different CPUs.
> > > > > 
> > > > > To ensure consistency and optimized processing, we need to parse out the
> > > > > inner header and compute a symmetric hash on it using a special rss key.
> > > > > 
> > > > > Sorry for not mentioning the inner symmetric hash before, in order to
> > > > > prevent the introduction of more concepts, but it is indeed a kind of inner
> > > > > hash.
> > > > If parts of a flow go through different tunnels won't this cause
> > > > reordering at the network level? Why is it so important to prevent it at
> > > > the nic then?  Or, since you are stressing symmetric hash, are you
> > > > talking about TX and RX side going through different tunnels?
> > > 
> > > Yes, the directions client1->client2 and client2->client1 may go through
> > > different tunnels.
> > > Using inner symmetric hashing can satisfy the same CPU to process two
> > > directions of the same flow to improve performance.
> > 
> > Well sure but ... are you just doing forwarding or inner processing too?
> 
> When there is an inner hash, there is no forwarding anymore.
> 
> > If forwarding why do you care about matching TX and RX queues? If e2e
> 
> In fact, we are just matching on the same rx queue. The network topology
> is roughly as follows. The processing host will receive the packets
> sent from client1 and client2 respectively, then make some action judgments,
> and return them to client2 and client1 respectively.
> 
> client1                   client2
>    |                         |
>    |      __________         |
>    +----->| tunnel |<--------+
>           |--------|
>              |  |
>              |  |
>              |  |
>              v  v
>        +-----------------+
>        | processing host |
>        +-----------------+
> 
> Thanks.

monotoring host would be a better term

> > processing can't you just store the incoming hash in the flow and reuse
> > on TX? This is what Linux is doing...
> > 
> > 
> > 
> > > > 
> > > > 
> > > > > 2. In Scenario2 with GRE, the lack of outer transport headers means that
> > > > > flows between multiple communication pairs encapsulated by the same tunnel
> > > > > 
> > > > > will all be hashed to the same queue. To address this, we need to implement
> > > > > inner hashing to improve the performance of RSS. By parsing and calculating
> > > > > 
> > > > > the inner hash, different flows can be hashed to different queues.
> > > > > 
> > > > > Thanks.
> > > > > 
> > > > > 
> > > > Well 2 is at least inexact, there's flowID there. It's just 8 bit
> > > 
> > > We use the most basic GRE header fields (not NVGRE), not even optional
> > > fields.
> > > There is also no flow id in the GRE header, should you be referring to
> > > NVGRE?
> > > 
> > > Thanks.
> > > 
> > > > so not sufficient if there are more than 512 queues. Still 512 queues
> > > > is quite a lot. Are you trying to solve for configurations with
> > > > more than 512 queues then?
> > > > 
> > > > 
> > > > > > > Assuming that the same flow includes a unidirectional flow a->b, or a
> > > > > > > bidirectional flow a->b and b->a,
> > > > > > > such flow may be out of order when processed by the gateway(DPDK):
> > > > > > > 
> > > > > > > 1. In unidirectional mode, if the same flow is switched to another gateway
> > > > > > > for some reason, resulting in different outer IP address,
> > > > > > >       then this flow may be processed by different CPUs after reaching the
> > > > > > > host if there is no inner hash. So after the host receives the
> > > > > > >       flow, first use the forwarding CPUs to parse the inner hash, and then
> > > > > > > use the hash to ensure that the flow is processed by the
> > > > > > >       same CPU.
> > > > > > > 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
> > > > > > > go to gateway 2. In order to ensure that the same flow is
> > > > > > >       processed by the same CPU, we still need the forwarding CPUs to parse
> > > > > > > the real inner hash(here, the hash key needs to be replaced with a symmetric
> > > > > > > hash key).
> > > > > > Oh intersting. What are those gateways, how come there's expectation
> > > > > > that you can change their addresses and topology
> > > > > > completely seamlessly without any reordering whatsoever?
> > > > > > Isn't network topology change kind of guaranteed to change ordering
> > > > > > sometimes?
> > > > > > 
> > > > > > 
> > > > > > > > > 1). During this process, the CPUs on the host is divided into two parts, one
> > > > > > > > > part is used as a forwarding node to parse the outer header,
> > > > > > > > >         and the CPU utilization is low. Another part handles packets.
> > > > > > > > Some overhead is clearly involved in *sending* packets -
> > > > > > > > to calculate the hash and stick it in the port number.
> > > > > > > > This is, however, a separate problem and if you want to
> > > > > > > > solve it then my suggestion would be to teach the *transmit*
> > > > > > > > side about GRE offloads, so it can fill the source port in the card.
> > > > > > > > 
> > > > > > > > > 2). The entropy of the source udp src port is not enough, that is, the queue
> > > > > > > > > is not widely distributed.
> > > > > > > > how isn't it enough? 16 bit is enough to cover all vqs ...
> > > > > > > A 5-tuple brings more entropy than a single port, doesn't it?
> > > > > > But you don't need more for RSS, the indirection table is not
> > > > > > that large.
> > > > > > 
> > > > > > > In fact, the
> > > > > > > inner hash of the physical network card used by
> > > > > > > the business team is indeed better than the udp port number of the outer
> > > > > > > header we modify now, but they did not give me the data.
> > > > > > Admittedly, out hash value is 32 bit.
> > > > > > 
> > > > > > > > > 2. When there is an inner header hash, the gateway will directly help parse
> > > > > > > > > the outer header, and use the inner 5 tuples to calculate the inner hash.
> > > > > > > > > The tunneled packet is then handed over to the host.
> > > > > > > > > 1) All the CPUs of the host are used to process data packets, and there is
> > > > > > > > > no need to use some CPUs to forward and parse the outer header.
> > > > > > > > You really have to parse the outer header anyway,
> > > > > > > > otherwise there's no tunneling.
> > > > > > > > Unless you want to teach virtio to implement tunneling
> > > > > > > > in hardware, which is something I'd find it easier to
> > > > > > > > get behind.
> > > > > > > There is no need to parse the outer header twice, because we use shared
> > > > > > > memory.
> > > > > > shared with what? you need the outer header to identify the tunnel.
> > > > > > 
> > > > > > > > > 2) The entropy of the original quintuple is sufficient, and the queue is
> > > > > > > > > widely distributed.
> > > > > > > > It's exactly the same entropy, why would it be better? In fact you
> > > > > > > > are taking out the outer hash entropy making things worse.
> > > > > > > I don't get the point, why the entropy of the inner 5-tuple and the outer
> > > > > > > tunnel header is the same,
> > > > > > > multiple streams have the same outer header.
> > > > > > > 
> > > > > > > Thanks.
> > > > > > well our hash is 32 bit. source port is just 16 bit.
> > > > > > so yes it's more entropy but RSS can't use more than 16 bit.
> > > > > > why do you need so many? you have more than 64k CPUs to offload to?
> > > > > > 
> > > > > > 
> > > > > > > > > Thanks.
> > > > > > > > > > same goes for vxlan did not check further.
> > > > > > > > > > 
> > > > > > > > > > so what is the problem?  and which tunnel types actually suffer from the
> > > > > > > > > > problem?
> > > > > > > > > > 
> > > > > > > > > This publicly archived list offers a means to provide input to the
> > > > > > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > > > > > > 
> > > > > > > > > In order to verify user consent to the Feedback License terms and
> > > > > > > > > to minimize spam in the list archive, subscription is required
> > > > > > > > > before posting.
> > > > > > > > > 
> > > > > > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > > > > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > 
> > > > > This publicly archived list offers a means to provide input to the
> > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > > 
> > > > > In order to verify user consent to the Feedback License terms and
> > > > > to minimize spam in the list archive, subscription is required
> > > > > before posting.
> > > > > 
> > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > 
> > > > This publicly archived list offers a means to provide input to the
> > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > 
> > > > In order to verify user consent to the Feedback License terms and
> > > > to minimize spam in the list archive, subscription is required
> > > > before posting.
> > > > 
> > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > Join OASIS: https://www.oasis-open.org/join/


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


  reply	other threads:[~2023-03-20 19:45 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-18 14:37 [PATCH v9] virtio-net: support inner header hash Heng Qi
2023-02-20 15:53 ` [virtio-comment] Re: [virtio-dev] " Heng Qi
2023-02-20 16:12   ` Michael S. Tsirkin
2023-02-21  4:20 ` Parav Pandit
2023-02-21  6:14   ` [virtio-comment] " Heng Qi
2023-02-21 12:47     ` Parav Pandit
2023-02-21 13:34       ` Heng Qi
2023-02-21 15:32         ` Parav Pandit
2023-02-21 16:44           ` [virtio-comment] Re: [virtio-dev] " Heng Qi
2023-02-21 16:50             ` Parav Pandit
2023-02-21 17:13               ` Michael S. Tsirkin
2023-02-21 17:40                 ` [virtio-comment] " Parav Pandit
2023-02-21 17:44                   ` Michael S. Tsirkin
2023-02-21 17:54                     ` Parav Pandit
2023-02-21 17:17               ` [virtio-comment] " Heng Qi
2023-02-21 17:39                 ` Parav Pandit
2023-02-21 13:37       ` Heng Qi
2023-02-21 17:05   ` Michael S. Tsirkin
2023-02-21 19:29     ` Parav Pandit
2023-02-21 21:23       ` Michael S. Tsirkin
2023-02-21 21:36         ` Parav Pandit
2023-02-21 21:46           ` Michael S. Tsirkin
2023-02-21 22:32             ` Parav Pandit
2023-02-21 23:18               ` Michael S. Tsirkin
2023-02-22  1:41                 ` Parav Pandit
2023-02-22  2:51                 ` [virtio-dev] " Heng Qi
2023-02-22  2:34       ` [virtio-dev] " Heng Qi
2023-02-22  6:21         ` Michael S. Tsirkin
2023-02-22  7:03           ` Heng Qi
2023-02-22 11:29             ` Michael S. Tsirkin
2023-02-21 17:50 ` Michael S. Tsirkin
2023-02-22  3:22   ` Jason Wang
2023-02-22  6:46     ` Heng Qi
2023-02-22 11:30       ` Michael S. Tsirkin
2023-02-23  2:50       ` Jason Wang
2023-02-23  4:41         ` [virtio-dev] " Heng Qi
2023-02-24  2:45           ` Jason Wang
2023-02-24  4:47             ` [virtio-comment] " Heng Qi
2023-02-24  8:07             ` Michael S. Tsirkin
2023-02-23 13:03         ` Michael S. Tsirkin
2023-02-24  2:26           ` Jason Wang
2023-02-24  8:06             ` [virtio-dev] " Michael S. Tsirkin
2023-02-27  4:07               ` Jason Wang
2023-02-27  7:39                 ` Michael S. Tsirkin
2023-02-27  8:35                   ` Jason Wang
2023-02-27 12:38                     ` Heng Qi
2023-02-27 17:49                     ` Michael S. Tsirkin
2023-02-28  3:04                       ` Jason Wang
2023-02-28  8:52                         ` Michael S. Tsirkin
2023-02-28  9:56                           ` [virtio-dev] " Heng Qi
2023-02-28 11:04                         ` Michael S. Tsirkin
2023-03-01  2:36                           ` Jason Wang
2023-02-23 13:13 ` Michael S. Tsirkin
2023-02-23 14:40   ` [virtio-comment] " Parav Pandit
2023-02-24  8:13     ` Michael S. Tsirkin
2023-02-24 14:38       ` [virtio-dev] " Heng Qi
2023-02-24 17:10         ` Michael S. Tsirkin
2023-02-27  0:29       ` Parav Pandit
2023-02-24  4:42   ` Heng Qi
2023-02-24  8:04     ` Michael S. Tsirkin
2023-02-28 11:16 ` Michael S. Tsirkin
2023-03-01  2:56   ` [virtio-dev] " Heng Qi
2023-03-08 14:39     ` [virtio-comment] " Michael S. Tsirkin
2023-03-09  4:55       ` [virtio-comment] Re: [virtio-dev] " Heng Qi
2023-03-09 19:36         ` Michael S. Tsirkin
2023-03-11  3:23           ` Heng Qi
2023-03-15 11:58             ` Michael S. Tsirkin
2023-03-15 12:55               ` Heng Qi
2023-03-15 14:57                 ` Michael S. Tsirkin
2023-03-16 13:17                   ` Heng Qi
2023-03-20 19:45                     ` Michael S. Tsirkin [this message]
2023-03-30 12:10                       ` Heng Qi
2023-03-20 19:48                 ` Michael S. Tsirkin
2023-03-30 12:37                   ` Heng Qi
2023-04-08 10:29                     ` Michael S. Tsirkin
2023-04-10 13:26                       ` [virtio-comment] Re: [virtio-dev] " Heng Qi
2023-03-01  3:30   ` [virtio-comment] " Heng Qi
2023-03-09 12:28   ` [virtio-comment] Re: [virtio-dev] " Heng Qi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230320154456-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=hengqi@linux.alibaba.com \
    --cc=jasowang@redhat.com \
    --cc=parav@nvidia.com \
    --cc=virtio-comment@lists.oasis-open.org \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=yuri.benditovich@daynix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).