virtio-dev.lists.oasis-open.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Heng Qi <hengqi@linux.alibaba.com>
Cc: virtio-comment@lists.oasis-open.org,
	virtio-dev@lists.oasis-open.org, Parav Pandit <parav@nvidia.com>,
	Jason Wang <jasowang@redhat.com>,
	Yuri Benditovich <yuri.benditovich@daynix.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Subject: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH v9] virtio-net: support inner header hash
Date: Wed, 15 Mar 2023 10:57:40 -0400	[thread overview]
Message-ID: <20230315094102-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <4b14043c-6059-1d26-060e-7dc653c4f401@linux.alibaba.com>

On Wed, Mar 15, 2023 at 08:55:45PM +0800, Heng Qi wrote:
> 
> 
> 在 2023/3/15 下午7:58, Michael S. Tsirkin 写道:
> > On Sat, Mar 11, 2023 at 11:23:08AM +0800, Heng Qi wrote:
> > > 
> > > 
> > > 在 2023/3/10 上午3:36, Michael S. Tsirkin 写道:
> > > > On Thu, Mar 09, 2023 at 12:55:02PM +0800, Heng Qi wrote:
> > > > > 在 2023/3/8 下午10:39, Michael S. Tsirkin 写道:
> > > > > > On Wed, Mar 01, 2023 at 10:56:31AM +0800, Heng Qi wrote:
> > > > > > > 在 2023/2/28 下午7:16, Michael S. Tsirkin 写道:
> > > > > > > > On Sat, Feb 18, 2023 at 10:37:15PM +0800, Heng Qi wrote:
> > > > > > > > > If the tunnel is used to encapsulate the packets, the hash calculated
> > > > > > > > > using the outer header of the receive packets is always fixed for the
> > > > > > > > > same flow packets, i.e. they will be steered to the same receive queue.
> > > > > > > > Wait a second. How is this true? Does not everyone stick the
> > > > > > > > inner header hash in the outer source port to solve this?
> > > > > > > Yes, you are right. That's what we did before the inner header hash, but it
> > > > > > > has a performance penalty, which I'll explain below.
> > > > > > > 
> > > > > > > > For example geneve spec says:
> > > > > > > > 
> > > > > > > >        it is necessary for entropy from encapsulated packets to be
> > > > > > > >        exposed in the tunnel header.  The most common technique for this is
> > > > > > > >        to use the UDP source port
> > > > > > > The end point of the tunnel called the gateway (with DPDK on top of it).
> > > > > > > 
> > > > > > > 1. When there is no inner header hash, entropy can be inserted into the udp
> > > > > > > src port of the outer header of the tunnel,
> > > > > > > and then the tunnel packet is handed over to the host. The host needs to
> > > > > > > take out a part of the CPUs to parse the outer headers (but not drop them)
> > > > > > > to calculate the inner hash for the inner payloads,
> > > > > > > and then use the inner
> > > > > > > hash to forward them to another part of the CPUs that are responsible for
> > > > > > > processing.
> > > > > > I don't get this part. Leave inner hashes to the guest inside the
> > > > > > tunnel, why is your host doing this?
> > > 
> > > Let's simplify some details and take a fresh look at two different
> > > scenarios: VXLAN and GENEVE (Scenario1) and GRE (Scenario2).
> > > 
> > > 1. In Scenario1, we can improve the processing performance of the same flow
> > > by implementing inner symmetric hashing.
> > > 
> > > This is because even though client1 and client2 communicate bidirectionally
> > > through the same flow, their data may pass
> > > 
> > > through and be encapsulated by different tunnels, resulting in the same flow
> > > being hashed to different queues and processed by different CPUs.
> > > 
> > > To ensure consistency and optimized processing, we need to parse out the
> > > inner header and compute a symmetric hash on it using a special rss key.
> > > 
> > > Sorry for not mentioning the inner symmetric hash before, in order to
> > > prevent the introduction of more concepts, but it is indeed a kind of inner
> > > hash.
> > If parts of a flow go through different tunnels won't this cause
> > reordering at the network level? Why is it so important to prevent it at
> > the nic then?  Or, since you are stressing symmetric hash, are you
> > talking about TX and RX side going through different tunnels?
> 
> Yes, the directions client1->client2 and client2->client1 may go through
> different tunnels.
> Using inner symmetric hashing can satisfy the same CPU to process two
> directions of the same flow to improve performance.

Well sure but ... are you just doing forwarding or inner processing too?
If forwarding why do you care about matching TX and RX queues? If e2e
processing can't you just store the incoming hash in the flow and reuse
on TX? This is what Linux is doing...



> > 
> > 
> > > 2. In Scenario2 with GRE, the lack of outer transport headers means that
> > > flows between multiple communication pairs encapsulated by the same tunnel
> > > 
> > > will all be hashed to the same queue. To address this, we need to implement
> > > inner hashing to improve the performance of RSS. By parsing and calculating
> > > 
> > > the inner hash, different flows can be hashed to different queues.
> > > 
> > > Thanks.
> > > 
> > > 
> > Well 2 is at least inexact, there's flowID there. It's just 8 bit
> 
> We use the most basic GRE header fields (not NVGRE), not even optional
> fields.
> There is also no flow id in the GRE header, should you be referring to
> NVGRE?
> 
> Thanks.
> 
> > so not sufficient if there are more than 512 queues. Still 512 queues
> > is quite a lot. Are you trying to solve for configurations with
> > more than 512 queues then?
> > 
> > 
> > > > > Assuming that the same flow includes a unidirectional flow a->b, or a
> > > > > bidirectional flow a->b and b->a,
> > > > > such flow may be out of order when processed by the gateway(DPDK):
> > > > > 
> > > > > 1. In unidirectional mode, if the same flow is switched to another gateway
> > > > > for some reason, resulting in different outer IP address,
> > > > >       then this flow may be processed by different CPUs after reaching the
> > > > > host if there is no inner hash. So after the host receives the
> > > > >       flow, first use the forwarding CPUs to parse the inner hash, and then
> > > > > use the hash to ensure that the flow is processed by the
> > > > >       same CPU.
> > > > > 2. In bidirectional mode, a->b flow may go to gateway 1, and b->a flow may
> > > > > go to gateway 2. In order to ensure that the same flow is
> > > > >       processed by the same CPU, we still need the forwarding CPUs to parse
> > > > > the real inner hash(here, the hash key needs to be replaced with a symmetric
> > > > > hash key).
> > > > Oh intersting. What are those gateways, how come there's expectation
> > > > that you can change their addresses and topology
> > > > completely seamlessly without any reordering whatsoever?
> > > > Isn't network topology change kind of guaranteed to change ordering
> > > > sometimes?
> > > > 
> > > > 
> > > > > > > 1). During this process, the CPUs on the host is divided into two parts, one
> > > > > > > part is used as a forwarding node to parse the outer header,
> > > > > > >         and the CPU utilization is low. Another part handles packets.
> > > > > > Some overhead is clearly involved in *sending* packets -
> > > > > > to calculate the hash and stick it in the port number.
> > > > > > This is, however, a separate problem and if you want to
> > > > > > solve it then my suggestion would be to teach the *transmit*
> > > > > > side about GRE offloads, so it can fill the source port in the card.
> > > > > > 
> > > > > > > 2). The entropy of the source udp src port is not enough, that is, the queue
> > > > > > > is not widely distributed.
> > > > > > how isn't it enough? 16 bit is enough to cover all vqs ...
> > > > > A 5-tuple brings more entropy than a single port, doesn't it?
> > > > But you don't need more for RSS, the indirection table is not
> > > > that large.
> > > > 
> > > > > In fact, the
> > > > > inner hash of the physical network card used by
> > > > > the business team is indeed better than the udp port number of the outer
> > > > > header we modify now, but they did not give me the data.
> > > > Admittedly, out hash value is 32 bit.
> > > > 
> > > > > > > 2. When there is an inner header hash, the gateway will directly help parse
> > > > > > > the outer header, and use the inner 5 tuples to calculate the inner hash.
> > > > > > > The tunneled packet is then handed over to the host.
> > > > > > > 1) All the CPUs of the host are used to process data packets, and there is
> > > > > > > no need to use some CPUs to forward and parse the outer header.
> > > > > > You really have to parse the outer header anyway,
> > > > > > otherwise there's no tunneling.
> > > > > > Unless you want to teach virtio to implement tunneling
> > > > > > in hardware, which is something I'd find it easier to
> > > > > > get behind.
> > > > > There is no need to parse the outer header twice, because we use shared
> > > > > memory.
> > > > shared with what? you need the outer header to identify the tunnel.
> > > > 
> > > > > > > 2) The entropy of the original quintuple is sufficient, and the queue is
> > > > > > > widely distributed.
> > > > > > It's exactly the same entropy, why would it be better? In fact you
> > > > > > are taking out the outer hash entropy making things worse.
> > > > > I don't get the point, why the entropy of the inner 5-tuple and the outer
> > > > > tunnel header is the same,
> > > > > multiple streams have the same outer header.
> > > > > 
> > > > > Thanks.
> > > > well our hash is 32 bit. source port is just 16 bit.
> > > > so yes it's more entropy but RSS can't use more than 16 bit.
> > > > why do you need so many? you have more than 64k CPUs to offload to?
> > > > 
> > > > 
> > > > > > > Thanks.
> > > > > > > > same goes for vxlan did not check further.
> > > > > > > > 
> > > > > > > > so what is the problem?  and which tunnel types actually suffer from the
> > > > > > > > problem?
> > > > > > > > 
> > > > > > > This publicly archived list offers a means to provide input to the
> > > > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > > > > 
> > > > > > > In order to verify user consent to the Feedback License terms and
> > > > > > > to minimize spam in the list archive, subscription is required
> > > > > > > before posting.
> > > > > > > 
> > > > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > 
> > > This publicly archived list offers a means to provide input to the
> > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > 
> > > In order to verify user consent to the Feedback License terms and
> > > to minimize spam in the list archive, subscription is required
> > > before posting.
> > > 
> > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > List help: virtio-comment-help@lists.oasis-open.org
> > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > Committee: https://www.oasis-open.org/committees/virtio/
> > > Join OASIS: https://www.oasis-open.org/join/
> > 
> > This publicly archived list offers a means to provide input to the
> > OASIS Virtual I/O Device (VIRTIO) TC.
> > 
> > In order to verify user consent to the Feedback License terms and
> > to minimize spam in the list archive, subscription is required
> > before posting.
> > 
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


  reply	other threads:[~2023-03-15 14:57 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-18 14:37 [PATCH v9] virtio-net: support inner header hash Heng Qi
2023-02-20 15:53 ` [virtio-comment] Re: [virtio-dev] " Heng Qi
2023-02-20 16:12   ` Michael S. Tsirkin
2023-02-21  4:20 ` Parav Pandit
2023-02-21  6:14   ` [virtio-comment] " Heng Qi
2023-02-21 12:47     ` Parav Pandit
2023-02-21 13:34       ` Heng Qi
2023-02-21 15:32         ` Parav Pandit
2023-02-21 16:44           ` [virtio-comment] Re: [virtio-dev] " Heng Qi
2023-02-21 16:50             ` Parav Pandit
2023-02-21 17:13               ` Michael S. Tsirkin
2023-02-21 17:40                 ` [virtio-comment] " Parav Pandit
2023-02-21 17:44                   ` Michael S. Tsirkin
2023-02-21 17:54                     ` Parav Pandit
2023-02-21 17:17               ` [virtio-comment] " Heng Qi
2023-02-21 17:39                 ` Parav Pandit
2023-02-21 13:37       ` Heng Qi
2023-02-21 17:05   ` Michael S. Tsirkin
2023-02-21 19:29     ` Parav Pandit
2023-02-21 21:23       ` Michael S. Tsirkin
2023-02-21 21:36         ` Parav Pandit
2023-02-21 21:46           ` Michael S. Tsirkin
2023-02-21 22:32             ` Parav Pandit
2023-02-21 23:18               ` Michael S. Tsirkin
2023-02-22  1:41                 ` Parav Pandit
2023-02-22  2:51                 ` [virtio-dev] " Heng Qi
2023-02-22  2:34       ` [virtio-dev] " Heng Qi
2023-02-22  6:21         ` Michael S. Tsirkin
2023-02-22  7:03           ` Heng Qi
2023-02-22 11:29             ` Michael S. Tsirkin
2023-03-01 14:32   ` [virtio-dev] " Heng Qi
2023-02-21 17:50 ` Michael S. Tsirkin
2023-02-22  3:22   ` Jason Wang
2023-02-22  6:46     ` Heng Qi
2023-02-22 11:30       ` Michael S. Tsirkin
2023-02-23  2:50       ` Jason Wang
2023-02-23  4:41         ` [virtio-dev] " Heng Qi
2023-02-24  2:45           ` Jason Wang
2023-02-24  4:47             ` [virtio-comment] " Heng Qi
2023-02-24  8:07             ` Michael S. Tsirkin
2023-02-23 13:03         ` Michael S. Tsirkin
2023-02-24  2:26           ` Jason Wang
2023-02-24  8:06             ` [virtio-dev] " Michael S. Tsirkin
2023-02-27  4:07               ` Jason Wang
2023-02-27  4:07                 ` [virtio-dev] " Jason Wang
2023-02-27  7:39                 ` Michael S. Tsirkin
2023-02-27  7:39                   ` [virtio-dev] " Michael S. Tsirkin
2023-02-27  8:35                   ` Jason Wang
2023-02-27  8:35                     ` [virtio-dev] " Jason Wang
2023-02-27 12:38                     ` Heng Qi
2023-02-27 12:38                       ` [virtio-dev] " Heng Qi
2023-02-27 17:49                     ` Michael S. Tsirkin
2023-02-27 17:49                       ` [virtio-dev] " Michael S. Tsirkin
2023-02-28  3:04                       ` Jason Wang
2023-02-28  3:04                         ` [virtio-dev] " Jason Wang
2023-02-28  8:52                         ` Michael S. Tsirkin
2023-02-28  8:52                           ` [virtio-dev] " Michael S. Tsirkin
2023-02-28  9:56                           ` Heng Qi
2023-02-28  9:56                             ` Heng Qi
2023-02-28 11:04                         ` Michael S. Tsirkin
2023-02-28 11:04                           ` [virtio-dev] " Michael S. Tsirkin
2023-03-01  2:36                           ` Jason Wang
2023-03-01  2:36                             ` [virtio-dev] " Jason Wang
2023-03-01 10:36                             ` Michael S. Tsirkin
2023-03-02  2:57                               ` Jason Wang
2023-03-02  7:42                                 ` Michael S. Tsirkin
2023-03-02  7:57                                   ` Jason Wang
2023-03-02  8:09                                     ` Michael S. Tsirkin
2023-03-02  8:15                                       ` Jason Wang
2023-03-02  8:41                                         ` Michael S. Tsirkin
2023-03-02  8:59                                           ` Jason Wang
2023-03-02  9:46                                             ` Michael S. Tsirkin
2023-02-23 13:13 ` Michael S. Tsirkin
2023-02-23 14:40   ` [virtio-comment] " Parav Pandit
2023-02-24  8:13     ` Michael S. Tsirkin
2023-02-24 14:38       ` [virtio-dev] " Heng Qi
2023-02-24 17:10         ` Michael S. Tsirkin
2023-02-24 17:10           ` Michael S. Tsirkin
2023-02-27  0:29       ` Parav Pandit
2023-02-27  0:29         ` [virtio-dev] " Parav Pandit
2023-02-24  4:42   ` Heng Qi
2023-02-24  8:04     ` Michael S. Tsirkin
2023-02-28 11:16 ` Michael S. Tsirkin
2023-02-28 11:16   ` [virtio-dev] " Michael S. Tsirkin
2023-03-01  2:56   ` Heng Qi
2023-03-01  2:56     ` Heng Qi
2023-03-08 14:39     ` [virtio-dev] Re: [virtio-comment] " Michael S. Tsirkin
2023-03-09  4:55       ` Heng Qi
2023-03-09 19:36         ` Michael S. Tsirkin
2023-03-11  3:23           ` Heng Qi
2023-03-15 11:58             ` [virtio-dev] Re: [virtio-comment] " Michael S. Tsirkin
2023-03-15 12:55               ` Heng Qi
2023-03-15 14:57                 ` Michael S. Tsirkin [this message]
2023-03-16 13:17                   ` Heng Qi
2023-03-20 19:45                     ` Michael S. Tsirkin
2023-03-30 12:10                       ` Heng Qi
2023-03-20 19:48                 ` Michael S. Tsirkin
2023-03-30 12:37                   ` Heng Qi
2023-04-08 10:29                     ` Michael S. Tsirkin
2023-04-10 13:26                       ` Heng Qi
2023-03-01  3:30   ` [virtio-comment] " Heng Qi
2023-03-01  3:30     ` [virtio-dev] " Heng Qi
2023-03-01 11:07     ` Michael S. Tsirkin
2023-03-01 15:10       ` Heng Qi
2023-03-09 12:28   ` [virtio-dev] " Heng Qi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230315094102-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=hengqi@linux.alibaba.com \
    --cc=jasowang@redhat.com \
    --cc=parav@nvidia.com \
    --cc=virtio-comment@lists.oasis-open.org \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=yuri.benditovich@daynix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).