BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Yuri Benditovich <yuri.benditovich@daynix.com>
Cc: Jason Wang <jasowang@redhat.com>,
	Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	"Gustavo A . R . Silva" <gustavoars@kernel.org>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Steffen Klassert <steffen.klassert@secunet.com>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	decui@microsoft.com, cai@lca.pw,
	Jakub Sitnicki <jakub@cloudflare.com>,
	Marco Elver <elver@google.com>, Paolo Abeni <pabeni@redhat.com>,
	Network Development <netdev@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
	bpf <bpf@vger.kernel.org>, Yan Vugenfirer <yan@daynix.com>
Subject: Re: [RFC PATCH 0/7] Support for virtio-net hash reporting
Date: Mon, 18 Jan 2021 10:19:57 -0500
Message-ID: <CA+FuTSfsFC0DTFhHDwT7dbtWXTmGOWjc=ozt8CgH_qDDn9gejg@mail.gmail.com> (raw)
In-Reply-To: <CAOEp5Oe4TcOukJa+OGj-ynfMMrZC=_YQDpzSC9_9p+UXSH7hmg@mail.gmail.com>

> > >>>>> What it does not give is a type indication, such as
> > >>>>> VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used.
> > >>>>> In datapaths where the NIC has already computed the four-tuple hash
> > >>>>> and stored it in skb->hash --the common case for servers--, That type
> > >>>>> field is the only reason to have to compute again.
> > >>>> The problem is there's no guarantee that the packet comes from the NIC,
> > >>>> it could be a simple VM2VM or host2VM packet.
> > >>>>
> > >>>> And even if the packet is coming from the NIC that calculates the hash
> > >>>> there's no guarantee that it's the has that guest want (guest may use
> > >>>> different RSS keys).
> > >>> Ah yes, of course.
> > >>>
> > >>> I would still revisit the need to store a detailed hash_type along with
> > >>> the hash, as as far I can tell that conveys no actionable information
> > >>> to the guest.
> > >>
> > >> Yes, need to figure out its usage. According to [1], it only mention
> > >> that storing has type is a charge of driver. Maybe Yuri can answer this.
> > >>
> > > For the case of Windows VM we can't know how exactly the network stack
> > > uses provided hash data (including hash type). But: different releases
> > > of Windows
> > > enable different hash types (for example UDP hash is enabled only on
> > > Server 2016 and up).
> > >
> > > Indeed the Windows requires a little more from the network adapter/driver
> > > than Linux does.
> > >
> > > The addition of RSS support to virtio specification takes in account
> > > the widest set of
> > > requirements (i.e. Windows one), our initial impression is that this
> > > should be enough also for Linux.
> > >
> > > The NDIS specification in part of RSS is _mandatory_ and there are
> > > certification tests
> > > that check that the driver provides the hash data as expected. All the
> > > high-performance
> > > network adapters have such RSS functionality in the hardware.

Thanks for the context.

If Windows requires the driver to pass the hash-type along with the
hash data, then indeed this will be needed.

If it only requires the device to support a subset of of the possible
types, chosen at init, that would be different and it would be cheaper
for the driver to pass this config to the device one time.

> > > With pre-RSS QEMU (i.e. where the virtio-net device does not indicate
> > > the RSS support)
> > > the virtio-net driver for Windows does all the job related to RSS:
> > > - hash calculation
> > > - hash/hash_type delivery
> > > - reporting each packet on the correct CPU according to RSS settings
> > >
> > > With RSS support in QEMU all the packets always come on a proper CPU and
> > > the driver never needs to reschedule them. The driver still need to
> > > calculate the
> > > hash and report it to Windows. In this case we do the same job twice: the device
> > > (QEMU or eBPF) does calculate the hash and get proper queue/CPU to deliver
> > > the packet. But the hash is not delivered by the device, so the driver needs to
> > > recalculate it and report to the Windows.
> > >
> > > If we add HASH_REPORT support (current set of patches) and the device
> > > indicates this
> > > feature we can avoid hash recalculation in the driver assuming we
> > > receive the correct hash
> > > value and hash type. Otherwise the driver can't know which exactly
> > > hash the device has calculated.
> > >
> > > Please let me know if I did not answer the question.
> >
> >
> > I think I get you. The hash type is also a kind of classification (e.g
> > TCP or UDP). Any possibility that it can be deduced from the driver? (Or
> > it could be too expensive to do that).
> >
> The driver does it today (when the device does not offer any features)
> and of course can continue doing it.
> IMO if the device can't report the data according to the spec it
> should not indicate support for the respective feature (or fallback to
> vhost=off).
> Again, IMO if Linux does not need the exact hash_type we can use (for
> Linux) the way that Willem de Brujin suggested in his patchset:
> - just add VIRTIO_NET_HASH_REPORT_L4 to the spec
> - Linux can use MQ + hash delivery (and use VIRTIO_NET_HASH_REPORT_L4)
> - Linux can use (if makes sense) RSS with VIRTIO_NET_HASH_REPORT_L4 and eBPF
> - Windows gets what it needs + eBPF
> So, everyone has what they need at the respective cost.
>
> Regarding use of skb->cb for hash type:
> Currently, if I'm not mistaken, there are 2 bytes at the end of skb->cb:
> skb->cb is 48 bytes array
> There is skb_gso_cb (14 bytes) at offset SKB_GSO_CB_OFFSET(32)
> Is it possible to use one of these 2 bytes for hash_type?
> If yes, shall we extend the skb_gso_cb and place the 1-bytes hash_type
> in it or just emit compilation error if the skb_gso_cb grows beyond 15
> bytes?

Good catch on segmentation taking place between .ndo_select_queue and
.ndo_start_xmit.

That also means that whatever field in the skb is used, has to be
copied to all segments in skb_segment. Which happens for cb. But this
feature is completely unrelated to the skb_gso_cb type. Perhaps
another field with a real type is more clear. For instance, an
extension to the union with napi_id and sender_cpu, as neither is used
in this egress path with .ndo_select_queue?

  reply index

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-12 19:41 Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 1/7] skbuff: define field for hash report type Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 2/7] vhost: support for hash report virtio-net feature Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type Yuri Benditovich
2021-01-12 19:46   ` Alexei Starovoitov
2021-01-12 20:33     ` Yuri Benditovich
2021-01-12 20:40   ` Yuri Benditovich
2021-01-12 20:55     ` Yuri Benditovich
2021-01-18  9:16       ` Yuri Benditovich
2021-01-20 18:44       ` Alexei Starovoitov
2021-01-24 11:52         ` Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 4/7] tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 5/7] tun: add ioctl code TUNSETHASHPOPULATION Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 6/7] tun: populate hash in virtio-net header when needed Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 7/7] tun: report new tun feature IFF_HASH Yuri Benditovich
2021-01-12 19:49 ` [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
2021-01-12 20:28   ` Yuri Benditovich
2021-01-12 23:47     ` Willem de Bruijn
2021-01-13  4:05       ` Jason Wang
2021-01-13 14:33         ` Willem de Bruijn
2021-01-14  3:38           ` Jason Wang
2021-01-17  7:57             ` Yuri Benditovich
2021-01-18  2:46               ` Jason Wang
2021-01-18  9:09                 ` Yuri Benditovich
2021-01-18 15:19                   ` Willem de Bruijn [this message]
  -- strict thread matches above, loose matches on Subject: below --
2021-01-05 12:24 Yuri Benditovich
2021-01-05 17:21 ` Willem de Bruijn
2021-01-12 19:36   ` Yuri Benditovich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+FuTSfsFC0DTFhHDwT7dbtWXTmGOWjc=ozt8CgH_qDDn9gejg@mail.gmail.com' \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cai@lca.pw \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=elver@google.com \
    --cc=gustavoars@kernel.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=jakub@cloudflare.com \
    --cc=jasowang@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=rdunlap@infradead.org \
    --cc=songliubraving@fb.com \
    --cc=steffen.klassert@secunet.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=yan@daynix.com \
    --cc=yhs@fb.com \
    --cc=yuri.benditovich@daynix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git