From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
Yuri Benditovich <yuri.benditovich@daynix.com>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
"Michael S . Tsirkin" <mst@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
Yonghong Song <yhs@fb.com>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
rdunlap@infradead.org,
"Gustavo A . R . Silva" <gustavoars@kernel.org>,
Herbert Xu <herbert@gondor.apana.org.au>,
Steffen Klassert <steffen.klassert@secunet.com>,
Pablo Neira Ayuso <pablo@netfilter.org>,
decui@microsoft.com, cai@lca.pw,
Jakub Sitnicki <jakub@cloudflare.com>,
Marco Elver <elver@google.com>, Paolo Abeni <pabeni@redhat.com>,
Network Development <netdev@vger.kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
bpf <bpf@vger.kernel.org>, Yan Vugenfirer <yan@daynix.com>
Subject: Re: [RFC PATCH 0/7] Support for virtio-net hash reporting
Date: Wed, 13 Jan 2021 09:33:25 -0500 [thread overview]
Message-ID: <CA+FuTSfJJhEYr6gXmjpjjXzg6Xm5wWa-dL1SEV-Zt7RcPXGztg@mail.gmail.com> (raw)
In-Reply-To: <78bbc518-4b73-4629-68fb-2713250f8967@redhat.com>
On Tue, Jan 12, 2021 at 11:11 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2021/1/13 上午7:47, Willem de Bruijn wrote:
> > On Tue, Jan 12, 2021 at 3:29 PM Yuri Benditovich
> > <yuri.benditovich@daynix.com> wrote:
> >> On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich
> >> <yuri.benditovich@daynix.com> wrote:
> >>> On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich
> >>> <yuri.benditovich@daynix.com> wrote:
> >>>> Existing TUN module is able to use provided "steering eBPF" to
> >>>> calculate per-packet hash and derive the destination queue to
> >>>> place the packet to. The eBPF uses mapped configuration data
> >>>> containing a key for hash calculation and indirection table
> >>>> with array of queues' indices.
> >>>>
> >>>> This series of patches adds support for virtio-net hash reporting
> >>>> feature as defined in virtio specification. It extends the TUN module
> >>>> and the "steering eBPF" as follows:
> >>>>
> >>>> Extended steering eBPF calculates the hash value and hash type, keeps
> >>>> hash value in the skb->hash and returns index of destination virtqueue
> >>>> and the type of the hash. TUN module keeps returned hash type in
> >>>> (currently unused) field of the skb.
> >>>> skb->__unused renamed to 'hash_report_type'.
> >>>>
> >>>> When TUN module is called later to allocate and fill the virtio-net
> >>>> header and push it to destination virtqueue it populates the hash
> >>>> and the hash type into virtio-net header.
> >>>>
> >>>> VHOST driver is made aware of respective virtio-net feature that
> >>>> extends the virtio-net header to report the hash value and hash report
> >>>> type.
> >>> Comment from Willem de Bruijn:
> >>>
> >>> Skbuff fields are in short supply. I don't think we need to add one
> >>> just for this narrow path entirely internal to the tun device.
> >>>
> >> We understand that and try to minimize the impact by using an already
> >> existing unused field of skb.
> > Not anymore. It was repurposed as a flags field very recently.
> >
> > This use case is also very narrow in scope. And a very short path from
> > data producer to consumer. So I don't think it needs to claim scarce
> > bits in the skb.
> >
> > tun_ebpf_select_queue stores the field, tun_put_user reads it and
> > converts it to the virtio_net_hdr in the descriptor.
> >
> > tun_ebpf_select_queue is called from .ndo_select_queue. Storing the
> > field in skb->cb is fragile, as in theory some code could overwrite
> > that between field between ndo_select_queue and
> > ndo_start_xmit/tun_net_xmit, from which point it is fully under tun
> > control again. But in practice, I don't believe anything does.
> >
> > Alternatively an existing skb field that is used only on disjoint
> > datapaths, such as ingress-only, could be viable.
>
>
> A question here. We had metadata support in XDP for cooperation between
> eBPF programs. Do we have something similar in the skb?
>
> E.g in the RSS, if we want to pass some metadata information between
> eBPF program and the logic that generates the vnet header (either hard
> logic in the kernel or another eBPF program). Is there any way that can
> avoid the possible conflicts of qdiscs?
Not that I am aware of. The closest thing is cb[].
It'll have to aliase a field like that, that is known unused for the given path.
One other approach that has been used within linear call stacks is out
of band. Like percpu variables softnet_data.xmit.more and
mirred_rec_level. But that is perhaps a bit overwrought for this use
case.
> >
> >>> Instead, you could just run the flow_dissector in tun_put_user if the
> >>> feature is negotiated. Indeed, the flow dissector seems more apt to me
> >>> than BPF here. Note that the flow dissector internally can be
> >>> overridden by a BPF program if the admin so chooses.
> >>>
> >> When this set of patches is related to hash delivery in the virtio-net
> >> packet in general,
> >> it was prepared in context of RSS feature implementation as defined in
> >> virtio spec [1]
> >> In case of RSS it is not enough to run the flow_dissector in tun_put_user:
> >> in tun_ebpf_select_queue the TUN calls eBPF to calculate the hash,
> >> hash type and queue index
> >> according to the (mapped) parameters (key, hash types, indirection
> >> table) received from the guest.
> > TUNSETSTEERINGEBPF was added to support more diverse queue selection
> > than the default in case of multiqueue tun. Not sure what the exact
> > use cases are.
> >
> > But RSS is exactly the purpose of the flow dissector. It is used for
> > that purpose in the software variant RPS. The flow dissector
> > implements a superset of the RSS spec, and certainly computes a
> > four-tuple for TCP/IPv6. In the case of RPS, it is skipped if the NIC
> > has already computed a 4-tuple hash.
> >
> > What it does not give is a type indication, such as
> > VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used.
> > In datapaths where the NIC has already computed the four-tuple hash
> > and stored it in skb->hash --the common case for servers--, That type
> > field is the only reason to have to compute again.
>
>
> The problem is there's no guarantee that the packet comes from the NIC,
> it could be a simple VM2VM or host2VM packet.
>
> And even if the packet is coming from the NIC that calculates the hash
> there's no guarantee that it's the has that guest want (guest may use
> different RSS keys).
Ah yes, of course.
I would still revisit the need to store a detailed hash_type along with
the hash, as as far I can tell that conveys no actionable information
to the guest.
next prev parent reply other threads:[~2021-01-13 14:34 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-12 19:41 [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 1/7] skbuff: define field for hash report type Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 2/7] vhost: support for hash report virtio-net feature Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type Yuri Benditovich
2021-01-12 19:46 ` Alexei Starovoitov
2021-01-12 20:33 ` Yuri Benditovich
2021-01-12 20:40 ` Yuri Benditovich
2021-01-12 20:55 ` Yuri Benditovich
2021-01-18 9:16 ` Yuri Benditovich
2021-01-20 18:44 ` Alexei Starovoitov
2021-01-24 11:52 ` Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 4/7] tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 5/7] tun: add ioctl code TUNSETHASHPOPULATION Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 6/7] tun: populate hash in virtio-net header when needed Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 7/7] tun: report new tun feature IFF_HASH Yuri Benditovich
2021-01-12 19:49 ` [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
2021-01-12 20:28 ` Yuri Benditovich
2021-01-12 23:47 ` Willem de Bruijn
2021-01-13 4:05 ` Jason Wang
2021-01-13 14:33 ` Willem de Bruijn [this message]
2021-01-14 3:38 ` Jason Wang
2021-01-17 7:57 ` Yuri Benditovich
2021-01-18 2:46 ` Jason Wang
2021-01-18 9:09 ` Yuri Benditovich
2021-01-18 15:19 ` Willem de Bruijn
-- strict thread matches above, loose matches on Subject: below --
2021-01-05 12:24 Yuri Benditovich
2021-01-05 17:21 ` Willem de Bruijn
2021-01-12 19:36 ` Yuri Benditovich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+FuTSfJJhEYr6gXmjpjjXzg6Xm5wWa-dL1SEV-Zt7RcPXGztg@mail.gmail.com \
--to=willemdebruijn.kernel@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cai@lca.pw \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=elver@google.com \
--cc=gustavoars@kernel.org \
--cc=herbert@gondor.apana.org.au \
--cc=jakub@cloudflare.com \
--cc=jasowang@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pablo@netfilter.org \
--cc=rdunlap@infradead.org \
--cc=songliubraving@fb.com \
--cc=steffen.klassert@secunet.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=yan@daynix.com \
--cc=yhs@fb.com \
--cc=yuri.benditovich@daynix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).