BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Yuri Benditovich <yuri.benditovich@daynix.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	rdunlap@infradead.org,
	"Gustavo A . R . Silva" <gustavoars@kernel.org>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Steffen Klassert <steffen.klassert@secunet.com>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	decui@microsoft.com, cai@lca.pw,
	Jakub Sitnicki <jakub@cloudflare.com>,
	Marco Elver <elver@google.com>, Paolo Abeni <pabeni@redhat.com>,
	Network Development <netdev@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
	bpf <bpf@vger.kernel.org>, Yan Vugenfirer <yan@daynix.com>
Subject: Re: [RFC PATCH 0/7] Support for virtio-net hash reporting
Date: Thu, 14 Jan 2021 11:38:48 +0800
Message-ID: <8ea218a8-a068-1ed9-929d-67ad30111c3c@redhat.com> (raw)
In-Reply-To: <CA+FuTSfJJhEYr6gXmjpjjXzg6Xm5wWa-dL1SEV-Zt7RcPXGztg@mail.gmail.com>


On 2021/1/13 下午10:33, Willem de Bruijn wrote:
> On Tue, Jan 12, 2021 at 11:11 PM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2021/1/13 上午7:47, Willem de Bruijn wrote:
>>> On Tue, Jan 12, 2021 at 3:29 PM Yuri Benditovich
>>> <yuri.benditovich@daynix.com> wrote:
>>>> On Tue, Jan 12, 2021 at 9:49 PM Yuri Benditovich
>>>> <yuri.benditovich@daynix.com> wrote:
>>>>> On Tue, Jan 12, 2021 at 9:41 PM Yuri Benditovich
>>>>> <yuri.benditovich@daynix.com> wrote:
>>>>>> Existing TUN module is able to use provided "steering eBPF" to
>>>>>> calculate per-packet hash and derive the destination queue to
>>>>>> place the packet to. The eBPF uses mapped configuration data
>>>>>> containing a key for hash calculation and indirection table
>>>>>> with array of queues' indices.
>>>>>>
>>>>>> This series of patches adds support for virtio-net hash reporting
>>>>>> feature as defined in virtio specification. It extends the TUN module
>>>>>> and the "steering eBPF" as follows:
>>>>>>
>>>>>> Extended steering eBPF calculates the hash value and hash type, keeps
>>>>>> hash value in the skb->hash and returns index of destination virtqueue
>>>>>> and the type of the hash. TUN module keeps returned hash type in
>>>>>> (currently unused) field of the skb.
>>>>>> skb->__unused renamed to 'hash_report_type'.
>>>>>>
>>>>>> When TUN module is called later to allocate and fill the virtio-net
>>>>>> header and push it to destination virtqueue it populates the hash
>>>>>> and the hash type into virtio-net header.
>>>>>>
>>>>>> VHOST driver is made aware of respective virtio-net feature that
>>>>>> extends the virtio-net header to report the hash value and hash report
>>>>>> type.
>>>>> Comment from Willem de Bruijn:
>>>>>
>>>>> Skbuff fields are in short supply. I don't think we need to add one
>>>>> just for this narrow path entirely internal to the tun device.
>>>>>
>>>> We understand that and try to minimize the impact by using an already
>>>> existing unused field of skb.
>>> Not anymore. It was repurposed as a flags field very recently.
>>>
>>> This use case is also very narrow in scope. And a very short path from
>>> data producer to consumer. So I don't think it needs to claim scarce
>>> bits in the skb.
>>>
>>> tun_ebpf_select_queue stores the field, tun_put_user reads it and
>>> converts it to the virtio_net_hdr in the descriptor.
>>>
>>> tun_ebpf_select_queue is called from .ndo_select_queue.  Storing the
>>> field in skb->cb is fragile, as in theory some code could overwrite
>>> that between field between ndo_select_queue and
>>> ndo_start_xmit/tun_net_xmit, from which point it is fully under tun
>>> control again. But in practice, I don't believe anything does.
>>>
>>> Alternatively an existing skb field that is used only on disjoint
>>> datapaths, such as ingress-only, could be viable.
>>
>> A question here. We had metadata support in XDP for cooperation between
>> eBPF programs. Do we have something similar in the skb?
>>
>> E.g in the RSS, if we want to pass some metadata information between
>> eBPF program and the logic that generates the vnet header (either hard
>> logic in the kernel or another eBPF program). Is there any way that can
>> avoid the possible conflicts of qdiscs?
> Not that I am aware of. The closest thing is cb[].
>
> It'll have to aliase a field like that, that is known unused for the given path.


Right, we need to make sure cb is not used by other ones. I'm not sure 
how hard to achieve that consider Qemu installs the eBPF program but it 
doesn't deal with networking configurations.


>
> One other approach that has been used within linear call stacks is out
> of band. Like percpu variables softnet_data.xmit.more and
> mirred_rec_level. But that is perhaps a bit overwrought for this use
> case.


Yes, and if we go that way then eBPF turns out to be a burden since we 
need to invent helpers to access those auxiliary data structure. It 
would be better then to hard-coded the RSS in the kernel.


>
>>>>> Instead, you could just run the flow_dissector in tun_put_user if the
>>>>> feature is negotiated. Indeed, the flow dissector seems more apt to me
>>>>> than BPF here. Note that the flow dissector internally can be
>>>>> overridden by a BPF program if the admin so chooses.
>>>>>
>>>> When this set of patches is related to hash delivery in the virtio-net
>>>> packet in general,
>>>> it was prepared in context of RSS feature implementation as defined in
>>>> virtio spec [1]
>>>> In case of RSS it is not enough to run the flow_dissector in tun_put_user:
>>>> in tun_ebpf_select_queue the TUN calls eBPF to calculate the hash,
>>>> hash type and queue index
>>>> according to the (mapped) parameters (key, hash types, indirection
>>>> table) received from the guest.
>>> TUNSETSTEERINGEBPF was added to support more diverse queue selection
>>> than the default in case of multiqueue tun. Not sure what the exact
>>> use cases are.
>>>
>>> But RSS is exactly the purpose of the flow dissector. It is used for
>>> that purpose in the software variant RPS. The flow dissector
>>> implements a superset of the RSS spec, and certainly computes a
>>> four-tuple for TCP/IPv6. In the case of RPS, it is skipped if the NIC
>>> has already computed a 4-tuple hash.
>>>
>>> What it does not give is a type indication, such as
>>> VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used.
>>> In datapaths where the NIC has already computed the four-tuple hash
>>> and stored it in skb->hash --the common case for servers--, That type
>>> field is the only reason to have to compute again.
>>
>> The problem is there's no guarantee that the packet comes from the NIC,
>> it could be a simple VM2VM or host2VM packet.
>>
>> And even if the packet is coming from the NIC that calculates the hash
>> there's no guarantee that it's the has that guest want (guest may use
>> different RSS keys).
> Ah yes, of course.
>
> I would still revisit the need to store a detailed hash_type along with
> the hash, as as far I can tell that conveys no actionable information
> to the guest.


Yes, need to figure out its usage. According to [1], it only mention 
that storing has type is a charge of driver. Maybe Yuri can answer this.

Thanks

[1] 
https://docs.microsoft.com/en-us/windows-hardware/drivers/network/indicating-rss-receive-data


>


  reply index

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-12 19:41 Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 1/7] skbuff: define field for hash report type Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 2/7] vhost: support for hash report virtio-net feature Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type Yuri Benditovich
2021-01-12 19:46   ` Alexei Starovoitov
2021-01-12 20:33     ` Yuri Benditovich
2021-01-12 20:40   ` Yuri Benditovich
2021-01-12 20:55     ` Yuri Benditovich
2021-01-18  9:16       ` Yuri Benditovich
2021-01-20 18:44       ` Alexei Starovoitov
2021-01-24 11:52         ` Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 4/7] tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 5/7] tun: add ioctl code TUNSETHASHPOPULATION Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 6/7] tun: populate hash in virtio-net header when needed Yuri Benditovich
2021-01-12 19:41 ` [RFC PATCH 7/7] tun: report new tun feature IFF_HASH Yuri Benditovich
2021-01-12 19:49 ` [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich
2021-01-12 20:28   ` Yuri Benditovich
2021-01-12 23:47     ` Willem de Bruijn
2021-01-13  4:05       ` Jason Wang
2021-01-13 14:33         ` Willem de Bruijn
2021-01-14  3:38           ` Jason Wang [this message]
2021-01-17  7:57             ` Yuri Benditovich
2021-01-18  2:46               ` Jason Wang
2021-01-18  9:09                 ` Yuri Benditovich
2021-01-18 15:19                   ` Willem de Bruijn
  -- strict thread matches above, loose matches on Subject: below --
2021-01-05 12:24 Yuri Benditovich
2021-01-05 17:21 ` Willem de Bruijn
2021-01-12 19:36   ` Yuri Benditovich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ea218a8-a068-1ed9-929d-67ad30111c3c@redhat.com \
    --to=jasowang@redhat.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cai@lca.pw \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=elver@google.com \
    --cc=gustavoars@kernel.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    --cc=rdunlap@infradead.org \
    --cc=songliubraving@fb.com \
    --cc=steffen.klassert@secunet.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=yan@daynix.com \
    --cc=yhs@fb.com \
    --cc=yuri.benditovich@daynix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git