From: Willem de Bruijn <email@example.com> To: Yuri Benditovich <firstname.lastname@example.org> Cc: Jason Wang <email@example.com>, Willem de Bruijn <firstname.lastname@example.org>, "David S. Miller" <email@example.com>, Jakub Kicinski <firstname.lastname@example.org>, "Michael S . Tsirkin" <email@example.com>, Alexei Starovoitov <firstname.lastname@example.org>, Daniel Borkmann <email@example.com>, Andrii Nakryiko <firstname.lastname@example.org>, Martin KaFai Lau <email@example.com>, Song Liu <firstname.lastname@example.org>, Yonghong Song <email@example.com>, John Fastabend <firstname.lastname@example.org>, KP Singh <email@example.com>, Randy Dunlap <firstname.lastname@example.org>, "Gustavo A . R . Silva" <email@example.com>, Herbert Xu <firstname.lastname@example.org>, Steffen Klassert <email@example.com>, Pablo Neira Ayuso <firstname.lastname@example.org>, email@example.com, firstname.lastname@example.org, Jakub Sitnicki <email@example.com>, Marco Elver <firstname.lastname@example.org>, Paolo Abeni <email@example.com>, Network Development <firstname.lastname@example.org>, linux-kernel <email@example.com>, firstname.lastname@example.org, email@example.com, bpf <firstname.lastname@example.org>, Yan Vugenfirer <email@example.com> Subject: Re: [RFC PATCH 0/7] Support for virtio-net hash reporting Date: Mon, 18 Jan 2021 10:19:57 -0500 Message-ID: <CA+FuTSfsFC0DTFhHDwT7dbtWXTmGOWjc=ozt8CgH_qDDn9gejg@mail.gmail.com> (raw) In-Reply-To: <CAOEp5Oe4TcOukJa+OGj-ynfMMrZC=_YQDpzSC9_9p+UXSH7hmg@mail.gmail.com> > > >>>>> What it does not give is a type indication, such as > > >>>>> VIRTIO_NET_HASH_TYPE_TCPv6. I don't understand how this would be used. > > >>>>> In datapaths where the NIC has already computed the four-tuple hash > > >>>>> and stored it in skb->hash --the common case for servers--, That type > > >>>>> field is the only reason to have to compute again. > > >>>> The problem is there's no guarantee that the packet comes from the NIC, > > >>>> it could be a simple VM2VM or host2VM packet. > > >>>> > > >>>> And even if the packet is coming from the NIC that calculates the hash > > >>>> there's no guarantee that it's the has that guest want (guest may use > > >>>> different RSS keys). > > >>> Ah yes, of course. > > >>> > > >>> I would still revisit the need to store a detailed hash_type along with > > >>> the hash, as as far I can tell that conveys no actionable information > > >>> to the guest. > > >> > > >> Yes, need to figure out its usage. According to , it only mention > > >> that storing has type is a charge of driver. Maybe Yuri can answer this. > > >> > > > For the case of Windows VM we can't know how exactly the network stack > > > uses provided hash data (including hash type). But: different releases > > > of Windows > > > enable different hash types (for example UDP hash is enabled only on > > > Server 2016 and up). > > > > > > Indeed the Windows requires a little more from the network adapter/driver > > > than Linux does. > > > > > > The addition of RSS support to virtio specification takes in account > > > the widest set of > > > requirements (i.e. Windows one), our initial impression is that this > > > should be enough also for Linux. > > > > > > The NDIS specification in part of RSS is _mandatory_ and there are > > > certification tests > > > that check that the driver provides the hash data as expected. All the > > > high-performance > > > network adapters have such RSS functionality in the hardware. Thanks for the context. If Windows requires the driver to pass the hash-type along with the hash data, then indeed this will be needed. If it only requires the device to support a subset of of the possible types, chosen at init, that would be different and it would be cheaper for the driver to pass this config to the device one time. > > > With pre-RSS QEMU (i.e. where the virtio-net device does not indicate > > > the RSS support) > > > the virtio-net driver for Windows does all the job related to RSS: > > > - hash calculation > > > - hash/hash_type delivery > > > - reporting each packet on the correct CPU according to RSS settings > > > > > > With RSS support in QEMU all the packets always come on a proper CPU and > > > the driver never needs to reschedule them. The driver still need to > > > calculate the > > > hash and report it to Windows. In this case we do the same job twice: the device > > > (QEMU or eBPF) does calculate the hash and get proper queue/CPU to deliver > > > the packet. But the hash is not delivered by the device, so the driver needs to > > > recalculate it and report to the Windows. > > > > > > If we add HASH_REPORT support (current set of patches) and the device > > > indicates this > > > feature we can avoid hash recalculation in the driver assuming we > > > receive the correct hash > > > value and hash type. Otherwise the driver can't know which exactly > > > hash the device has calculated. > > > > > > Please let me know if I did not answer the question. > > > > > > I think I get you. The hash type is also a kind of classification (e.g > > TCP or UDP). Any possibility that it can be deduced from the driver? (Or > > it could be too expensive to do that). > > > The driver does it today (when the device does not offer any features) > and of course can continue doing it. > IMO if the device can't report the data according to the spec it > should not indicate support for the respective feature (or fallback to > vhost=off). > Again, IMO if Linux does not need the exact hash_type we can use (for > Linux) the way that Willem de Brujin suggested in his patchset: > - just add VIRTIO_NET_HASH_REPORT_L4 to the spec > - Linux can use MQ + hash delivery (and use VIRTIO_NET_HASH_REPORT_L4) > - Linux can use (if makes sense) RSS with VIRTIO_NET_HASH_REPORT_L4 and eBPF > - Windows gets what it needs + eBPF > So, everyone has what they need at the respective cost. > > Regarding use of skb->cb for hash type: > Currently, if I'm not mistaken, there are 2 bytes at the end of skb->cb: > skb->cb is 48 bytes array > There is skb_gso_cb (14 bytes) at offset SKB_GSO_CB_OFFSET(32) > Is it possible to use one of these 2 bytes for hash_type? > If yes, shall we extend the skb_gso_cb and place the 1-bytes hash_type > in it or just emit compilation error if the skb_gso_cb grows beyond 15 > bytes? Good catch on segmentation taking place between .ndo_select_queue and .ndo_start_xmit. That also means that whatever field in the skb is used, has to be copied to all segments in skb_segment. Which happens for cb. But this feature is completely unrelated to the skb_gso_cb type. Perhaps another field with a real type is more clear. For instance, an extension to the union with napi_id and sender_cpu, as neither is used in this egress path with .ndo_select_queue?
next prev parent reply index Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-01-12 19:41 Yuri Benditovich 2021-01-12 19:41 ` [RFC PATCH 1/7] skbuff: define field for hash report type Yuri Benditovich 2021-01-12 19:41 ` [RFC PATCH 2/7] vhost: support for hash report virtio-net feature Yuri Benditovich 2021-01-12 19:41 ` [RFC PATCH 3/7] tun: allow use of BPF_PROG_TYPE_SCHED_CLS program type Yuri Benditovich 2021-01-12 19:46 ` Alexei Starovoitov 2021-01-12 20:33 ` Yuri Benditovich 2021-01-12 20:40 ` Yuri Benditovich 2021-01-12 20:55 ` Yuri Benditovich 2021-01-18 9:16 ` Yuri Benditovich 2021-01-20 18:44 ` Alexei Starovoitov 2021-01-24 11:52 ` Yuri Benditovich 2021-01-12 19:41 ` [RFC PATCH 4/7] tun: free bpf_program by bpf_prog_put instead of bpf_prog_destroy Yuri Benditovich 2021-01-12 19:41 ` [RFC PATCH 5/7] tun: add ioctl code TUNSETHASHPOPULATION Yuri Benditovich 2021-01-12 19:41 ` [RFC PATCH 6/7] tun: populate hash in virtio-net header when needed Yuri Benditovich 2021-01-12 19:41 ` [RFC PATCH 7/7] tun: report new tun feature IFF_HASH Yuri Benditovich 2021-01-12 19:49 ` [RFC PATCH 0/7] Support for virtio-net hash reporting Yuri Benditovich 2021-01-12 20:28 ` Yuri Benditovich 2021-01-12 23:47 ` Willem de Bruijn 2021-01-13 4:05 ` Jason Wang 2021-01-13 14:33 ` Willem de Bruijn 2021-01-14 3:38 ` Jason Wang 2021-01-17 7:57 ` Yuri Benditovich 2021-01-18 2:46 ` Jason Wang 2021-01-18 9:09 ` Yuri Benditovich 2021-01-18 15:19 ` Willem de Bruijn [this message] -- strict thread matches above, loose matches on Subject: below -- 2021-01-05 12:24 Yuri Benditovich 2021-01-05 17:21 ` Willem de Bruijn 2021-01-12 19:36 ` Yuri Benditovich
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CA+FuTSfsFC0DTFhHDwT7dbtWXTmGOWjc=ozt8CgH_qDDn9gejg@mail.gmail.com' \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
BPF Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \ firstname.lastname@example.org public-inbox-index bpf Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.bpf AGPL code for this site: git clone https://public-inbox.org/public-inbox.git