All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Boqun Feng <boqun.feng@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Eric Dumazet <edumazet@google.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>, Will Deacon <will@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Cong Wang <xiyou.wangcong@gmail.com>, Hao Luo <haoluo@google.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Jiri Olsa <jolsa@kernel.org>, Jiri Pirko <jiri@resnulli.us>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Ronak Doshi <doshir@vmware.com>, Song Liu <song@kernel.org>,
	Stanislav Fomichev <sdf@google.com>,
	VMware PV-Drivers Reviewers <pv-drivers@vmware.com>,
	Yonghong Song <yonghong.song@linux.dev>,
	bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect.
Date: Thu, 18 Jan 2024 08:35:40 +0100	[thread overview]
Message-ID: <20240118073540.GIobmYpD@linutronix.de> (raw)
In-Reply-To: <87ttnb6hme.fsf@toke.dk>

On 2024-01-17 17:37:29 [+0100], Toke Høiland-Jørgensen wrote:
> This is all back-of-the-envelope calculations, of course. Having some
> actual numbers to look at would be great; I don't suppose you have a
> setup where you can run xdp-bench and see how your patches affect the
> throughput?

No but I probably could set it up.

> I chatted with Jesper about this, and he had an idea not too far from
> this: split up the XDP and regular stack processing in two stages, each
> with their individual batching. So whereas right now we're doing
> something like:
> 
> run_napi()
>   bh_disable()
>   for pkt in budget:
>     act = run_xdp(pkt)
>     if (act == XDP_PASS)
>       run_netstack(pkt)  // this is the expensive bit
>   bh_enable()
> 
> We could instead do:
> 
> run_napi()
>   bh_disable()
>   for pkt in budget:
>     act = run_xdp(pkt)
>     if (act == XDP_PASS)
>       add_to_list(pkt, to_stack_list)
>   bh_enable()
>   // sched point
>   bh_disable()
>   for pkt in to_stack_list:
>     run_netstack(pkt)
>   bh_enable()
> 
> 
> This would limit the batching that blocks everything to only the XDP
> processing itself, which should limit the maximum time spent in the
> blocking state significantly compared to what we have today. The caveat
> being that rearranging things like this is potentially a pretty major
> refactoring task that needs to touch all the drivers (even if some of
> the logic can be moved into the core code in the process). So not really
> sure if this approach is feasible, TBH.

This does not work because bh_disable() does not disable scheduling.
Scheduling may happen. bh_disable() acquires a lock which is currently
the only synchronisation point between two say network driver doing
NAPI. And this what I want to get rid of.
Regarding expensive bit as in XDP_PASS: This doesn't need locking as per
proposal, just the REDIRECT piece.

> > Daniel said netkit doesn't need this locking because it is not
> > supporting this redirect and it made me think. Would it work to make
> > the redirect structures part of the bpf_prog-structure instead of
> > per-CPU? My understanding is that eBPF's programs data structures are
> > part of it and contain locking allowing one eBPF program preempt
> > another one.
> > Having the redirect structures part of the program would obsolete
> > locking. Do I miss anything?
> 
> This won't work, unfortunately: the same XDP program can be attached to
> multiple interfaces simultaneously, and for hardware with multiple
> receive queues (which is most of the hardware that supports XDP), it can
> even run simultaneously on multiple CPUs on the same interface. This is
> the reason why this is all being kept in per-CPU variables today.

So I started hacking this and noticed yesterday and noticed that you can
run multiple bpf programs. This is how I learned that it won't work.
My plan B is now to move it into task_struct. 

> -Toke
Sebastian

  parent reply	other threads:[~2024-01-18  7:35 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-15 17:07 [PATCH net-next 00/24] locking: Introduce nested-BH locking Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 01/24] locking/local_lock: Introduce guard definition for local_lock Sebastian Andrzej Siewior
2023-12-18  8:16   ` Paolo Abeni
2024-01-11 16:19     ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 02/24] locking/local_lock: Add local nested BH locking infrastructure Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 03/24] net: Use __napi_alloc_frag_align() instead of open coding it Sebastian Andrzej Siewior
2023-12-18  7:48   ` Paolo Abeni
2024-01-12  9:01     ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 04/24] net: Use nested-BH locking for napi_alloc_cache Sebastian Andrzej Siewior
2023-12-16  4:43   ` kernel test robot
2024-01-12 10:58     ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 05/24] net/tcp_sigpool: Use nested-BH locking for sigpool_scratch Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 06/24] net/ipv4: Use nested-BH locking for ipv4_tcp_sk Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 07/24] netfilter: br_netfilter: Use nested-BH locking for brnf_frag_data_storage Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 08/24] net: softnet_data: Make xmit.recursion per task Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 09/24] dev: Use the RPS lock for softnet_data::input_pkt_queue on PREEMPT_RT Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 10/24] dev: Use nested-BH locking for softnet_data.process_queue Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 11/24] lwt: Don't disable migration prio invoking BPF Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 12/24] seg6: Use nested-BH locking for seg6_bpf_srh_states Sebastian Andrzej Siewior
2023-12-16  3:39   ` kernel test robot
2023-12-18  8:33   ` Paolo Abeni
2024-01-12 11:23     ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 13/24] net: Use nested-BH locking for bpf_scratchpad Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 14/24] net: Add a lock which held during the redirect process Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect Sebastian Andrzej Siewior
2023-12-16  4:12   ` kernel test robot
2023-12-20  0:25   ` Alexei Starovoitov
2024-01-04 19:29     ` Toke Høiland-Jørgensen
2024-01-12 17:41       ` Sebastian Andrzej Siewior
2024-01-17 16:37         ` Toke Høiland-Jørgensen
2024-01-18  2:04           ` Jakub Kicinski
2024-01-18  8:27             ` Sebastian Andrzej Siewior
2024-01-18 16:38               ` Jakub Kicinski
2024-01-18 16:50                 ` Sebastian Andrzej Siewior
2024-01-18 11:51             ` Toke Høiland-Jørgensen
2024-01-18 16:37               ` Jakub Kicinski
2024-01-20 14:41                 ` Toke Høiland-Jørgensen
2024-01-18  7:35           ` Sebastian Andrzej Siewior [this message]
2024-01-18 11:58             ` Toke Høiland-Jørgensen
2023-12-15 17:07 ` [PATCH net-next 16/24] net: netkit, veth, tun, virt*: " Sebastian Andrzej Siewior
2023-12-16 19:28   ` kernel test robot
2023-12-18  8:52   ` Daniel Borkmann
2024-01-12 15:37     ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 17/24] net: amazon, aquanti, broadcom, cavium, engleder: " Sebastian Andrzej Siewior
2023-12-16 22:09   ` Kiyanovski, Arthur
2024-01-12 17:53     ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 18/24] net: Freescale: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 19/24] net: fungible, gve, mtk, microchip, mana: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 20/24] net: intel: " Sebastian Andrzej Siewior
2023-12-15 17:07   ` [Intel-wired-lan] " Sebastian Andrzej Siewior
2023-12-16  4:53   ` kernel test robot
2023-12-16  4:53     ` [Intel-wired-lan] " kernel test robot
2023-12-19  0:01     ` Nathan Chancellor
2023-12-19  0:01       ` [Intel-wired-lan] " Nathan Chancellor
2023-12-19 16:55       ` Nick Desaulniers
2023-12-19 16:55         ` [Intel-wired-lan] " Nick Desaulniers
2023-12-15 17:07 ` [PATCH net-next 21/24] net: marvell: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 22/24] net: mellanox, nfp, sfc: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 23/24] net: qlogic, socionext, stmmac, cpsw: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 24/24] net: bpf: Add lockdep assert for the redirect process Sebastian Andrzej Siewior
2023-12-15 22:50 ` [PATCH net-next 00/24] locking: Introduce nested-BH locking Jakub Kicinski
2023-12-18 17:23   ` Sebastian Andrzej Siewior
2023-12-19  0:41     ` Jakub Kicinski
2023-12-21 20:46       ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240118073540.GIobmYpD@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=boqun.feng@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=doshir@vmware.com \
    --cc=edumazet@google.com \
    --cc=frederic@kernel.org \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=martin.lau@linux.dev \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pv-drivers@vmware.com \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=toke@redhat.com \
    --cc=will@kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.