netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amery Hung <ameryhung@gmail.com>
To: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, toke@redhat.com,
	jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com,
	xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com
Subject: [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc
Date: Wed, 17 Jan 2024 21:56:16 +0000	[thread overview]
Message-ID: <cover.1705432850.git.amery.hung@bytedance.com> (raw)

Hi, 

I am continuing the work of ebpf-based Qdisc based on Cong’s previous
RFC. The followings are some use cases of eBPF Qdisc:

1. Allow customizing Qdiscs in an easier way. So that people don't
   have to write a complete Qdisc kernel module just to experiment
   some new queuing theory.

2. Solve EDT's problem. EDT calcuates the "tokens" in clsact which
   is before enqueue, it is impossible to adjust those "tokens" after
   packets get dropped in enqueue. With eBPF Qdisc, it is easy to
   be solved with a shared map between clsact and sch_bpf.

3. Replace qevents, as now the user gains much more control over the
   skb and queues.

4. Provide a new way to reuse TC filters. Currently TC relies on filter
   chain and block to reuse the TC filters, but they are too complicated
   to understand. With eBPF helper bpf_skb_tc_classify(), we can invoke
   TC filters on _any_ Qdisc (even on a different netdev) to do the
   classification.

5. Potentially pave a way for ingress to queue packets, although
   current implementation is still only for egress.

I’ve combed through previous comments and appreciated the feedbacks.
Some major changes in this RFC is the use of kptr to skb to maintain
the validility of skb during its lifetime in the Qdisc, dropping rbtree
maps, and the inclusion of two examples. 

Some questions for discussion:

1. We now pass a trusted kptr of sk_buff to the program instead of
   __sk_buff. This makes most helpers using __sk_buff incompatible
   with eBPF qdisc. An alternative is to still use __sk_buff in the
   context and use bpf_cast_to_kern_ctx() to acquire the kptr. However,
   this can only be applied to enqueue program, since in dequeue program
   skbs do not come from ctx but kptrs exchanged out of maps (i.e., there
   is no __sk_buff). Any suggestion for making skb kptr and helper
   functions compatible?

2. The current patchset uses netlink. Do we also want to use bpf_link
   for attachment?

3. People have suggested struct_ops. We chose not to use struct_ops since
   users might want to create multiple bpf qdiscs with different
   implementations. Current struct_ops attachment model does not seem
   to support replacing only functions of a specific instance of a module,
   but I might be wrong.

Todo:
  - Add selftest

  - Make bpf list/rbtree use list/rbnode in skb so that developers
    don't need to allocate bpf objects for storing skb kptrs.

Note:
  - This patchset requires bpf support of exchanging kptr into allocated
    objects (local kptr), which Dave Marchevsky is working on.

  - The user space programs in the sample are adapted from the example
    Peihao Yang written in RFC v5 thread.

---
v7: Reference skb using kptr to sk_buff instead of __sk_buff
    Use the new bpf rbtree/link to for skb queues
    Add reset and init programs
    Add a bpf fq qdisc sample
    Add a bpf netem qdisc sample

v6: switch to kptr based approach

v5: mv kernel/bpf/skb_map.c net/core/skb_map.c
    implement flow map as map-in-map
    rename bpf_skb_tc_classify() and move it to net/sched/cls_api.c
    clean up eBPF qdisc program context

v4: get rid of PIFO, use rbtree directly

v3: move priority queue from sch_bpf to skb map
    introduce skb map and its helpers
    introduce bpf_skb_classify()
    use netdevice notifier to reset skb's
    Rebase on latest bpf-next

v2: Rebase on latest net-next
    Make the code more complete (but still incomplete)

Amery Hung (5):
  net_sched: Add reset program
  net_sched: Add init program
  tools/libbpf: Add support for BPF_PROG_TYPE_QDISC
  samples/bpf: Add an example of bpf fq qdisc
  samples/bpf: Add an example of bpf netem qdisc

Cong Wang (3):
  net_sched: Introduce eBPF based Qdisc
  net_sched: Add kfuncs for working with skb
  net_sched: Introduce kfunc bpf_skb_tc_classify()

 include/linux/bpf_types.h       |   4 +
 include/uapi/linux/bpf.h        |  23 +
 include/uapi/linux/pkt_sched.h  |  24 ++
 kernel/bpf/btf.c                |   5 +
 kernel/bpf/helpers.c            |   1 +
 kernel/bpf/syscall.c            |  10 +
 net/core/filter.c               | 100 +++++
 net/sched/Kconfig               |  15 +
 net/sched/Makefile              |   1 +
 net/sched/sch_bpf.c             | 729 ++++++++++++++++++++++++++++++++
 samples/bpf/Makefile            |  14 +-
 samples/bpf/bpf_experimental.h  | 134 ++++++
 samples/bpf/tc_clsact_edt.bpf.c | 103 +++++
 samples/bpf/tc_sch_fq.bpf.c     | 666 +++++++++++++++++++++++++++++
 samples/bpf/tc_sch_fq.c         | 321 ++++++++++++++
 samples/bpf/tc_sch_netem.bpf.c  | 256 +++++++++++
 samples/bpf/tc_sch_netem.c      | 347 +++++++++++++++
 tools/include/uapi/linux/bpf.h  |  23 +
 tools/lib/bpf/libbpf.c          |   4 +
 19 files changed, 2779 insertions(+), 1 deletion(-)
 create mode 100644 net/sched/sch_bpf.c
 create mode 100644 samples/bpf/bpf_experimental.h
 create mode 100644 samples/bpf/tc_clsact_edt.bpf.c
 create mode 100644 samples/bpf/tc_sch_fq.bpf.c
 create mode 100644 samples/bpf/tc_sch_fq.c
 create mode 100644 samples/bpf/tc_sch_netem.bpf.c
 create mode 100644 samples/bpf/tc_sch_netem.c

-- 
2.20.1


             reply	other threads:[~2024-01-17 21:56 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 21:56 Amery Hung [this message]
2024-01-17 21:56 ` [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc Amery Hung
2024-01-23 23:51   ` Martin KaFai Lau
2024-01-24  5:22     ` Amery Hung
2024-01-26  2:22       ` Martin KaFai Lau
2024-01-27  1:17         ` Amery Hung
2024-01-30  6:39           ` Martin KaFai Lau
2024-01-30 17:49             ` Kui-Feng Lee
2024-01-31  1:01               ` Martin KaFai Lau
2024-01-31 16:49                 ` Kui-Feng Lee
2024-01-31 16:59                   ` Amery Hung
2024-01-31 16:23             ` Amery Hung
2024-02-02  1:47               ` Martin KaFai Lau
2024-02-09 20:14                 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 2/8] net_sched: Add kfuncs for working with skb Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 3/8] net_sched: Introduce kfunc bpf_skb_tc_classify() Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 4/8] net_sched: Add reset program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 5/8] net_sched: Add init program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 6/8] tools/libbpf: Add support for BPF_PROG_TYPE_QDISC Amery Hung
2024-01-23  0:17   ` Andrii Nakryiko
2024-01-23 19:40     ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 7/8] samples/bpf: Add an example of bpf fq qdisc Amery Hung
2024-01-24 10:29   ` Daniel Borkmann
2024-01-26 19:49     ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 8/8] samples/bpf: Add an example of bpf netem qdisc Amery Hung
2024-01-23 21:13 ` [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc Stanislav Fomichev
2024-01-24 10:10   ` Daniel Borkmann
2024-01-24 12:09   ` Jamal Hadi Salim
2024-01-24 13:07     ` Daniel Borkmann
2024-01-24 14:11       ` Jamal Hadi Salim
2024-01-24 15:26         ` Daniel Borkmann
2024-01-24 21:26           ` Amery Hung
2024-01-25 11:57             ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1705432850.git.amery.hung@bytedance.com \
    --to=ameryhung@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=toke@redhat.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yangpeihao@sjtu.edu.cn \
    --cc=yepeilin.cs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).