All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/7] bpf: Propagate cn to TCP
@ 2019-03-23  8:05 brakmo
  2019-03-23  8:05 ` [PATCH bpf-next 1/7] bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY brakmo
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: brakmo @ 2019-03-23  8:05 UTC (permalink / raw)
  To: netdev
  Cc: Martin Lau, Alexei Starovoitov, Daniel Borkmann, Eric Dumazet,
	Kernel Team

This patchset adds support for propagating congestion notifications (cn)
to TCP from cgroup inet skb egress BPF programs.

Current cgroup skb BPF programs cannot trigger TCP congestion window
reductions, even when they drop a packet. This patch-set adds support
for cgroup skb BPF programs to send congestion notifications in the
return value when the packets are TCP packets. Rather than the
current 1 for keeping the packet and 0 for dropping it, they can
now return:
    NET_XMIT_SUCCESS    (0)    - continue with packet output
    NET_XMIT_DROP       (1)    - drop packet and do cn
    NET_XMIT_CN         (2)    - continue with packet output and do cn
    -EPERM                     - drop packet

There is also support for setting the probe timer to a small value,
specified by a sysctl, when a packet is dropped when calling
queue_xmit in __tcp_transmit_skb and there are no other packets in
transit.

In addition, HBM programs are modified to collect and return more
statistics.

The use of congestion notifications improves the performance of HBM when
using Cubic. Without congestion notifications, Cubic will not decrease its
cwnd and HBM will need to drop a large percentage of the packets.
Smaller probe timers improve the performance of Cubic and DCTCP when the
rates are small enough that there are times when HBM cannot send a packet
per RTT in order to mainting the bandwidth limit.

The following results are obtained for rate limits of 1Gbps and 200Mbps,
between two servers using netperf, and only one flow. We also show how
reducing the max delayed ACK timer can improve the performance when
using Cubic. 

A following patch will add support for fq's Earliest Departure Time (EDT).

The command used was:
  ./do_hbm_test.sh -l -D --stats -N -r=<rate> [--no_cn] [dctcp] \
                   -s=<server running netserver>
  where:
     <rate>   is 1000 or 200
     --no_cn  specifies no cwr notifications
     dctcp    use of dctcp

                       Cubic                    DCTCP
Lim,Prob,DA    Mbps cwnd cred drops  Mbps cwnd cred drops
------------   ---- ---- ---- -----  ---- ---- ---- -----
  1G, 0,40       35  462 -320 67%     995    1 -212  0.05%
  1G, 0,40,cn   349    3 -229  0.15   995    1 -212  0.05
  1G, 0, 5,cn   941    2 -189  0.13   995    1 -212  0.05

200M, 0,40,cn    50    3 -152  0.34    31    3 -203  0.50
200M, 0, 5,cn    43    2 -202  0.48    33    3 -199  0.50
200M,20, 5,cn   199    2 -209  0.38   199    1 -214  0.30

Notes:
  --no_cn has no effect with DCTCP
  Lim = rate limit
  Prob = Probe timer
  DA = maximum delay ack timer
  cred = credit in packets
  drops = % packets dropped

brakmo (7):
  bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY
  bpf: cgroup inet skb programs can return 0 to 3
  bpf: Update __cgroup_bpf_run_filter_skb with cn
  bpf: Update BPF_CGROUP_RUN_PROG_INET_EGRESS calls
  bpf: sysctl for probe_on_drop
  bpf: Add cn support to hbm_out_kern.c
  bpf: Add more stats to HBM

 include/linux/bpf.h        | 50 +++++++++++++++++++++++++++++
 include/linux/filter.h     |  3 +-
 include/net/netns/ipv4.h   |  1 +
 kernel/bpf/cgroup.c        | 25 ++++++++++++---
 kernel/bpf/syscall.c       | 12 +++++++
 kernel/bpf/verifier.c      | 16 +++++++--
 net/ipv4/ip_output.c       | 39 ++++++++++++----------
 net/ipv4/sysctl_net_ipv4.c | 10 ++++++
 net/ipv4/tcp_ipv4.c        |  1 +
 net/ipv4/tcp_output.c      | 18 +++++++++--
 net/ipv6/ip6_output.c      | 22 +++++++------
 samples/bpf/do_hbm_test.sh | 10 ++++--
 samples/bpf/hbm.c          | 51 +++++++++++++++++++++++++++--
 samples/bpf/hbm.h          |  9 +++++-
 samples/bpf/hbm_kern.h     | 66 ++++++++++++++++++++++++++++++++++++--
 samples/bpf/hbm_out_kern.c | 48 +++++++++++++++++++--------
 16 files changed, 321 insertions(+), 60 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-03-26 18:07 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-23  8:05 [PATCH bpf-next 0/7] bpf: Propagate cn to TCP brakmo
2019-03-23  8:05 ` [PATCH bpf-next 1/7] bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY brakmo
2019-03-23  8:05 ` [PATCH bpf-next 2/7] bpf: cgroup inet skb programs can return 0 to 3 brakmo
2019-03-23  8:05 ` [PATCH bpf-next 3/7] bpf: Update __cgroup_bpf_run_filter_skb with cn brakmo
2019-03-23  8:05 ` [PATCH bpf-next 4/7] bpf: Update BPF_CGROUP_RUN_PROG_INET_EGRESS calls brakmo
2019-03-23  8:05 ` [PATCH bpf-next 5/7] bpf: sysctl for probe_on_drop brakmo
2019-03-23  8:05 ` [PATCH bpf-next 6/7] bpf: Add cn support to hbm_out_kern.c brakmo
2019-03-23  8:05 ` [PATCH bpf-next 7/7] bpf: Add more stats to HBM brakmo
2019-03-23  9:12 ` [PATCH bpf-next 0/7] bpf: Propagate cn to TCP Eric Dumazet
2019-03-23 15:41   ` Alexei Starovoitov
2019-03-24  5:36     ` Eric Dumazet
2019-03-24 16:19       ` Alexei Starovoitov
2019-03-25  8:33         ` Eric Dumazet
2019-03-25  8:48           ` Eric Dumazet
2019-03-26  4:27             ` Alexei Starovoitov
2019-03-26  8:06               ` Eric Dumazet
2019-03-26 15:07                 ` Alexei Starovoitov
2019-03-26 15:43                   ` Eric Dumazet
2019-03-26 17:01                     ` Alexei Starovoitov
2019-03-26 18:07                       ` Eric Dumazet
2019-03-26  8:13               ` Eric Dumazet
2019-03-24  5:48     ` Eric Dumazet
2019-03-24  1:14   ` Lawrence Brakmo
2019-03-24  5:58     ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.