All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv4 net-next 00/10] net: support ipv4 big tcp
@ 2023-01-28 15:58 Xin Long
  2023-01-28 15:58 ` [PATCHv4 net-next 01/10] net: add a couple of helpers for iph tot_len Xin Long
                   ` (12 more replies)
  0 siblings, 13 replies; 23+ messages in thread
From: Xin Long @ 2023-01-28 15:58 UTC (permalink / raw)
  To: network dev
  Cc: davem, kuba, Eric Dumazet, Paolo Abeni, David Ahern,
	Hideaki YOSHIFUJI, Pravin B Shelar, Jamal Hadi Salim, Cong Wang,
	Jiri Pirko, Pablo Neira Ayuso, Florian Westphal,
	Marcelo Ricardo Leitner, Ilya Maximets, Aaron Conole,
	Roopa Prabhu, Nikolay Aleksandrov, Mahesh Bandewar, Paul Moore,
	Guillaume Nault

This is similar to the BIG TCP patchset added by Eric for IPv6:

  https://lwn.net/Articles/895398/

Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header
doesn't have exthdrs(options) for the BIG TCP packets' length. To make
it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to
indicate this might be a BIG TCP packet and use skb->len as the real
IPv4 total length.

This will work safely, as all BIG TCP packets are GSO/GRO packets and
processed on the same host as they were created; There is no padding
in GSO/GRO packets, and skb->len - network_offset is exactly the IPv4
packet total length; Also, before implementing the feature, all those
places that may get iph tot_len from BIG TCP packets are taken care
with some new APIs:

Patch 1 adds some APIs for iph tot_len setting and getting, which are
used in all these places where IPv4 BIG TCP packets may reach in Patch
2-7, Patch 8 adds a GSO_TCP tp_status for af_packet users, and Patch 9
add new netlink attributes to make IPv4 BIG TCP independent from IPv6
BIG TCP on configuration, and Patch 10 implements this feature.

Note that the similar change as in Patch 2-6 are also needed for IPv6
BIG TCP packets, and will be addressed in another patchset.

The similar performance test is done for IPv4 BIG TCP with 25Gbit NIC
and 1.5K MTU:

No BIG TCP:
for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
168          322          337          3776.49
143          236          277          4654.67
128          258          288          4772.83
171          229          278          4645.77
175          228          243          4678.93
149          239          279          4599.86
164          234          268          4606.94
155          276          289          4235.82
180          255          268          4418.95
168          241          249          4417.82

Enable BIG TCP:
ip link set dev ens1f0np0 gro_ipv4_max_size 128000 gso_ipv4_max_size 128000
for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
161          241          252          4821.73
174          205          217          5098.28
167          208          220          5001.43
164          228          249          4883.98
150          233          249          4914.90
180          233          244          4819.66
154          208          219          5004.92
157          209          247          4999.78
160          218          246          4842.31
174          206          217          5080.99

Thanks for the feedback from Eric and David Ahern.

v1->v2:
  - remove the fixes and the selftest for IPv6 BIG TCP, will do it in
    another patchset.
  - add GSO_TCP for tp_status in packet sockets to tell the af_packet
    users that this is a TCP GSO packet in Patch 8.
  - also check skb_is_gso() when checking if it's a GSO TCP packet in
    Patch 1.
v2->v3:
  - add gso/gro_ipv4_max_size per device and netlink attributes for them
    in Patch 9, so that we can selectively enable BIG TCP for IPv6, and
    not for IPv4, as Eric required.
  - remove the selftest, as it requires userspace iproute2 change after
    making IPv4 BIG TCP independent from IPv6 BIG TCP on configuration.
v3->v4:
  - put gso/gro_ipv4_max_size close to other related fields, so that we
    do not need an extra cache line miss, as Eric suggested.
  - also check ipv6_addr_v4mapped() when reading gso(_ipv4)_max_size in
    sk_setup_caps(), as Eric noticed.

Xin Long (10):
  net: add a couple of helpers for iph tot_len
  bridge: use skb_ip_totlen in br netfilter
  openvswitch: use skb_ip_totlen in conntrack
  net: sched: use skb_ip_totlen and iph_totlen
  netfilter: use skb_ip_totlen and iph_totlen
  cipso_ipv4: use iph_set_totlen in skbuff_setattr
  ipvlan: use skb_ip_totlen in ipvlan_get_L3_hdr
  packet: add TP_STATUS_GSO_TCP for tp_status
  net: add gso_ipv4_max_size and gro_ipv4_max_size per device
  net: add support for ipv4 big tcp

 drivers/net/ipvlan/ipvlan_core.c           |  2 +-
 include/linux/ip.h                         | 21 ++++++++++++++
 include/linux/netdevice.h                  |  6 ++++
 include/net/netfilter/nf_tables_ipv4.h     |  4 +--
 include/net/route.h                        |  3 --
 include/uapi/linux/if_link.h               |  3 ++
 include/uapi/linux/if_packet.h             |  1 +
 net/bridge/br_netfilter_hooks.c            |  2 +-
 net/bridge/netfilter/nf_conntrack_bridge.c |  4 +--
 net/core/dev.c                             |  4 +++
 net/core/dev.h                             | 18 ++++++++++++
 net/core/gro.c                             | 12 ++++----
 net/core/rtnetlink.c                       | 33 ++++++++++++++++++++++
 net/core/sock.c                            | 26 +++++++++--------
 net/ipv4/af_inet.c                         |  7 +++--
 net/ipv4/cipso_ipv4.c                      |  2 +-
 net/ipv4/ip_input.c                        |  2 +-
 net/ipv4/ip_output.c                       |  2 +-
 net/netfilter/ipvs/ip_vs_xmit.c            |  2 +-
 net/netfilter/nf_log_syslog.c              |  2 +-
 net/netfilter/xt_length.c                  |  2 +-
 net/openvswitch/conntrack.c                |  2 +-
 net/packet/af_packet.c                     |  4 +++
 net/sched/act_ct.c                         |  2 +-
 net/sched/sch_cake.c                       |  2 +-
 25 files changed, 130 insertions(+), 38 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-02-02  9:25 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-28 15:58 [PATCHv4 net-next 00/10] net: support ipv4 big tcp Xin Long
2023-01-28 15:58 ` [PATCHv4 net-next 01/10] net: add a couple of helpers for iph tot_len Xin Long
2023-02-01 15:31   ` David Ahern
2023-01-28 15:58 ` [PATCHv4 net-next 02/10] bridge: use skb_ip_totlen in br netfilter Xin Long
2023-01-31 15:01   ` Nikolay Aleksandrov
2023-01-28 15:58 ` [PATCHv4 net-next 03/10] openvswitch: use skb_ip_totlen in conntrack Xin Long
2023-02-01 13:29   ` Aaron Conole
2023-01-28 15:58 ` [PATCHv4 net-next 04/10] net: sched: use skb_ip_totlen and iph_totlen Xin Long
2023-01-28 15:58 ` [PATCHv4 net-next 05/10] netfilter: " Xin Long
2023-01-28 15:58 ` [PATCHv4 net-next 06/10] cipso_ipv4: use iph_set_totlen in skbuff_setattr Xin Long
2023-01-28 15:58 ` [PATCHv4 net-next 07/10] ipvlan: use skb_ip_totlen in ipvlan_get_L3_hdr Xin Long
2023-01-28 15:58 ` [PATCHv4 net-next 08/10] packet: add TP_STATUS_GSO_TCP for tp_status Xin Long
2023-02-01 15:32   ` David Ahern
2023-01-28 15:58 ` [PATCHv4 net-next 09/10] net: add gso_ipv4_max_size and gro_ipv4_max_size per device Xin Long
2023-01-31 14:59   ` Paolo Abeni
2023-01-31 17:55     ` Xin Long
2023-02-01 15:36   ` David Ahern
2023-01-28 15:58 ` [PATCHv4 net-next 10/10] net: add support for ipv4 big tcp Xin Long
2023-02-01 15:38   ` David Ahern
2023-02-02  9:24   ` [PATCHv4 net-next 10/10] net: add support for ipv4 big tcp: manual merge Matthieu Baerts
2023-02-01  8:53 ` [PATCHv4 net-next 00/10] net: support ipv4 big tcp Eric Dumazet
2023-02-01 15:39 ` David Ahern
2023-02-02  5:10 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.