* [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support @ 2016-06-22 8:25 Manish Chopra 2016-06-22 8:25 ` [PATCH net-next 1/5] net: export udp and gre gro_complete() APIs Manish Chopra ` (5 more replies) 0 siblings, 6 replies; 36+ messages in thread From: Manish Chopra @ 2016-06-22 8:25 UTC (permalink / raw) To: davem; +Cc: netdev, Ariel.Elior, Yuval.Mintz Hi David, This series adds driver support for the processing of tunnel [specifically vxlan/geneve/gre tunnels] packets which are aggregated [GROed] by the hardware before driver passes such packets upto the stack. Patch 1 - General infrastructure change Exported UDP and GRE gro_complete APIs to be revealed by driver to complete/process hardware assisted GRO for encapsulated packets, as gro_complete() APIs are partially exposed. [sufficient for TCP, but not enough for udp/gre tunnels]. Patch 2/3/4 - These patches add support for handling tunnel packets [which are GROed by the adapter] in the driver receive data path. Patch 5 - This actually enables GRO functionality for tunnel packets on the adapter. Please consider applying this series to "net-next" Thanks, Manish Manish Chopra (5): net: export udp and gre gro_complete() APIs qede: Add support to handle VXLAN hardware GRO packets qede: Add support to handle GENEVE hardware GRO packets qede: Add support to handle GRE hardware GRO packets qed: Enable hardware GRO feature for encapsulated packets drivers/net/ethernet/qlogic/qed/qed_l2.c | 2 + drivers/net/ethernet/qlogic/qede/qede.h | 1 + drivers/net/ethernet/qlogic/qede/qede_main.c | 163 ++++++++++++++++++++++++--- include/net/gre.h | 3 + include/net/udp.h | 6 + net/ipv4/gre_offload.c | 3 +- net/ipv4/udp_offload.c | 3 +- net/ipv6/udp_offload.c | 3 +- 8 files changed, 163 insertions(+), 21 deletions(-) -- 2.7.2 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH net-next 1/5] net: export udp and gre gro_complete() APIs 2016-06-22 8:25 [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support Manish Chopra @ 2016-06-22 8:25 ` Manish Chopra 2016-06-22 8:25 ` [PATCH net-next 2/5] qede: Add support to handle VXLAN hardware GRO packets Manish Chopra ` (4 subsequent siblings) 5 siblings, 0 replies; 36+ messages in thread From: Manish Chopra @ 2016-06-22 8:25 UTC (permalink / raw) To: davem; +Cc: netdev, Ariel.Elior, Yuval.Mintz This patch exports relevant APIs needed to be used by this driver to handle hardware assisted encapsulated GRO packets. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> --- include/net/gre.h | 3 +++ include/net/udp.h | 6 ++++++ net/ipv4/gre_offload.c | 3 ++- net/ipv4/udp_offload.c | 3 ++- net/ipv6/udp_offload.c | 3 ++- 5 files changed, 15 insertions(+), 3 deletions(-) diff --git a/include/net/gre.h b/include/net/gre.h index 5dce30a..a5fe689 100644 --- a/include/net/gre.h +++ b/include/net/gre.h @@ -23,6 +23,9 @@ struct gre_protocol { int gre_add_protocol(const struct gre_protocol *proto, u8 version); int gre_del_protocol(const struct gre_protocol *proto, u8 version); +/* gre_gro_complete exported for drivers implementing hardware GRO */ +int gre_gro_complete(struct sk_buff *skb, int nhoff); + struct net_device *gretap_fb_dev_create(struct net *net, const char *name, u8 name_assign_type); int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi, diff --git a/include/net/udp.h b/include/net/udp.h index 8894d71..ac6b19b 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -174,6 +174,12 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb, struct udphdr *uh, udp_lookup_t lookup); int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup); +/* udp4_gro_complete exported for drivers implementing hardware GRO */ +int udp4_gro_complete(struct sk_buff *skb, int nhoff); + +/* udp6_gro_complete exported for drivers implementing hardware GRO */ +int udp6_gro_complete(struct sk_buff *skb, int nhoff); + static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb) { struct udphdr *uh; diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c index ecd1e09..c09c7e9 100644 --- a/net/ipv4/gre_offload.c +++ b/net/ipv4/gre_offload.c @@ -238,7 +238,7 @@ out: return pp; } -static int gre_gro_complete(struct sk_buff *skb, int nhoff) +int gre_gro_complete(struct sk_buff *skb, int nhoff) { struct gre_base_hdr *greh = (struct gre_base_hdr *)(skb->data + nhoff); struct packet_offload *ptype; @@ -267,6 +267,7 @@ static int gre_gro_complete(struct sk_buff *skb, int nhoff) return err; } +EXPORT_SYMBOL(gre_gro_complete); static const struct net_offload gre_offload = { .callbacks = { diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 81f253b..b4899bb 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -359,7 +359,7 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff, } EXPORT_SYMBOL(udp_gro_complete); -static int udp4_gro_complete(struct sk_buff *skb, int nhoff) +int udp4_gro_complete(struct sk_buff *skb, int nhoff) { const struct iphdr *iph = ip_hdr(skb); struct udphdr *uh = (struct udphdr *)(skb->data + nhoff); @@ -374,6 +374,7 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff) return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb); } +EXPORT_SYMBOL(udp4_gro_complete); static const struct net_offload udpv4_offload = { .callbacks = { diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index ac858c4..827cd20 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -147,7 +147,7 @@ flush: return NULL; } -static int udp6_gro_complete(struct sk_buff *skb, int nhoff) +int udp6_gro_complete(struct sk_buff *skb, int nhoff) { const struct ipv6hdr *ipv6h = ipv6_hdr(skb); struct udphdr *uh = (struct udphdr *)(skb->data + nhoff); @@ -162,6 +162,7 @@ static int udp6_gro_complete(struct sk_buff *skb, int nhoff) return udp_gro_complete(skb, nhoff, udp6_lib_lookup_skb); } +EXPORT_SYMBOL(udp6_gro_complete); static const struct net_offload udpv6_offload = { .callbacks = { -- 2.7.2 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next 2/5] qede: Add support to handle VXLAN hardware GRO packets 2016-06-22 8:25 [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support Manish Chopra 2016-06-22 8:25 ` [PATCH net-next 1/5] net: export udp and gre gro_complete() APIs Manish Chopra @ 2016-06-22 8:25 ` Manish Chopra 2016-06-22 8:25 ` [PATCH net-next 3/5] qede: Add support to handle GENEVE " Manish Chopra ` (3 subsequent siblings) 5 siblings, 0 replies; 36+ messages in thread From: Manish Chopra @ 2016-06-22 8:25 UTC (permalink / raw) To: davem; +Cc: netdev, Ariel.Elior, Yuval.Mintz This patch adds support to do necessary processing for hardware assisted VXLAN tunnel GRO packets before driver delivers them upto the stack. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> --- drivers/net/ethernet/qlogic/qede/qede.h | 1 + drivers/net/ethernet/qlogic/qede/qede_main.c | 79 ++++++++++++++++++++++++---- 2 files changed, 70 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h index 1441c8f..d8ec269 100644 --- a/drivers/net/ethernet/qlogic/qede/qede.h +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -213,6 +213,7 @@ struct qede_agg_info { struct sk_buff *skb; int frag_id; u16 vlan_tag; + u8 tunnel_type; }; struct qede_rx_queue { diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 2972742..f94ea16 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -36,6 +36,7 @@ #include <linux/random.h> #include <net/ip6_checksum.h> #include <linux/bitops.h> +#include <net/vxlan.h> #include "qede.h" @@ -1048,6 +1049,12 @@ out: return -ENOMEM; } +static bool qede_tunn_exist(u16 flag) +{ + return !!(flag & (PARSING_AND_ERR_FLAGS_TUNNELEXIST_MASK << + PARSING_AND_ERR_FLAGS_TUNNELEXIST_SHIFT)); +} + static void qede_tpa_start(struct qede_dev *edev, struct qede_rx_queue *rxq, struct eth_fast_path_rx_tpa_start_cqe *cqe) @@ -1065,6 +1072,14 @@ static void qede_tpa_start(struct qede_dev *edev, sw_rx_data_cons = &rxq->sw_rx_ring[rxq->sw_rx_cons & NUM_RX_BDS_MAX]; sw_rx_data_prod = &rxq->sw_rx_ring[rxq->sw_rx_prod & NUM_RX_BDS_MAX]; + if (qede_tunn_exist(le16_to_cpu(cqe->pars_flags.flags))) { + u8 flags = cqe->tunnel_pars_flags.flags, shift; + + shift = ETH_TUNNEL_PARSING_FLAGS_TYPE_SHIFT; + tpa_info->tunnel_type = (flags >> shift) & + ETH_TUNNEL_PARSING_FLAGS_TYPE_MASK; + } + /* Use pre-allocated replacement buffer - we can't release the agg. * start until its over and we don't want to risk allocation failing * here, so re-allocate when aggregation will be over. @@ -1159,12 +1174,55 @@ static void qede_gro_ipv6_csum(struct sk_buff *skb) &iph->saddr, &iph->daddr, 0); tcp_gro_complete(skb); } + +static void qede_set_nh_th_offset(struct sk_buff *skb, int off) +{ + skb_set_network_header(skb, off); + + if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) { + off += sizeof(struct iphdr); + skb_set_transport_header(skb, off); + } else { + off += sizeof(struct ipv6hdr); + skb_set_transport_header(skb, off); + } +} + +static void qede_handle_vxlan_tunnel_gro(struct sk_buff *skb) +{ + switch (skb->protocol) { + case htons(ETH_P_IP): + qede_set_nh_th_offset(skb, VXLAN_HEADROOM); + udp4_gro_complete(skb, sizeof(struct iphdr)); + break; + case htons(ETH_P_IPV6): + qede_set_nh_th_offset(skb, VXLAN6_HEADROOM); + udp6_gro_complete(skb, sizeof(struct ipv6hdr)); + break; + default: + WARN_ONCE(1, "Unsupported VXLAN tunnel GRO proto=0x%x\n", + skb->protocol); + } +} + +static void qede_handle_tunnel_gro(struct qede_dev *edev, + struct sk_buff *skb, u8 tunnel_type) +{ + switch (tunnel_type) { + case ETH_RX_TUNN_VXLAN: + qede_handle_vxlan_tunnel_gro(skb); + break; + default: + WARN_ONCE(1, "Unsupported tunnel GRO, tunnel type=0x%x\n", + tunnel_type); + } +} #endif static void qede_gro_receive(struct qede_dev *edev, struct qede_fastpath *fp, struct sk_buff *skb, - u16 vlan_tag) + struct qede_agg_info *tpa_info) { /* FW can send a single MTU sized packet from gro flow * due to aggregation timeout/last segment etc. which @@ -1179,6 +1237,12 @@ static void qede_gro_receive(struct qede_dev *edev, #ifdef CONFIG_INET if (skb_shinfo(skb)->gso_size) { + if (tpa_info->tunnel_type) { + qede_handle_tunnel_gro(edev, skb, + tpa_info->tunnel_type); + goto send_skb; + } + skb_set_network_header(skb, 0); switch (skb->protocol) { @@ -1198,7 +1262,7 @@ static void qede_gro_receive(struct qede_dev *edev, send_skb: skb_record_rx_queue(skb, fp->rss_id); - qede_skb_receive(edev, fp, skb, vlan_tag); + qede_skb_receive(edev, fp, skb, tpa_info->vlan_tag); } static inline void qede_tpa_cont(struct qede_dev *edev, @@ -1267,10 +1331,10 @@ static void qede_tpa_end(struct qede_dev *edev, */ NAPI_GRO_CB(skb)->count = le16_to_cpu(cqe->num_of_coalesced_segs); - qede_gro_receive(edev, fp, skb, tpa_info->vlan_tag); + qede_gro_receive(edev, fp, skb, tpa_info); tpa_info->agg_state = QEDE_AGG_STATE_NONE; - + tpa_info->tunnel_type = 0; return; err: /* The BD starting the aggregation is still mapped; Re-use it for @@ -1283,12 +1347,7 @@ err: tpa_info->agg_state = QEDE_AGG_STATE_NONE; dev_kfree_skb_any(tpa_info->skb); tpa_info->skb = NULL; -} - -static bool qede_tunn_exist(u16 flag) -{ - return !!(flag & (PARSING_AND_ERR_FLAGS_TUNNELEXIST_MASK << - PARSING_AND_ERR_FLAGS_TUNNELEXIST_SHIFT)); + tpa_info->tunnel_type = 0; } static u8 qede_check_tunn_csum(u16 flag) -- 2.7.2 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next 3/5] qede: Add support to handle GENEVE hardware GRO packets 2016-06-22 8:25 [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support Manish Chopra 2016-06-22 8:25 ` [PATCH net-next 1/5] net: export udp and gre gro_complete() APIs Manish Chopra 2016-06-22 8:25 ` [PATCH net-next 2/5] qede: Add support to handle VXLAN hardware GRO packets Manish Chopra @ 2016-06-22 8:25 ` Manish Chopra 2016-06-22 8:25 ` [PATCH net-next 4/5] qede: Add support to handle GRE " Manish Chopra ` (2 subsequent siblings) 5 siblings, 0 replies; 36+ messages in thread From: Manish Chopra @ 2016-06-22 8:25 UTC (permalink / raw) To: davem; +Cc: netdev, Ariel.Elior, Yuval.Mintz This patch adds support to do necessary processing for hardware assisted GENEVE tunnel GRO packets before driver delivers them upto the stack. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> --- drivers/net/ethernet/qlogic/qede/qede_main.c | 30 ++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index f94ea16..a4b445f 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -37,6 +37,7 @@ #include <net/ip6_checksum.h> #include <linux/bitops.h> #include <net/vxlan.h> +#include <net/geneve.h> #include "qede.h" @@ -1205,6 +1206,32 @@ static void qede_handle_vxlan_tunnel_gro(struct sk_buff *skb) } } +static inline struct genevehdr * +qede_geneve_hdr(struct sk_buff *skb, int off) { + return (struct genevehdr *)(skb->data + off + sizeof(struct udphdr)); +} + +static void qede_handle_geneve_tunnel_gro(struct sk_buff *skb) +{ + struct genevehdr *gh; + + switch (skb->protocol) { + case htons(ETH_P_IP): + gh = qede_geneve_hdr(skb, sizeof(struct iphdr)); + qede_set_nh_th_offset(skb, VXLAN_HEADROOM + (gh->opt_len * 4)); + udp4_gro_complete(skb, sizeof(struct iphdr)); + break; + case htons(ETH_P_IPV6): + gh = qede_geneve_hdr(skb, sizeof(struct ipv6hdr)); + qede_set_nh_th_offset(skb, VXLAN6_HEADROOM + (gh->opt_len * 4)); + udp6_gro_complete(skb, sizeof(struct ipv6hdr)); + break; + default: + WARN_ONCE(1, "Unsupported GENEVE tunnel GRO proto=0x%x\n", + skb->protocol); + } +} + static void qede_handle_tunnel_gro(struct qede_dev *edev, struct sk_buff *skb, u8 tunnel_type) { @@ -1212,6 +1239,9 @@ static void qede_handle_tunnel_gro(struct qede_dev *edev, case ETH_RX_TUNN_VXLAN: qede_handle_vxlan_tunnel_gro(skb); break; + case ETH_RX_TUNN_GENEVE: + qede_handle_geneve_tunnel_gro(skb); + break; default: WARN_ONCE(1, "Unsupported tunnel GRO, tunnel type=0x%x\n", tunnel_type); -- 2.7.2 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next 4/5] qede: Add support to handle GRE hardware GRO packets 2016-06-22 8:25 [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support Manish Chopra ` (2 preceding siblings ...) 2016-06-22 8:25 ` [PATCH net-next 3/5] qede: Add support to handle GENEVE " Manish Chopra @ 2016-06-22 8:25 ` Manish Chopra 2016-06-22 8:25 ` [PATCH net-next 5/5] qed: Enable hardware GRO feature for encapsulated packets Manish Chopra 2016-06-22 16:27 ` [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support Alexander Duyck 5 siblings, 0 replies; 36+ messages in thread From: Manish Chopra @ 2016-06-22 8:25 UTC (permalink / raw) To: davem; +Cc: netdev, Ariel.Elior, Yuval.Mintz This patch adds support to do necessary processing for hardware assisted GRE tunnel GRO packets before driver delivers them upto the stack. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> --- drivers/net/ethernet/qlogic/qede/qede_main.c | 54 +++++++++++++++++++++++----- 1 file changed, 46 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index a4b445f..787aef0 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -38,6 +38,7 @@ #include <linux/bitops.h> #include <net/vxlan.h> #include <net/geneve.h> +#include <net/gre.h> #include "qede.h" @@ -1152,10 +1153,7 @@ cons_buf: /* We still need to handle bd_len_list to consume buffers */ static void qede_gro_ip_csum(struct sk_buff *skb) { const struct iphdr *iph = ip_hdr(skb); - struct tcphdr *th; - - skb_set_transport_header(skb, sizeof(struct iphdr)); - th = tcp_hdr(skb); + struct tcphdr *th = tcp_hdr(skb); th->check = ~tcp_v4_check(skb->len - skb_transport_offset(skb), iph->saddr, iph->daddr, 0); @@ -1166,10 +1164,7 @@ static void qede_gro_ip_csum(struct sk_buff *skb) static void qede_gro_ipv6_csum(struct sk_buff *skb) { struct ipv6hdr *iph = ipv6_hdr(skb); - struct tcphdr *th; - - skb_set_transport_header(skb, sizeof(struct ipv6hdr)); - th = tcp_hdr(skb); + struct tcphdr *th = tcp_hdr(skb); th->check = ~tcp_v6_check(skb->len - skb_transport_offset(skb), &iph->saddr, &iph->daddr, 0); @@ -1232,6 +1227,44 @@ static void qede_handle_geneve_tunnel_gro(struct sk_buff *skb) } } +static void qede_handle_gre_tunnel_gro(struct sk_buff *skb) +{ + unsigned int grehlen, gre_headroom; + struct gre_base_hdr *greh; + int nhoff = 0; + + switch (skb->protocol) { + case htons(ETH_P_IP): + nhoff = sizeof(struct iphdr); + break; + case htons(ETH_P_IPV6): + nhoff = sizeof(struct ipv6hdr); + break; + default: + WARN_ONCE(1, "Unsupported GRE tunnel GRO, proto=0x%x\n", + skb->protocol); + } + + greh = (struct gre_base_hdr *)(skb->data + nhoff); + grehlen = sizeof(*greh); + + if (greh->flags & GRE_KEY) + grehlen += GRE_HEADER_SECTION; + + if (greh->flags & GRE_CSUM) + grehlen += GRE_HEADER_SECTION; + + gre_headroom = nhoff + grehlen; + + /* L2 GRE */ + if (greh->protocol == htons(ETH_P_TEB)) + gre_headroom += ETH_HLEN; + + qede_set_nh_th_offset(skb, gre_headroom); + + gre_gro_complete(skb, nhoff); +} + static void qede_handle_tunnel_gro(struct qede_dev *edev, struct sk_buff *skb, u8 tunnel_type) { @@ -1242,6 +1275,9 @@ static void qede_handle_tunnel_gro(struct qede_dev *edev, case ETH_RX_TUNN_GENEVE: qede_handle_geneve_tunnel_gro(skb); break; + case ETH_RX_TUNN_GRE: + qede_handle_gre_tunnel_gro(skb); + break; default: WARN_ONCE(1, "Unsupported tunnel GRO, tunnel type=0x%x\n", tunnel_type); @@ -1277,9 +1313,11 @@ static void qede_gro_receive(struct qede_dev *edev, switch (skb->protocol) { case htons(ETH_P_IP): + skb_set_transport_header(skb, sizeof(struct iphdr)); qede_gro_ip_csum(skb); break; case htons(ETH_P_IPV6): + skb_set_transport_header(skb, sizeof(struct ipv6hdr)); qede_gro_ipv6_csum(skb); break; default: -- 2.7.2 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH net-next 5/5] qed: Enable hardware GRO feature for encapsulated packets 2016-06-22 8:25 [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support Manish Chopra ` (3 preceding siblings ...) 2016-06-22 8:25 ` [PATCH net-next 4/5] qede: Add support to handle GRE " Manish Chopra @ 2016-06-22 8:25 ` Manish Chopra 2016-06-22 16:27 ` [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support Alexander Duyck 5 siblings, 0 replies; 36+ messages in thread From: Manish Chopra @ 2016-06-22 8:25 UTC (permalink / raw) To: davem; +Cc: netdev, Ariel.Elior, Yuval.Mintz This patch actually enables the hardware to perform GRO over encapsulated Ipv4/Ipv6 packets. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> --- drivers/net/ethernet/qlogic/qed/qed_l2.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c index d121a8b..e56981d 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_l2.c +++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c @@ -92,6 +92,8 @@ int qed_sp_eth_vport_start(struct qed_hwfn *p_hwfn, p_ramrod->tpa_param.tpa_min_size_to_start = p_params->mtu / 2; p_ramrod->tpa_param.tpa_ipv4_en_flg = 1; p_ramrod->tpa_param.tpa_ipv6_en_flg = 1; + p_ramrod->tpa_param.tpa_ipv4_tunn_en_flg = 1; + p_ramrod->tpa_param.tpa_ipv6_tunn_en_flg = 1; p_ramrod->tpa_param.tpa_pkt_split_flg = 1; p_ramrod->tpa_param.tpa_gro_consistent_flg = 1; break; -- 2.7.2 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 8:25 [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support Manish Chopra ` (4 preceding siblings ...) 2016-06-22 8:25 ` [PATCH net-next 5/5] qed: Enable hardware GRO feature for encapsulated packets Manish Chopra @ 2016-06-22 16:27 ` Alexander Duyck 2016-06-22 17:16 ` Yuval Mintz 5 siblings, 1 reply; 36+ messages in thread From: Alexander Duyck @ 2016-06-22 16:27 UTC (permalink / raw) To: Manish Chopra Cc: David Miller, Netdev, Ariel Elior, Yuval Mintz, Tom Herbert, Hannes Frederic Sowa On Wed, Jun 22, 2016 at 1:25 AM, Manish Chopra <manish.chopra@qlogic.com> wrote: > Hi David, > > This series adds driver support for the processing of tunnel > [specifically vxlan/geneve/gre tunnels] packets which are > aggregated [GROed] by the hardware before driver passes > such packets upto the stack. First off I am pretty sure this isn't GRO. This is LRO. The distinction is that LRO is performed in hardware and/or the driver while GRO is performed by the kernel. Think of this as the same distinction between GSO and TSO. The reason why we want to make certain to keep them separate is that LRO has a bad habit of mangling frames in ways that can be counter productive. It also is not extensible since it is implemented in hardware. The best way to think of it is that LRO is a subset of GRO. If done correctly the LRO can be as good as GRO, but if not there are usually some pieces of the frame that end up being mangled which make it difficult to undo later. As such I would recommend first going through and updating your "GRO" feature to use the correct "LRO" feature flag if you haven't already. The drivers should not really be messing with the GRO bit and should be using the LRO bit to indicate this type of feature. Also I don't know if you have been paying attention to recent discussions on the mailing list but the fact is GRO over UDP tunnels is still a subject for debate. This patch takes things in the opposite direction of where we are currently talking about going with GRO. I've added Hannes and Tom to this discussion just to make sure I have the proper understanding of all this as my protocol knowledge is somewhat lacking. Ideally we need to be able to identify that a given packet terminates on a local socket in our namespace before we could begin to perform any sort of mangling on the local packets. It is always possible that we could be looking at a frame that uses the same UDP port but is not the tunnel protocol if we are performing bridging or routing. The current GRO implementation fails in that regard and there are discussions between several of us on how to deal with that. It is likely that we would be forcing GRO for tunnels to move a bit further up the stack if bridging or routing so that we could verify that the frame is not being routed back out before we could aggregate it. Also I would be interested in knowing how your hardware handles tunnels with outer checksums. Is it just ignoring the frames in such a case, ignoring the checksum, or is it actually validating the frames and then merging the resulting checksum? Thanks. - Alex ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 16:27 ` [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support Alexander Duyck @ 2016-06-22 17:16 ` Yuval Mintz 2016-06-22 17:45 ` Alexander Duyck 0 siblings, 1 reply; 36+ messages in thread From: Yuval Mintz @ 2016-06-22 17:16 UTC (permalink / raw) To: Alexander Duyck, Manish Chopra Cc: David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa >> This series adds driver support for the processing of tunnel >> [specifically vxlan/geneve/gre tunnels] packets which are >> aggregated [GROed] by the hardware before driver passes >> such packets upto the stack. > First off I am pretty sure this isn't GRO. This is LRO. Nopes. This is GRO - each MSS-sized frame will arrive on its own frag, whereas the headers [both internal & external] would arrive on the linear part of the SKB. > Also I don't know if you have been paying attention to recent > discussions on the mailing list but the fact is GRO over UDP tunnels > is still a subject for debate. This patch takes things in the > opposite direction of where we are currently talking about going with > GRO. I've added Hannes and Tom to this discussion just to make sure I > have the proper understanding of all this as my protocol knowledge is > somewhat lacking. I guess we're on the exact opposite side of the discussion - I.e., we're the vendor that tries pushing offload capabilities to the device. Do notice however that we're not planning on pushing anything new feature-wise, just like we haven't done anything for regular TCP GRO - All we do is allow our HW/FW to aggregate packets instead of stack. > Ideally we need to be able to identify that a given packet terminates > on a local socket in our namespace before we could begin to perform > any sort of mangling on the local packets. It is always possible that > we could be looking at a frame that uses the same UDP port but is not > the tunnel protocol if we are performing bridging or routing. The > current GRO implementation fails in that regard and there are > discussions between several of us on how to deal with that. It is > likely that we would be forcing GRO for tunnels to move a bit further > up the stack if bridging or routing so that we could verify that the > frame is not being routed back out before we could aggregate it. I'm aware of the on-going discussion, but I'm not sure this should bother us greatly - the aggregation is done based on the inner TCP connection; I.e., it's guaranteed all the frames belong to the same TCP connection. While HW requires the UDP port for the initial classification effort, it doesn't take decision solely on that. > Also I would be interested in knowing how your hardware handles > tunnels with outer checksums. Is it just ignoring the frames in such > a case, ignoring the checksum, or is it actually validating the frames > and then merging the resulting checksum? HW is validating both inner and outer csum; if it wouldn't be able to due to some limitation, it would not try to pass the packet as a GRO aggregation but rather as regular seperated packets. But I don't believe it merges the checksum, only validate each independently [CSUM_UNNECESSARY]. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 17:16 ` Yuval Mintz @ 2016-06-22 17:45 ` Alexander Duyck 2016-06-22 18:22 ` Yuval Mintz 0 siblings, 1 reply; 36+ messages in thread From: Alexander Duyck @ 2016-06-22 17:45 UTC (permalink / raw) To: Yuval Mintz Cc: Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On Wed, Jun 22, 2016 at 10:16 AM, Yuval Mintz <Yuval.Mintz@qlogic.com> wrote: >>> This series adds driver support for the processing of tunnel >>> [specifically vxlan/geneve/gre tunnels] packets which are >>> aggregated [GROed] by the hardware before driver passes >>> such packets upto the stack. > >> First off I am pretty sure this isn't GRO. This is LRO. > Nopes. This is GRO - each MSS-sized frame will arrive on its > own frag, whereas the headers [both internal & external] > would arrive on the linear part of the SKB. No it is LRO, it just very closely mimics GRO. If it is in hardware it is LRO. GRO is extensible in software and bugs can be fixed in kernel, LRO is not extensible and any bugs in it are found in hardware or the driver and would have to be fixed there. It all comes down to bug isolation. If we find that disabling LRO makes the bug go away we know the hardware aggregation has an issue. Both features should not operate on the same bit. Otherwise when some bug is found in your implementation it will be blamed on GRO when the bug is actually in LRO. I realize this causes some pain when routing or bridging as LRO is disabled but that is kind of the point. We don't want the hardware to be mangling frames when we are attempting to route packets between any two given devices. >> Also I don't know if you have been paying attention to recent >> discussions on the mailing list but the fact is GRO over UDP tunnels >> is still a subject for debate. This patch takes things in the >> opposite direction of where we are currently talking about going with >> GRO. I've added Hannes and Tom to this discussion just to make sure I >> have the proper understanding of all this as my protocol knowledge is >> somewhat lacking. > I guess we're on the exact opposite side of the discussion - I.e., we're > the vendor that tries pushing offload capabilities to the device. > Do notice however that we're not planning on pushing anything new > feature-wise, just like we haven't done anything for regular TCP GRO - > All we do is allow our HW/FW to aggregate packets instead of stack. If you have been following the list then arguably you didn't fully understand what has been going on. I just went through and cleaned up all the VXLAN, GENEVE, and VXLAN-GPE mess that we have been creating and tried to get this consolidated so that we could start to have conversations about this without things being outright rejected. I feel like you guys have just prooven the argument for the other side that said as soon as we start support any of it, the vendors were going to go nuts and try to stuff everything and the kitchen sink into the NICs. The fact is I am starting to think they were right. >> Ideally we need to be able to identify that a given packet terminates >> on a local socket in our namespace before we could begin to perform >> any sort of mangling on the local packets. It is always possible that >> we could be looking at a frame that uses the same UDP port but is not >> the tunnel protocol if we are performing bridging or routing. The >> current GRO implementation fails in that regard and there are >> discussions between several of us on how to deal with that. It is >> likely that we would be forcing GRO for tunnels to move a bit further >> up the stack if bridging or routing so that we could verify that the >> frame is not being routed back out before we could aggregate it. > I'm aware of the on-going discussion, but I'm not sure this should > bother us greatly - the aggregation is done based on the > inner TCP connection; I.e., it's guaranteed all the frames belong to > the same TCP connection. While HW requires the UDP port for the > initial classification effort, it doesn't take decision solely on that. I realize that UDP port is only one piece of this. Exactly what fields are being taken into account. That information would be useful as we could then go though and determine what the probability is of us having a false match between any two packets in a given UDP flow. Also I would still argue this is LRO. If we are doing routing it should be disabled for us to later re-enable if we feel it is safe. Having a feature like this enabled by default with GRO is a really bad idea. >> Also I would be interested in knowing how your hardware handles >> tunnels with outer checksums. Is it just ignoring the frames in such >> a case, ignoring the checksum, or is it actually validating the frames >> and then merging the resulting checksum? > HW is validating both inner and outer csum; if it wouldn't be able to > due to some limitation, it would not try to pass the packet as a GRO > aggregation but rather as regular seperated packets. > But I don't believe it merges the checksum, only validate each > independently [CSUM_UNNECESSARY]. So it is mangling the packets then. If this were flagged as LRO at least if bridging or routing is enabled you won't be mangling the frames without the user having to specifically go back through and re-enable LRO. - Alex ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 17:45 ` Alexander Duyck @ 2016-06-22 18:22 ` Yuval Mintz 2016-06-22 21:32 ` Alexander Duyck 2016-06-22 21:52 ` Rick Jones 0 siblings, 2 replies; 36+ messages in thread From: Yuval Mintz @ 2016-06-22 18:22 UTC (permalink / raw) To: Alexander Duyck Cc: Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa >>>> This series adds driver support for the processing of tunnel >>>> [specifically vxlan/geneve/gre tunnels] packets which are >>>> aggregated [GROed] by the hardware before driver passes >>>> such packets upto the stack. >>> First off I am pretty sure this isn't GRO. This is LRO. >> Nopes. This is GRO - each MSS-sized frame will arrive on its >> own frag, whereas the headers [both internal & external] >> would arrive on the linear part of the SKB. > No it is LRO, it just very closely mimics GRO. If it is in hardware > it is LRO. GRO is extensible in software and bugs can be fixed in > kernel, LRO is not extensible and any bugs in it are found in hardware > or the driver and would have to be fixed there. It all comes down to > bug isolation. If we find that disabling LRO makes the bug go away we > know the hardware aggregation has an issue. Both features should not > operate on the same bit. Otherwise when some bug is found in your > implementation it will be blamed on GRO when the bug is actually in > LRO. I'm not aware of a definition stating GRO *has* to be extensible in SW; AFAIK the LRO/GRO distinction revolves around multiple areas [criteria for aggregation, method of aggregation, etc.]. > I realize this causes some pain when routing or bridging as LRO is > disabled but that is kind of the point. We don't want the hardware to > be mangling frames when we are attempting to route packets between any > two given devices. Actually, while I might disagree on whether this is LRO/GRO, I don't think there's any problem for us to base this on the LRO feature - I.e., when we started working on qede we didn't bother implementing LRO as we understood it was deprecated, but this encapsulated aggregation is configurable; If it makes life easier for everyone if we make the configuration based on the LRO configuration, so that when LRO is disabled we won't have this turned on it can be easily done. >>> Also I don't know if you have been paying attention to recent >>> discussions on the mailing list but the fact is GRO over UDP tunnels >>> is still a subject for debate. This patch takes things in the >>> opposite direction of where we are currently talking about going with >>> GRO. I've added Hannes and Tom to this discussion just to make sure I >>> have the proper understanding of all this as my protocol knowledge is >>> somewhat lacking. >> I guess we're on the exact opposite side of the discussion - I.e., we're >> the vendor that tries pushing offload capabilities to the device. >> Do notice however that we're not planning on pushing anything new >> feature-wise, just like we haven't done anything for regular TCP GRO - >> All we do is allow our HW/FW to aggregate packets instead of stack. > If you have been following the list then arguably you didn't fully > understand what has been going on. I just went through and cleaned up > all the VXLAN, GENEVE, and VXLAN-GPE mess that we have been creating > and tried to get this consolidated so that we could start to have > conversations about this without things being outright rejected. I > feel like you guys have just prooven the argument for the other side > that said as soon as we start support any of it, the vendors were > going to go nuts and try to stuff everything and the kitchen sink into > the NICs. The fact is I am starting to think they were right. You hurt my feeling; I started going nuts ages ago ;-) But seriously, this isn't really anything new but rather a step forward in the direction we've already taken - bnx2x/qede are already performing the same for non-encapsulated TCP. And while I understand why you're being suspicious of such an addition, I don't entirely see how it affects any of that discussion - I already yielded that we can make this configurable, so if any routing decision would be added that result in packets NOT supposed to be aggregated, the feature can be turned off [at worst, at the expanse of it not having its benefit on other connections]. >>> Ideally we need to be able to identify that a given packet terminates >>> on a local socket in our namespace before we could begin to perform >>> any sort of mangling on the local packets. It is always possible that >>> we could be looking at a frame that uses the same UDP port but is not >>> the tunnel protocol if we are performing bridging or routing. The >>> current GRO implementation fails in that regard and there are >>> discussions between several of us on how to deal with that. It is >>> likely that we would be forcing GRO for tunnels to move a bit further >>>> up the stack if bridging or routing so that we could verify that the >>> frame is not being routed back out before we could aggregate it. >> I'm aware of the on-going discussion, but I'm not sure this should >> bother us greatly - the aggregation is done based on the >> inner TCP connection; I.e., it's guaranteed all the frames belong to >> the same TCP connection. While HW requires the UDP port for the >> initial classification effort, it doesn't take decision solely on that. > I realize that UDP port is only one piece of this. Exactly what > fields are being taken into account. That information would be useful > as we could then go though and determine what the probability is of us > having a false match between any two packets in a given UDP flow. While we can surely do that, I think that's really beside the point; I.e., I don't expect any of you to debug issues arising from a bad HW/FW implementation. But if you REALLY want, I can go and ask for exact details. > Also I would still argue this is LRO. If we are doing routing it > should be disabled for us to later re-enable if we feel it is safe. > Having a feature like this enabled by default with GRO is a really bad > idea. Which is exactly what happens in qede/bnx2x for regular TCP. [I understand this area is a much hotter potato at the moment then simple GRO when we've submitted that content for both drivers; But I won't mention it again - the sins of the fathers are not excuses for any wrong-doings today]. >>> Also I would be interested in knowing how your hardware handles >>> tunnels with outer checksums. Is it just ignoring the frames in such >>> a case, ignoring the checksum, or is it actually validating the frames >>> and then merging the resulting checksum? >> HW is validating both inner and outer csum; if it wouldn't be able to >> due to some limitation, it would not try to pass the packet as a GRO >> aggregation but rather as regular seperated packets. >> But I don't believe it merges the checksum, only validate each >> independently [CSUM_UNNECESSARY]. >So it is mangling the packets then. Obviously - you can't aggregate things without touching them. > If this were flagged as LRO at > least if bridging or routing is enabled you won't be mangling the > frames without the user having to specifically go back through and > re-enable LRO. - Alex ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 18:22 ` Yuval Mintz @ 2016-06-22 21:32 ` Alexander Duyck 2016-06-22 22:32 ` Hannes Frederic Sowa 2016-06-22 23:42 ` Eric Dumazet 2016-06-22 21:52 ` Rick Jones 1 sibling, 2 replies; 36+ messages in thread From: Alexander Duyck @ 2016-06-22 21:32 UTC (permalink / raw) To: Yuval Mintz Cc: Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On Wed, Jun 22, 2016 at 11:22 AM, Yuval Mintz <Yuval.Mintz@qlogic.com> wrote: >>>>> This series adds driver support for the processing of tunnel >>>>> [specifically vxlan/geneve/gre tunnels] packets which are >>>>> aggregated [GROed] by the hardware before driver passes >>>>> such packets upto the stack. >>>> First off I am pretty sure this isn't GRO. This is LRO. >>> Nopes. This is GRO - each MSS-sized frame will arrive on its >>> own frag, whereas the headers [both internal & external] >>> would arrive on the linear part of the SKB. > >> No it is LRO, it just very closely mimics GRO. If it is in hardware >> it is LRO. GRO is extensible in software and bugs can be fixed in >> kernel, LRO is not extensible and any bugs in it are found in hardware >> or the driver and would have to be fixed there. It all comes down to >> bug isolation. If we find that disabling LRO makes the bug go away we >> know the hardware aggregation has an issue. Both features should not >> operate on the same bit. Otherwise when some bug is found in your >> implementation it will be blamed on GRO when the bug is actually in >> LRO. > I'm not aware of a definition stating GRO *has* to be extensible in SW; > AFAIK the LRO/GRO distinction revolves around multiple areas [criteria > for aggregation, method of aggregation, etc.]. The idea behind GRO was to make it so that we had a generic way to handle this in software. For the most part drivers doing LRO in software were doing the same thing that the GRO was doing. The only reason it was deprecated is because GRO was capable of doing more than LRO could since we add one parser and suddenly all devices saw the benefit instead of just one specific device. It is best to keep those two distinct solutions and then let the user sort out if they want to have the aggregation done by the device or the kernel. >> I realize this causes some pain when routing or bridging as LRO is >> disabled but that is kind of the point. We don't want the hardware to >> be mangling frames when we are attempting to route packets between any >> two given devices. > Actually, while I might disagree on whether this is LRO/GRO, I don't think > there's any problem for us to base this on the LRO feature - I.e., when we > started working on qede we didn't bother implementing LRO as we understood > it was deprecated, but this encapsulated aggregation is configurable; If it > makes life easier for everyone if we make the configuration based on the LRO > configuration, so that when LRO is disabled we won't have this turned on > it can be easily done. If you can move this over to an LRO feature I think it would go a long way towards cleaning up many of my complaints with this. LRO really isn't acceptable for routing or bridging scenarios whereas GRO sometimes is. If we place your code under the same limitations as the LRO feature bit then we can probably look at allowing the tunnel aggregation to be performed since we can be more certain that the tunnel will be terminating at the local endpoint. >>>> Also I don't know if you have been paying attention to recent >>>> discussions on the mailing list but the fact is GRO over UDP tunnels >>>> is still a subject for debate. This patch takes things in the >>>> opposite direction of where we are currently talking about going with >>>> GRO. I've added Hannes and Tom to this discussion just to make sure I >>>> have the proper understanding of all this as my protocol knowledge is >>>> somewhat lacking. >>> I guess we're on the exact opposite side of the discussion - I.e., we're >>> the vendor that tries pushing offload capabilities to the device. >>> Do notice however that we're not planning on pushing anything new >>> feature-wise, just like we haven't done anything for regular TCP GRO - >>> All we do is allow our HW/FW to aggregate packets instead of stack. > >> If you have been following the list then arguably you didn't fully >> understand what has been going on. I just went through and cleaned up >> all the VXLAN, GENEVE, and VXLAN-GPE mess that we have been creating >> and tried to get this consolidated so that we could start to have >> conversations about this without things being outright rejected. I >> feel like you guys have just prooven the argument for the other side >> that said as soon as we start support any of it, the vendors were >> going to go nuts and try to stuff everything and the kitchen sink into >> the NICs. The fact is I am starting to think they were right. > You hurt my feeling; I started going nuts ages ago ;-) I'm not saying anything about going nuts, I'm just saying you kind of decided to sprint out into a mine-field where I have been treading the last week or so. I'm just wanting to avoid having to see all this code getting ripped out. > But seriously, this isn't really anything new but rather a step forward in > the direction we've already taken - bnx2x/qede are already performing > the same for non-encapsulated TCP. Right. That goes on for other NICs as well. Most of them use the LRO feature flag for it tough. Odds are bnx2x and qede should probably be updated to do so as well. I didn't realize we had NICs that were re-purposing the GRO bit this way. It would be a recipe for us to run into issues from the software side as well because if we change something in the requirements for GRO there is no guarantee that we could support it in your hardware so that would be another argument for forking your feature off into LRO on the hardware. > And while I understand why you're being suspicious of such an addition, > I don't entirely see how it affects any of that discussion - I already yielded > that we can make this configurable, so if any routing decision would be > added that result in packets NOT supposed to be aggregated, the feature > can be turned off [at worst, at the expanse of it not having its benefit > on other connections]. It basically just increased the scope of all the VXLAN, GENEVE, and VXLAN-GPE offloads. Most NICs were only using this for Rx checksum, RSS, and a couple unfortunately to identify Tx flows for offload. We want to try and keep the scope of the offloads enabled via port identification to the bare minimum. Some of the community are still of the opinion that we shouldn't support them at all and should just rip them out. >>>> Ideally we need to be able to identify that a given packet terminates >>>> on a local socket in our namespace before we could begin to perform >>>> any sort of mangling on the local packets. It is always possible that >>>> we could be looking at a frame that uses the same UDP port but is not >>>> the tunnel protocol if we are performing bridging or routing. The >>>> current GRO implementation fails in that regard and there are >>>> discussions between several of us on how to deal with that. It is >>>> likely that we would be forcing GRO for tunnels to move a bit further >>>>> up the stack if bridging or routing so that we could verify that the >>>> frame is not being routed back out before we could aggregate it. > >>> I'm aware of the on-going discussion, but I'm not sure this should >>> bother us greatly - the aggregation is done based on the >>> inner TCP connection; I.e., it's guaranteed all the frames belong to >>> the same TCP connection. While HW requires the UDP port for the >>> initial classification effort, it doesn't take decision solely on that. > >> I realize that UDP port is only one piece of this. Exactly what >> fields are being taken into account. That information would be useful >> as we could then go though and determine what the probability is of us >> having a false match between any two packets in a given UDP flow. > While we can surely do that, I think that's really beside the point; > I.e., I don't expect any of you to debug issues arising from a bad HW/FW > implementation. But if you REALLY want, I can go and ask for exact > details. I don't need the exact details but it is something that you should probably look into from your side. I can tell you that I have seen a number of firmware bugs from various NICs over the last year or two. Debugging them can be a real pain when features are running in the background that you aren't aware of. That is another reason for wanting LRO and GRO separated so that I know I am dealing with hardware versus software without having to know the details of your NIC. >> Also I would still argue this is LRO. If we are doing routing it >> should be disabled for us to later re-enable if we feel it is safe. >> Having a feature like this enabled by default with GRO is a really bad >> idea. > Which is exactly what happens in qede/bnx2x for regular TCP. I'm not sure I follow you here. Are you saying this feature is disabled by default or that it is supposed to be disabled when routing or bridging? From what I can tell it looks like what you are saying is true for the bnx2x as it appears to be using the NETIF_F_LRO bit based on a quick grep, but I don't see this bit used anywhere in qede. > [I understand this area is a much hotter potato at the moment then > simple GRO when we've submitted that content for both drivers; > But I won't mention it again - the sins of the fathers are not excuses > for any wrong-doings today]. Yeah, I don't really care about if this is how it was before. Basically I just want it to be configured in such a way that somebody doesn't have to have proprietary knowledge about the driver in order to debug it if we end up finding a firmware bug. I should be able to clear the LRO bit and have hardware aggregation stop and hand everything off to the kernel for aggregation so I can determine if the issue is the kernel or the device/driver. >>>> Also I would be interested in knowing how your hardware handles >>>> tunnels with outer checksums. Is it just ignoring the frames in such >>>> a case, ignoring the checksum, or is it actually validating the frames >>>> and then merging the resulting checksum? > >>> HW is validating both inner and outer csum; if it wouldn't be able to >>> due to some limitation, it would not try to pass the packet as a GRO >>> aggregation but rather as regular seperated packets. >>> But I don't believe it merges the checksum, only validate each >>> independently [CSUM_UNNECESSARY]. > >>So it is mangling the packets then. > Obviously - you can't aggregate things without touching them. Right. Just making the point for those that might not be following the conversation since we are on a public list. Still I would have probably preferred to have the checksum repopulated after aggregation, but the fact that it isn't is not really all that big a deal. It just is painful for those what would prefer that we were using CHECKSUM_COMPLETE for all these tunnels. I hope this helps to clarify where I stand on this. - Alex ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 21:32 ` Alexander Duyck @ 2016-06-22 22:32 ` Hannes Frederic Sowa 2016-06-22 23:42 ` Eric Dumazet 1 sibling, 0 replies; 36+ messages in thread From: Hannes Frederic Sowa @ 2016-06-22 22:32 UTC (permalink / raw) To: Alexander Duyck, Yuval Mintz Cc: Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On 22.06.2016 14:32, Alexander Duyck wrote: > On Wed, Jun 22, 2016 at 11:22 AM, Yuval Mintz <Yuval.Mintz@qlogic.com> wrote: >>>>>> This series adds driver support for the processing of tunnel >>>>>> [specifically vxlan/geneve/gre tunnels] packets which are >>>>>> aggregated [GROed] by the hardware before driver passes >>>>>> such packets upto the stack. >>>>> First off I am pretty sure this isn't GRO. This is LRO. >>>> Nopes. This is GRO - each MSS-sized frame will arrive on its >>>> own frag, whereas the headers [both internal & external] >>>> would arrive on the linear part of the SKB. >> >>> No it is LRO, it just very closely mimics GRO. If it is in hardware >>> it is LRO. GRO is extensible in software and bugs can be fixed in >>> kernel, LRO is not extensible and any bugs in it are found in hardware >>> or the driver and would have to be fixed there. It all comes down to >>> bug isolation. If we find that disabling LRO makes the bug go away we >>> know the hardware aggregation has an issue. Both features should not >>> operate on the same bit. Otherwise when some bug is found in your >>> implementation it will be blamed on GRO when the bug is actually in >>> LRO. >> I'm not aware of a definition stating GRO *has* to be extensible in SW; >> AFAIK the LRO/GRO distinction revolves around multiple areas [criteria >> for aggregation, method of aggregation, etc.]. > > The idea behind GRO was to make it so that we had a generic way to > handle this in software. For the most part drivers doing LRO in > software were doing the same thing that the GRO was doing. The only > reason it was deprecated is because GRO was capable of doing more than > LRO could since we add one parser and suddenly all devices saw the > benefit instead of just one specific device. It is best to keep those > two distinct solutions and then let the user sort out if they want to > have the aggregation done by the device or the kernel. The most important point while talking about GRO is that it is symmetrical. The GSO functions need to be able to completely undo the effect GRO did. LRO is, for example, completely disabled for all devices in the namespace as soon as forwarding is enabled because segmentation isn't available for LRO. This lead to the definition of GRO and GSO as far as I was told. Probably I would also personally take this as the definition of GRO. Things like maximum segment size, flags, extensions and everything must be aggregated and metadata be stored, so that GSO can exactly build the original fragments (or some reasonable of the originals). Again, LRO does not provide this, thus we have to disable it on all devices that participate in forwarding. AFAIK one problem we constantly saw is that the MSS was not correctly stored in the meta data and thus the segmentation didn't create same-sized packets. There are other small rules enforced nowadays. If we upgrade GSO side we need to be able to modify GRO as well, vice versa. The problem with a hardware implementation is that we might not be able to modify core kernel code anymore, because hardware got into this contract on how it implements GRO, so I agree with Alex, it would better be kept a software only implementation. >>> I realize this causes some pain when routing or bridging as LRO is >>> disabled but that is kind of the point. We don't want the hardware to >>> be mangling frames when we are attempting to route packets between any >>> two given devices. > >> Actually, while I might disagree on whether this is LRO/GRO, I don't think >> there's any problem for us to base this on the LRO feature - I.e., when we >> started working on qede we didn't bother implementing LRO as we understood >> it was deprecated, but this encapsulated aggregation is configurable; If it >> makes life easier for everyone if we make the configuration based on the LRO >> configuration, so that when LRO is disabled we won't have this turned on >> it can be easily done. > > If you can move this over to an LRO feature I think it would go a long > way towards cleaning up many of my complaints with this. LRO really > isn't acceptable for routing or bridging scenarios whereas GRO > sometimes is. If we place your code under the same limitations as the > LRO feature bit then we can probably look at allowing the tunnel > aggregation to be performed since we can be more certain that the > tunnel will be terminating at the local endpoint. > >>>>> Also I don't know if you have been paying attention to recent >>>>> discussions on the mailing list but the fact is GRO over UDP tunnels >>>>> is still a subject for debate. This patch takes things in the >>>>> opposite direction of where we are currently talking about going with >>>>> GRO. I've added Hannes and Tom to this discussion just to make sure I >>>>> have the proper understanding of all this as my protocol knowledge is >>>>> somewhat lacking. >>>> I guess we're on the exact opposite side of the discussion - I.e., we're >>>> the vendor that tries pushing offload capabilities to the device. >>>> Do notice however that we're not planning on pushing anything new >>>> feature-wise, just like we haven't done anything for regular TCP GRO - >>>> All we do is allow our HW/FW to aggregate packets instead of stack. >> >>> If you have been following the list then arguably you didn't fully >>> understand what has been going on. I just went through and cleaned up >>> all the VXLAN, GENEVE, and VXLAN-GPE mess that we have been creating >>> and tried to get this consolidated so that we could start to have >>> conversations about this without things being outright rejected. I >>> feel like you guys have just prooven the argument for the other side >>> that said as soon as we start support any of it, the vendors were >>> going to go nuts and try to stuff everything and the kitchen sink into >>> the NICs. The fact is I am starting to think they were right. > >> You hurt my feeling; I started going nuts ages ago ;-) > > I'm not saying anything about going nuts, I'm just saying you kind of > decided to sprint out into a mine-field where I have been treading the > last week or so. I'm just wanting to avoid having to see all this > code getting ripped out. > >> But seriously, this isn't really anything new but rather a step forward in >> the direction we've already taken - bnx2x/qede are already performing >> the same for non-encapsulated TCP. > > Right. That goes on for other NICs as well. Most of them use the LRO > feature flag for it tough. Odds are bnx2x and qede should probably be > updated to do so as well. I didn't realize we had NICs that were > re-purposing the GRO bit this way. It would be a recipe for us to run > into issues from the software side as well because if we change > something in the requirements for GRO there is no guarantee that we > could support it in your hardware so that would be another argument > for forking your feature off into LRO on the hardware. Yes, I am surprised to see this as well. My intuition was that this is only a in-kernel feature. >> And while I understand why you're being suspicious of such an addition, >> I don't entirely see how it affects any of that discussion - I already yielded >> that we can make this configurable, so if any routing decision would be >> added that result in packets NOT supposed to be aggregated, the feature >> can be turned off [at worst, at the expanse of it not having its benefit >> on other connections]. > > It basically just increased the scope of all the VXLAN, GENEVE, and > VXLAN-GPE offloads. Most NICs were only using this for Rx checksum, > RSS, and a couple unfortunately to identify Tx flows for offload. We > want to try and keep the scope of the offloads enabled via port > identification to the bare minimum. Some of the community are still > of the opinion that we shouldn't support them at all and should just > rip them out. > >>>>> Ideally we need to be able to identify that a given packet terminates >>>>> on a local socket in our namespace before we could begin to perform >>>>> any sort of mangling on the local packets. It is always possible that >>>>> we could be looking at a frame that uses the same UDP port but is not >>>>> the tunnel protocol if we are performing bridging or routing. The >>>>> current GRO implementation fails in that regard and there are >>>>> discussions between several of us on how to deal with that. It is >>>>> likely that we would be forcing GRO for tunnels to move a bit further >>>>>> up the stack if bridging or routing so that we could verify that the >>>>> frame is not being routed back out before we could aggregate it. >> >>>> I'm aware of the on-going discussion, but I'm not sure this should >>>> bother us greatly - the aggregation is done based on the >>>> inner TCP connection; I.e., it's guaranteed all the frames belong to >>>> the same TCP connection. While HW requires the UDP port for the >>>> initial classification effort, it doesn't take decision solely on that. Is it? If you just have a UDP port number you can also interpret some random data as TCP header. The chances are pretty small that you find seq numbers that match quite nicely but you are simply not allowed to assume that. The contract what data is inside a UDP packet is also based on the machines talking with each other. I don't think you get the destination address for the tunnel endpoint from the kernel right now, because Alex' last series doesn't support that, yet. >>> I realize that UDP port is only one piece of this. Exactly what >>> fields are being taken into account. That information would be useful >>> as we could then go though and determine what the probability is of us >>> having a false match between any two packets in a given UDP flow. > >> While we can surely do that, I think that's really beside the point; >> I.e., I don't expect any of you to debug issues arising from a bad HW/FW >> implementation. But if you REALLY want, I can go and ask for exact >> details. > > I don't need the exact details but it is something that you should > probably look into from your side. I can tell you that I have seen a > number of firmware bugs from various NICs over the last year or two. > Debugging them can be a real pain when features are running in the > background that you aren't aware of. That is another reason for > wanting LRO and GRO separated so that I know I am dealing with > hardware versus software without having to know the details of your > NIC. > >>> Also I would still argue this is LRO. If we are doing routing it >>> should be disabled for us to later re-enable if we feel it is safe. >>> Having a feature like this enabled by default with GRO is a really bad >>> idea. > >> Which is exactly what happens in qede/bnx2x for regular TCP. > > I'm not sure I follow you here. Are you saying this feature is > disabled by default or that it is supposed to be disabled when routing > or bridging? From what I can tell it looks like what you are saying > is true for the bnx2x as it appears to be using the NETIF_F_LRO bit > based on a quick grep, but I don't see this bit used anywhere in qede. If qede/bnx2x use LRO feature bit already it will automatically be disabled as soon as the device enters a bridge/ovs constellation or if forwarding, bonding is enabled. This is absolutely necessary to keep the GRO<->GSO contract in place. >> [I understand this area is a much hotter potato at the moment then >> simple GRO when we've submitted that content for both drivers; >> But I won't mention it again - the sins of the fathers are not excuses >> for any wrong-doings today]. > > Yeah, I don't really care about if this is how it was before. > Basically I just want it to be configured in such a way that somebody > doesn't have to have proprietary knowledge about the driver in order > to debug it if we end up finding a firmware bug. I should be able to > clear the LRO bit and have hardware aggregation stop and hand > everything off to the kernel for aggregation so I can determine if the > issue is the kernel or the device/driver. Also ack! >>>>> Also I would be interested in knowing how your hardware handles >>>>> tunnels with outer checksums. Is it just ignoring the frames in such >>>>> a case, ignoring the checksum, or is it actually validating the frames >>>>> and then merging the resulting checksum? >> >>>> HW is validating both inner and outer csum; if it wouldn't be able to >>>> due to some limitation, it would not try to pass the packet as a GRO >>>> aggregation but rather as regular seperated packets. >>>> But I don't believe it merges the checksum, only validate each >>>> independently [CSUM_UNNECESSARY]. >> >>> So it is mangling the packets then. >> Obviously - you can't aggregate things without touching them. We pretty much try to keep the differences in meta data, so we can restore everything to the previous situation. This sometimes doesn't work completely, like e.g. different size mss or DF bits. But it is in kernel control. > Right. Just making the point for those that might not be following > the conversation since we are on a public list. > > Still I would have probably preferred to have the checksum repopulated > after aggregation, but the fact that it isn't is not really all that > big a deal. It just is painful for those what would prefer that we > were using CHECKSUM_COMPLETE for all these tunnels. > > I hope this helps to clarify where I stand on this. Bye, Hannes ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 21:32 ` Alexander Duyck 2016-06-22 22:32 ` Hannes Frederic Sowa @ 2016-06-22 23:42 ` Eric Dumazet 1 sibling, 0 replies; 36+ messages in thread From: Eric Dumazet @ 2016-06-22 23:42 UTC (permalink / raw) To: Alexander Duyck Cc: Yuval Mintz, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On Wed, 2016-06-22 at 14:32 -0700, Alexander Duyck wrote: > The idea behind GRO was to make it so that we had a generic way to > handle this in software. For the most part drivers doing LRO in > software were doing the same thing that the GRO was doing. The only > reason it was deprecated is because GRO was capable of doing more than > LRO could since we add one parser and suddenly all devices saw the > benefit instead of just one specific device. It is best to keep those > two distinct solutions and then let the user sort out if they want to > have the aggregation done by the device or the kernel. Presumably we could add features flags to selectively enable part of LRO (really simply GRO offloading) for NIC that partially match GRO requirements. Patch 5/5 seems to enable the hardware feature(s) : + p_ramrod->tpa_param.tpa_ipv4_tunn_en_flg = 1; + p_ramrod->tpa_param.tpa_ipv6_tunn_en_flg = 1; So this NIC seems to have a way to control its LRO engine. If some horrible bug happens for GRE+IPv6+TCP, you could disable LRO coping with GRE encapsulation, instead of disabling whole LRO We have software fallback. Nice, with quite heavy cpu cost. If we can use offload without breaking rules, use it. Some NIC have terrible LRO performance (I wont give details here), but others are ok. Some NIC have terrible TSO performance for small number of segments. (gso_segs < 4) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 18:22 ` Yuval Mintz 2016-06-22 21:32 ` Alexander Duyck @ 2016-06-22 21:52 ` Rick Jones 2016-06-22 22:47 ` Eric Dumazet 2016-06-26 19:53 ` David Miller 1 sibling, 2 replies; 36+ messages in thread From: Rick Jones @ 2016-06-22 21:52 UTC (permalink / raw) To: Yuval Mintz, Alexander Duyck Cc: Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On 06/22/2016 11:22 AM, Yuval Mintz wrote: > But seriously, this isn't really anything new but rather a step forward in > the direction we've already taken - bnx2x/qede are already performing > the same for non-encapsulated TCP. Since you mention bnx2x... I would argue that the NIC firmware on those NICs driven by bnx2x is doing it badly. Not so much from a functional standpoint I suppose, but from a performance one. The NIC-firmware GRO done there has this rather unfortunate assumption about "all MSSes will be directly driven by my own physical MTU" and when it sees segments of a size other than would be suggested by the physical MTU, will coalesce only two segments together. They then do not get further coalesced in the stack. Suffice it to say this does not do well from a performance standpoint. One can disable LRO via ethtool for these NICs, but what that does is disable old-school LRO, not GRO-in-the-NIC. To get that disabled, one must also get the bnx2x module loaded with "disable-tpa=1" so the Linux stack GRO gets used instead. Had the bnx2x-driven NICs' firmware not had that rather unfortunate assumption about MSSes I probably would never have noticed. happy benchmarking, rick jones ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 21:52 ` Rick Jones @ 2016-06-22 22:47 ` Eric Dumazet 2016-06-22 22:56 ` Alexander Duyck 2016-06-22 23:10 ` Rick Jones 2016-06-26 19:53 ` David Miller 1 sibling, 2 replies; 36+ messages in thread From: Eric Dumazet @ 2016-06-22 22:47 UTC (permalink / raw) To: Rick Jones Cc: Yuval Mintz, Alexander Duyck, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On Wed, 2016-06-22 at 14:52 -0700, Rick Jones wrote: > On 06/22/2016 11:22 AM, Yuval Mintz wrote: > > But seriously, this isn't really anything new but rather a step forward in > > the direction we've already taken - bnx2x/qede are already performing > > the same for non-encapsulated TCP. > > Since you mention bnx2x... I would argue that the NIC firmware on > those NICs driven by bnx2x is doing it badly. Not so much from a > functional standpoint I suppose, but from a performance one. The > NIC-firmware GRO done there has this rather unfortunate assumption about > "all MSSes will be directly driven by my own physical MTU" and when it > sees segments of a size other than would be suggested by the physical > MTU, will coalesce only two segments together. They then do not get > further coalesced in the stack. > > Suffice it to say this does not do well from a performance standpoint. > > One can disable LRO via ethtool for these NICs, but what that does is > disable old-school LRO, not GRO-in-the-NIC. To get that disabled, one > must also get the bnx2x module loaded with "disable-tpa=1" so the Linux > stack GRO gets used instead. > > Had the bnx2x-driven NICs' firmware not had that rather unfortunate > assumption about MSSes I probably would never have noticed. I do not see this behavior on my bnx2x nics ? ip ro add 10.246.11.52 via 10.246.11.254 dev eth0 mtu 1000 lpk51:~# ./netperf -H 10.246.11.52 -l 1000 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.11.52 () port 0 AF_INET On receiver : 15:46:08.296241 IP 10.246.11.52.46907 > 10.246.11.51.34131: Flags [.], ack 303360, win 8192, options [nop,nop,TS val 1245217243 ecr 1245306446], length 0 15:46:08.296430 IP 10.246.11.51.34131 > 10.246.11.52.46907: Flags [.], seq 303360:327060, ack 1, win 229, options [nop,nop,TS val 1245306446 ecr 1245217242], length 23700 15:46:08.296441 IP 10.246.11.52.46907 > 10.246.11.51.34131: Flags [.], ack 327060, win 8192, options [nop,nop,TS val 1245217243 ecr 1245306446], length 0 15:46:08.296644 IP 10.246.11.51.34131 > 10.246.11.52.46907: Flags [.], seq 327060:350760, ack 1, win 229, options [nop,nop,TS val 1245306446 ecr 1245217242], length 23700 15:46:08.296655 IP 10.246.11.52.46907 > 10.246.11.51.34131: Flags [.], ack 350760, win 8192, options [nop,nop,TS val 1245217244 ecr 1245306446], length 0 15:46:08.296854 IP 10.246.11.51.34131 > 10.246.11.52.46907: Flags [.], seq 350760:374460, ack 1, win 229, options [nop,nop,TS val 1245306446 ecr 1245217242], length 23700 15:46:08.296897 IP 10.246.11.52.46907 > 10.246.11.51.34131: Flags [.], ack 374460, win 8192, options [nop,nop,TS val 1245217244 ecr 1245306446], length 0 15:46:08.297054 IP 10.246.11.51.34131 > 10.246.11.52.46907: Flags [.], seq 374460:398160, ack 1, win 229, options [nop,nop,TS val 1245306446 ecr 1245217242], length 23700 15:46:08.297099 IP 10.246.11.52.46907 > 10.246.11.51.34131: Flags [.], ack 398160, win 8192, options [nop,nop,TS val 1245217244 ecr 1245306446], length 0 15:46:08.297258 IP 10.246.11.51.34131 > 10.246.11.52.46907: Flags [.], seq 398160:420912, ack 1, win 229, options [nop,nop,TS val 1245306446 ecr 1245217242], length 22752 15:46:08.297301 IP 10.246.11.52.46907 > 10.246.11.51.34131: Flags [.], ack 420912, win 8192, options [nop,nop,TS val 1245217244 ecr 1245306446], length 0 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 22:47 ` Eric Dumazet @ 2016-06-22 22:56 ` Alexander Duyck 2016-06-22 23:31 ` Eric Dumazet 2016-06-22 23:52 ` Rick Jones 2016-06-22 23:10 ` Rick Jones 1 sibling, 2 replies; 36+ messages in thread From: Alexander Duyck @ 2016-06-22 22:56 UTC (permalink / raw) To: Eric Dumazet Cc: Rick Jones, Yuval Mintz, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On Wed, Jun 22, 2016 at 3:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2016-06-22 at 14:52 -0700, Rick Jones wrote: >> On 06/22/2016 11:22 AM, Yuval Mintz wrote: >> > But seriously, this isn't really anything new but rather a step forward in >> > the direction we've already taken - bnx2x/qede are already performing >> > the same for non-encapsulated TCP. >> >> Since you mention bnx2x... I would argue that the NIC firmware on >> those NICs driven by bnx2x is doing it badly. Not so much from a >> functional standpoint I suppose, but from a performance one. The >> NIC-firmware GRO done there has this rather unfortunate assumption about >> "all MSSes will be directly driven by my own physical MTU" and when it >> sees segments of a size other than would be suggested by the physical >> MTU, will coalesce only two segments together. They then do not get >> further coalesced in the stack. >> >> Suffice it to say this does not do well from a performance standpoint. >> >> One can disable LRO via ethtool for these NICs, but what that does is >> disable old-school LRO, not GRO-in-the-NIC. To get that disabled, one >> must also get the bnx2x module loaded with "disable-tpa=1" so the Linux >> stack GRO gets used instead. >> >> Had the bnx2x-driven NICs' firmware not had that rather unfortunate >> assumption about MSSes I probably would never have noticed. > > I do not see this behavior on my bnx2x nics ? It could be that you and Rick are running different firmware. I believe you can expose that via "ethtool -i". This is the ugly bit about all this. We are offloading GRO into the firmware of these devices with no idea how any of it works and by linking GRO to LRO on the same device you are stuck having to accept either the firmware offload or nothing at all. That is kind of the point Rick was trying to get at. The preferred setup would be to have any aggregation offload on the device be flagged as LRO and then the offload performed by the kernel be GRO. By linking the two features as has apparently happened with bnx2x and qede it limits the users options and kind of forces them into an all or nothing situation unless they have insight into proprietary driver options to disable these features. - Alex ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 22:56 ` Alexander Duyck @ 2016-06-22 23:31 ` Eric Dumazet 2016-06-22 23:59 ` Tom Herbert 2016-06-23 0:11 ` Alexander Duyck 2016-06-22 23:52 ` Rick Jones 1 sibling, 2 replies; 36+ messages in thread From: Eric Dumazet @ 2016-06-22 23:31 UTC (permalink / raw) To: Alexander Duyck Cc: Rick Jones, Yuval Mintz, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On Wed, 2016-06-22 at 15:56 -0700, Alexander Duyck wrote: > It could be that you and Rick are running different firmware. I > believe you can expose that via "ethtool -i". This is the ugly bit > about all this. We are offloading GRO into the firmware of these > devices with no idea how any of it works and by linking GRO to LRO on > the same device you are stuck having to accept either the firmware > offload or nothing at all. That is kind of the point Rick was trying > to get at. Well, we offload TSO to the NIC. Or checksums as much as we can. At one point we have to trust the NIC we plug in our hosts, right ? Why RX is so different than TX exactly ? Yes, LRO is hard to implement properly compared to TSO, but not something that is _fundamentally_ broken. Claiming that hardware assist GRO is not possible is a plain mantra. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 23:31 ` Eric Dumazet @ 2016-06-22 23:59 ` Tom Herbert 2016-06-23 0:11 ` Alexander Duyck 1 sibling, 0 replies; 36+ messages in thread From: Tom Herbert @ 2016-06-22 23:59 UTC (permalink / raw) To: Eric Dumazet Cc: Alexander Duyck, Rick Jones, Yuval Mintz, Manish Chopra, David Miller, netdev, Ariel Elior, Hannes Frederic Sowa On Wed, Jun 22, 2016 at 4:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2016-06-22 at 15:56 -0700, Alexander Duyck wrote: > >> It could be that you and Rick are running different firmware. I >> believe you can expose that via "ethtool -i". This is the ugly bit >> about all this. We are offloading GRO into the firmware of these >> devices with no idea how any of it works and by linking GRO to LRO on >> the same device you are stuck having to accept either the firmware >> offload or nothing at all. That is kind of the point Rick was trying >> to get at. > > > Well, we offload TSO to the NIC. Or checksums as much as we can. > > At one point we have to trust the NIC we plug in our hosts, right ? > > Why RX is so different than TX exactly ? > LRO requires parsing the packet and hence DPI for things like foo/UDP. TSO is much more straightforward and does not rely on parsing (e.g. LRO would require NIC to parse EH, TSO doesn't). Also, TSO is work-conserving, whereas LRO is not. I don't believe there is any standard for how NICs are supposed to determined when to set packets to host. All of this makes LRO much less deterministic and means there's a lot of variance between NIC implementations. Plus there's also the question of whether a vendor is testing IPv6. I do think all this can be "fixed" once we have programmable NICs and we can program our own implementation of LRO, but until then I think there is inherent risk in using it. GRO rocks though! Tom > Yes, LRO is hard to implement properly compared to TSO, > but not something that is _fundamentally_ broken. > > Claiming that hardware assist GRO is not possible is a plain mantra. > > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 23:31 ` Eric Dumazet 2016-06-22 23:59 ` Tom Herbert @ 2016-06-23 0:11 ` Alexander Duyck 2016-06-23 4:10 ` Yuval Mintz 1 sibling, 1 reply; 36+ messages in thread From: Alexander Duyck @ 2016-06-23 0:11 UTC (permalink / raw) To: Eric Dumazet Cc: Rick Jones, Yuval Mintz, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On Wed, Jun 22, 2016 at 4:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2016-06-22 at 15:56 -0700, Alexander Duyck wrote: > >> It could be that you and Rick are running different firmware. I >> believe you can expose that via "ethtool -i". This is the ugly bit >> about all this. We are offloading GRO into the firmware of these >> devices with no idea how any of it works and by linking GRO to LRO on >> the same device you are stuck having to accept either the firmware >> offload or nothing at all. That is kind of the point Rick was trying >> to get at. > > > Well, we offload TSO to the NIC. Or checksums as much as we can. > > At one point we have to trust the NIC we plug in our hosts, right ? > > Why RX is so different than TX exactly ? Exactly my point. The problem I have with the bnx2x and qede drivers is that the NIC driver is using the GRO flag to enable LRO on the device. It is like arguing that the NIC can do segmentation based on the GSO flag instead of the TSO flag. We should be keeping TSO/LRO as the device features while GSO/GRO are kernel features that don't directly change any settings on the device. > Yes, LRO is hard to implement properly compared to TSO, > but not something that is _fundamentally_ broken. Yes and no. More often then not there are issues that pop up as protocols change. I'm not assuming they are fundamentally broken. The bit that I consider broken is that I cannot disable the NIC LRO functionality without having to disable the stack from doing GRO. All I am asking for is LRO controls all of the NIC aggregation features and GRO be kept a software only feature instead of being tied to a hardware feature. > Claiming that hardware assist GRO is not possible is a plain mantra. I have no issue claiming hardware assist GRO is possible. My problem is saying that the GRO feature flag can be used to enable it. I would argue that any packet aggregation at the device or driver level is LRO regardless of how close it comes to GRO feature wise. GRO should only be occurring in the receive path after calling the appropriate GRO function. Otherwise it becomes really difficult to work around any possible issues that the hardware assisted GRO introduces without paying a huge penalty. We need to keep these feature flags separate. I thought that was the whole reason why we have the distinction between LRO and GRO in the first place. - Alex ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-23 0:11 ` Alexander Duyck @ 2016-06-23 4:10 ` Yuval Mintz 2016-06-23 4:17 ` Yuval Mintz 0 siblings, 1 reply; 36+ messages in thread From: Yuval Mintz @ 2016-06-23 4:10 UTC (permalink / raw) To: Alexander Duyck, Eric Dumazet Cc: Rick Jones, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa >> Claiming that hardware assist GRO is not possible is a plain mantra. > I have no issue claiming hardware assist GRO is possible. My problem > is saying that the GRO feature flag can be used to enable it. I would > argue that any packet aggregation at the device or driver level is LRO > regardless of how close it comes to GRO feature wise. GRO should only > be occurring in the receive path after calling the appropriate GRO > function. Otherwise it becomes really difficult to work around any > possible issues that the hardware assisted GRO introduces without > paying a huge penalty. We need to keep these feature flags separate. > I thought that was the whole reason why we have the distinction > between LRO and GRO in the first place. Then again, if you're basically saying every HW-assisted offload on receive should be done under LRO flag, what would be the use case where a GRO-assisted offload would help? I.e., afaik LRO is superior to GRO in `brute force' - it creates better packed packets and utilizes memory better [with all the obvious cons such as inability for defragmentation]. So if you'd have the choice of having an adpater perform 'classic' LRO aggregation or something that resembles a GRO packet, what would be the gain from doing the latter? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-23 4:10 ` Yuval Mintz @ 2016-06-23 4:17 ` Yuval Mintz 2016-06-23 17:07 ` Alexander Duyck 0 siblings, 1 reply; 36+ messages in thread From: Yuval Mintz @ 2016-06-23 4:17 UTC (permalink / raw) To: Alexander Duyck, Eric Dumazet Cc: Rick Jones, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa > Then again, if you're basically saying every HW-assisted offload on > receive should be done under LRO flag, what would be the use case > where a GRO-assisted offload would help? > I.e., afaik LRO is superior to GRO in `brute force' - > it creates better packed packets and utilizes memory better > [with all the obvious cons such as inability for defragmentation]. > So if you'd have the choice of having an adpater perform 'classic' > LRO aggregation or something that resembles a GRO packet, > what would be the gain from doing the latter? Just to relate to bnx2x/qede differences in current implementation - when this GRO hw-offload was added to bnx2x, it has already supported classical LRO, and due to above statement whenever LRO was set driver aggregated incoming traffic as classic LRO. I agree that in hindsight the lack of distinction between sw/hw GRO was hurting us. qede isn't implementing LRO, so we could easily mark this feature under LRO there - but question is, given that the adapter can support LRO, if we're going to suffer from all the shotrages that arise from putting this feature under LRO, why should we bother? You can argue that we might need a new feature bit for control over such a feature; If we don't do that, is there any gain in all of this? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-23 4:17 ` Yuval Mintz @ 2016-06-23 17:07 ` Alexander Duyck 2016-06-23 21:06 ` Yuval Mintz 2016-06-24 13:09 ` Edward Cree 0 siblings, 2 replies; 36+ messages in thread From: Alexander Duyck @ 2016-06-23 17:07 UTC (permalink / raw) To: Yuval Mintz Cc: Eric Dumazet, Rick Jones, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On Wed, Jun 22, 2016 at 9:17 PM, Yuval Mintz <Yuval.Mintz@qlogic.com> wrote: >> Then again, if you're basically saying every HW-assisted offload on >> receive should be done under LRO flag, what would be the use case >> where a GRO-assisted offload would help? > >> I.e., afaik LRO is superior to GRO in `brute force' - >> it creates better packed packets and utilizes memory better >> [with all the obvious cons such as inability for defragmentation]. >> So if you'd have the choice of having an adpater perform 'classic' >> LRO aggregation or something that resembles a GRO packet, >> what would be the gain from doing the latter? LRO and GRO shouldn't really differ in packing or anything like that. The big difference between the two is that LRO is destructive while GRO is not. Specifically in the case of GRO you should be able to take the resultant frame, feed it through GSO, and get the original stream of frames back out. So you can pack the frames however you want the only key is that you must capture all the correct offsets and set the gso_size correct for the flow. > Just to relate to bnx2x/qede differences in current implementation - > when this GRO hw-offload was added to bnx2x, it has already > supported classical LRO, and due to above statement whenever LRO > was set driver aggregated incoming traffic as classic LRO. > I agree that in hindsight the lack of distinction between sw/hw GRO > was hurting us. In the case of bnx2x it sounds like you have issues that are significantly hurting the performance versus classic software GRO. If that is the case you might want to simply flip the logic for the module parameter that Rick mentioned and just disable the hardware assisted GRO unless it is specifically requested. > qede isn't implementing LRO, so we could easily mark this feature > under LRO there - but question is, given that the adapter can support > LRO, if we're going to suffer from all the shotrages that arise from > putting this feature under LRO, why should we bother? The idea is to address feature isolation. The fact is the hardware exists outside of kernel control. If you end up linking an internal kernel feature to your device like this you are essentially stripping the option of using the kernel feature. I would prefer to see us extend LRO to support "close enough GRO" instead of have us extend GRO to also include LRO. That way when we encounter issues like the FW limitation that Rick encountered he can just go in and disable the LRO and have true GRO kick in which would be significantly better than having to poke around through documentation to find a module parameter that can force the feature off. Really the fact that you have to use a module parameter is frowned upon as well as most drivers aren't supposed to be using those in the netdev tree. > You can argue that we might need a new feature bit for control > over such a feature; If we don't do that, is there any gain in all of this? I would argue that yes there are many cases where we will be able to show gain. The fact is there is a strong likelihood of the GRO on your parts having some differences either now, or at some point in the future as the code evolves. As I mentioned there was already some talk about possibly needing to push the UDP tunnel aggregation out of GRO and perhaps handling it sometime after IP look up had verified that the destination was in fact a local address in the namespace. In addition it makes the changes to include the tunnel encapsulation much more acceptable as LRO is already naturally dropped in the routing and bridging cases if I recall correctly. - Alex ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-23 17:07 ` Alexander Duyck @ 2016-06-23 21:06 ` Yuval Mintz 2016-06-23 23:20 ` Alexander Duyck 2016-06-24 13:09 ` Edward Cree 1 sibling, 1 reply; 36+ messages in thread From: Yuval Mintz @ 2016-06-23 21:06 UTC (permalink / raw) To: Alexander Duyck Cc: Eric Dumazet, Rick Jones, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa >>> Then again, if you're basically saying every HW-assisted offload on >>> receive should be done under LRO flag, what would be the use case >>> where a GRO-assisted offload would help? >>> I.e., afaik LRO is superior to GRO in `brute force' - >>> it creates better packed packets and utilizes memory better >>> [with all the obvious cons such as inability for defragmentation]. >>> So if you'd have the choice of having an adpater perform 'classic' >>> LRO aggregation or something that resembles a GRO packet, >>> what would be the gain from doing the latter? > LRO and GRO shouldn't really differ in packing or anything like that. > The big difference between the two is that LRO is destructive while > GRO is not. Specifically in the case of GRO you should be able to > take the resultant frame, feed it through GSO, and get the original > stream of frames back out. So you can pack the frames however you > want the only key is that you must capture all the correct offsets and > set the gso_size correct for the flow. While the implementation might lack in things [such as issues with future implementation], following your logic it is GRO - I.e., forwarding scenarios work fine with HW assisted GRO. >> Just to relate to bnx2x/qede differences in current implementation - >> when this GRO hw-offload was added to bnx2x, it has already >> supported classical LRO, and due to above statement whenever LRO >> was set driver aggregated incoming traffic as classic LRO. >> I agree that in hindsight the lack of distinction between sw/hw GRO >> was hurting us. > In the case of bnx2x it sounds like you have issues that are > significantly hurting the performance versus classic software GRO. If > that is the case you might want to simply flip the logic for the > module parameter that Rick mentioned and just disable the hardware > assisted GRO unless it is specifically requested. A bit hard to flip; The module parameter also disables LRO support. And given that module parameters is mostly a thing of the past, I don't think we should strive fixing things through additional changes in that area. > > qede isn't implementing LRO, so we could easily mark this feature > > under LRO there - but question is, given that the adapter can support > > LRO, if we're going to suffer from all the shotrages that arise from > > putting this feature under LRO, why should we bother? > The idea is to address feature isolation. The fact is the hardware > exists outside of kernel control. If you end up linking an internal > kernel feature to your device like this you are essentially stripping > the option of using the kernel feature. > I would prefer to see us extend LRO to support "close enough GRO" > instead of have us extend GRO to also include LRO. Again - why? What's the benefit of HW doing LRO and trying to control imitate GRO, if it's still carrying all the LRO baggage [specifically, disabling it on forwarding] as opposed to simply doing classic LRO? > > You can argue that we might need a new feature bit for control > > over such a feature; If we don't do that, is there any gain in all of this? > I would argue that yes there are many cases where we will be able to > show gain. The fact is there is a strong likelihood of the GRO on > your parts having some differences either now, or at some point in the > future as the code evolves. As I mentioned there was already some > talk about possibly needing to push the UDP tunnel aggregation out of > GRO and perhaps handling it sometime after IP look up had verified > that the destination was in fact a local address in the namespace. In > addition it makes the changes to include the tunnel encapsulation much > more acceptable as LRO is already naturally dropped in the routing and > bridging cases if I recall correctly. I think it all boils down to the question of "do we actually want to have HW-assisted GRO?". If we do [and not necessarily for the UDP-tunnel scenario] then we need to have it distinct from LRO, otherwise there's very little gain. If we believe tGRO should remain SW-only, then I think the discussion is mott; We need to stop trying this, and offload only LRO - in which case we can aggregate it in whichever 'destructive' [correct] format we like, without trying to have it resemble GRO. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-23 21:06 ` Yuval Mintz @ 2016-06-23 23:20 ` Alexander Duyck 2016-06-24 5:20 ` Yuval Mintz 0 siblings, 1 reply; 36+ messages in thread From: Alexander Duyck @ 2016-06-23 23:20 UTC (permalink / raw) To: Yuval Mintz Cc: Eric Dumazet, Rick Jones, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On Thu, Jun 23, 2016 at 2:06 PM, Yuval Mintz <Yuval.Mintz@qlogic.com> wrote: >>>> Then again, if you're basically saying every HW-assisted offload on >>>> receive should be done under LRO flag, what would be the use case >>>> where a GRO-assisted offload would help? >>>> I.e., afaik LRO is superior to GRO in `brute force' - >>>> it creates better packed packets and utilizes memory better >>>> [with all the obvious cons such as inability for defragmentation]. >>>> So if you'd have the choice of having an adpater perform 'classic' >>>> LRO aggregation or something that resembles a GRO packet, >>>> what would be the gain from doing the latter? > >> LRO and GRO shouldn't really differ in packing or anything like that. >> The big difference between the two is that LRO is destructive while >> GRO is not. Specifically in the case of GRO you should be able to >> take the resultant frame, feed it through GSO, and get the original >> stream of frames back out. So you can pack the frames however you >> want the only key is that you must capture all the correct offsets and >> set the gso_size correct for the flow. > > While the implementation might lack in things [such as issues with > future implementation], following your logic it is GRO - I.e., forwarding > scenarios work fine with HW assisted GRO. > >>> Just to relate to bnx2x/qede differences in current implementation - >>> when this GRO hw-offload was added to bnx2x, it has already >>> supported classical LRO, and due to above statement whenever LRO >>> was set driver aggregated incoming traffic as classic LRO. >>> I agree that in hindsight the lack of distinction between sw/hw GRO >>> was hurting us. > >> In the case of bnx2x it sounds like you have issues that are >> significantly hurting the performance versus classic software GRO. If >> that is the case you might want to simply flip the logic for the >> module parameter that Rick mentioned and just disable the hardware >> assisted GRO unless it is specifically requested. > > A bit hard to flip; The module parameter also disables LRO support. > And given that module parameters is mostly a thing of the past, I > don't think we should strive fixing things through additional changes > in that area. > >> > qede isn't implementing LRO, so we could easily mark this feature >> > under LRO there - but question is, given that the adapter can support >> > LRO, if we're going to suffer from all the shotrages that arise from >> > putting this feature under LRO, why should we bother? > >> The idea is to address feature isolation. The fact is the hardware >> exists outside of kernel control. If you end up linking an internal >> kernel feature to your device like this you are essentially stripping >> the option of using the kernel feature. > >> I would prefer to see us extend LRO to support "close enough GRO" >> instead of have us extend GRO to also include LRO. > > Again - why? What's the benefit of HW doing LRO and trying to > control imitate GRO, if it's still carrying all the LRO baggage > [specifically, disabling it on forwarding] as opposed to simply > doing classic LRO? In most cases that is all LRO is. LRO is trying to do the best it can to emulate GRO. The reason why it was pushed off into a separate feature bit is that it never quite works out that way and most vendors end up with something that comes close but is always missing a few items. The "hardware assisted GRO" is really nothing more than a marketing term and a justification to ignore the fact that we should probably be disabling it when routing or bridging is enabled. What you are doing is LRO but using the wrong bit to test for the feature. We already know of one firmware bug you guys have which makes it clear that the bnx2x is not doing hardware assisted GRO it is doing something else since it performs much worse than GRO if the MSS is less than what it would be based on the MTU. >> > You can argue that we might need a new feature bit for control >> > over such a feature; If we don't do that, is there any gain in all of this? > >> I would argue that yes there are many cases where we will be able to >> show gain. The fact is there is a strong likelihood of the GRO on >> your parts having some differences either now, or at some point in the >> future as the code evolves. As I mentioned there was already some >> talk about possibly needing to push the UDP tunnel aggregation out of >> GRO and perhaps handling it sometime after IP look up had verified >> that the destination was in fact a local address in the namespace. In >> addition it makes the changes to include the tunnel encapsulation much >> more acceptable as LRO is already naturally dropped in the routing and >> bridging cases if I recall correctly. > > I think it all boils down to the question of "do we actually want to have > HW-assisted GRO?". If we do [and not necessarily for the UDP-tunnel > scenario] then we need to have it distinct from LRO, otherwise there's > very little gain. If we believe tGRO should remain SW-only, then > I think the discussion is mott; We need to stop trying this, and offload > only LRO - in which case we can aggregate it in whichever 'destructive' > [correct] format we like, without trying to have it resemble GRO. I believe GRO should be software only. The LRO feature is meant to represent hardware assisted aggregation. If you want to take steps to define it further or reduce the limitations so that it is not disabled when routing or bridging is enabled for a device I would be fine with that. One of the reasons why LRO is disabled by default for routing and bridging is because the feature has always been somewhat poorly defined and resulted in us getting all sort of frames, some that were route-able and some that weren't. Arguably your hardware is on the better end of this but it is still not without its issues. Perhaps this might be an opportunity to go through and redefine what LRO is so that we can make better use of it like we did with the ntuple/nfc stuff a few years ago. Then it would be easier to go through and make it programmable so we can do things like turn on and off tunnel offloads, maybe IPv4/IPv6 support, and perhaps have a flag for identifying GRO like LRO versus everything else. That way the end user has more control over the configuration and can pick and choose what they want. - Alex ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-23 23:20 ` Alexander Duyck @ 2016-06-24 5:20 ` Yuval Mintz 2016-06-24 16:44 ` Alexander Duyck 0 siblings, 1 reply; 36+ messages in thread From: Yuval Mintz @ 2016-06-24 5:20 UTC (permalink / raw) To: Alexander Duyck Cc: Eric Dumazet, Rick Jones, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa >We already know of one firmware bug you guys have which makes >it clear that the bnx2x is not doing hardware assisted GRO it is doing >something else since it performs much worse than GRO if the MSS is >less than what it would be based on the MTU. It's a bit nitpicky, isn't it? Claiming this flaw means it's not GRO. I.e., you obviously wouldn't have claimed it beacme GRO if it was fixed. Not saying it makes a lot of difference, though. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-24 5:20 ` Yuval Mintz @ 2016-06-24 16:44 ` Alexander Duyck 0 siblings, 0 replies; 36+ messages in thread From: Alexander Duyck @ 2016-06-24 16:44 UTC (permalink / raw) To: Yuval Mintz Cc: Eric Dumazet, Rick Jones, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On Thu, Jun 23, 2016 at 10:20 PM, Yuval Mintz <Yuval.Mintz@qlogic.com> wrote: >>We already know of one firmware bug you guys have which makes >>it clear that the bnx2x is not doing hardware assisted GRO it is doing >>something else since it performs much worse than GRO if the MSS is >>less than what it would be based on the MTU. > > It's a bit nitpicky, isn't it? Claiming this flaw means it's not GRO. > I.e., you obviously wouldn't have claimed it beacme GRO if it > was fixed. > > Not saying it makes a lot of difference, though. The fact is LRO and GRO are two separate things. Even without the bug in the firmware I would still be saying the are two very different things. Your GRO implementation only supports a subset of the features that GRO has in the hardware. The way the kernel has implemented things we should keep GRO and GSO symmetric if at all possible. There aren't currently any GSO hardware offloads so there probably shouldn't be any GRO hardware offloads. On the other hand devices do support TSO hardware offloads so it would make sense to match that up and support LRO as the hardware equivalent of GRO. Anyway that is my opinion. It may be nitpicky but I don't fee that we should be re-purposing feature bits that were meant to be software features to represent hardware ones. - Alex ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-23 17:07 ` Alexander Duyck 2016-06-23 21:06 ` Yuval Mintz @ 2016-06-24 13:09 ` Edward Cree 2016-06-24 16:31 ` Tom Herbert 1 sibling, 1 reply; 36+ messages in thread From: Edward Cree @ 2016-06-24 13:09 UTC (permalink / raw) To: Alexander Duyck, Yuval Mintz Cc: Eric Dumazet, Rick Jones, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa, Bert Kenward On 23/06/16 18:07, Alexander Duyck wrote: > I would prefer to see us extend LRO to support "close enough GRO" > instead of have us extend GRO to also include LRO. This reminds me of something I've been meaning to bring up (sorry for slightly OT, but it might turn out relevant after all). In sfc we have an (out-of-tree and likely staying that way) LRO that's entirely in software. The only reason it exists is for users who want the 'permissive' merging behaviour of LRO, i.e. they don't need the guarantees of reversibility and by merging more stuff they can get slightly higher performance. I wonder if it would be a good idea for the GRO implementation to have some knobs to allow setting it to behave in this way. That would imply a scheme to define various GRO/SSR semantics, which then would also be a convenient interface for a driver to report the semantics of its hardware LRO if it has any. And it would make crystal clear that the difference between GRO and LRO is kernel vs hardware, rather than reversible vs not. -Ed ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-24 13:09 ` Edward Cree @ 2016-06-24 16:31 ` Tom Herbert 2016-06-24 17:21 ` Edward Cree 0 siblings, 1 reply; 36+ messages in thread From: Tom Herbert @ 2016-06-24 16:31 UTC (permalink / raw) To: Edward Cree Cc: Alexander Duyck, Yuval Mintz, Eric Dumazet, Rick Jones, Manish Chopra, David Miller, netdev, Ariel Elior, Hannes Frederic Sowa, Bert Kenward On Fri, Jun 24, 2016 at 6:09 AM, Edward Cree <ecree@solarflare.com> wrote: > On 23/06/16 18:07, Alexander Duyck wrote: >> I would prefer to see us extend LRO to support "close enough GRO" >> instead of have us extend GRO to also include LRO. > This reminds me of something I've been meaning to bring up (sorry for > slightly OT, but it might turn out relevant after all). > In sfc we have an (out-of-tree and likely staying that way) LRO that's > entirely in software. The only reason it exists is for users who want > the 'permissive' merging behaviour of LRO, i.e. they don't need the > guarantees of reversibility and by merging more stuff they can get > slightly higher performance. > I wonder if it would be a good idea for the GRO implementation to have > some knobs to allow setting it to behave in this way. > That would imply a scheme to define various GRO/SSR semantics, which > then would also be a convenient interface for a driver to report the > semantics of its hardware LRO if it has any. > And it would make crystal clear that the difference between GRO and > LRO is kernel vs hardware, rather than reversible vs not. > Ed, Because you took this OT... ;-) LRO/GRO is the one of the five common offloads that has no generic analogue and requires protocol specific logic. For instance each IP-over-foo encapsulation needs kernel code for GRO, device/driver code for LRO. I think the answer here is to make both GRO and LRO to be user programmable via BPF. That is, instead of needing to add code or buy a new device to support every new protocol, we really just want to write a program for it that runs in any environment. In the case of LRO this becomes especially important since it resolves the "black box" nature of LRO in devices, so design goals like ensuring LRO is "close enough to GRO" become something we (the stack) have some control over. We've already moved GRO for to be a attribute of a UDP sockets, it is not much of a stretch now to allow applications to define their own GRO for their protocols (I'm thinking protocols like QUIC or TOU could use this). For programmable LRO I think the solution is to use XDP. For instance protocol specific parsing would done by the BPF program to identify the flows, and the infrastructure would provide the backend handling. The advantage of XDP for this is that it is not platform specific, so programmable LRO could be implemented in the driver (probably leveraging existing LRO solution like sfc), or it could be implemented implemented in HW using the exact same program (with HW support for BPF/XDP). Since such a program allows arbitrary parsing and flow lookup, we can match on specific n-tuples as needed to resolve the UDP-encapsulation identification problem. Tom > -Ed ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-24 16:31 ` Tom Herbert @ 2016-06-24 17:21 ` Edward Cree 2016-06-26 6:09 ` Yuval Mintz 0 siblings, 1 reply; 36+ messages in thread From: Edward Cree @ 2016-06-24 17:21 UTC (permalink / raw) To: Tom Herbert Cc: Alexander Duyck, Yuval Mintz, Eric Dumazet, Rick Jones, Manish Chopra, David Miller, netdev, Ariel Elior, Hannes Frederic Sowa, Bert Kenward On 24/06/16 17:31, Tom Herbert wrote: > Ed, > Because you took this OT... ;-) > > LRO/GRO is the one of the five common offloads that has no generic > analogue and requires protocol specific logic. For instance each > IP-over-foo encapsulation needs kernel code for GRO, device/driver > code for LRO. I think the answer here is to make both GRO and LRO to > be user programmable via BPF. I agree that the only way to make LRO generic is to go for hardware BPF. However, I think that's likely to cause a _lot_ of headaches to implement and my hope is that we can instead get acceptable receive performance from GRO, RSS, and maybe things like the skb bundling I posted a while back. For instance, if your 'source port hack' were to mix in the TNI as well as the inner flow fields it already uses, I think that could improve hash spreading and thus GRO would perform better. Fundamentally I believe that robust, responsive hardware LRO is not workable as the hardware would have to decide to hold onto packets in the hope of merge candidates arriving soon after. Whereas in the software layer (GRO, bundling...), the packets are already coming in bursts thanks to the way napi polling behaves. But I'd love to be proved wrong :) The 'hybrid' approach of using bpf in hw to identify flows for sw to gro does seem plausible, maybe having bpf to compute the rxhash is the answer? -Ed (disclaimer: definitely not speaking for my employer here, these are my personal views only.) ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-24 17:21 ` Edward Cree @ 2016-06-26 6:09 ` Yuval Mintz 0 siblings, 0 replies; 36+ messages in thread From: Yuval Mintz @ 2016-06-26 6:09 UTC (permalink / raw) To: Edward Cree, Tom Herbert Cc: Alexander Duyck, Eric Dumazet, Rick Jones, Manish Chopra, David Miller, netdev, Ariel Elior, Hannes Frederic Sowa, Bert Kenward > Fundamentally I believe that robust, responsive hardware LRO is not workable as > the hardware would have to decide to hold onto packets in the hope of merge > candidates arriving soon after. Whereas in the software layer (GRO, > bundling...), the packets are already coming in bursts thanks to the way napi > polling behaves. Sounds like an additional coalescing configuration to me [assuming HW supports it]. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 22:56 ` Alexander Duyck 2016-06-22 23:31 ` Eric Dumazet @ 2016-06-22 23:52 ` Rick Jones 2016-06-23 0:18 ` Alexander Duyck 1 sibling, 1 reply; 36+ messages in thread From: Rick Jones @ 2016-06-22 23:52 UTC (permalink / raw) To: Alexander Duyck, Eric Dumazet Cc: Yuval Mintz, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On 06/22/2016 03:56 PM, Alexander Duyck wrote: > On Wed, Jun 22, 2016 at 3:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> On Wed, 2016-06-22 at 14:52 -0700, Rick Jones wrote: >>> Had the bnx2x-driven NICs' firmware not had that rather unfortunate >>> assumption about MSSes I probably would never have noticed. > It could be that you and Rick are running different firmware. I > believe you can expose that via "ethtool -i". This is the ugly bit > about all this. We are offloading GRO into the firmware of these > devices with no idea how any of it works and by linking GRO to LRO on > the same device you are stuck having to accept either the firmware > offload or nothing at all. That is kind of the point Rick was trying > to get at. I think you are typing a bit too far ahead into my keyboard with that last sentence. And I may not have been sufficiently complete in what I wrote. If the bnx2x-driven NICs' firmware had been coalescing more than two segments together, not only would I probably not have noticed, I probably would not have been upset to learn it was NIC-firmware GRO rather than stack. My complaint is the specific bug of coalescing only two segments when their size is unexpected, and the difficulty present in disabling the bnx2x-driven NICs' firmware GRO. I don't have a problem necessarily with the existence of NIC-firmware GRO in general. I just want to be able to enable/disable it easily. rick jones Of course, what I really want are much, Much, MUCH larger MTUs. It isn't for nothing that I used to refer to TSO as "Poor man's Jumbo Frames" :) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 23:52 ` Rick Jones @ 2016-06-23 0:18 ` Alexander Duyck 0 siblings, 0 replies; 36+ messages in thread From: Alexander Duyck @ 2016-06-23 0:18 UTC (permalink / raw) To: Rick Jones Cc: Eric Dumazet, Yuval Mintz, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On Wed, Jun 22, 2016 at 4:52 PM, Rick Jones <rick.jones2@hpe.com> wrote: > On 06/22/2016 03:56 PM, Alexander Duyck wrote: >> >> On Wed, Jun 22, 2016 at 3:47 PM, Eric Dumazet <eric.dumazet@gmail.com> >> wrote: >>> >>> On Wed, 2016-06-22 at 14:52 -0700, Rick Jones wrote: >>>> >>>> Had the bnx2x-driven NICs' firmware not had that rather unfortunate >>>> assumption about MSSes I probably would never have noticed. > > > >> It could be that you and Rick are running different firmware. I >> believe you can expose that via "ethtool -i". This is the ugly bit >> about all this. We are offloading GRO into the firmware of these >> devices with no idea how any of it works and by linking GRO to LRO on >> the same device you are stuck having to accept either the firmware >> offload or nothing at all. That is kind of the point Rick was trying >> to get at. > > > I think you are typing a bit too far ahead into my keyboard with that last > sentence. And I may not have been sufficiently complete in what I wrote. > If the bnx2x-driven NICs' firmware had been coalescing more than two > segments together, not only would I probably not have noticed, I probably > would not have been upset to learn it was NIC-firmware GRO rather than > stack. > > My complaint is the specific bug of coalescing only two segments when their > size is unexpected, and the difficulty present in disabling the bnx2x-driven > NICs' firmware GRO. I don't have a problem necessarily with the existence > of NIC-firmware GRO in general. I just want to be able to enable/disable it > easily. Right. Which is why I thought we were supposed to be using the LRO flag when a NIC is performing the GRO instead of it being done by the kernel. I have no problem with us doing hardware GRO. The only issue I have is with us identifying it as GRO. In my opinion the way I would have preferred to have seen the bnx2x handle all this is to make it that the NETIF_F_LRO flag controlled if the NIC performed aggregation or not and the module parameter determine if the LRO conforms to the GRO type layout for the frame. As it is now I can easily see this causing issues if someone updates how GRO/GSO handles frames without realizing they now need to make sure the bnx2x and qede drivers need to generate the same frame layout as well. - Alex ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 22:47 ` Eric Dumazet 2016-06-22 22:56 ` Alexander Duyck @ 2016-06-22 23:10 ` Rick Jones 2016-06-23 0:48 ` Rick Jones 1 sibling, 1 reply; 36+ messages in thread From: Rick Jones @ 2016-06-22 23:10 UTC (permalink / raw) To: Eric Dumazet Cc: Yuval Mintz, Alexander Duyck, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On 06/22/2016 03:47 PM, Eric Dumazet wrote: > On Wed, 2016-06-22 at 14:52 -0700, Rick Jones wrote: >> On 06/22/2016 11:22 AM, Yuval Mintz wrote: >>> But seriously, this isn't really anything new but rather a step forward in >>> the direction we've already taken - bnx2x/qede are already performing >>> the same for non-encapsulated TCP. >> >> Since you mention bnx2x... I would argue that the NIC firmware on >> those NICs driven by bnx2x is doing it badly. Not so much from a >> functional standpoint I suppose, but from a performance one. The >> NIC-firmware GRO done there has this rather unfortunate assumption about >> "all MSSes will be directly driven by my own physical MTU" and when it >> sees segments of a size other than would be suggested by the physical >> MTU, will coalesce only two segments together. They then do not get >> further coalesced in the stack. >> >> Suffice it to say this does not do well from a performance standpoint. >> >> One can disable LRO via ethtool for these NICs, but what that does is >> disable old-school LRO, not GRO-in-the-NIC. To get that disabled, one >> must also get the bnx2x module loaded with "disable-tpa=1" so the Linux >> stack GRO gets used instead. >> >> Had the bnx2x-driven NICs' firmware not had that rather unfortunate >> assumption about MSSes I probably would never have noticed. > > I do not see this behavior on my bnx2x nics ? > > ip ro add 10.246.11.52 via 10.246.11.254 dev eth0 mtu 1000 > lpk51:~# ./netperf -H 10.246.11.52 -l 1000 > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to > 10.246.11.52 () port 0 AF_INET I first saw this with VMs which themselves had 1400 byte MTUs on their vNICs, speaking though bnx2x-driven NICs with a 1500 byte MTU, but I did later reproduce it by tweaking the MTU of my sending side NIC to something like 1400 bytes and running a "bare iron" netperf. I believe you may be able to achieve the same thing by having netperf set a smaller MSS via the test-specific -G option. My systems are presently in the midst of an install but I should be able to demonstrate it in the morning (US Pacific time, modulo the shuttle service of a car repair place) > On receiver : Paranoid question, but is LRO disabled on the receiver? I don't know that LRO exhibits the behaviour, just GRO-in-the-NIC. rick > > 15:46:08.296241 IP 10.246.11.52.46907 > 10.246.11.51.34131: Flags [.], > ack 303360, win 8192, options [nop,nop,TS val 1245217243 ecr > 1245306446], length 0 > 15:46:08.296430 IP 10.246.11.51.34131 > 10.246.11.52.46907: Flags [.], > seq 303360:327060, ack 1, win 229, options [nop,nop,TS val 1245306446 > ecr 1245217242], length 23700 > 15:46:08.296441 IP 10.246.11.52.46907 > 10.246.11.51.34131: Flags [.], > ack 327060, win 8192, options [nop,nop,TS val 1245217243 ecr > 1245306446], length 0 > 15:46:08.296644 IP 10.246.11.51.34131 > 10.246.11.52.46907: Flags [.], > seq 327060:350760, ack 1, win 229, options [nop,nop,TS val 1245306446 > ecr 1245217242], length 23700 > 15:46:08.296655 IP 10.246.11.52.46907 > 10.246.11.51.34131: Flags [.], > ack 350760, win 8192, options [nop,nop,TS val 1245217244 ecr > 1245306446], length 0 > 15:46:08.296854 IP 10.246.11.51.34131 > 10.246.11.52.46907: Flags [.], > seq 350760:374460, ack 1, win 229, options [nop,nop,TS val 1245306446 > ecr 1245217242], length 23700 > 15:46:08.296897 IP 10.246.11.52.46907 > 10.246.11.51.34131: Flags [.], > ack 374460, win 8192, options [nop,nop,TS val 1245217244 ecr > 1245306446], length 0 > 15:46:08.297054 IP 10.246.11.51.34131 > 10.246.11.52.46907: Flags [.], > seq 374460:398160, ack 1, win 229, options [nop,nop,TS val 1245306446 > ecr 1245217242], length 23700 > 15:46:08.297099 IP 10.246.11.52.46907 > 10.246.11.51.34131: Flags [.], > ack 398160, win 8192, options [nop,nop,TS val 1245217244 ecr > 1245306446], length 0 > 15:46:08.297258 IP 10.246.11.51.34131 > 10.246.11.52.46907: Flags [.], > seq 398160:420912, ack 1, win 229, options [nop,nop,TS val 1245306446 > ecr 1245217242], length 22752 > 15:46:08.297301 IP 10.246.11.52.46907 > 10.246.11.51.34131: Flags [.], > ack 420912, win 8192, options [nop,nop,TS val 1245217244 ecr > 1245306446], length 0 > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 23:10 ` Rick Jones @ 2016-06-23 0:48 ` Rick Jones 2016-06-23 9:03 ` Yuval Mintz 0 siblings, 1 reply; 36+ messages in thread From: Rick Jones @ 2016-06-23 0:48 UTC (permalink / raw) To: Eric Dumazet Cc: Yuval Mintz, Alexander Duyck, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa On 06/22/2016 04:10 PM, Rick Jones wrote: > My systems are presently in the midst of an install but I should be able > to demonstrate it in the morning (US Pacific time, modulo the shuttle > service of a car repair place) The installs finished sooner than I thought. So, receiver: root@np-cp1-comp0001-mgmt:/home/stack# uname -a Linux np-cp1-comp0001-mgmt 4.4.11-2-amd64-hpelinux #hpelinux1 SMP Mon May 23 15:39:22 UTC 2016 x86_64 GNU/Linux root@np-cp1-comp0001-mgmt:/home/stack# ethtool -i hed2 driver: bnx2x version: 1.712.30-0 firmware-version: bc 7.10.10 bus-info: 0000:05:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes the hed2 interface is a port of an HPE 630M NIC, based on the BCM57840: 05:00.0 Ethernet controller: Broadcom Corporation BCM57840 NetXtreme II 10/20-Gigabit Ethernet (rev 11) Subsystem: Hewlett-Packard Company HP FlexFabric 20Gb 2-port 630M Adapter (The pci.ids entry being from before that 10 GbE IP was purchased from Broadcom by QLogic...) Verify that LRO is disabled (IIRC it is enabled by default): root@np-cp1-comp0001-mgmt:/home/stack# ethtool -k hed2 | grep large large-receive-offload: off Verify that disable_tpa is not set: root@np-cp1-comp0001-mgmt:/home/stack# cat /sys/module/bnx2x/parameters/disable_tpa 0 So this means we will see NIC-firmware GRO. Start a tcpdump on the receiver: root@np-cp1-comp0001-mgmt:/home/stack# tcpdump -s 96 -c 2000000 -i hed2 -w foo.pcap port 12867 tcpdump: listening on hed2, link-type EN10MB (Ethernet), capture size 96 bytes Start a netperf test targeting that system, specifying a smaller MSS: stack@np-cp1-comp0002-mgmt:~$ ./netperf -H np-cp1-comp0001-guest -- -G 1400 -P 12867 -O throughput,transport_mss MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 12867 AF_INET to np-cp1-comp0001-guest () port 12867 AF_INET : demo Throughput Transport MSS bytes 3372.82 1388 Come back to the receiver and post-process the tcpdump capture to get the average segment size for the data segments: 2000000 packets captured 2000916 packets received by filter 0 packets dropped by kernel root@np-cp1-comp0001-mgmt:/home/stack# tcpdump -n -r foo.pcap | fgrep -v "length 0" | awk '{sum += $NF}END{print "Average:",sum/NR}' reading from file foo.pcap, link-type EN10MB (Ethernet) Average: 2741.93 and finally a snippet of the capture: 00:37:47.333414 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [S], seq 1236484791, win 28000, options [mss 1400,sackOK,TS val 1491134 ecr 0,nop,wscale 7], length 0 00:37:47.333488 IP 192.168.2.7.12867 > 192.168.2.8.12867: Flags [S.], seq 134167501, ack 1236484792, win 28960, options [mss 1460,sackOK,TS val 1499053 ecr 1491134,nop,wscale 7], length 0 00:37:47.333731 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 0 00:37:47.333788 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], seq 1:2777, ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 2776 00:37:47.333815 IP 192.168.2.7.12867 > 192.168.2.8.12867: Flags [.], ack 2777, win 270, options [nop,nop,TS val 1499053 ecr 1491134], length 0 00:37:47.333822 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], seq 2777:5553, ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 2776 00:37:47.333837 IP 192.168.2.7.12867 > 192.168.2.8.12867: Flags [.], ack 5553, win 313, options [nop,nop,TS val 1499053 ecr 1491134], length 0 00:37:47.333842 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], seq 5553:8329, ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 2776 00:37:47.333856 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], seq 8329:11105, ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 2776 00:37:47.333869 IP 192.168.2.7.12867 > 192.168.2.8.12867: Flags [.], ack 8329, win 357, options [nop,nop,TS val 1499053 ecr 1491134], length 0 00:37:47.333879 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], seq 11105:13881, ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 2776 00:37:47.333891 IP 192.168.2.7.12867 > 192.168.2.8.12867: Flags [.], ack 11105, win 400, options [nop,nop,TS val 1499053 ecr 1491134], length 0 00:37:47.333911 IP 192.168.2.7.12867 > 192.168.2.8.12867: Flags [.], ack 13881, win 444, options [nop,nop,TS val 1499053 ecr 1491134], length 0 00:37:47.333964 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], seq 13881:16657, ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 2776 00:37:47.333982 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], seq 16657:19433, ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 2776 00:37:47.333989 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], seq 19433:22209, ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 2776 00:37:47.333994 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], seq 22209:24985, ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 2776 00:37:47.334011 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], seq 24985:27761, ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 2776 00:37:47.334018 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], seq 27761:30537, ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 2776 00:37:47.334025 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], seq 30537:33313, ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 2776 00:37:47.334031 IP 192.168.2.8.12867 > 192.168.2.7.12867: Flags [.], seq 33313:36089, ack 1, win 219, options [nop,nop,TS val 1491134 ecr 1499053], length 2776 2776 being twice 1388. happy benchmarking, rick jones root@np-cp1-comp0001-mgmt:/home/stack# tcpdump -n -r foo.pcap | fgrep -v -e "length 0" | fgrep length | awk '{print $NF}' | sort | uniq -c | sort -nr reading from file foo.pcap, link-type EN10MB (Ethernet) 1400584 2776 5930 1388 17 2544 15 1620 13 2560 13 2000 12 2604 12 2504 12 2456 12 1708 ... ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-23 0:48 ` Rick Jones @ 2016-06-23 9:03 ` Yuval Mintz 0 siblings, 0 replies; 36+ messages in thread From: Yuval Mintz @ 2016-06-23 9:03 UTC (permalink / raw) To: Rick Jones, Eric Dumazet Cc: Alexander Duyck, Manish Chopra, David Miller, netdev, Ariel Elior, Tom Herbert, Hannes Frederic Sowa > > My systems are presently in the midst of an install but I should be > > able to demonstrate it in the morning (US Pacific time, modulo the > > shuttle service of a car repair place) > > stack@np-cp1-comp0002-mgmt:~$ ./netperf -H np-cp1-comp0001-guest -- -G > 1400 -P 12867 -O throughput,transport_mss MIGRATED TCP STREAM TEST from > 0.0.0.0 (0.0.0.0) port 12867 AF_INET to np-cp1-comp0001-guest () port 12867 > AF_INET : demo Throughput Transport > MSS > bytes > > 3372.82 1388 ... > root@np-cp1-comp0001-mgmt:/home/stack# tcpdump -n -r foo.pcap | fgrep -v > "length 0" | awk '{sum += $NF}END{print "Average:",sum/NR}' > reading from file foo.pcap, link-type EN10MB (Ethernet) > Average: 2741.93 Yes, it's a known FW limitation - if MSS is smaller than configured MTU, aggregation will be closed on 2nd packet; It might get changed in some future FW version. But I agree it reinforces the need of having some kind of user-knob for controlling this offloaded feature. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support 2016-06-22 21:52 ` Rick Jones 2016-06-22 22:47 ` Eric Dumazet @ 2016-06-26 19:53 ` David Miller 1 sibling, 0 replies; 36+ messages in thread From: David Miller @ 2016-06-26 19:53 UTC (permalink / raw) To: rick.jones2 Cc: Yuval.Mintz, alexander.duyck, manish.chopra, netdev, Ariel.Elior, tom, hannes From: Rick Jones <rick.jones2@hpe.com> Date: Wed, 22 Jun 2016 14:52:34 -0700 > To get that disabled, one must also get the bnx2x module loaded with > "disable-tpa=1" so the Linux stack GRO gets used instead. And people wonder why I'm difficult about module parameters in drivers.... ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2016-06-26 19:53 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-06-22 8:25 [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support Manish Chopra 2016-06-22 8:25 ` [PATCH net-next 1/5] net: export udp and gre gro_complete() APIs Manish Chopra 2016-06-22 8:25 ` [PATCH net-next 2/5] qede: Add support to handle VXLAN hardware GRO packets Manish Chopra 2016-06-22 8:25 ` [PATCH net-next 3/5] qede: Add support to handle GENEVE " Manish Chopra 2016-06-22 8:25 ` [PATCH net-next 4/5] qede: Add support to handle GRE " Manish Chopra 2016-06-22 8:25 ` [PATCH net-next 5/5] qed: Enable hardware GRO feature for encapsulated packets Manish Chopra 2016-06-22 16:27 ` [PATCH net-next 0/5] qed/qede: Tunnel hardware GRO support Alexander Duyck 2016-06-22 17:16 ` Yuval Mintz 2016-06-22 17:45 ` Alexander Duyck 2016-06-22 18:22 ` Yuval Mintz 2016-06-22 21:32 ` Alexander Duyck 2016-06-22 22:32 ` Hannes Frederic Sowa 2016-06-22 23:42 ` Eric Dumazet 2016-06-22 21:52 ` Rick Jones 2016-06-22 22:47 ` Eric Dumazet 2016-06-22 22:56 ` Alexander Duyck 2016-06-22 23:31 ` Eric Dumazet 2016-06-22 23:59 ` Tom Herbert 2016-06-23 0:11 ` Alexander Duyck 2016-06-23 4:10 ` Yuval Mintz 2016-06-23 4:17 ` Yuval Mintz 2016-06-23 17:07 ` Alexander Duyck 2016-06-23 21:06 ` Yuval Mintz 2016-06-23 23:20 ` Alexander Duyck 2016-06-24 5:20 ` Yuval Mintz 2016-06-24 16:44 ` Alexander Duyck 2016-06-24 13:09 ` Edward Cree 2016-06-24 16:31 ` Tom Herbert 2016-06-24 17:21 ` Edward Cree 2016-06-26 6:09 ` Yuval Mintz 2016-06-22 23:52 ` Rick Jones 2016-06-23 0:18 ` Alexander Duyck 2016-06-22 23:10 ` Rick Jones 2016-06-23 0:48 ` Rick Jones 2016-06-23 9:03 ` Yuval Mintz 2016-06-26 19:53 ` David Miller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.