This patch series contains the implementation of the RFC that was posted on this mailing list previously: https://www.spinics.net/lists/netdev/msg541709.html It fixes having incorrect skb->transport_header values in cases when dissect fails. Having correct values set by the kernel fixes mlx5 operation and allows to remove some unnecessary code flows in mlx5. Maxim Mikityanskiy (7): net: Don't set transport offset to invalid value net: Introduce parse_protocol header_ops callback net/ethernet: Add parse_protocol header_ops support net/packet: Ask driver for protocol if not provided by user net/packet: Remove redundant skb->protocol set net/mlx5e: Remove the wrong assumption about transport offset net/mlx5e: Trust kernel regarding transport offset .../net/ethernet/mellanox/mlx5/core/en_tx.c | 15 ++--------- drivers/net/tap.c | 4 +-- drivers/net/tun.c | 4 +-- drivers/net/xen-netback/netback.c | 19 ++++++++------ include/linux/etherdevice.h | 1 + include/linux/netdevice.h | 10 +++++++ include/linux/skbuff.h | 14 +++++----- net/ethernet/eth.c | 13 ++++++++++ net/packet/af_packet.c | 26 +++++++++---------- 9 files changed, 60 insertions(+), 46 deletions(-) -- 2.19.1
If the socket was created with socket(AF_PACKET, SOCK_RAW, 0), skb->protocol will be unset, __skb_flow_dissect() will fail, and skb_probe_transport_header() will fall back to the offset_hint, making the resulting skb_transport_offset incorrect. If, however, there is no transport header in the packet, transport_header shouldn't be set to an arbitrary value. Fix it by leaving the transport offset unset if it couldn't be found, to be explicit rather than to fill it with some wrong value. It changes the behavior, but if some code relied on the old behavior, it would be broken anyway, as the old one is incorrect. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> --- drivers/net/tap.c | 4 ++-- drivers/net/tun.c | 4 ++-- drivers/net/xen-netback/netback.c | 19 +++++++++++-------- include/linux/skbuff.h | 14 +++++++------- net/packet/af_packet.c | 6 +++--- 5 files changed, 25 insertions(+), 22 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 443b2694130c..a35b44b13a34 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -712,7 +712,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control, goto err_kfree; } - skb_probe_transport_header(skb, ETH_HLEN); + skb_try_probe_transport_header(skb); /* Move network header to the right position for VLAN tagged packets */ if ((skb->protocol == htons(ETH_P_8021Q) || @@ -1177,7 +1177,7 @@ static int tap_get_user_xdp(struct tap_queue *q, struct xdp_buff *xdp) goto err_kfree; } - skb_probe_transport_header(skb, ETH_HLEN); + skb_try_probe_transport_header(skb); /* Move network header to the right position for VLAN tagged packets */ if ((skb->protocol == htons(ETH_P_8021Q) || diff --git a/drivers/net/tun.c b/drivers/net/tun.c index a4fdad475594..f73a156379e6 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1927,7 +1927,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, } skb_reset_network_header(skb); - skb_probe_transport_header(skb, 0); + skb_try_probe_transport_header(skb); if (skb_xdp) { struct bpf_prog *xdp_prog; @@ -2480,7 +2480,7 @@ static int tun_xdp_one(struct tun_struct *tun, skb->protocol = eth_type_trans(skb, tun->dev); skb_reset_network_header(skb); - skb_probe_transport_header(skb, 0); + skb_try_probe_transport_header(skb); if (skb_xdp) { err = do_xdp_generic(xdp_prog, skb); diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 80aae3a32c2a..b49b6e56ca47 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1105,6 +1105,7 @@ static int xenvif_tx_submit(struct xenvif_queue *queue) struct xen_netif_tx_request *txp; u16 pending_idx; unsigned data_len; + bool th_set; pending_idx = XENVIF_TX_CB(skb)->pending_idx; txp = &queue->pending_tx_info[pending_idx].req; @@ -1169,20 +1170,22 @@ static int xenvif_tx_submit(struct xenvif_queue *queue) continue; } - skb_probe_transport_header(skb, 0); + th_set = skb_try_probe_transport_header(skb); /* If the packet is GSO then we will have just set up the * transport header offset in checksum_setup so it's now * straightforward to calculate gso_segs. */ if (skb_is_gso(skb)) { - int mss = skb_shinfo(skb)->gso_size; - int hdrlen = skb_transport_header(skb) - - skb_mac_header(skb) + - tcp_hdrlen(skb); - - skb_shinfo(skb)->gso_segs = - DIV_ROUND_UP(skb->len - hdrlen, mss); + if (likely(th_set)) { /* GSO implies having L4 header */ + int mss = skb_shinfo(skb)->gso_size; + int hdrlen = skb_transport_header(skb) - + skb_mac_header(skb) + + tcp_hdrlen(skb); + + skb_shinfo(skb)->gso_segs = + DIV_ROUND_UP(skb->len - hdrlen, mss); + } } queue->stats.rx_bytes += skb->len; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2a57a365c711..b3aa2be1afb3 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2424,18 +2424,18 @@ static inline void skb_pop_mac_header(struct sk_buff *skb) skb->mac_header = skb->network_header; } -static inline void skb_probe_transport_header(struct sk_buff *skb, - const int offset_hint) +static inline bool skb_try_probe_transport_header(struct sk_buff *skb) { struct flow_keys_basic keys; if (skb_transport_header_was_set(skb)) - return; + return true; - if (skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0, 0)) - skb_set_transport_header(skb, keys.control.thoff); - else - skb_set_transport_header(skb, offset_hint); + if (!skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0, 0)) + return false; + + skb_set_transport_header(skb, keys.control.thoff); + return true; } static inline void skb_mac_header_rebuild(struct sk_buff *skb) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index eedacdebcd4c..8fc76e68777a 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1970,7 +1970,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg, if (unlikely(extra_len == 4)) skb->no_fcs = 1; - skb_probe_transport_header(skb, 0); + skb_try_probe_transport_header(skb); dev_queue_xmit(skb); rcu_read_unlock(); @@ -2519,7 +2519,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, len = ((to_write > len_max) ? len_max : to_write); } - skb_probe_transport_header(skb, 0); + skb_try_probe_transport_header(skb); return tp_len; } @@ -2924,7 +2924,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) virtio_net_hdr_set_proto(skb, &vnet_hdr); } - skb_probe_transport_header(skb, reserve); + skb_try_probe_transport_header(skb); if (unlikely(extra_len == 4)) skb->no_fcs = 1; -- 2.19.1
Introduce a new optional header_ops callback called parse_protocol and a wrapper function dev_parse_header_protocol, similar to dev_parse_header. The new callback's purpose is to extract the protocol number from the L2 header, the format of which is known to the driver, but not to the upper layers of the stack. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> --- include/linux/netdevice.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 1377d085ef99..44b12430b0a7 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -274,6 +274,7 @@ struct header_ops { const struct net_device *dev, const unsigned char *haddr); bool (*validate)(const char *ll_header, unsigned int len); + __be16 (*parse_protocol)(const struct sk_buff *skb); }; /* These flag bits are private to the generic network queueing @@ -2928,6 +2929,15 @@ static inline int dev_parse_header(const struct sk_buff *skb, return dev->header_ops->parse(skb, haddr); } +static inline __be16 dev_parse_header_protocol(const struct sk_buff *skb) +{ + const struct net_device *dev = skb->dev; + + if (!dev->header_ops || !dev->header_ops->parse_protocol) + return 0; + return dev->header_ops->parse_protocol(skb); +} + /* ll_header must have at least hard_header_len allocated */ static inline bool dev_validate_header(const struct net_device *dev, char *ll_header, int len) -- 2.19.1
The previous commit introduced parse_protocol callback which should extract the protocol number from the L2 header. Make all Ethernet devices support it. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> --- include/linux/etherdevice.h | 1 + net/ethernet/eth.c | 13 +++++++++++++ 2 files changed, 14 insertions(+) diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index 2c0af7b00715..e2f3b21cd72a 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -44,6 +44,7 @@ int eth_header_cache(const struct neighbour *neigh, struct hh_cache *hh, __be16 type); void eth_header_cache_update(struct hh_cache *hh, const struct net_device *dev, const unsigned char *haddr); +__be16 eth_header_parse_protocol(const struct sk_buff *skb); int eth_prepare_mac_addr_change(struct net_device *dev, void *p); void eth_commit_mac_addr_change(struct net_device *dev, void *p); int eth_mac_addr(struct net_device *dev, void *p); diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index 4c520110b04f..f7a3d7a171c7 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -264,6 +264,18 @@ void eth_header_cache_update(struct hh_cache *hh, } EXPORT_SYMBOL(eth_header_cache_update); +/** + * eth_header_parser_protocol - extract protocol from L2 header + * @skb: packet to extract protocol from + */ +__be16 eth_header_parse_protocol(const struct sk_buff *skb) +{ + const struct ethhdr *eth = eth_hdr(skb); + + return eth->h_proto; +} +EXPORT_SYMBOL(eth_header_parse_protocol); + /** * eth_prepare_mac_addr_change - prepare for mac change * @dev: network device @@ -346,6 +358,7 @@ const struct header_ops eth_header_ops ____cacheline_aligned = { .parse = eth_header_parse, .cache = eth_header_cache, .cache_update = eth_header_cache_update, + .parse_protocol = eth_header_parse_protocol, }; /** -- 2.19.1
If a socket was created with socket(AF_PACKET, SOCK_RAW, 0), the protocol number is unavailable. Try to ask the driver to extract it from the L2 header in order for skb_try_probe_transport_header to succeed. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> --- net/packet/af_packet.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 8fc76e68777a..d1d89749a17a 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1850,6 +1850,15 @@ static int packet_rcv_spkt(struct sk_buff *skb, struct net_device *dev, return 0; } +static void packet_parse_headers(struct sk_buff *skb, struct socket *sock) +{ + if (!skb->protocol && sock->type == SOCK_RAW) { + skb_reset_mac_header(skb); + skb->protocol = dev_parse_header_protocol(skb); + } + + skb_try_probe_transport_header(skb); +} /* * Output a raw packet to a device layer. This bypasses all the other @@ -1970,7 +1979,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg, if (unlikely(extra_len == 4)) skb->no_fcs = 1; - skb_try_probe_transport_header(skb); + packet_parse_headers(skb, sock); dev_queue_xmit(skb); rcu_read_unlock(); @@ -2519,7 +2528,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, len = ((to_write > len_max) ? len_max : to_write); } - skb_try_probe_transport_header(skb); + packet_parse_headers(skb, sock); return tp_len; } @@ -2924,7 +2933,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) virtio_net_hdr_set_proto(skb, &vnet_hdr); } - skb_try_probe_transport_header(skb); + packet_parse_headers(skb, sock); if (unlikely(extra_len == 4)) skb->no_fcs = 1; -- 2.19.1
c72219b75f introduced tpacket_set_protocol that parses the Ethernet L2 header and sets skb->protocol if it's unset. It is no longer needed since the introduction of packet_parse_headers. In case of SOCK_RAW and unset skb->protocol, packet_parse_headers asks the driver to tell the protocol number, and it's implemented for all Ethernet devices. As the old function supported only Ethernet, no functionality is lost. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> --- net/packet/af_packet.c | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index d1d89749a17a..e41bddf3d8bb 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2413,15 +2413,6 @@ static void tpacket_destruct_skb(struct sk_buff *skb) sock_wfree(skb); } -static void tpacket_set_protocol(const struct net_device *dev, - struct sk_buff *skb) -{ - if (dev->type == ARPHRD_ETHER) { - skb_reset_mac_header(skb); - skb->protocol = eth_hdr(skb)->h_proto; - } -} - static int __packet_snd_vnet_parse(struct virtio_net_hdr *vnet_hdr, size_t len) { if ((vnet_hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && @@ -2492,8 +2483,6 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, return err; if (!dev_validate_header(dev, skb->data, hdrlen)) return -EINVAL; - if (!skb->protocol) - tpacket_set_protocol(dev, skb); data += hdrlen; to_write -= hdrlen; -- 2.19.1
skb_transport_offset() == 0 is not a special value. The only special value is when skb->transport_header is ~0U, and it's checked by skb_transport_header_was_set(). Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index 598ad7e4d5c9..563fdb1dd15e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -172,15 +172,8 @@ static inline u16 mlx5e_calc_min_inline(enum mlx5_inline_modes mode, hlen += VLAN_HLEN; break; case MLX5_INLINE_MODE_IP: - /* When transport header is set to zero, it means no transport - * header. When transport header is set to 0xff's, it means - * transport header wasn't set. - */ - if (skb_transport_offset(skb)) { - hlen = mlx5e_skb_l3_header_offset(skb); - break; - } - /* fall through */ + hlen = mlx5e_skb_l3_header_offset(skb); + break; case MLX5_INLINE_MODE_L2: default: hlen = mlx5e_skb_l2_header_offset(skb); -- 2.19.1
After AF_PACKET is fixed to calculate the transport header offset correctly, trust the value set by the kernel. If the offset wasn't set, it means there is no transport header in the packet. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index 563fdb1dd15e..119bbbbe4ad0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -148,12 +148,8 @@ static inline int mlx5e_skb_l2_header_offset(struct sk_buff *skb) static inline int mlx5e_skb_l3_header_offset(struct sk_buff *skb) { - struct flow_keys keys; - if (skb_transport_header_was_set(skb)) return skb_transport_offset(skb); - else if (skb_flow_dissect_flow_keys(skb, &keys, 0)) - return keys.control.thoff; else return mlx5e_skb_l2_header_offset(skb); } -- 2.19.1
I just realized that the window is still closed. I'll resubmit it when it opens. Sorry for the extra noise.
> -----Original Message-----
> From: Maxim Mikityanskiy
> Sent: 14 January, 2019 15:19
> To: David S. Miller <davem@davemloft.net>; Saeed Mahameed
> <saeedm@mellanox.com>; Willem de Bruijn <willemb@google.com>; Jason Wang
> <jasowang@redhat.com>; Eric Dumazet <edumazet@google.com>
> Cc: netdev@vger.kernel.org; Eran Ben Elisha <eranbe@mellanox.com>; Tariq
> Toukan <tariqt@mellanox.com>; Maxim Mikityanskiy <maximmi@mellanox.com>
> Subject: [PATCH 0/7] AF_PACKET transport_offset fix
>
> This patch series contains the implementation of the RFC that was posted
> on this mailing list previously:
> https://www.spinics.net/lists/netdev/msg541709.html
>
> It fixes having incorrect skb->transport_header values in cases when
> dissect fails. Having correct values set by the kernel fixes mlx5
> operation and allows to remove some unnecessary code flows in mlx5.
>
> Maxim Mikityanskiy (7):
> net: Don't set transport offset to invalid value
> net: Introduce parse_protocol header_ops callback
> net/ethernet: Add parse_protocol header_ops support
> net/packet: Ask driver for protocol if not provided by user
> net/packet: Remove redundant skb->protocol set
> net/mlx5e: Remove the wrong assumption about transport offset
> net/mlx5e: Trust kernel regarding transport offset
>
> .../net/ethernet/mellanox/mlx5/core/en_tx.c | 15 ++---------
> drivers/net/tap.c | 4 +--
> drivers/net/tun.c | 4 +--
> drivers/net/xen-netback/netback.c | 19 ++++++++------
> include/linux/etherdevice.h | 1 +
> include/linux/netdevice.h | 10 +++++++
> include/linux/skbuff.h | 14 +++++-----
> net/ethernet/eth.c | 13 ++++++++++
> net/packet/af_packet.c | 26 +++++++++----------
> 9 files changed, 60 insertions(+), 46 deletions(-)
>
> --
> 2.19.1
On Mon, Jan 14, 2019 at 8:20 AM Maxim Mikityanskiy <maximmi@mellanox.com> wrote: > > If the socket was created with socket(AF_PACKET, SOCK_RAW, 0), > skb->protocol will be unset, __skb_flow_dissect() will fail, and > skb_probe_transport_header() will fall back to the offset_hint, making > the resulting skb_transport_offset incorrect. > > If, however, there is no transport header in the packet, > transport_header shouldn't be set to an arbitrary value. > > Fix it by leaving the transport offset unset if it couldn't be found, to > be explicit rather than to fill it with some wrong value. It changes the > behavior, but if some code relied on the old behavior, it would be > broken anyway, as the old one is incorrect. > > > Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> > --- > drivers/net/tap.c | 4 ++-- > drivers/net/tun.c | 4 ++-- > drivers/net/xen-netback/netback.c | 19 +++++++++++-------- > include/linux/skbuff.h | 14 +++++++------- > net/packet/af_packet.c | 6 +++--- > 5 files changed, 25 insertions(+), 22 deletions(-) This is a lot of code change. This would do. @@ -2434,8 +2434,6 @@ static inline void skb_probe_transport_header(struct sk_buff *skb, if (skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0, 0)) skb_set_transport_header(skb, keys.control.thoff); - else - skb_set_transport_header(skb, offset_hint); } Though leaving an unused argument is a bit ugly. For net-next, indeed better to clean up (please mark your patchset with net or net-next, btw) > > diff --git a/drivers/net/tap.c b/drivers/net/tap.c > index 443b2694130c..a35b44b13a34 100644 > --- a/drivers/net/tap.c > +++ b/drivers/net/tap.c > @@ -712,7 +712,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control, > goto err_kfree; > } > > - skb_probe_transport_header(skb, ETH_HLEN); > + skb_try_probe_transport_header(skb); > > /* Move network header to the right position for VLAN tagged packets */ > if ((skb->protocol == htons(ETH_P_8021Q) || > @@ -1177,7 +1177,7 @@ static int tap_get_user_xdp(struct tap_queue *q, struct xdp_buff *xdp) > goto err_kfree; > } > > - skb_probe_transport_header(skb, ETH_HLEN); > + skb_try_probe_transport_header(skb); > > /* Move network header to the right position for VLAN tagged packets */ > if ((skb->protocol == htons(ETH_P_8021Q) || > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index a4fdad475594..f73a156379e6 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -1927,7 +1927,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, > } > > skb_reset_network_header(skb); > - skb_probe_transport_header(skb, 0); > + skb_try_probe_transport_header(skb); > > if (skb_xdp) { > struct bpf_prog *xdp_prog; > @@ -2480,7 +2480,7 @@ static int tun_xdp_one(struct tun_struct *tun, > > skb->protocol = eth_type_trans(skb, tun->dev); > skb_reset_network_header(skb); > - skb_probe_transport_header(skb, 0); > + skb_try_probe_transport_header(skb); > > if (skb_xdp) { > err = do_xdp_generic(xdp_prog, skb); > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c > index 80aae3a32c2a..b49b6e56ca47 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -1105,6 +1105,7 @@ static int xenvif_tx_submit(struct xenvif_queue *queue) > struct xen_netif_tx_request *txp; > u16 pending_idx; > unsigned data_len; > + bool th_set; > > pending_idx = XENVIF_TX_CB(skb)->pending_idx; > txp = &queue->pending_tx_info[pending_idx].req; > @@ -1169,20 +1170,22 @@ static int xenvif_tx_submit(struct xenvif_queue *queue) > continue; > } > > - skb_probe_transport_header(skb, 0); > + th_set = skb_try_probe_transport_header(skb); Can use skb_transport_header_was_set(). Then at least there is no need to change the function's return value. Normally, a GSO packet will also have skb_transport_header set even from packet sockets, due to skb_partial_csum_set called from virtio_net_hdr_to_skb. But I don't think we can be sure that TSO packets without checksum offload are dropped before reaching this code.
On Mon, Jan 14, 2019 at 8:20 AM Maxim Mikityanskiy <maximmi@mellanox.com> wrote: > > If a socket was created with socket(AF_PACKET, SOCK_RAW, 0), the > protocol number is unavailable. Try to ask the driver to extract it from > the L2 header in order for skb_try_probe_transport_header to succeed. > > Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> > --- > net/packet/af_packet.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index 8fc76e68777a..d1d89749a17a 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -1850,6 +1850,15 @@ static int packet_rcv_spkt(struct sk_buff *skb, struct net_device *dev, > return 0; > } > > +static void packet_parse_headers(struct sk_buff *skb, struct socket *sock) > +{ > + if (!skb->protocol && sock->type == SOCK_RAW) { > + skb_reset_mac_header(skb); > + skb->protocol = dev_parse_header_protocol(skb); > + } > + > + skb_try_probe_transport_header(skb); > +} In relation to the discussion at af_packet: fix raw sockets over 6in4 tunnel http://patchwork.ozlabs.org/patch/1023623/ if adding a new header_ops callback to parse link layer headers, please have it return both protocol and link layer header length. also, this information may be needed earlier in (t)packet_snd than the transport header, so I would suggest not combining the two in a new packet_parse_headers wrapper function.
On Mon, Jan 14, 2019 at 11:52 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Mon, Jan 14, 2019 at 8:20 AM Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
> >
> > If a socket was created with socket(AF_PACKET, SOCK_RAW, 0), the
> > protocol number is unavailable. Try to ask the driver to extract it from
> > the L2 header in order for skb_try_probe_transport_header to succeed.
> >
> > Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> > ---
> > net/packet/af_packet.c | 15 ++++++++++++---
> > 1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> > index 8fc76e68777a..d1d89749a17a 100644
> > --- a/net/packet/af_packet.c
> > +++ b/net/packet/af_packet.c
> > @@ -1850,6 +1850,15 @@ static int packet_rcv_spkt(struct sk_buff *skb, struct net_device *dev,
> > return 0;
> > }
> >
> > +static void packet_parse_headers(struct sk_buff *skb, struct socket *sock)
> > +{
> > + if (!skb->protocol && sock->type == SOCK_RAW) {
> > + skb_reset_mac_header(skb);
> > + skb->protocol = dev_parse_header_protocol(skb);
> > + }
> > +
> > + skb_try_probe_transport_header(skb);
> > +}
>
>
> In relation to the discussion at
>
> af_packet: fix raw sockets over 6in4 tunnel
> http://patchwork.ozlabs.org/patch/1023623/
>
> if adding a new header_ops callback to parse link layer headers,
> please have it return both protocol and link layer header length.
This could just be an extension of existing header_ops->validate.
> > > +static void packet_parse_headers(struct sk_buff *skb, struct socket > *sock) > > > +{ > > > + if (!skb->protocol && sock->type == SOCK_RAW) { > > > + skb_reset_mac_header(skb); > > > + skb->protocol = dev_parse_header_protocol(skb); > > > + } > > > + > > > + skb_try_probe_transport_header(skb); > > > +} > > > > > > In relation to the discussion at > > > > af_packet: fix raw sockets over 6in4 tunnel > > http://patchwork.ozlabs.org/patch/1023623/ > > > > if adding a new header_ops callback to parse link layer headers, > > please have it return both protocol and link layer header length. Sorry, I miss the point here, can you elaborate more? If all you need is to have some header_ops callback that returns the L2 header length, there is one already, it's called parse. Or do you have a specific reason why you want my callback to also return the header length? > This could just be an extension of existing header_ops->validate. If you suggest extending an existing function, parse looks more suitable, but I decided not to touch the existing ones for two reasons: 1. I don't want to break the existing code that uses the parse function and will need to be modified to pass an extra parameter. 2. I don't want to spend machine time on copying the destination MAC when I only need the protocol, and vice versa. I'm looking forward to hearing your thoughts about it.
> This is a lot of code change. This would do. > > @@ -2434,8 +2434,6 @@ static inline void > skb_probe_transport_header(struct sk_buff *skb, > > if (skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0, 0)) > skb_set_transport_header(skb, keys.control.thoff); > - else > - skb_set_transport_header(skb, offset_hint); > } > > Though leaving an unused argument is a bit ugly. For net-next, indeed > better to clean up (please mark your patchset with net or net-next, > btw) It's for net-next (I'll resend with the correct mark), so I'll stick with the current implementation. > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen- > netback/netback.c > > index 80aae3a32c2a..b49b6e56ca47 100644 > > --- a/drivers/net/xen-netback/netback.c > > +++ b/drivers/net/xen-netback/netback.c > > @@ -1105,6 +1105,7 @@ static int xenvif_tx_submit(struct xenvif_queue > *queue) > > struct xen_netif_tx_request *txp; > > u16 pending_idx; > > unsigned data_len; > > + bool th_set; > > > > pending_idx = XENVIF_TX_CB(skb)->pending_idx; > > txp = &queue->pending_tx_info[pending_idx].req; > > @@ -1169,20 +1170,22 @@ static int xenvif_tx_submit(struct xenvif_queue > *queue) > > continue; > > } > > > > - skb_probe_transport_header(skb, 0); > > + th_set = skb_try_probe_transport_header(skb); > > Can use skb_transport_header_was_set(). Then at least there is no need > to change the function's return value. I suppose this comment relates to the previous one, and if we do it for net-next, it's fine to make change I made, isn't it? Thanks for reviewing.
On Thu, Jan 17, 2019 at 4:10 AM Maxim Mikityanskiy <maximmi@mellanox.com> wrote: > > > This is a lot of code change. This would do. > > > > @@ -2434,8 +2434,6 @@ static inline void > > skb_probe_transport_header(struct sk_buff *skb, > > > > if (skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0, 0)) > > skb_set_transport_header(skb, keys.control.thoff); > > - else > > - skb_set_transport_header(skb, offset_hint); > > } > > > > Though leaving an unused argument is a bit ugly. For net-next, indeed > > better to clean up (please mark your patchset with net or net-next, > > btw) > > It's for net-next (I'll resend with the correct mark), so I'll stick > with the current implementation. Absolutely, sounds good. > > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen- > > netback/netback.c > > > index 80aae3a32c2a..b49b6e56ca47 100644 > > > --- a/drivers/net/xen-netback/netback.c > > > +++ b/drivers/net/xen-netback/netback.c > > > @@ -1105,6 +1105,7 @@ static int xenvif_tx_submit(struct xenvif_queue > > *queue) > > > struct xen_netif_tx_request *txp; > > > u16 pending_idx; > > > unsigned data_len; > > > + bool th_set; > > > > > > pending_idx = XENVIF_TX_CB(skb)->pending_idx; > > > txp = &queue->pending_tx_info[pending_idx].req; > > > @@ -1169,20 +1170,22 @@ static int xenvif_tx_submit(struct xenvif_queue > > *queue) > > > continue; > > > } > > > > > > - skb_probe_transport_header(skb, 0); > > > + th_set = skb_try_probe_transport_header(skb); > > > > Can use skb_transport_header_was_set(). Then at least there is no need > > to change the function's return value. > > I suppose this comment relates to the previous one, and if we do it for > net-next, it's fine to make change I made, isn't it? If this is the only reason for the boolean return value, using skb_transport_header_was_set() is more standard (I immediately know what's happening when I read it), slightly less code change and avoids introducing a situation where the majority of callers ignore a return value. I think it's preferable. But these merits are certainly debatable, so either is fine.
On Thu, Jan 17, 2019 at 4:10 AM Maxim Mikityanskiy <maximmi@mellanox.com> wrote: > > > > > +static void packet_parse_headers(struct sk_buff *skb, struct socket > > *sock) > > > > +{ > > > > + if (!skb->protocol && sock->type == SOCK_RAW) { > > > > + skb_reset_mac_header(skb); > > > > + skb->protocol = dev_parse_header_protocol(skb); > > > > + } > > > > + > > > > + skb_try_probe_transport_header(skb); > > > > +} > > > > > > > > > In relation to the discussion at > > > > > > af_packet: fix raw sockets over 6in4 tunnel > > > http://patchwork.ozlabs.org/patch/1023623/ > > > > > > if adding a new header_ops callback to parse link layer headers, > > > please have it return both protocol and link layer header length. > > Sorry, I miss the point here, can you elaborate more? If all you need is > to have some header_ops callback that returns the L2 header length, > there is one already, it's called parse. Or do you have a specific > reason why you want my callback to also return the header length? The main reason is to avoid multiple indirect function calls, both essentially doing the same: parsing the ll header. But admittedly the instances where dev->header_ops->validate are called are rare. > > This could just be an extension of existing header_ops->validate. > > If you suggest extending an existing function, parse looks more > suitable, but I decided not to touch the existing ones for two reasons: > > 1. I don't want to break the existing code that uses the parse function > and will need to be modified to pass an extra parameter. > > 2. I don't want to spend machine time on copying the destination MAC > when I only need the protocol, and vice versa. > > I'm looking forward to hearing your thoughts about it. header_ops.parse is also a good candidate. As a matter of fact, parse and validate could (eventually) probably be combined. The only direct caller to header_ops.parse appears to be dev_parse_header, so modifying its interface should be fairly straightforward. Allowing a NULL haddr could avoid the address copy cost if not needed. This does require modifying all implementations. But from a quick scan, there appear to be only 8. And only 1 for validate. So changing the implementation is quite acceptable. Another issue, though, would be what to return as protocol if a header does not encode that. Given these non-trivial changes, if you prefer to just add the dedicated new callback, that's fine. We can see independently whether deduplication makes sense. With three ll header parse functions, I think that it will be. But even if so, it is better to do so as a stand-alone noop patch than combining refactoring and new features, anyway. Long story short, sounds good. Thanks.
> -----Original Message-----
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Sent: 17 January, 2019 17:16
> To: Maxim Mikityanskiy <maximmi@mellanox.com>
> Cc: David S. Miller <davem@davemloft.net>; Saeed Mahameed
> <saeedm@mellanox.com>; Willem de Bruijn <willemb@google.com>; Jason Wang
> <jasowang@redhat.com>; Eric Dumazet <edumazet@google.com>;
> netdev@vger.kernel.org; Eran Ben Elisha <eranbe@mellanox.com>; Tariq Toukan
> <tariqt@mellanox.com>
> Subject: Re: [PATCH 1/7] net: Don't set transport offset to invalid value
>
> On Thu, Jan 17, 2019 at 4:10 AM Maxim Mikityanskiy <maximmi@mellanox.com>
> wrote:
> >
> > > This is a lot of code change. This would do.
> > >
> > > @@ -2434,8 +2434,6 @@ static inline void
> > > skb_probe_transport_header(struct sk_buff *skb,
> > >
> > > if (skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0,
> 0))
> > > skb_set_transport_header(skb, keys.control.thoff);
> > > - else
> > > - skb_set_transport_header(skb, offset_hint);
> > > }
> > >
> > > Though leaving an unused argument is a bit ugly. For net-next, indeed
> > > better to clean up (please mark your patchset with net or net-next,
> > > btw)
> >
> > It's for net-next (I'll resend with the correct mark), so I'll stick
> > with the current implementation.
>
> Absolutely, sounds good.
>
> > > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-
> > > netback/netback.c
> > > > index 80aae3a32c2a..b49b6e56ca47 100644
> > > > --- a/drivers/net/xen-netback/netback.c
> > > > +++ b/drivers/net/xen-netback/netback.c
> > > > @@ -1105,6 +1105,7 @@ static int xenvif_tx_submit(struct xenvif_queue
> > > *queue)
> > > > struct xen_netif_tx_request *txp;
> > > > u16 pending_idx;
> > > > unsigned data_len;
> > > > + bool th_set;
> > > >
> > > > pending_idx = XENVIF_TX_CB(skb)->pending_idx;
> > > > txp = &queue->pending_tx_info[pending_idx].req;
> > > > @@ -1169,20 +1170,22 @@ static int xenvif_tx_submit(struct
> xenvif_queue
> > > *queue)
> > > > continue;
> > > > }
> > > >
> > > > - skb_probe_transport_header(skb, 0);
> > > > + th_set = skb_try_probe_transport_header(skb);
> > >
> > > Can use skb_transport_header_was_set(). Then at least there is no need
> > > to change the function's return value.
> >
> > I suppose this comment relates to the previous one, and if we do it for
> > net-next, it's fine to make change I made, isn't it?
>
> If this is the only reason for the boolean return value, using
> skb_transport_header_was_set() is more standard (I immediately know
> what's happening when I read it), slightly less code change and avoids
> introducing a situation where the majority of callers ignore a return
> value. I think it's preferable. But these merits are certainly
> debatable, so either is fine.
From my side, I wanted to avoid calling skb_transport_header_was_set
twice, so I made skb_try_probe_transport_header return whether it
succeeded or not. I think "try" in the function name indicates this idea
pretty clearly. This result status is pretty useful, it just happened
that it's not needed in many places, but the general idea is that we
report this status, so if you say that my version is also good for you,
I'll leave it as is. It was just a rationale for my decision.
From my perspective, after the discussion we had with Willem, the
current version of this series can be merged to net-next.
Willem, do you approve it?
Thanks for reviewing!
> -----Original Message-----
> From: Maxim Mikityanskiy
> Sent: 14 January, 2019 15:19
> To: David S. Miller <davem@davemloft.net>; Saeed Mahameed
> <saeedm@mellanox.com>; Willem de Bruijn <willemb@google.com>; Jason Wang
> <jasowang@redhat.com>; Eric Dumazet <edumazet@google.com>
> Cc: netdev@vger.kernel.org; Eran Ben Elisha <eranbe@mellanox.com>; Tariq
> Toukan <tariqt@mellanox.com>; Maxim Mikityanskiy <maximmi@mellanox.com>
> Subject: [PATCH 0/7] AF_PACKET transport_offset fix
>
> This patch series contains the implementation of the RFC that was posted
> on this mailing list previously:
> https://www.spinics.net/lists/netdev/msg541709.html
>
> It fixes having incorrect skb->transport_header values in cases when
> dissect fails. Having correct values set by the kernel fixes mlx5
> operation and allows to remove some unnecessary code flows in mlx5.
>
> Maxim Mikityanskiy (7):
> net: Don't set transport offset to invalid value
> net: Introduce parse_protocol header_ops callback
> net/ethernet: Add parse_protocol header_ops support
> net/packet: Ask driver for protocol if not provided by user
> net/packet: Remove redundant skb->protocol set
> net/mlx5e: Remove the wrong assumption about transport offset
> net/mlx5e: Trust kernel regarding transport offset
>
> .../net/ethernet/mellanox/mlx5/core/en_tx.c | 15 ++---------
> drivers/net/tap.c | 4 +--
> drivers/net/tun.c | 4 +--
> drivers/net/xen-netback/netback.c | 19 ++++++++------
> include/linux/etherdevice.h | 1 +
> include/linux/netdevice.h | 10 +++++++
> include/linux/skbuff.h | 14 +++++-----
> net/ethernet/eth.c | 13 ++++++++++
> net/packet/af_packet.c | 26 +++++++++----------
> 9 files changed, 60 insertions(+), 46 deletions(-)
>
> --
> 2.19.1
On Wed, Jan 23, 2019 at 5:17 AM Maxim Mikityanskiy <maximmi@mellanox.com> wrote: > > > -----Original Message----- > > From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> > > Sent: 17 January, 2019 17:16 > > To: Maxim Mikityanskiy <maximmi@mellanox.com> > > Cc: David S. Miller <davem@davemloft.net>; Saeed Mahameed > > <saeedm@mellanox.com>; Willem de Bruijn <willemb@google.com>; Jason Wang > > <jasowang@redhat.com>; Eric Dumazet <edumazet@google.com>; > > netdev@vger.kernel.org; Eran Ben Elisha <eranbe@mellanox.com>; Tariq Toukan > > <tariqt@mellanox.com> > > Subject: Re: [PATCH 1/7] net: Don't set transport offset to invalid value > > > > On Thu, Jan 17, 2019 at 4:10 AM Maxim Mikityanskiy <maximmi@mellanox.com> > > wrote: > > > > > > > This is a lot of code change. This would do. > > > > > > > > @@ -2434,8 +2434,6 @@ static inline void > > > > skb_probe_transport_header(struct sk_buff *skb, > > > > > > > > if (skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0, > > 0)) > > > > skb_set_transport_header(skb, keys.control.thoff); > > > > - else > > > > - skb_set_transport_header(skb, offset_hint); > > > > } > > > > > > > > Though leaving an unused argument is a bit ugly. For net-next, indeed > > > > better to clean up (please mark your patchset with net or net-next, > > > > btw) > > > > > > It's for net-next (I'll resend with the correct mark), so I'll stick > > > with the current implementation. > > > > Absolutely, sounds good. > > > > > > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen- > > > > netback/netback.c > > > > > index 80aae3a32c2a..b49b6e56ca47 100644 > > > > > --- a/drivers/net/xen-netback/netback.c > > > > > +++ b/drivers/net/xen-netback/netback.c > > > > > @@ -1105,6 +1105,7 @@ static int xenvif_tx_submit(struct xenvif_queue > > > > *queue) > > > > > struct xen_netif_tx_request *txp; > > > > > u16 pending_idx; > > > > > unsigned data_len; > > > > > + bool th_set; > > > > > > > > > > pending_idx = XENVIF_TX_CB(skb)->pending_idx; > > > > > txp = &queue->pending_tx_info[pending_idx].req; > > > > > @@ -1169,20 +1170,22 @@ static int xenvif_tx_submit(struct > > xenvif_queue > > > > *queue) > > > > > continue; > > > > > } > > > > > > > > > > - skb_probe_transport_header(skb, 0); > > > > > + th_set = skb_try_probe_transport_header(skb); > > > > > > > > Can use skb_transport_header_was_set(). Then at least there is no need > > > > to change the function's return value. > > > > > > I suppose this comment relates to the previous one, and if we do it for > > > net-next, it's fine to make change I made, isn't it? > > > > If this is the only reason for the boolean return value, using > > skb_transport_header_was_set() is more standard (I immediately know > > what's happening when I read it), slightly less code change and avoids > > introducing a situation where the majority of callers ignore a return > > value. I think it's preferable. But these merits are certainly > > debatable, so either is fine. > > From my side, I wanted to avoid calling skb_transport_header_was_set > twice, so I made skb_try_probe_transport_header return whether it > succeeded or not. I think "try" in the function name indicates this idea > pretty clearly. This result status is pretty useful, it just happened > that it's not needed in many places, Which is an indication that it's perhaps not needed. > but the general idea is that we > report this status, so if you say that my version is also good for you, > I'll leave it as is. It was just a rationale for my decision. It's fine. But please avoid the code churn in xenvif_tx_submit with to extra indentation. This is equivalent: - if (skb_is_gso(skb)) { + if (skb_is_gso(skb) && th_set) { More fundamentally, the code has the assumption that th_set always holds if skb_is_gso(skb). Why add the check? This is another example that the return value is not really needed.
On Wed, Jan 23, 2019 at 5:19 AM Maxim Mikityanskiy <maximmi@mellanox.com> wrote: > > From my perspective, after the discussion we had with Willem, the > current version of this series can be merged to net-next. net-next has moved forward. The patchset as is no longer applies cleanly. > Willem, do you approve it? > > Thanks for reviewing! > > > -----Original Message----- > > From: Maxim Mikityanskiy > > Sent: 14 January, 2019 15:19 > > To: David S. Miller <davem@davemloft.net>; Saeed Mahameed > > <saeedm@mellanox.com>; Willem de Bruijn <willemb@google.com>; Jason Wang > > <jasowang@redhat.com>; Eric Dumazet <edumazet@google.com> > > Cc: netdev@vger.kernel.org; Eran Ben Elisha <eranbe@mellanox.com>; Tariq > > Toukan <tariqt@mellanox.com>; Maxim Mikityanskiy <maximmi@mellanox.com> > > Subject: [PATCH 0/7] AF_PACKET transport_offset fix > > > > This patch series contains the implementation of the RFC that was posted > > on this mailing list previously: > > https://www.spinics.net/lists/netdev/msg541709.html > > > > It fixes having incorrect skb->transport_header values in cases when > > dissect fails. Having correct values set by the kernel fixes mlx5 > > operation and allows to remove some unnecessary code flows in mlx5. > > > > Maxim Mikityanskiy (7): > > net: Don't set transport offset to invalid value > > net: Introduce parse_protocol header_ops callback > > net/ethernet: Add parse_protocol header_ops support > > net/packet: Ask driver for protocol if not provided by user > > net/packet: Remove redundant skb->protocol set > > net/mlx5e: Remove the wrong assumption about transport offset > > net/mlx5e: Trust kernel regarding transport offset > > > > .../net/ethernet/mellanox/mlx5/core/en_tx.c | 15 ++--------- > > drivers/net/tap.c | 4 +-- > > drivers/net/tun.c | 4 +-- > > drivers/net/xen-netback/netback.c | 19 ++++++++------ > > include/linux/etherdevice.h | 1 + > > include/linux/netdevice.h | 10 +++++++ > > include/linux/skbuff.h | 14 +++++----- > > net/ethernet/eth.c | 13 ++++++++++ > > net/packet/af_packet.c | 26 +++++++++---------- > > 9 files changed, 60 insertions(+), 46 deletions(-) > > > > -- > > 2.19.1 >
On Mon, Jan 14, 2019 at 8:21 AM Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
>
> The previous commit introduced parse_protocol callback which should
> extract the protocol number from the L2 header. Make all Ethernet
> devices support it.
>
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> ---
> include/linux/etherdevice.h | 1 +
> net/ethernet/eth.c | 13 +++++++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
> index 2c0af7b00715..e2f3b21cd72a 100644
> --- a/include/linux/etherdevice.h
> +++ b/include/linux/etherdevice.h
> @@ -44,6 +44,7 @@ int eth_header_cache(const struct neighbour *neigh, struct hh_cache *hh,
> __be16 type);
> void eth_header_cache_update(struct hh_cache *hh, const struct net_device *dev,
> const unsigned char *haddr);
> +__be16 eth_header_parse_protocol(const struct sk_buff *skb);
Does not need to be exposed in the header file or exported.
> > but the general idea is that we > > report this status, so if you say that my version is also good for you, > > I'll leave it as is. It was just a rationale for my decision. > > It's fine. But please avoid the code churn in xenvif_tx_submit > with to extra indentation. This is equivalent: > > - if (skb_is_gso(skb)) { > + if (skb_is_gso(skb) && th_set) { > > More fundamentally, the code has the assumption that th_set > always holds if skb_is_gso(skb). Why add the check? This is > another example that the return value is not really needed. What about if (skb_is_gso(skb)) { BUG_ON(!skb_transport_header_was_set(skb)); ? I think it's cleaner than skipping the action here if dissect failed and propagating a potential bug further. > > > If this is the only reason for the boolean return value, using > > > skb_transport_header_was_set() is more standard (I immediately know > > > what's happening when I read it), slightly less code change and avoids > > > introducing a situation where the majority of callers ignore a return > > > value. I think it's preferable. But these merits are certainly > > > debatable, so either is fine. > > > > From my side, I wanted to avoid calling skb_transport_header_was_set > > twice, so I made skb_try_probe_transport_header return whether it > > succeeded or not. I think "try" in the function name indicates this idea > > pretty clearly. This result status is pretty useful, it just happened > > that it's not needed in many places, > > Which is an indication that it's perhaps not needed. Well, from the point of view of the function, it looks reasonable to notify the caller whether the call was successful or not... You know, many functions return error codes. However, this case is rather special, because it turned out that we don't actually need that error status immediately, but it can be requested much later, and we already have skb_transport_header_was_set for that. So, considering these points, the return value can be removed, as all use cases are "probe now, check later", not "probe now and handle errors immediately".
> -----Original Message-----
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Sent: 23 January, 2019 16:15
> To: Maxim Mikityanskiy <maximmi@mellanox.com>
> Cc: David S. Miller <davem@davemloft.net>; Saeed Mahameed
> <saeedm@mellanox.com>; Willem de Bruijn <willemb@google.com>; Jason Wang
> <jasowang@redhat.com>; Eric Dumazet <edumazet@google.com>;
> netdev@vger.kernel.org; Eran Ben Elisha <eranbe@mellanox.com>; Tariq Toukan
> <tariqt@mellanox.com>
> Subject: Re: [PATCH 3/7] net/ethernet: Add parse_protocol header_ops support
>
> On Mon, Jan 14, 2019 at 8:21 AM Maxim Mikityanskiy <maximmi@mellanox.com>
> wrote:
> >
> > The previous commit introduced parse_protocol callback which should
> > extract the protocol number from the L2 header. Make all Ethernet
> > devices support it.
> >
> > Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> > ---
> > include/linux/etherdevice.h | 1 +
> > net/ethernet/eth.c | 13 +++++++++++++
> > 2 files changed, 14 insertions(+)
> >
> > diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
> > index 2c0af7b00715..e2f3b21cd72a 100644
> > --- a/include/linux/etherdevice.h
> > +++ b/include/linux/etherdevice.h
> > @@ -44,6 +44,7 @@ int eth_header_cache(const struct neighbour *neigh,
> struct hh_cache *hh,
> > __be16 type);
> > void eth_header_cache_update(struct hh_cache *hh, const struct net_device
> *dev,
> > const unsigned char *haddr);
> > +__be16 eth_header_parse_protocol(const struct sk_buff *skb);
>
> Does not need to be exposed in the header file or exported.
Are you sure? All the other Ethernet header_ops callbacks are exported
and declared in the header. I'm not sure about the reason why it is done
in such a way, but my guess is that it will be useful if some driver
decides to replace one callback in header_ops but to use the default
ones for the rest of callbacks. If the default callbacks were not
accessible externally, it wouldn't be possible. So, I'm following the
existing pattern here, and there are reasons for that. What do you think
about it?
On Thu, Jan 24, 2019 at 4:48 AM Maxim Mikityanskiy <maximmi@mellanox.com> wrote: > > > > but the general idea is that we > > > report this status, so if you say that my version is also good for you, > > > I'll leave it as is. It was just a rationale for my decision. > > > > It's fine. But please avoid the code churn in xenvif_tx_submit > > with to extra indentation. This is equivalent: > > > > - if (skb_is_gso(skb)) { > > + if (skb_is_gso(skb) && th_set) { > > > > More fundamentally, the code has the assumption that th_set > > always holds if skb_is_gso(skb). Why add the check? This is > > another example that the return value is not really needed. > > What about > > if (skb_is_gso(skb)) { > BUG_ON(!skb_transport_header_was_set(skb)); > > ? Good idea to validate. Please on error use WARN_ON_ONCE and free the packet as with other errors in this function. BUG_ONs are problematic when a path to them is found. The other option is to stay as close to the current logic as possible in this driver and set the mac header at offset 0 if probe fails. Either through the return value or skb_transport_header_was_set. > I think it's cleaner than skipping the action here if dissect failed and > propagating a potential bug further. > > > > > If this is the only reason for the boolean return value, using > > > > skb_transport_header_was_set() is more standard (I immediately know > > > > what's happening when I read it), slightly less code change and avoids > > > > introducing a situation where the majority of callers ignore a return > > > > value. I think it's preferable. But these merits are certainly > > > > debatable, so either is fine. > > > > > > From my side, I wanted to avoid calling skb_transport_header_was_set > > > twice, so I made skb_try_probe_transport_header return whether it > > > succeeded or not. I think "try" in the function name indicates this idea > > > pretty clearly. This result status is pretty useful, it just happened > > > that it's not needed in many places, > > > > Which is an indication that it's perhaps not needed. > > Well, from the point of view of the function, it looks reasonable to > notify the caller whether the call was successful or not... You know, > many functions return error codes. > > However, this case is rather special, because it turned out that we > don't actually need that error status immediately, but it can be > requested much later, and we already have skb_transport_header_was_set > for that. > > So, considering these points, the return value can be removed, as all > use cases are "probe now, check later", not "probe now and handle errors > immediately".
On Thu, Jan 24, 2019 at 4:48 AM Maxim Mikityanskiy <maximmi@mellanox.com> wrote: > > > -----Original Message----- > > From: Willem de Bruijn <willemdebruijn.kernel@gmail.com> > > Sent: 23 January, 2019 16:15 > > To: Maxim Mikityanskiy <maximmi@mellanox.com> > > Cc: David S. Miller <davem@davemloft.net>; Saeed Mahameed > > <saeedm@mellanox.com>; Willem de Bruijn <willemb@google.com>; Jason Wang > > <jasowang@redhat.com>; Eric Dumazet <edumazet@google.com>; > > netdev@vger.kernel.org; Eran Ben Elisha <eranbe@mellanox.com>; Tariq Toukan > > <tariqt@mellanox.com> > > Subject: Re: [PATCH 3/7] net/ethernet: Add parse_protocol header_ops support > > > > On Mon, Jan 14, 2019 at 8:21 AM Maxim Mikityanskiy <maximmi@mellanox.com> > > wrote: > > > > > > The previous commit introduced parse_protocol callback which should > > > extract the protocol number from the L2 header. Make all Ethernet > > > devices support it. > > > > > > Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> > > > --- > > > include/linux/etherdevice.h | 1 + > > > net/ethernet/eth.c | 13 +++++++++++++ > > > 2 files changed, 14 insertions(+) > > > > > > diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h > > > index 2c0af7b00715..e2f3b21cd72a 100644 > > > --- a/include/linux/etherdevice.h > > > +++ b/include/linux/etherdevice.h > > > @@ -44,6 +44,7 @@ int eth_header_cache(const struct neighbour *neigh, > > struct hh_cache *hh, > > > __be16 type); > > > void eth_header_cache_update(struct hh_cache *hh, const struct net_device > > *dev, > > > const unsigned char *haddr); > > > +__be16 eth_header_parse_protocol(const struct sk_buff *skb); > > > > Does not need to be exposed in the header file or exported. > > Are you sure? All the other Ethernet header_ops callbacks are exported > and declared in the header. I'm not sure about the reason why it is done > in such a way, but my guess is that it will be useful if some driver > decides to replace one callback in header_ops but to use the default > ones for the rest of callbacks. I don't exactly follow this. But I think that many are exported because Ethernet is so common that of these are also called directly instead of through header_ops. Looking at other header_ops implementations, or other such callback structs, shows many examples where the members are static local functions. > If the default callbacks were not > accessible externally, it wouldn't be possible. So, I'm following the > existing pattern here, and there are reasons for that. What do you think > about it?
> -----Original Message-----
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Sent: 24 January, 2019 16:22
> To: Maxim Mikityanskiy <maximmi@mellanox.com>
> Cc: David S. Miller <davem@davemloft.net>; Saeed Mahameed
> <saeedm@mellanox.com>; Willem de Bruijn <willemb@google.com>; Jason Wang
> <jasowang@redhat.com>; Eric Dumazet <edumazet@google.com>;
> netdev@vger.kernel.org; Eran Ben Elisha <eranbe@mellanox.com>; Tariq Toukan
> <tariqt@mellanox.com>
> Subject: Re: [PATCH 3/7] net/ethernet: Add parse_protocol header_ops support
>
> On Thu, Jan 24, 2019 at 4:48 AM Maxim Mikityanskiy <maximmi@mellanox.com>
> wrote:
> >
> > > -----Original Message-----
> > > From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> > > Sent: 23 January, 2019 16:15
> > > To: Maxim Mikityanskiy <maximmi@mellanox.com>
> > > Cc: David S. Miller <davem@davemloft.net>; Saeed Mahameed
> > > <saeedm@mellanox.com>; Willem de Bruijn <willemb@google.com>; Jason Wang
> > > <jasowang@redhat.com>; Eric Dumazet <edumazet@google.com>;
> > > netdev@vger.kernel.org; Eran Ben Elisha <eranbe@mellanox.com>; Tariq
> Toukan
> > > <tariqt@mellanox.com>
> > > Subject: Re: [PATCH 3/7] net/ethernet: Add parse_protocol header_ops
> support
> > >
> > > On Mon, Jan 14, 2019 at 8:21 AM Maxim Mikityanskiy
> <maximmi@mellanox.com>
> > > wrote:
> > > >
> > > > The previous commit introduced parse_protocol callback which should
> > > > extract the protocol number from the L2 header. Make all Ethernet
> > > > devices support it.
> > > >
> > > > Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> > > > ---
> > > > include/linux/etherdevice.h | 1 +
> > > > net/ethernet/eth.c | 13 +++++++++++++
> > > > 2 files changed, 14 insertions(+)
> > > >
> > > > diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
> > > > index 2c0af7b00715..e2f3b21cd72a 100644
> > > > --- a/include/linux/etherdevice.h
> > > > +++ b/include/linux/etherdevice.h
> > > > @@ -44,6 +44,7 @@ int eth_header_cache(const struct neighbour *neigh,
> > > struct hh_cache *hh,
> > > > __be16 type);
> > > > void eth_header_cache_update(struct hh_cache *hh, const struct
> net_device
> > > *dev,
> > > > const unsigned char *haddr);
> > > > +__be16 eth_header_parse_protocol(const struct sk_buff *skb);
> > >
> > > Does not need to be exposed in the header file or exported.
> >
> > Are you sure? All the other Ethernet header_ops callbacks are exported
> > and declared in the header. I'm not sure about the reason why it is done
> > in such a way, but my guess is that it will be useful if some driver
> > decides to replace one callback in header_ops but to use the default
> > ones for the rest of callbacks.
>
> I don't exactly follow this. But I think that many are exported
> because Ethernet is so common that of these are also called directly
> instead of through header_ops. Looking at other header_ops
> implementations, or other such callback structs, shows many examples
> where the members are static local functions.
Yes, they are called directly indeed, but not all of them. E.g.,
eth_header_parse is never called directly. On the other hand, look at
drivers/net/macvlan.c:
static const struct header_ops macvlan_hard_header_ops = {
.create = macvlan_hard_header,
.parse = eth_header_parse,
.cache = eth_header_cache,
.cache_update = eth_header_cache_update,
};
This is exactly what I am talking about. In order to support it,
eth_header_parse_protocol needs to be exported. BTW, we should consider
adding it to macvlan_hard_header_ops, ipvlan_header_ops and all other
such structures.
> > > > > diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
> > > > > index 2c0af7b00715..e2f3b21cd72a 100644
> > > > > --- a/include/linux/etherdevice.h
> > > > > +++ b/include/linux/etherdevice.h
> > > > > @@ -44,6 +44,7 @@ int eth_header_cache(const struct neighbour *neigh,
> > > > struct hh_cache *hh,
> > > > > __be16 type);
> > > > > void eth_header_cache_update(struct hh_cache *hh, const struct
> > net_device
> > > > *dev,
> > > > > const unsigned char *haddr);
> > > > > +__be16 eth_header_parse_protocol(const struct sk_buff *skb);
> > > >
> > > > Does not need to be exposed in the header file or exported.
> > >
> > > Are you sure? All the other Ethernet header_ops callbacks are exported
> > > and declared in the header. I'm not sure about the reason why it is done
> > > in such a way, but my guess is that it will be useful if some driver
> > > decides to replace one callback in header_ops but to use the default
> > > ones for the rest of callbacks.
> >
> > I don't exactly follow this. But I think that many are exported
> > because Ethernet is so common that of these are also called directly
> > instead of through header_ops. Looking at other header_ops
> > implementations, or other such callback structs, shows many examples
> > where the members are static local functions.
>
> Yes, they are called directly indeed, but not all of them. E.g.,
> eth_header_parse is never called directly. On the other hand, look at
> drivers/net/macvlan.c:
>
> static const struct header_ops macvlan_hard_header_ops = {
> .create = macvlan_hard_header,
> .parse = eth_header_parse,
> .cache = eth_header_cache,
> .cache_update = eth_header_cache_update,
> };
>
> This is exactly what I am talking about. In order to support it,
> eth_header_parse_protocol needs to be exported. BTW, we should consider
> adding it to macvlan_hard_header_ops, ipvlan_header_ops and all other
> such structures.
Very good point. Okay, export it is then.