* [PATCH net-next v3 0/3] tun/tap: use kfree_skb_reason() to trace dropped skb
@ 2022-02-21 5:34 Dongli Zhang
2022-02-21 5:34 ` [PATCH net-next v3 1/4] skbuff: introduce kfree_skb_list_reason() Dongli Zhang
` (4 more replies)
0 siblings, 5 replies; 18+ messages in thread
From: Dongli Zhang @ 2022-02-21 5:34 UTC (permalink / raw)
To: netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, dsahern, edumazet
The commit c504e5c2f964 ("net: skb: introduce kfree_skb_reason()") has
introduced the kfree_skb_reason() to help track the reason.
The tun and tap are commonly used as virtio-net/vhost-net backend. This is to
use kfree_skb_reason() to trace the dropped skb for those two drivers.
Changed since v1:
- I have renamed many of the reasons since v1. I make them as generic as
possible so that they can be re-used by core networking and drivers.
Changed since v2:
- declare drop_reason as type "enum skb_drop_reason"
- handle the drop in skb_list_walk_safe() case for tap driver, and
kfree_skb_list_reason() is introduced
The following reasons are introduced.
- SKB_DROP_REASON_SKB_CSUM
This is used whenever there is checksum error with sk_buff.
- SKB_DROP_REASON_SKB_COPY_DATA
The kernel may (zero) copy the data to or from sk_buff, e.g.,
zerocopy_sg_from_iter(), skb_copy_datagram_from_iter() and
skb_orphan_frags_rx(). This reason is for the copy related error.
- SKB_DROP_REASON_SKB_GSO_SEG
Any error reported when GSO processing the sk_buff. It is frequent to process
sk_buff gso data and we introduce a new reason to handle that.
- SKB_DROP_REASON_SKB_PULL
- SKB_DROP_REASON_SKB_TRIM
It is frequent to pull to sk_buff data or trim the sk_buff data.
- SKB_DROP_REASON_DEV_HDR
Any driver may report error if there is any error in the metadata on the DMA
ring buffer.
- SKB_DROP_REASON_DEV_READY
The device is not ready/online or initialized to receive data.
- SKB_DROP_REASON_DEV_FILTER
David Ahern suggested SKB_DROP_REASON_TAP_FILTER. I changed from 'TAP' to 'DEV'
to make it more generic.
- SKB_DROP_REASON_FULL_RING
Suggested by Eric Dumazet.
- SKB_DROP_REASON_BPF_FILTER
Dropped by ebpf filter
This is the output for TUN device.
# cat /sys/kernel/debug/tracing/trace_pipe
<idle>-0 [018] ..s1. 1478.130490: kfree_skb: skbaddr=00000000c4f21b8d protocol=0 location=00000000aff342c7 reason: NOT_SPECIFIED
vhost-9003-9020 [012] b..1. 1478.196264: kfree_skb: skbaddr=00000000b174fb9b protocol=2054 location=000000001cf38db0 reason: FULL_RING
arping-9639 [018] b..1. 1479.082993: kfree_skb: skbaddr=00000000c4f21b8d protocol=2054 location=000000001cf38db0 reason: FULL_RING
<idle>-0 [012] b.s3. 1479.110472: kfree_skb: skbaddr=00000000e0c3681f protocol=4 location=000000001cf38db0 reason: FULL_RING
arping-9639 [018] b..1. 1480.083086: kfree_skb: skbaddr=00000000c4f21b8d protocol=2054 location=000000001cf38db0 reason: FULL_RING
This is the output for TAP device.
# cat /sys/kernel/debug/tracing/trace_pipe
<idle>-0 [014] ..s1. 1096.418621: kfree_skb: skbaddr=00000000f8f41946 protocol=0 location=00000000aff342c7 reason: NOT_SPECIFIED
arping-7006 [001] ..s1. 1096.843961: kfree_skb: skbaddr=000000002ec803a8 protocol=2054 location=000000009a57b32f reason: FULL_RING
arping-7006 [001] ..s1. 1097.844035: kfree_skb: skbaddr=000000002ec803a8 protocol=2054 location=000000009a57b32f reason: FULL_RING
arping-7006 [001] ..s1. 1098.844102: kfree_skb: skbaddr=00000000295eb0da protocol=2054 location=000000009a57b32f reason: FULL_RING
arping-7006 [001] ..s1. 1099.844160: kfree_skb: skbaddr=00000000295eb0da protocol=2054 location=000000009a57b32f reason: FULL_RING
arping-7006 [001] ..s1. 1100.844214: kfree_skb: skbaddr=00000000295eb0da protocol=2054 location=000000009a57b32f reason: FULL_RING
arping-7006 [001] ..s1. 1101.844230: kfree_skb: skbaddr=00000000295eb0da protocol=2054 location=000000009a57b32f reason: FULL_RING
drivers/net/tap.c | 35 +++++++++++++++++++++++++----------
drivers/net/tun.c | 38 ++++++++++++++++++++++++++++++--------
include/linux/skbuff.h | 18 ++++++++++++++++++
include/trace/events/skb.h | 10 ++++++++++
net/core/skbuff.c | 11 +++++++++--
5 files changed, 92 insertions(+), 20 deletions(-)
Please let me know if there is any suggestion on the definition of reasons.
Thank you very much!
Dongli Zhang
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH net-next v3 1/4] skbuff: introduce kfree_skb_list_reason()
2022-02-21 5:34 [PATCH net-next v3 0/3] tun/tap: use kfree_skb_reason() to trace dropped skb Dongli Zhang
@ 2022-02-21 5:34 ` Dongli Zhang
2022-02-22 3:20 ` David Ahern
2022-02-21 5:34 ` [PATCH net-next v3 2/4] net: tap: track dropped skb via kfree_skb_reason() Dongli Zhang
` (3 subsequent siblings)
4 siblings, 1 reply; 18+ messages in thread
From: Dongli Zhang @ 2022-02-21 5:34 UTC (permalink / raw)
To: netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, dsahern, edumazet
This is to introduce kfree_skb_list_reason() to drop a list of sk_buff with
a specific reason.
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
include/linux/skbuff.h | 2 ++
net/core/skbuff.c | 11 +++++++++--
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a3e90ef..87ebe2f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1176,6 +1176,8 @@ static inline void kfree_skb(struct sk_buff *skb)
}
void skb_release_head_state(struct sk_buff *skb);
+void kfree_skb_list_reason(struct sk_buff *segs,
+ enum skb_drop_reason reason);
void kfree_skb_list(struct sk_buff *segs);
void skb_dump(const char *level, const struct sk_buff *skb, bool full_pkt);
void skb_tx_error(struct sk_buff *skb);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9d0388be..dfdd71e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -777,15 +777,22 @@ void kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason)
}
EXPORT_SYMBOL(kfree_skb_reason);
-void kfree_skb_list(struct sk_buff *segs)
+void kfree_skb_list_reason(struct sk_buff *segs,
+ enum skb_drop_reason reason)
{
while (segs) {
struct sk_buff *next = segs->next;
- kfree_skb(segs);
+ kfree_skb_reason(segs, reason);
segs = next;
}
}
+EXPORT_SYMBOL(kfree_skb_list_reason);
+
+void kfree_skb_list(struct sk_buff *segs)
+{
+ kfree_skb_list_reason(segs, SKB_DROP_REASON_NOT_SPECIFIED);
+}
EXPORT_SYMBOL(kfree_skb_list);
/* Dump skb information and contents.
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next v3 2/4] net: tap: track dropped skb via kfree_skb_reason()
2022-02-21 5:34 [PATCH net-next v3 0/3] tun/tap: use kfree_skb_reason() to trace dropped skb Dongli Zhang
2022-02-21 5:34 ` [PATCH net-next v3 1/4] skbuff: introduce kfree_skb_list_reason() Dongli Zhang
@ 2022-02-21 5:34 ` Dongli Zhang
2022-02-22 3:24 ` David Ahern
2022-02-21 5:34 ` [PATCH net-next v3 3/4] net: tun: split run_ebpf_filter() and pskb_trim() into different "if statement" Dongli Zhang
` (2 subsequent siblings)
4 siblings, 1 reply; 18+ messages in thread
From: Dongli Zhang @ 2022-02-21 5:34 UTC (permalink / raw)
To: netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, dsahern, edumazet
The TAP can be used as vhost-net backend. E.g., the tap_handle_frame() is
the interface to forward the skb from TAP to vhost-net/virtio-net.
However, there are many "goto drop" in the TAP driver. Therefore, the
kfree_skb_reason() is involved at each "goto drop" to help userspace
ftrace/ebpf to track the reason for the loss of packets.
The below reasons are introduced:
- SKB_DROP_REASON_SKB_CSUM
- SKB_DROP_REASON_SKB_COPY_DATA
- SKB_DROP_REASON_SKB_GSO_SEG
- SKB_DROP_REASON_DEV_HDR
- SKB_DROP_REASON_FULL_RING
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v1:
- revise the reason name
Changed since v2:
- declare drop_reason as type "enum skb_drop_reason"
- handle the drop in skb_list_walk_safe() case
drivers/net/tap.c | 35 +++++++++++++++++++++++++----------
include/linux/skbuff.h | 9 +++++++++
include/trace/events/skb.h | 5 +++++
3 files changed, 39 insertions(+), 10 deletions(-)
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 8e3a28b..b48f519 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -322,6 +322,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
struct tap_dev *tap;
struct tap_queue *q;
netdev_features_t features = TAP_FEATURES;
+ enum skb_drop_reason drop_reason;
tap = tap_dev_get_rcu(dev);
if (!tap)
@@ -343,12 +344,16 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
struct sk_buff *segs = __skb_gso_segment(skb, features, false);
struct sk_buff *next;
- if (IS_ERR(segs))
+ if (IS_ERR(segs)) {
+ drop_reason = SKB_DROP_REASON_SKB_GSO_SEG;
goto drop;
+ }
if (!segs) {
- if (ptr_ring_produce(&q->ring, skb))
+ if (ptr_ring_produce(&q->ring, skb)) {
+ drop_reason = SKB_DROP_REASON_FULL_RING;
goto drop;
+ }
goto wake_up;
}
@@ -356,8 +361,9 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
skb_list_walk_safe(segs, skb, next) {
skb_mark_not_on_list(skb);
if (ptr_ring_produce(&q->ring, skb)) {
- kfree_skb(skb);
- kfree_skb_list(next);
+ drop_reason = SKB_DROP_REASON_FULL_RING;
+ kfree_skb_reason(skb, drop_reason);
+ kfree_skb_list_reason(next, drop_reason);
break;
}
}
@@ -369,10 +375,14 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
*/
if (skb->ip_summed == CHECKSUM_PARTIAL &&
!(features & NETIF_F_CSUM_MASK) &&
- skb_checksum_help(skb))
+ skb_checksum_help(skb)) {
+ drop_reason = SKB_DROP_REASON_SKB_CSUM;
goto drop;
- if (ptr_ring_produce(&q->ring, skb))
+ }
+ if (ptr_ring_produce(&q->ring, skb)) {
+ drop_reason = SKB_DROP_REASON_FULL_RING;
goto drop;
+ }
}
wake_up:
@@ -383,7 +393,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
/* Count errors/drops only here, thus don't care about args. */
if (tap->count_rx_dropped)
tap->count_rx_dropped(tap);
- kfree_skb(skb);
+ kfree_skb_reason(skb, drop_reason);
return RX_HANDLER_CONSUMED;
}
EXPORT_SYMBOL_GPL(tap_handle_frame);
@@ -632,6 +642,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
int depth;
bool zerocopy = false;
size_t linear;
+ enum skb_drop_reason drop_reason;
if (q->flags & IFF_VNET_HDR) {
vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz);
@@ -696,8 +707,10 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
else
err = skb_copy_datagram_from_iter(skb, 0, from, len);
- if (err)
+ if (err) {
+ drop_reason = SKB_DROP_REASON_SKB_COPY_DATA;
goto err_kfree;
+ }
skb_set_network_header(skb, ETH_HLEN);
skb_reset_mac_header(skb);
@@ -706,8 +719,10 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
if (vnet_hdr_len) {
err = virtio_net_hdr_to_skb(skb, &vnet_hdr,
tap_is_little_endian(q));
- if (err)
+ if (err) {
+ drop_reason = SKB_DROP_REASON_DEV_HDR;
goto err_kfree;
+ }
}
skb_probe_transport_header(skb);
@@ -738,7 +753,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
return total_len;
err_kfree:
- kfree_skb(skb);
+ kfree_skb_reason(skb, drop_reason);
err:
rcu_read_lock();
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 87ebe2f..52550c7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -380,6 +380,15 @@ enum skb_drop_reason {
* the ofo queue, corresponding to
* LINUX_MIB_TCPOFOMERGE
*/
+ SKB_DROP_REASON_SKB_CSUM, /* sk_buff checksum error */
+ SKB_DROP_REASON_SKB_COPY_DATA, /* failed to copy data from or to
+ * sk_buff
+ */
+ SKB_DROP_REASON_SKB_GSO_SEG, /* gso segmentation error */
+ SKB_DROP_REASON_DEV_HDR, /* there is something wrong with
+ * device driver specific header
+ */
+ SKB_DROP_REASON_FULL_RING, /* ring buffer is full */
SKB_DROP_REASON_MAX,
};
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 2ab7193..5b5f135 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -37,6 +37,11 @@
EM(SKB_DROP_REASON_TCP_OLD_DATA, TCP_OLD_DATA) \
EM(SKB_DROP_REASON_TCP_OVERWINDOW, TCP_OVERWINDOW) \
EM(SKB_DROP_REASON_TCP_OFOMERGE, TCP_OFOMERGE) \
+ EM(SKB_DROP_REASON_SKB_CSUM, SKB_CSUM) \
+ EM(SKB_DROP_REASON_SKB_COPY_DATA, SKB_COPY_DATA) \
+ EM(SKB_DROP_REASON_SKB_GSO_SEG, SKB_GSO_SEG) \
+ EM(SKB_DROP_REASON_DEV_HDR, DEV_HDR) \
+ EM(SKB_DROP_REASON_FULL_RING, FULL_RING) \
EMe(SKB_DROP_REASON_MAX, MAX)
#undef EM
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next v3 3/4] net: tun: split run_ebpf_filter() and pskb_trim() into different "if statement"
2022-02-21 5:34 [PATCH net-next v3 0/3] tun/tap: use kfree_skb_reason() to trace dropped skb Dongli Zhang
2022-02-21 5:34 ` [PATCH net-next v3 1/4] skbuff: introduce kfree_skb_list_reason() Dongli Zhang
2022-02-21 5:34 ` [PATCH net-next v3 2/4] net: tap: track dropped skb via kfree_skb_reason() Dongli Zhang
@ 2022-02-21 5:34 ` Dongli Zhang
2022-02-22 3:28 ` David Ahern
2022-02-21 5:34 ` [PATCH net-next v3 4/4] net: tun: track dropped skb via kfree_skb_reason() Dongli Zhang
2022-02-21 22:53 ` [PATCH net-next v3 0/3] tun/tap: use kfree_skb_reason() to trace dropped skb Dongli Zhang
4 siblings, 1 reply; 18+ messages in thread
From: Dongli Zhang @ 2022-02-21 5:34 UTC (permalink / raw)
To: netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, dsahern, edumazet
No functional change.
Just to split the if statement into different conditions to use
kfree_skb_reason() to trace the reason later.
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
drivers/net/tun.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index fed8544..aa27268 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1086,7 +1086,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
goto drop;
len = run_ebpf_filter(tun, skb, len);
- if (len == 0 || pskb_trim(skb, len))
+ if (len == 0)
+ goto drop;
+
+ if (pskb_trim(skb, len))
goto drop;
if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC)))
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH net-next v3 4/4] net: tun: track dropped skb via kfree_skb_reason()
2022-02-21 5:34 [PATCH net-next v3 0/3] tun/tap: use kfree_skb_reason() to trace dropped skb Dongli Zhang
` (2 preceding siblings ...)
2022-02-21 5:34 ` [PATCH net-next v3 3/4] net: tun: split run_ebpf_filter() and pskb_trim() into different "if statement" Dongli Zhang
@ 2022-02-21 5:34 ` Dongli Zhang
2022-02-22 3:28 ` David Ahern
2022-02-21 22:53 ` [PATCH net-next v3 0/3] tun/tap: use kfree_skb_reason() to trace dropped skb Dongli Zhang
4 siblings, 1 reply; 18+ messages in thread
From: Dongli Zhang @ 2022-02-21 5:34 UTC (permalink / raw)
To: netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, dsahern, edumazet
The TUN can be used as vhost-net backend. E.g, the tun_net_xmit() is the
interface to forward the skb from TUN to vhost-net/virtio-net.
However, there are many "goto drop" in the TUN driver. Therefore, the
kfree_skb_reason() is involved at each "goto drop" to help userspace
ftrace/ebpf to track the reason for the loss of packets.
The below reasons are introduced:
- SKB_DROP_REASON_SKB_PULL
- SKB_DROP_REASON_SKB_TRIM
- SKB_DROP_REASON_DEV_READY
- SKB_DROP_REASON_DEV_FILTER
- SKB_DROP_REASON_BPF_FILTER
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v1:
- revise the reason name
Changed since v2:
- declare drop_reason as type "enum skb_drop_reason"
drivers/net/tun.c | 37 ++++++++++++++++++++++++++++---------
include/linux/skbuff.h | 7 +++++++
include/trace/events/skb.h | 5 +++++
3 files changed, 40 insertions(+), 9 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index aa27268..bf7d8cd 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1062,13 +1062,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
struct netdev_queue *queue;
struct tun_file *tfile;
int len = skb->len;
+ enum skb_drop_reason drop_reason;
rcu_read_lock();
tfile = rcu_dereference(tun->tfiles[txq]);
/* Drop packet if interface is not attached */
- if (!tfile)
+ if (!tfile) {
+ drop_reason = SKB_DROP_REASON_DEV_READY;
goto drop;
+ }
if (!rcu_dereference(tun->steering_prog))
tun_automq_xmit(tun, skb);
@@ -1078,22 +1081,32 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
/* Drop if the filter does not like it.
* This is a noop if the filter is disabled.
* Filter can be enabled only for the TAP devices. */
- if (!check_filter(&tun->txflt, skb))
+ if (!check_filter(&tun->txflt, skb)) {
+ drop_reason = SKB_DROP_REASON_DEV_FILTER;
goto drop;
+ }
if (tfile->socket.sk->sk_filter &&
- sk_filter(tfile->socket.sk, skb))
+ sk_filter(tfile->socket.sk, skb)) {
+ drop_reason = SKB_DROP_REASON_SOCKET_FILTER;
goto drop;
+ }
len = run_ebpf_filter(tun, skb, len);
- if (len == 0)
+ if (len == 0) {
+ drop_reason = SKB_DROP_REASON_BPF_FILTER;
goto drop;
+ }
- if (pskb_trim(skb, len))
+ if (pskb_trim(skb, len)) {
+ drop_reason = SKB_DROP_REASON_SKB_TRIM;
goto drop;
+ }
- if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC)))
+ if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC))) {
+ drop_reason = SKB_DROP_REASON_SKB_COPY_DATA;
goto drop;
+ }
skb_tx_timestamp(skb);
@@ -1104,8 +1117,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
nf_reset_ct(skb);
- if (ptr_ring_produce(&tfile->tx_ring, skb))
+ if (ptr_ring_produce(&tfile->tx_ring, skb)) {
+ drop_reason = SKB_DROP_REASON_FULL_RING;
goto drop;
+ }
/* NETIF_F_LLTX requires to do our own update of trans_start */
queue = netdev_get_tx_queue(dev, txq);
@@ -1122,7 +1137,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
drop:
atomic_long_inc(&dev->tx_dropped);
skb_tx_error(skb);
- kfree_skb(skb);
+ kfree_skb_reason(skb, drop_reason);
rcu_read_unlock();
return NET_XMIT_DROP;
}
@@ -1720,6 +1735,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
u32 rxhash = 0;
int skb_xdp = 1;
bool frags = tun_napi_frags_enabled(tfile);
+ enum skb_drop_reason drop_reason;
if (!(tun->flags & IFF_NO_PI)) {
if (len < sizeof(pi))
@@ -1823,9 +1839,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
if (err) {
err = -EFAULT;
+ drop_reason = SKB_DROP_REASON_SKB_COPY_DATA;
drop:
atomic_long_inc(&tun->dev->rx_dropped);
- kfree_skb(skb);
+ kfree_skb_reason(skb, drop_reason);
if (frags) {
tfile->napi.skb = NULL;
mutex_unlock(&tfile->napi_mutex);
@@ -1872,6 +1889,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
case IFF_TAP:
if (frags && !pskb_may_pull(skb, ETH_HLEN)) {
err = -ENOMEM;
+ drop_reason = SKB_DROP_REASON_SKB_PULL;
goto drop;
}
skb->protocol = eth_type_trans(skb, tun->dev);
@@ -1925,6 +1943,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
if (unlikely(!(tun->dev->flags & IFF_UP))) {
err = -EIO;
rcu_read_unlock();
+ drop_reason = SKB_DROP_REASON_DEV_READY;
goto drop;
}
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 52550c7..5850590 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -385,10 +385,17 @@ enum skb_drop_reason {
* sk_buff
*/
SKB_DROP_REASON_SKB_GSO_SEG, /* gso segmentation error */
+ SKB_DROP_REASON_SKB_PULL, /* failed to pull sk_buff data */
+ SKB_DROP_REASON_SKB_TRIM, /* failed to trim sk_buff data */
SKB_DROP_REASON_DEV_HDR, /* there is something wrong with
* device driver specific header
*/
+ SKB_DROP_REASON_DEV_READY, /* device is not ready */
+ SKB_DROP_REASON_DEV_FILTER, /* dropped by device driver
+ * specific filter
+ */
SKB_DROP_REASON_FULL_RING, /* ring buffer is full */
+ SKB_DROP_REASON_BPF_FILTER, /* dropped by ebpf filter */
SKB_DROP_REASON_MAX,
};
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 5b5f135..0db0962 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -40,8 +40,13 @@
EM(SKB_DROP_REASON_SKB_CSUM, SKB_CSUM) \
EM(SKB_DROP_REASON_SKB_COPY_DATA, SKB_COPY_DATA) \
EM(SKB_DROP_REASON_SKB_GSO_SEG, SKB_GSO_SEG) \
+ EM(SKB_DROP_REASON_SKB_PULL, SKB_PULL) \
+ EM(SKB_DROP_REASON_SKB_TRIM, SKB_TRIM) \
EM(SKB_DROP_REASON_DEV_HDR, DEV_HDR) \
+ EM(SKB_DROP_REASON_DEV_READY, DEV_READY) \
+ EM(SKB_DROP_REASON_DEV_FILTER, DEV_FILTER) \
EM(SKB_DROP_REASON_FULL_RING, FULL_RING) \
+ EM(SKB_DROP_REASON_BPF_FILTER, BPF_FILTER) \
EMe(SKB_DROP_REASON_MAX, MAX)
#undef EM
--
1.8.3.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 0/3] tun/tap: use kfree_skb_reason() to trace dropped skb
2022-02-21 5:34 [PATCH net-next v3 0/3] tun/tap: use kfree_skb_reason() to trace dropped skb Dongli Zhang
` (3 preceding siblings ...)
2022-02-21 5:34 ` [PATCH net-next v3 4/4] net: tun: track dropped skb via kfree_skb_reason() Dongli Zhang
@ 2022-02-21 22:53 ` Dongli Zhang
4 siblings, 0 replies; 18+ messages in thread
From: Dongli Zhang @ 2022-02-21 22:53 UTC (permalink / raw)
To: netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, dsahern, edumazet
The subject should be [PATCH net-next v3 0/4] but not [PATCH net-next v3 0/3].
Sorry for the mistake.
Dongli Zhang
On 2/20/22 9:34 PM, Dongli Zhang wrote:
> The commit c504e5c2f964 ("net: skb: introduce kfree_skb_reason()") has
> introduced the kfree_skb_reason() to help track the reason.
>
> The tun and tap are commonly used as virtio-net/vhost-net backend. This is to
> use kfree_skb_reason() to trace the dropped skb for those two drivers.
>
> Changed since v1:
> - I have renamed many of the reasons since v1. I make them as generic as
> possible so that they can be re-used by core networking and drivers.
>
> Changed since v2:
> - declare drop_reason as type "enum skb_drop_reason"
> - handle the drop in skb_list_walk_safe() case for tap driver, and
> kfree_skb_list_reason() is introduced
>
>
> The following reasons are introduced.
>
> - SKB_DROP_REASON_SKB_CSUM
>
> This is used whenever there is checksum error with sk_buff.
>
> - SKB_DROP_REASON_SKB_COPY_DATA
>
> The kernel may (zero) copy the data to or from sk_buff, e.g.,
> zerocopy_sg_from_iter(), skb_copy_datagram_from_iter() and
> skb_orphan_frags_rx(). This reason is for the copy related error.
>
> - SKB_DROP_REASON_SKB_GSO_SEG
>
> Any error reported when GSO processing the sk_buff. It is frequent to process
> sk_buff gso data and we introduce a new reason to handle that.
>
> - SKB_DROP_REASON_SKB_PULL
> - SKB_DROP_REASON_SKB_TRIM
>
> It is frequent to pull to sk_buff data or trim the sk_buff data.
>
> - SKB_DROP_REASON_DEV_HDR
>
> Any driver may report error if there is any error in the metadata on the DMA
> ring buffer.
>
> - SKB_DROP_REASON_DEV_READY
>
> The device is not ready/online or initialized to receive data.
>
> - SKB_DROP_REASON_DEV_FILTER
>
> David Ahern suggested SKB_DROP_REASON_TAP_FILTER. I changed from 'TAP' to 'DEV'
> to make it more generic.
>
> - SKB_DROP_REASON_FULL_RING
>
> Suggested by Eric Dumazet.
>
> - SKB_DROP_REASON_BPF_FILTER
>
> Dropped by ebpf filter
>
>
> This is the output for TUN device.
>
> # cat /sys/kernel/debug/tracing/trace_pipe
> <idle>-0 [018] ..s1. 1478.130490: kfree_skb: skbaddr=00000000c4f21b8d protocol=0 location=00000000aff342c7 reason: NOT_SPECIFIED
> vhost-9003-9020 [012] b..1. 1478.196264: kfree_skb: skbaddr=00000000b174fb9b protocol=2054 location=000000001cf38db0 reason: FULL_RING
> arping-9639 [018] b..1. 1479.082993: kfree_skb: skbaddr=00000000c4f21b8d protocol=2054 location=000000001cf38db0 reason: FULL_RING
> <idle>-0 [012] b.s3. 1479.110472: kfree_skb: skbaddr=00000000e0c3681f protocol=4 location=000000001cf38db0 reason: FULL_RING
> arping-9639 [018] b..1. 1480.083086: kfree_skb: skbaddr=00000000c4f21b8d protocol=2054 location=000000001cf38db0 reason: FULL_RING
>
>
> This is the output for TAP device.
>
> # cat /sys/kernel/debug/tracing/trace_pipe
> <idle>-0 [014] ..s1. 1096.418621: kfree_skb: skbaddr=00000000f8f41946 protocol=0 location=00000000aff342c7 reason: NOT_SPECIFIED
> arping-7006 [001] ..s1. 1096.843961: kfree_skb: skbaddr=000000002ec803a8 protocol=2054 location=000000009a57b32f reason: FULL_RING
> arping-7006 [001] ..s1. 1097.844035: kfree_skb: skbaddr=000000002ec803a8 protocol=2054 location=000000009a57b32f reason: FULL_RING
> arping-7006 [001] ..s1. 1098.844102: kfree_skb: skbaddr=00000000295eb0da protocol=2054 location=000000009a57b32f reason: FULL_RING
> arping-7006 [001] ..s1. 1099.844160: kfree_skb: skbaddr=00000000295eb0da protocol=2054 location=000000009a57b32f reason: FULL_RING
> arping-7006 [001] ..s1. 1100.844214: kfree_skb: skbaddr=00000000295eb0da protocol=2054 location=000000009a57b32f reason: FULL_RING
> arping-7006 [001] ..s1. 1101.844230: kfree_skb: skbaddr=00000000295eb0da protocol=2054 location=000000009a57b32f reason: FULL_RING
>
>
> drivers/net/tap.c | 35 +++++++++++++++++++++++++----------
> drivers/net/tun.c | 38 ++++++++++++++++++++++++++++++--------
> include/linux/skbuff.h | 18 ++++++++++++++++++
> include/trace/events/skb.h | 10 ++++++++++
> net/core/skbuff.c | 11 +++++++++--
> 5 files changed, 92 insertions(+), 20 deletions(-)
>
> Please let me know if there is any suggestion on the definition of reasons.
>
> Thank you very much!
>
> Dongli Zhang
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 1/4] skbuff: introduce kfree_skb_list_reason()
2022-02-21 5:34 ` [PATCH net-next v3 1/4] skbuff: introduce kfree_skb_list_reason() Dongli Zhang
@ 2022-02-22 3:20 ` David Ahern
0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2022-02-22 3:20 UTC (permalink / raw)
To: Dongli Zhang, netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, edumazet
On 2/20/22 10:34 PM, Dongli Zhang wrote:
> This is to introduce kfree_skb_list_reason() to drop a list of sk_buff with
> a specific reason.
>
> Cc: Joao Martins <joao.m.martins@oracle.com>
> Cc: Joe Jin <joe.jin@oracle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> include/linux/skbuff.h | 2 ++
> net/core/skbuff.c | 11 +++++++++--
> 2 files changed, 11 insertions(+), 2 deletions(-)
>
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 2/4] net: tap: track dropped skb via kfree_skb_reason()
2022-02-21 5:34 ` [PATCH net-next v3 2/4] net: tap: track dropped skb via kfree_skb_reason() Dongli Zhang
@ 2022-02-22 3:24 ` David Ahern
2022-02-22 4:31 ` Dongli Zhang
0 siblings, 1 reply; 18+ messages in thread
From: David Ahern @ 2022-02-22 3:24 UTC (permalink / raw)
To: Dongli Zhang, netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, edumazet
On 2/20/22 10:34 PM, Dongli Zhang wrote:
> The TAP can be used as vhost-net backend. E.g., the tap_handle_frame() is
> the interface to forward the skb from TAP to vhost-net/virtio-net.
>
> However, there are many "goto drop" in the TAP driver. Therefore, the
> kfree_skb_reason() is involved at each "goto drop" to help userspace
> ftrace/ebpf to track the reason for the loss of packets.
>
> The below reasons are introduced:
>
> - SKB_DROP_REASON_SKB_CSUM
> - SKB_DROP_REASON_SKB_COPY_DATA
> - SKB_DROP_REASON_SKB_GSO_SEG
> - SKB_DROP_REASON_DEV_HDR
> - SKB_DROP_REASON_FULL_RING
>
> Cc: Joao Martins <joao.m.martins@oracle.com>
> Cc: Joe Jin <joe.jin@oracle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
> - revise the reason name
> Changed since v2:
> - declare drop_reason as type "enum skb_drop_reason"
> - handle the drop in skb_list_walk_safe() case
>
> drivers/net/tap.c | 35 +++++++++++++++++++++++++----------
> include/linux/skbuff.h | 9 +++++++++
> include/trace/events/skb.h | 5 +++++
> 3 files changed, 39 insertions(+), 10 deletions(-)
>
couple of places where the new reason should be in reverse xmas order;
logic wise:
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 4/4] net: tun: track dropped skb via kfree_skb_reason()
2022-02-21 5:34 ` [PATCH net-next v3 4/4] net: tun: track dropped skb via kfree_skb_reason() Dongli Zhang
@ 2022-02-22 3:28 ` David Ahern
2022-02-22 4:45 ` Dongli Zhang
0 siblings, 1 reply; 18+ messages in thread
From: David Ahern @ 2022-02-22 3:28 UTC (permalink / raw)
To: Dongli Zhang, netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, edumazet
On 2/20/22 10:34 PM, Dongli Zhang wrote:
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index aa27268..bf7d8cd 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1062,13 +1062,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> struct netdev_queue *queue;
> struct tun_file *tfile;
> int len = skb->len;
> + enum skb_drop_reason drop_reason;
this function is already honoring reverse xmas tree style, so this needs
to be moved up.
>
> rcu_read_lock();
> tfile = rcu_dereference(tun->tfiles[txq]);
>
> /* Drop packet if interface is not attached */
> - if (!tfile)
> + if (!tfile) {
> + drop_reason = SKB_DROP_REASON_DEV_READY;
> goto drop;
> + }
>
> if (!rcu_dereference(tun->steering_prog))
> tun_automq_xmit(tun, skb);
> @@ -1078,22 +1081,32 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> /* Drop if the filter does not like it.
> * This is a noop if the filter is disabled.
> * Filter can be enabled only for the TAP devices. */
> - if (!check_filter(&tun->txflt, skb))
> + if (!check_filter(&tun->txflt, skb)) {
> + drop_reason = SKB_DROP_REASON_DEV_FILTER;
> goto drop;
> + }
>
> if (tfile->socket.sk->sk_filter &&
> - sk_filter(tfile->socket.sk, skb))
> + sk_filter(tfile->socket.sk, skb)) {
> + drop_reason = SKB_DROP_REASON_SOCKET_FILTER;
> goto drop;
> + }
>
> len = run_ebpf_filter(tun, skb, len);
> - if (len == 0)
> + if (len == 0) {
> + drop_reason = SKB_DROP_REASON_BPF_FILTER;
how does this bpf filter differ from SKB_DROP_REASON_SOCKET_FILTER? I
think the reason code needs to be a little clearer on the distinction.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 3/4] net: tun: split run_ebpf_filter() and pskb_trim() into different "if statement"
2022-02-21 5:34 ` [PATCH net-next v3 3/4] net: tun: split run_ebpf_filter() and pskb_trim() into different "if statement" Dongli Zhang
@ 2022-02-22 3:28 ` David Ahern
0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2022-02-22 3:28 UTC (permalink / raw)
To: Dongli Zhang, netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, edumazet
On 2/20/22 10:34 PM, Dongli Zhang wrote:
> No functional change.
>
> Just to split the if statement into different conditions to use
> kfree_skb_reason() to trace the reason later.
>
> Cc: Joao Martins <joao.m.martins@oracle.com>
> Cc: Joe Jin <joe.jin@oracle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> drivers/net/tun.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 2/4] net: tap: track dropped skb via kfree_skb_reason()
2022-02-22 3:24 ` David Ahern
@ 2022-02-22 4:31 ` Dongli Zhang
2022-02-26 8:52 ` Dongli Zhang
0 siblings, 1 reply; 18+ messages in thread
From: Dongli Zhang @ 2022-02-22 4:31 UTC (permalink / raw)
To: David Ahern, netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, edumazet
Hi David,
On 2/21/22 7:24 PM, David Ahern wrote:
> On 2/20/22 10:34 PM, Dongli Zhang wrote:
>> The TAP can be used as vhost-net backend. E.g., the tap_handle_frame() is
>> the interface to forward the skb from TAP to vhost-net/virtio-net.
>>
>> However, there are many "goto drop" in the TAP driver. Therefore, the
>> kfree_skb_reason() is involved at each "goto drop" to help userspace
>> ftrace/ebpf to track the reason for the loss of packets.
>>
>> The below reasons are introduced:
>>
>> - SKB_DROP_REASON_SKB_CSUM
>> - SKB_DROP_REASON_SKB_COPY_DATA
>> - SKB_DROP_REASON_SKB_GSO_SEG
>> - SKB_DROP_REASON_DEV_HDR
>> - SKB_DROP_REASON_FULL_RING
>>
>> Cc: Joao Martins <joao.m.martins@oracle.com>
>> Cc: Joe Jin <joe.jin@oracle.com>
>> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
>> ---
>> Changed since v1:
>> - revise the reason name
>> Changed since v2:
>> - declare drop_reason as type "enum skb_drop_reason"
>> - handle the drop in skb_list_walk_safe() case
>>
>> drivers/net/tap.c | 35 +++++++++++++++++++++++++----------
>> include/linux/skbuff.h | 9 +++++++++
>> include/trace/events/skb.h | 5 +++++
>> 3 files changed, 39 insertions(+), 10 deletions(-)
>>
>
> couple of places where the new reason should be in reverse xmas order;
> logic wise:
>
> Reviewed-by: David Ahern <dsahern@kernel.org>
>
I will re-order the reasons in the same patch and re-send with your Reviewed-by
in the next version.
Thank you very much!
Dongli Zhang
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 4/4] net: tun: track dropped skb via kfree_skb_reason()
2022-02-22 3:28 ` David Ahern
@ 2022-02-22 4:45 ` Dongli Zhang
2022-02-22 14:39 ` David Ahern
0 siblings, 1 reply; 18+ messages in thread
From: Dongli Zhang @ 2022-02-22 4:45 UTC (permalink / raw)
To: David Ahern, netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, edumazet
Hi David,
On 2/21/22 7:28 PM, David Ahern wrote:
> On 2/20/22 10:34 PM, Dongli Zhang wrote:
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index aa27268..bf7d8cd 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1062,13 +1062,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>> struct netdev_queue *queue;
>> struct tun_file *tfile;
>> int len = skb->len;
>> + enum skb_drop_reason drop_reason;
>
> this function is already honoring reverse xmas tree style, so this needs
> to be moved up.
I will move this up to before "int txq = skb->queue_mapping;".
>
>>
>> rcu_read_lock();
>> tfile = rcu_dereference(tun->tfiles[txq]);
>>
>> /* Drop packet if interface is not attached */
>> - if (!tfile)
>> + if (!tfile) {
>> + drop_reason = SKB_DROP_REASON_DEV_READY;
>> goto drop;
>> + }
>>
>> if (!rcu_dereference(tun->steering_prog))
>> tun_automq_xmit(tun, skb);
>> @@ -1078,22 +1081,32 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>> /* Drop if the filter does not like it.
>> * This is a noop if the filter is disabled.
>> * Filter can be enabled only for the TAP devices. */
>> - if (!check_filter(&tun->txflt, skb))
>> + if (!check_filter(&tun->txflt, skb)) {
>> + drop_reason = SKB_DROP_REASON_DEV_FILTER;
>> goto drop;
>> + }
>>
>> if (tfile->socket.sk->sk_filter &&
>> - sk_filter(tfile->socket.sk, skb))
>> + sk_filter(tfile->socket.sk, skb)) {
>> + drop_reason = SKB_DROP_REASON_SOCKET_FILTER;
>> goto drop;
>> + }
>>
>> len = run_ebpf_filter(tun, skb, len);
>> - if (len == 0)
>> + if (len == 0) {
>> + drop_reason = SKB_DROP_REASON_BPF_FILTER;
>
> how does this bpf filter differ from SKB_DROP_REASON_SOCKET_FILTER? I
> think the reason code needs to be a little clearer on the distinction.
>
While there is a diff between BPF_FILTER (here) and SOCKET_FILTER ...
... indeed the issue is: there is NO diff between BPF_FILTER (here) and
DEV_FILTER (introduced by the patch).
The run_ebpf_filter() is to run the bpf filter attached to the TUN device (not
socket). This is similar to DEV_FILTER, which is to run a device specific filter.
Initially, I would use DEV_FILTER at both locations. This makes trouble to me as
there would be two places with same reason=DEV_FILTER. I will not be able to
tell where the skb is dropped.
I was thinking about to introduce a SKB_DROP_REASON_DEV_BPF. While I have
limited experience in device specific bpf, the TUN is the only device I know
that has a device specific ebpf filter (by commit aff3d70a07ff ("tun: allow to
attach ebpf socket filter")). The SKB_DROP_REASON_DEV_BPF is not generic enough
to be re-used by other drivers.
Would you mind sharing your suggestion if I would re-use (1)
SKB_DROP_REASON_DEV_FILTER or (2) introduce a new SKB_DROP_REASON_DEV_BPF, which
is for sk_buff dropped by ebpf attached to device (not socket).
To answer your question, the SOCKET_FILTER is for filter attached to socket, the
BPF_FILTER was supposed for ebpf filter attached to device (tun->filter_prog).
Thank you very much!
Dongli Zhang
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 4/4] net: tun: track dropped skb via kfree_skb_reason()
2022-02-22 4:45 ` Dongli Zhang
@ 2022-02-22 14:39 ` David Ahern
2022-02-22 18:20 ` Dongli Zhang
0 siblings, 1 reply; 18+ messages in thread
From: David Ahern @ 2022-02-22 14:39 UTC (permalink / raw)
To: Dongli Zhang, netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, edumazet
On 2/21/22 9:45 PM, Dongli Zhang wrote:
> Hi David,
>
> On 2/21/22 7:28 PM, David Ahern wrote:
>> On 2/20/22 10:34 PM, Dongli Zhang wrote:
>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>> index aa27268..bf7d8cd 100644
>>> --- a/drivers/net/tun.c
>>> +++ b/drivers/net/tun.c
>>> @@ -1062,13 +1062,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>> struct netdev_queue *queue;
>>> struct tun_file *tfile;
>>> int len = skb->len;
>>> + enum skb_drop_reason drop_reason;
>>
>> this function is already honoring reverse xmas tree style, so this needs
>> to be moved up.
>
> I will move this up to before "int txq = skb->queue_mapping;".
>
>>
>>>
>>> rcu_read_lock();
>>> tfile = rcu_dereference(tun->tfiles[txq]);
>>>
>>> /* Drop packet if interface is not attached */
>>> - if (!tfile)
>>> + if (!tfile) {
>>> + drop_reason = SKB_DROP_REASON_DEV_READY;
>>> goto drop;
>>> + }
>>>
>>> if (!rcu_dereference(tun->steering_prog))
>>> tun_automq_xmit(tun, skb);
>>> @@ -1078,22 +1081,32 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>> /* Drop if the filter does not like it.
>>> * This is a noop if the filter is disabled.
>>> * Filter can be enabled only for the TAP devices. */
>>> - if (!check_filter(&tun->txflt, skb))
>>> + if (!check_filter(&tun->txflt, skb)) {
>>> + drop_reason = SKB_DROP_REASON_DEV_FILTER;
>>> goto drop;
>>> + }
>>>
>>> if (tfile->socket.sk->sk_filter &&
>>> - sk_filter(tfile->socket.sk, skb))
>>> + sk_filter(tfile->socket.sk, skb)) {
>>> + drop_reason = SKB_DROP_REASON_SOCKET_FILTER;
>>> goto drop;
>>> + }
>>>
>>> len = run_ebpf_filter(tun, skb, len);
>>> - if (len == 0)
>>> + if (len == 0) {
>>> + drop_reason = SKB_DROP_REASON_BPF_FILTER;
>>
>> how does this bpf filter differ from SKB_DROP_REASON_SOCKET_FILTER? I
>> think the reason code needs to be a little clearer on the distinction.
>>
>
>
> While there is a diff between BPF_FILTER (here) and SOCKET_FILTER ...
>
> ... indeed the issue is: there is NO diff between BPF_FILTER (here) and
> DEV_FILTER (introduced by the patch).
>
>
> The run_ebpf_filter() is to run the bpf filter attached to the TUN device (not
> socket). This is similar to DEV_FILTER, which is to run a device specific filter.
>
> Initially, I would use DEV_FILTER at both locations. This makes trouble to me as
> there would be two places with same reason=DEV_FILTER. I will not be able to
> tell where the skb is dropped.
>
>
> I was thinking about to introduce a SKB_DROP_REASON_DEV_BPF. While I have
> limited experience in device specific bpf, the TUN is the only device I know
> that has a device specific ebpf filter (by commit aff3d70a07ff ("tun: allow to
> attach ebpf socket filter")). The SKB_DROP_REASON_DEV_BPF is not generic enough
> to be re-used by other drivers.
>
>
> Would you mind sharing your suggestion if I would re-use (1)
> SKB_DROP_REASON_DEV_FILTER or (2) introduce a new SKB_DROP_REASON_DEV_BPF, which
> is for sk_buff dropped by ebpf attached to device (not socket).
>
>
> To answer your question, the SOCKET_FILTER is for filter attached to socket, the
> BPF_FILTER was supposed for ebpf filter attached to device (tun->filter_prog).
>
>
tun/tap does have some unique filtering options. The other sets focused
on the core networking stack is adding a drop reason of
SKB_DROP_REASON_BPF_CGROUP_EGRESS for cgroup based egress filters.
For tun unique filters, how about using a shortened version of the ioctl
name used to set the filter.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 4/4] net: tun: track dropped skb via kfree_skb_reason()
2022-02-22 14:39 ` David Ahern
@ 2022-02-22 18:20 ` Dongli Zhang
2022-02-25 5:57 ` Menglong Dong
0 siblings, 1 reply; 18+ messages in thread
From: Dongli Zhang @ 2022-02-22 18:20 UTC (permalink / raw)
To: David Ahern, netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, edumazet
Hi David,
On 2/22/22 6:39 AM, David Ahern wrote:
> On 2/21/22 9:45 PM, Dongli Zhang wrote:
>> Hi David,
>>
>> On 2/21/22 7:28 PM, David Ahern wrote:
>>> On 2/20/22 10:34 PM, Dongli Zhang wrote:
>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>> index aa27268..bf7d8cd 100644
>>>> --- a/drivers/net/tun.c
>>>> +++ b/drivers/net/tun.c
>>>> @@ -1062,13 +1062,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>>> struct netdev_queue *queue;
>>>> struct tun_file *tfile;
>>>> int len = skb->len;
>>>> + enum skb_drop_reason drop_reason;
>>>
>>> this function is already honoring reverse xmas tree style, so this needs
>>> to be moved up.
>>
>> I will move this up to before "int txq = skb->queue_mapping;".
>>
>>>
>>>>
>>>> rcu_read_lock();
>>>> tfile = rcu_dereference(tun->tfiles[txq]);
>>>>
>>>> /* Drop packet if interface is not attached */
>>>> - if (!tfile)
>>>> + if (!tfile) {
>>>> + drop_reason = SKB_DROP_REASON_DEV_READY;
>>>> goto drop;
>>>> + }
>>>>
>>>> if (!rcu_dereference(tun->steering_prog))
>>>> tun_automq_xmit(tun, skb);
>>>> @@ -1078,22 +1081,32 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>>> /* Drop if the filter does not like it.
>>>> * This is a noop if the filter is disabled.
>>>> * Filter can be enabled only for the TAP devices. */
>>>> - if (!check_filter(&tun->txflt, skb))
>>>> + if (!check_filter(&tun->txflt, skb)) {
>>>> + drop_reason = SKB_DROP_REASON_DEV_FILTER;
>>>> goto drop;
>>>> + }
>>>>
>>>> if (tfile->socket.sk->sk_filter &&
>>>> - sk_filter(tfile->socket.sk, skb))
>>>> + sk_filter(tfile->socket.sk, skb)) {
>>>> + drop_reason = SKB_DROP_REASON_SOCKET_FILTER;
>>>> goto drop;
>>>> + }
>>>>
>>>> len = run_ebpf_filter(tun, skb, len);
>>>> - if (len == 0)
>>>> + if (len == 0) {
>>>> + drop_reason = SKB_DROP_REASON_BPF_FILTER;
>>>
>>> how does this bpf filter differ from SKB_DROP_REASON_SOCKET_FILTER? I
>>> think the reason code needs to be a little clearer on the distinction.
>>>
>>
>>
>> While there is a diff between BPF_FILTER (here) and SOCKET_FILTER ...
>>
>> ... indeed the issue is: there is NO diff between BPF_FILTER (here) and
>> DEV_FILTER (introduced by the patch).
>>
>>
>> The run_ebpf_filter() is to run the bpf filter attached to the TUN device (not
>> socket). This is similar to DEV_FILTER, which is to run a device specific filter.
>>
>> Initially, I would use DEV_FILTER at both locations. This makes trouble to me as
>> there would be two places with same reason=DEV_FILTER. I will not be able to
>> tell where the skb is dropped.
>>
>>
>> I was thinking about to introduce a SKB_DROP_REASON_DEV_BPF. While I have
>> limited experience in device specific bpf, the TUN is the only device I know
>> that has a device specific ebpf filter (by commit aff3d70a07ff ("tun: allow to
>> attach ebpf socket filter")). The SKB_DROP_REASON_DEV_BPF is not generic enough
>> to be re-used by other drivers.
>>
>>
>> Would you mind sharing your suggestion if I would re-use (1)
>> SKB_DROP_REASON_DEV_FILTER or (2) introduce a new SKB_DROP_REASON_DEV_BPF, which
>> is for sk_buff dropped by ebpf attached to device (not socket).
>>
>>
>> To answer your question, the SOCKET_FILTER is for filter attached to socket, the
>> BPF_FILTER was supposed for ebpf filter attached to device (tun->filter_prog).
>>
>>
>
> tun/tap does have some unique filtering options. The other sets focused
> on the core networking stack is adding a drop reason of
> SKB_DROP_REASON_BPF_CGROUP_EGRESS for cgroup based egress filters.
Thank you for the explanation!
>
> For tun unique filters, how about using a shortened version of the ioctl
> name used to set the filter.
>
Although TUN is widely used in virtualization environment, it is only one of
many drivers. I prefer to not introduce a reason that can be used only by a
specific driver.
In order to make it more generic and more re-usable (e.g., perhaps people may
add ebpf filter to TAP driver as well), how about we create below reasons.
SKB_DROP_REASON_DEV_FILTER, /* dropped by filter attached to
* or directly implemented by a
* specific driver
*/
SKB_DROP_REASON_BPF_DEV, /* dropped by bpf directly
* attached to a specific device,
* e.g., via TUNSETFILTEREBPF
*/
We already use SKB_DROP_REASON_DEV_FILTER in this patchset. We will use
SKB_DROP_REASON_BPF_DEV for the ebpf filter attached to TUN.
Thank you very much!
Dongli Zhang
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH net-next v3 4/4] net: tun: track dropped skb via kfree_skb_reason()
2022-02-22 18:20 ` Dongli Zhang
@ 2022-02-25 5:57 ` Menglong Dong
2022-02-25 15:48 ` David Ahern
0 siblings, 1 reply; 18+ messages in thread
From: Menglong Dong @ 2022-02-25 5:57 UTC (permalink / raw)
To: dongli.zhang
Cc: andrii, ast, bpf, daniel, davem, dsahern, edumazet, imagedong,
joao.m.martins, joe.jin, kuba, linux-kernel, mingo, netdev,
rostedt
>Hi David,
>
>On 2/22/22 6:39 AM, David Ahern wrote:
>> On 2/21/22 9:45 PM, Dongli Zhang wrote:
>>> Hi David,
>>>
>>> On 2/21/22 7:28 PM, David Ahern wrote:
>>>> On 2/20/22 10:34 PM, Dongli Zhang wrote:
>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>> index aa27268..bf7d8cd 100644
>>>>> --- a/drivers/net/tun.c
>>>>> +++ b/drivers/net/tun.c
>>>>> @@ -1062,13 +1062,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>> struct netdev_queue *queue;
>>>>> struct tun_file *tfile;
>>>>> int len = skb->len;
>>>>> + enum skb_drop_reason drop_reason;
>>>>
>>>> this function is already honoring reverse xmas tree style, so this needs
>>>> to be moved up.
>>>
>>> I will move this up to before "int txq = skb->queue_mapping;".
>>>
>>>>
[...]
>>>>
>>>
>>>
>>> While there is a diff between BPF_FILTER (here) and SOCKET_FILTER ...
>>>
>>> ... indeed the issue is: there is NO diff between BPF_FILTER (here) and
>>> DEV_FILTER (introduced by the patch).
>>>
>>>
>>> The run_ebpf_filter() is to run the bpf filter attached to the TUN device (not
>>> socket). This is similar to DEV_FILTER, which is to run a device specific filter.
>>>
>>> Initially, I would use DEV_FILTER at both locations. This makes trouble to me as
>>> there would be two places with same reason=DEV_FILTER. I will not be able to
>>> tell where the skb is dropped.
>>>
>>>
>>> I was thinking about to introduce a SKB_DROP_REASON_DEV_BPF. While I have
>>> limited experience in device specific bpf, the TUN is the only device I know
>>> that has a device specific ebpf filter (by commit aff3d70a07ff ("tun: allow to
>>> attach ebpf socket filter")). The SKB_DROP_REASON_DEV_BPF is not generic enough
>>> to be re-used by other drivers.
>>>
>>>
>>> Would you mind sharing your suggestion if I would re-use (1)
>>> SKB_DROP_REASON_DEV_FILTER or (2) introduce a new SKB_DROP_REASON_DEV_BPF, which
>>> is for sk_buff dropped by ebpf attached to device (not socket).
>>>
>>>
>>> To answer your question, the SOCKET_FILTER is for filter attached to socket, the
>>> BPF_FILTER was supposed for ebpf filter attached to device (tun->filter_prog).
>>>
>>>
>>
>> tun/tap does have some unique filtering options. The other sets focused
>> on the core networking stack is adding a drop reason of
>> SKB_DROP_REASON_BPF_CGROUP_EGRESS for cgroup based egress filters.
>
>Thank you for the explanation!
>
>>
>> For tun unique filters, how about using a shortened version of the ioctl
>> name used to set the filter.
>>
>
>Although TUN is widely used in virtualization environment, it is only one of
>many drivers. I prefer to not introduce a reason that can be used only by a
>specific driver.
>
>In order to make it more generic and more re-usable (e.g., perhaps people may
>add ebpf filter to TAP driver as well), how about we create below reasons.
>
>SKB_DROP_REASON_DEV_FILTER, /* dropped by filter attached to
> * or directly implemented by a
> * specific driver
> */
>SKB_DROP_REASON_BPF_DEV, /* dropped by bpf directly
> * attached to a specific device,
> * e.g., via TUNSETFILTEREBPF
> */
Aren't DEV_FILTER and BPF_DEV too generic? eBPF atached to netdev can
be many kinds, such as XDP, TC, etc.
I think that use TAP_TXFILTER instaed of DEV_FILTER maybe better?
and TAP_FILTER->BPF_DEV. Make them similar to the name in
__tun_chr_ioctl() may be easier for user to understand.
>
>We already use SKB_DROP_REASON_DEV_FILTER in this patchset. We will use
>SKB_DROP_REASON_BPF_DEV for the ebpf filter attached to TUN.
>
>Thank you very much!
>
>Dongli Zhang
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 4/4] net: tun: track dropped skb via kfree_skb_reason()
2022-02-25 5:57 ` Menglong Dong
@ 2022-02-25 15:48 ` David Ahern
2022-02-25 16:49 ` Dongli Zhang
0 siblings, 1 reply; 18+ messages in thread
From: David Ahern @ 2022-02-25 15:48 UTC (permalink / raw)
To: Menglong Dong, dongli.zhang
Cc: andrii, ast, bpf, daniel, davem, edumazet, imagedong,
joao.m.martins, joe.jin, kuba, linux-kernel, mingo, netdev,
rostedt
On 2/24/22 10:57 PM, Menglong Dong wrote:
>>>
>>> For tun unique filters, how about using a shortened version of the ioctl
>>> name used to set the filter.
>>>
>>
>> Although TUN is widely used in virtualization environment, it is only one of
>> many drivers. I prefer to not introduce a reason that can be used only by a
>> specific driver.
>>
>> In order to make it more generic and more re-usable (e.g., perhaps people may
>> add ebpf filter to TAP driver as well), how about we create below reasons.
>>
>> SKB_DROP_REASON_DEV_FILTER, /* dropped by filter attached to
>> * or directly implemented by a
>> * specific driver
>> */
>> SKB_DROP_REASON_BPF_DEV, /* dropped by bpf directly
>> * attached to a specific device,
>> * e.g., via TUNSETFILTEREBPF
>> */
>
> Aren't DEV_FILTER and BPF_DEV too generic? eBPF atached to netdev can
> be many kinds, such as XDP, TC, etc.
yes.
>
> I think that use TAP_TXFILTER instaed of DEV_FILTER maybe better?
> and TAP_FILTER->BPF_DEV. Make them similar to the name in
> __tun_chr_ioctl() may be easier for user to understand.
>
in this case given the unique attach points and API tap in the name
seems more appropriate
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 4/4] net: tun: track dropped skb via kfree_skb_reason()
2022-02-25 15:48 ` David Ahern
@ 2022-02-25 16:49 ` Dongli Zhang
0 siblings, 0 replies; 18+ messages in thread
From: Dongli Zhang @ 2022-02-25 16:49 UTC (permalink / raw)
To: David Ahern, Menglong Dong
Cc: andrii, ast, bpf, daniel, davem, edumazet, imagedong,
joao.m.martins, joe.jin, kuba, linux-kernel, mingo, netdev,
rostedt
Hi David and Menglong,
On 2/25/22 7:48 AM, David Ahern wrote:
> On 2/24/22 10:57 PM, Menglong Dong wrote:
>>>>
>>>> For tun unique filters, how about using a shortened version of the ioctl
>>>> name used to set the filter.
>>>>
>>>
>>> Although TUN is widely used in virtualization environment, it is only one of
>>> many drivers. I prefer to not introduce a reason that can be used only by a
>>> specific driver.
>>>
>>> In order to make it more generic and more re-usable (e.g., perhaps people may
>>> add ebpf filter to TAP driver as well), how about we create below reasons.
>>>
>>> SKB_DROP_REASON_DEV_FILTER, /* dropped by filter attached to
>>> * or directly implemented by a
>>> * specific driver
>>> */
>>> SKB_DROP_REASON_BPF_DEV, /* dropped by bpf directly
>>> * attached to a specific device,
>>> * e.g., via TUNSETFILTEREBPF
>>> */
>>
>> Aren't DEV_FILTER and BPF_DEV too generic? eBPF atached to netdev can
>> be many kinds, such as XDP, TC, etc.
>
> yes.
>
>>
>> I think that use TAP_TXFILTER instaed of DEV_FILTER maybe better?
>> and TAP_FILTER->BPF_DEV. Make them similar to the name in
>> __tun_chr_ioctl() may be easier for user to understand.
>>
>
> in this case given the unique attach points and API tap in the name
> seems more appropriate
>
Thank you very much for the suggestions.
I will add below in the next version.
SKB_DROP_REASON_TAP_TXFILTER, /* dropped by tx filter implemented at
* tun/tap, e.g., check_filter()
*/
SKB_DROP_REASON_TAP_FILTER, /* dropped by (ebpf) filter directly
* attached to tun/tap, e.g., via
* TUNSETFILTEREBPF
*/
Dongli Zhang
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH net-next v3 2/4] net: tap: track dropped skb via kfree_skb_reason()
2022-02-22 4:31 ` Dongli Zhang
@ 2022-02-26 8:52 ` Dongli Zhang
0 siblings, 0 replies; 18+ messages in thread
From: Dongli Zhang @ 2022-02-26 8:52 UTC (permalink / raw)
To: David Ahern, netdev, bpf
Cc: linux-kernel, davem, kuba, rostedt, mingo, ast, daniel, andrii,
imagedong, joao.m.martins, joe.jin, edumazet
On 2/21/22 8:31 PM, Dongli Zhang wrote:
> Hi David,
>
> On 2/21/22 7:24 PM, David Ahern wrote:
>> On 2/20/22 10:34 PM, Dongli Zhang wrote:
>>> The TAP can be used as vhost-net backend. E.g., the tap_handle_frame() is
>>> the interface to forward the skb from TAP to vhost-net/virtio-net.
>>>
>>> However, there are many "goto drop" in the TAP driver. Therefore, the
>>> kfree_skb_reason() is involved at each "goto drop" to help userspace
>>> ftrace/ebpf to track the reason for the loss of packets.
>>>
>>> The below reasons are introduced:
>>>
>>> - SKB_DROP_REASON_SKB_CSUM
>>> - SKB_DROP_REASON_SKB_COPY_DATA
>>> - SKB_DROP_REASON_SKB_GSO_SEG
>>> - SKB_DROP_REASON_DEV_HDR
>>> - SKB_DROP_REASON_FULL_RING
>>>
>>> Cc: Joao Martins <joao.m.martins@oracle.com>
>>> Cc: Joe Jin <joe.jin@oracle.com>
>>> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
>>> ---
>>> Changed since v1:
>>> - revise the reason name
>>> Changed since v2:
>>> - declare drop_reason as type "enum skb_drop_reason"
>>> - handle the drop in skb_list_walk_safe() case
>>>
>>> drivers/net/tap.c | 35 +++++++++++++++++++++++++----------
>>> include/linux/skbuff.h | 9 +++++++++
>>> include/trace/events/skb.h | 5 +++++
>>> 3 files changed, 39 insertions(+), 10 deletions(-)
>>>
>>
>> couple of places where the new reason should be in reverse xmas order;
>> logic wise:
>>
>> Reviewed-by: David Ahern <dsahern@kernel.org>
>>
>
> I will re-order the reasons in the same patch and re-send with your Reviewed-by
> in the next version.
>
I have sent out v4 and I finally decide to not re-order reasons for this patch
as this may makes trouble for backport.
I will not follow the reverse xmas order here, as all existing variables are not
declared in reverse xmas order.
Thank you very much!
Dongli Zhang
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-02-26 8:53 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-21 5:34 [PATCH net-next v3 0/3] tun/tap: use kfree_skb_reason() to trace dropped skb Dongli Zhang
2022-02-21 5:34 ` [PATCH net-next v3 1/4] skbuff: introduce kfree_skb_list_reason() Dongli Zhang
2022-02-22 3:20 ` David Ahern
2022-02-21 5:34 ` [PATCH net-next v3 2/4] net: tap: track dropped skb via kfree_skb_reason() Dongli Zhang
2022-02-22 3:24 ` David Ahern
2022-02-22 4:31 ` Dongli Zhang
2022-02-26 8:52 ` Dongli Zhang
2022-02-21 5:34 ` [PATCH net-next v3 3/4] net: tun: split run_ebpf_filter() and pskb_trim() into different "if statement" Dongli Zhang
2022-02-22 3:28 ` David Ahern
2022-02-21 5:34 ` [PATCH net-next v3 4/4] net: tun: track dropped skb via kfree_skb_reason() Dongli Zhang
2022-02-22 3:28 ` David Ahern
2022-02-22 4:45 ` Dongli Zhang
2022-02-22 14:39 ` David Ahern
2022-02-22 18:20 ` Dongli Zhang
2022-02-25 5:57 ` Menglong Dong
2022-02-25 15:48 ` David Ahern
2022-02-25 16:49 ` Dongli Zhang
2022-02-21 22:53 ` [PATCH net-next v3 0/3] tun/tap: use kfree_skb_reason() to trace dropped skb Dongli Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).