* [PATCH net-next] net: skb: fix kfree_skb event output error in perf @ 2022-08-25 5:37 menglong8.dong 2022-08-25 15:31 ` Eric Dumazet 0 siblings, 1 reply; 7+ messages in thread From: menglong8.dong @ 2022-08-25 5:37 UTC (permalink / raw) To: edumazet Cc: rostedt, mingo, davem, imagedong, dsahern, flyingpeng, dongli.zhang, linux-kernel From: Menglong Dong <imagedong@tencent.com> As Eric reported, the 'reason' field is not presented when trace the kfree_skb event by perf: $ perf record -e skb:kfree_skb -a sleep 10 $ perf script ip_defrag 14605 [021] 221.614303: skb:kfree_skb: skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1 reason: The cause seems to be passing kernel address directly to TP_printk(), which is not right. Therefore, fix this by adding a '__string' field to the TP_STRUCT of kfree_skb, which is 'reason_str', and passing it to TP_printk(). (Not sure if we should still keep the 'reason' field in TP_STRUCT__entry) Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Menglong Dong <imagedong@tencent.com> --- include/trace/events/skb.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h index 45264e4bb254..7235554141c3 100644 --- a/include/trace/events/skb.h +++ b/include/trace/events/skb.h @@ -24,6 +24,7 @@ TRACE_EVENT(kfree_skb, __field(void *, location) __field(unsigned short, protocol) __field(enum skb_drop_reason, reason) + __string(reason_str, drop_reasons[reason]) ), TP_fast_assign( @@ -31,11 +32,12 @@ TRACE_EVENT(kfree_skb, __entry->location = location; __entry->protocol = ntohs(skb->protocol); __entry->reason = reason; + __assign_str(reason_str, drop_reasons[reason]); ), TP_printk("skbaddr=%p protocol=%u location=%p reason: %s", __entry->skbaddr, __entry->protocol, __entry->location, - drop_reasons[__entry->reason]) + __get_str(reason_str)) ); TRACE_EVENT(consume_skb, -- 2.37.2 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net-next] net: skb: fix kfree_skb event output error in perf 2022-08-25 5:37 [PATCH net-next] net: skb: fix kfree_skb event output error in perf menglong8.dong @ 2022-08-25 15:31 ` Eric Dumazet 2022-08-26 4:47 ` Menglong Dong 0 siblings, 1 reply; 7+ messages in thread From: Eric Dumazet @ 2022-08-25 15:31 UTC (permalink / raw) To: Menglong Dong Cc: Steven Rostedt, Ingo Molnar, David Miller, Menglong Dong, David Ahern, Hao Peng, Dongli Zhang, LKML On Wed, Aug 24, 2022 at 10:37 PM <menglong8.dong@gmail.com> wrote: > > From: Menglong Dong <imagedong@tencent.com> > > As Eric reported, the 'reason' field is not presented when trace the > kfree_skb event by perf: > > $ perf record -e skb:kfree_skb -a sleep 10 > $ perf script > ip_defrag 14605 [021] 221.614303: skb:kfree_skb: > skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1 > reason: > > The cause seems to be passing kernel address directly to TP_printk(), > which is not right. Why ? It seems this adds an expensive copy of a string that should reside in rodata section of vmlinux, thus can not disappear... (Also the ring buffer entry will have a dynamic size ?) trace_safe_str() is using is_kernel_rodata() which should return true for drop_reasons[X] ? $ grep drop_reasons net/core/dropreason_str.c const char * const drop_reasons[] = { ... > > Therefore, fix this by adding a '__string' field to the TP_STRUCT of > kfree_skb, which is 'reason_str', and passing it to TP_printk(). > > (Not sure if we should still keep the 'reason' field in > TP_STRUCT__entry) Maybe for event/trace filtering purposes ? > > Reported-by: Eric Dumazet <edumazet@google.com> > Signed-off-by: Menglong Dong <imagedong@tencent.com> > --- > include/trace/events/skb.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h > index 45264e4bb254..7235554141c3 100644 > --- a/include/trace/events/skb.h > +++ b/include/trace/events/skb.h > @@ -24,6 +24,7 @@ TRACE_EVENT(kfree_skb, > __field(void *, location) > __field(unsigned short, protocol) > __field(enum skb_drop_reason, reason) > + __string(reason_str, drop_reasons[reason]) > ), > > TP_fast_assign( > @@ -31,11 +32,12 @@ TRACE_EVENT(kfree_skb, > __entry->location = location; > __entry->protocol = ntohs(skb->protocol); > __entry->reason = reason; > + __assign_str(reason_str, drop_reasons[reason]); > ), > > TP_printk("skbaddr=%p protocol=%u location=%p reason: %s", > __entry->skbaddr, __entry->protocol, __entry->location, > - drop_reasons[__entry->reason]) > + __get_str(reason_str)) > ); > > TRACE_EVENT(consume_skb, > -- > 2.37.2 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next] net: skb: fix kfree_skb event output error in perf 2022-08-25 15:31 ` Eric Dumazet @ 2022-08-26 4:47 ` Menglong Dong [not found] ` <CANn89iKVpRMDPgiQ1O6=H1M05iXX3o7rGARa1Z6HptWnAgi_Sg@mail.gmail.com> 0 siblings, 1 reply; 7+ messages in thread From: Menglong Dong @ 2022-08-26 4:47 UTC (permalink / raw) To: Eric Dumazet Cc: Steven Rostedt, Ingo Molnar, David Miller, Menglong Dong, David Ahern, Hao Peng, Dongli Zhang, LKML On Thu, Aug 25, 2022 at 11:32 PM Eric Dumazet <edumazet@google.com> wrote: > > On Wed, Aug 24, 2022 at 10:37 PM <menglong8.dong@gmail.com> wrote: > > > > From: Menglong Dong <imagedong@tencent.com> > > > > As Eric reported, the 'reason' field is not presented when trace the > > kfree_skb event by perf: > > > > $ perf record -e skb:kfree_skb -a sleep 10 > > $ perf script > > ip_defrag 14605 [021] 221.614303: skb:kfree_skb: > > skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1 > > reason: > > > > The cause seems to be passing kernel address directly to TP_printk(), > > which is not right. > > Why ? > I think it is because of how perf passes data to the user space. From what 'perf_output_sample()' do, we can know that perf passes the data of entry (with other data) to the user, and the user generates the output string from the format string (which can be obtained from /sys/kernel/debug/tracing/event/skb/kfree_skb/format) and the entry data. Therefore, perf can't get the string of drop reasons from the entry, only the enum. > It seems this adds an expensive copy of a string that should reside in > rodata section of vmlinux, thus can not disappear... > (Also the ring buffer entry will have a dynamic size ?) > It indeed will add additional cost, but it seems unavoidable. In the old version, __print_symbolic() is used, which will loop all the drop reason from a array and find corresponding string: TP_printk("skbaddr=%p protocol=%u location=%p reason: %s", __entry->skbaddr, __entry->protocol, __entry->location, __print_symbolic(__entry->reason, TRACE_SKB_DROP_REASON)) And I think the cost of coping strings may be less than this loop? as the drop reasons are getting more and more. > trace_safe_str() is using is_kernel_rodata() which should return true > for drop_reasons[X] ? > > $ grep drop_reasons net/core/dropreason_str.c > const char * const drop_reasons[] = { > ... > > > > > > > Therefore, fix this by adding a '__string' field to the TP_STRUCT of > > kfree_skb, which is 'reason_str', and passing it to TP_printk(). > > > > (Not sure if we should still keep the 'reason' field in > > TP_STRUCT__entry) > > Maybe for event/trace filtering purposes ? > > > > > Reported-by: Eric Dumazet <edumazet@google.com> > > Signed-off-by: Menglong Dong <imagedong@tencent.com> > > --- > > include/trace/events/skb.h | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h > > index 45264e4bb254..7235554141c3 100644 > > --- a/include/trace/events/skb.h > > +++ b/include/trace/events/skb.h > > @@ -24,6 +24,7 @@ TRACE_EVENT(kfree_skb, > > __field(void *, location) > > __field(unsigned short, protocol) > > __field(enum skb_drop_reason, reason) > > + __string(reason_str, drop_reasons[reason]) > > ), > > > > TP_fast_assign( > > @@ -31,11 +32,12 @@ TRACE_EVENT(kfree_skb, > > __entry->location = location; > > __entry->protocol = ntohs(skb->protocol); > > __entry->reason = reason; > > + __assign_str(reason_str, drop_reasons[reason]); > > ), > > > > TP_printk("skbaddr=%p protocol=%u location=%p reason: %s", > > __entry->skbaddr, __entry->protocol, __entry->location, > > - drop_reasons[__entry->reason]) > > + __get_str(reason_str)) > > ); > > > > TRACE_EVENT(consume_skb, > > -- > > 2.37.2 > > ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CANn89iKVpRMDPgiQ1O6=H1M05iXX3o7rGARa1Z6HptWnAgi_Sg@mail.gmail.com>]
* Re: [PATCH net-next] net: skb: fix kfree_skb event output error in perf [not found] ` <CANn89iKVpRMDPgiQ1O6=H1M05iXX3o7rGARa1Z6HptWnAgi_Sg@mail.gmail.com> @ 2022-08-26 15:44 ` Menglong Dong 2022-08-26 15:53 ` Eric Dumazet 0 siblings, 1 reply; 7+ messages in thread From: Menglong Dong @ 2022-08-26 15:44 UTC (permalink / raw) To: Eric Dumazet Cc: Steven Rostedt, Ingo Molnar, David Miller, Menglong Dong, David Ahern, Hao Peng, Dongli Zhang, LKML On Fri, Aug 26, 2022 at 11:07 PM Eric Dumazet <edumazet@google.com> wrote: > > > > On Thu, Aug 25, 2022 at 9:47 PM Menglong Dong <menglong8.dong@gmail.com> wrote: >> >> On Thu, Aug 25, 2022 at 11:32 PM Eric Dumazet <edumazet@google.com> wrote: >> > >> > On Wed, Aug 24, 2022 at 10:37 PM <menglong8.dong@gmail.com> wrote: >> > > >> > > From: Menglong Dong <imagedong@tencent.com> >> > > >> > > As Eric reported, the 'reason' field is not presented when trace the >> > > kfree_skb event by perf: >> > > >> > > $ perf record -e skb:kfree_skb -a sleep 10 >> > > $ perf script >> > > ip_defrag 14605 [021] 221.614303: skb:kfree_skb: >> > > skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1 >> > > reason: >> > > >> > > The cause seems to be passing kernel address directly to TP_printk(), >> > > which is not right. >> > >> > Why ? >> > >> >> I think it is because of how perf passes data to the user space. From >> what 'perf_output_sample()' do, we can know that perf passes the data >> of entry (with other data) to the user, and the user generates the output >> string from the format string (which can be obtained from >> /sys/kernel/debug/tracing/event/skb/kfree_skb/format) and the entry data. >> >> Therefore, perf can't get the string of drop reasons from the entry, only >> the enum. >> >> > It seems this adds an expensive copy of a string that should reside in >> > rodata section of vmlinux, thus can not disappear... >> > (Also the ring buffer entry will have a dynamic size ?) >> > >> >> It indeed will add additional cost, but it seems unavoidable. In the old >> version, __print_symbolic() is used, which will loop all the drop reason >> from a array and find corresponding string: >> >> TP_printk("skbaddr=%p protocol=%u location=%p reason: %s", >> __entry->skbaddr, __entry->protocol, __entry->location, >> __print_symbolic(__entry->reason, >> TRACE_SKB_DROP_REASON)) >> >> And I think the cost of coping strings may be less than this loop? as the >> drop reasons are getting more and more. > > > We are back to original feedback about all this stuff. > > Please measure the tax on a workload dropping 5,000,000 packets per second > when/if a "perf -e skb:kfree_skb" is attempted by a clueless admin :) > Okay, I'll do such a test. > If just using an integer instead of a string has a measurable impact, we probably should stick to an integer. > > kfree_skb tracing is really for experts, they probably can have a tool to understand what a particular integer value means. > > Then we can also make sure to only add new values to the end of the enum, to have stable integer values among different kernel versions. > In fact, this is exactly what I wanted to do. Users can get little information from the output of perf or ftrace for the kfree_skb event without a tools, such as dropwatch. I keep adding new values to the end of the enum. And I tried to make the enum as uapi, as user space tools need the enum to understand what the integer values mean. Hmm......that commit was rejected :) I'll do the test to see the impact between integer, string copy and __print_symbolic. Then we can decide the solutions. Thanks! Menglong Dong >> >> > trace_safe_str() is using is_kernel_rodata() which should return true >> > for drop_reasons[X] ? >> > >> > $ grep drop_reasons net/core/dropreason_str.c >> > const char * const drop_reasons[] = { >> > ... >> > >> > >> > >> > > >> > > Therefore, fix this by adding a '__string' field to the TP_STRUCT of >> > > kfree_skb, which is 'reason_str', and passing it to TP_printk(). >> > > >> > > (Not sure if we should still keep the 'reason' field in >> > > TP_STRUCT__entry) >> > >> > Maybe for event/trace filtering purposes ? >> > >> > > >> > > Reported-by: Eric Dumazet <edumazet@google.com> >> > > Signed-off-by: Menglong Dong <imagedong@tencent.com> >> > > --- >> > > include/trace/events/skb.h | 4 +++- >> > > 1 file changed, 3 insertions(+), 1 deletion(-) >> > > >> > > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h >> > > index 45264e4bb254..7235554141c3 100644 >> > > --- a/include/trace/events/skb.h >> > > +++ b/include/trace/events/skb.h >> > > @@ -24,6 +24,7 @@ TRACE_EVENT(kfree_skb, >> > > __field(void *, location) >> > > __field(unsigned short, protocol) >> > > __field(enum skb_drop_reason, reason) >> > > + __string(reason_str, drop_reasons[reason]) >> > > ), >> > > >> > > TP_fast_assign( >> > > @@ -31,11 +32,12 @@ TRACE_EVENT(kfree_skb, >> > > __entry->location = location; >> > > __entry->protocol = ntohs(skb->protocol); >> > > __entry->reason = reason; >> > > + __assign_str(reason_str, drop_reasons[reason]); >> > > ), >> > > >> > > TP_printk("skbaddr=%p protocol=%u location=%p reason: %s", >> > > __entry->skbaddr, __entry->protocol, __entry->location, >> > > - drop_reasons[__entry->reason]) >> > > + __get_str(reason_str)) >> > > ); >> > > >> > > TRACE_EVENT(consume_skb, >> > > -- >> > > 2.37.2 >> > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next] net: skb: fix kfree_skb event output error in perf 2022-08-26 15:44 ` Menglong Dong @ 2022-08-26 15:53 ` Eric Dumazet 2022-08-28 5:08 ` Menglong Dong 2022-08-28 5:21 ` Menglong Dong 0 siblings, 2 replies; 7+ messages in thread From: Eric Dumazet @ 2022-08-26 15:53 UTC (permalink / raw) To: Menglong Dong Cc: Steven Rostedt, Ingo Molnar, David Miller, Menglong Dong, David Ahern, Hao Peng, Dongli Zhang, LKML On Fri, Aug 26, 2022 at 8:44 AM Menglong Dong <menglong8.dong@gmail.com> wrote: > > On Fri, Aug 26, 2022 at 11:07 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > On Thu, Aug 25, 2022 at 9:47 PM Menglong Dong <menglong8.dong@gmail.com> wrote: > >> > >> On Thu, Aug 25, 2022 at 11:32 PM Eric Dumazet <edumazet@google.com> wrote: > >> > > >> > On Wed, Aug 24, 2022 at 10:37 PM <menglong8.dong@gmail.com> wrote: > >> > > > >> > > From: Menglong Dong <imagedong@tencent.com> > >> > > > >> > > As Eric reported, the 'reason' field is not presented when trace the > >> > > kfree_skb event by perf: > >> > > > >> > > $ perf record -e skb:kfree_skb -a sleep 10 > >> > > $ perf script > >> > > ip_defrag 14605 [021] 221.614303: skb:kfree_skb: > >> > > skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1 > >> > > reason: > >> > > > >> > > The cause seems to be passing kernel address directly to TP_printk(), > >> > > which is not right. > >> > > >> > Why ? > >> > > >> > >> I think it is because of how perf passes data to the user space. From > >> what 'perf_output_sample()' do, we can know that perf passes the data > >> of entry (with other data) to the user, and the user generates the output > >> string from the format string (which can be obtained from > >> /sys/kernel/debug/tracing/event/skb/kfree_skb/format) and the entry data. > >> > >> Therefore, perf can't get the string of drop reasons from the entry, only > >> the enum. > >> > >> > It seems this adds an expensive copy of a string that should reside in > >> > rodata section of vmlinux, thus can not disappear... > >> > (Also the ring buffer entry will have a dynamic size ?) > >> > > >> > >> It indeed will add additional cost, but it seems unavoidable. In the old > >> version, __print_symbolic() is used, which will loop all the drop reason > >> from a array and find corresponding string: > >> > >> TP_printk("skbaddr=%p protocol=%u location=%p reason: %s", > >> __entry->skbaddr, __entry->protocol, __entry->location, > >> __print_symbolic(__entry->reason, > >> TRACE_SKB_DROP_REASON)) > >> > >> And I think the cost of coping strings may be less than this loop? as the > >> drop reasons are getting more and more. > > > > > > We are back to original feedback about all this stuff. > > > > Please measure the tax on a workload dropping 5,000,000 packets per second > > when/if a "perf -e skb:kfree_skb" is attempted by a clueless admin :) > > > > Okay, I'll do such a test. > > > If just using an integer instead of a string has a measurable impact, we probably should stick to an integer. > > > > kfree_skb tracing is really for experts, they probably can have a tool to understand what a particular integer value means. > > > > Then we can also make sure to only add new values to the end of the enum, to have stable integer values among different kernel versions. > > > > In fact, this is exactly what I wanted to do. Users can get little information > from the output of perf or ftrace for the kfree_skb event without a > tools, such as dropwatch. > > I keep adding new values to the end of the enum. And I tried to > make the enum as uapi, as user space tools need the enum to > understand what the integer values mean. Hmm......that commit > was rejected :) I think that your initial proposal was to stuff __FILE__ or __LINE__ which was a no go, because they would require anyone having fresh kernel source to make any mapping. Also, we added SKB_NOT_DROPPED_YET in first position in the list. UAPI would have made this kind of change not possible. I am not suggesting to make enum skb_drop_reason UAPI, I want this to be clear :) > > I'll do the test to see the impact between integer, string copy and > __print_symbolic. Then we can decide the solutions. > > Thanks! > Menglong Dong > > >> > >> > trace_safe_str() is using is_kernel_rodata() which should return true > >> > for drop_reasons[X] ? > >> > > >> > $ grep drop_reasons net/core/dropreason_str.c > >> > const char * const drop_reasons[] = { > >> > ... > >> > > >> > > >> > > >> > > > >> > > Therefore, fix this by adding a '__string' field to the TP_STRUCT of > >> > > kfree_skb, which is 'reason_str', and passing it to TP_printk(). > >> > > > >> > > (Not sure if we should still keep the 'reason' field in > >> > > TP_STRUCT__entry) > >> > > >> > Maybe for event/trace filtering purposes ? > >> > > >> > > > >> > > Reported-by: Eric Dumazet <edumazet@google.com> > >> > > Signed-off-by: Menglong Dong <imagedong@tencent.com> > >> > > --- > >> > > include/trace/events/skb.h | 4 +++- > >> > > 1 file changed, 3 insertions(+), 1 deletion(-) > >> > > > >> > > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h > >> > > index 45264e4bb254..7235554141c3 100644 > >> > > --- a/include/trace/events/skb.h > >> > > +++ b/include/trace/events/skb.h > >> > > @@ -24,6 +24,7 @@ TRACE_EVENT(kfree_skb, > >> > > __field(void *, location) > >> > > __field(unsigned short, protocol) > >> > > __field(enum skb_drop_reason, reason) > >> > > + __string(reason_str, drop_reasons[reason]) > >> > > ), > >> > > > >> > > TP_fast_assign( > >> > > @@ -31,11 +32,12 @@ TRACE_EVENT(kfree_skb, > >> > > __entry->location = location; > >> > > __entry->protocol = ntohs(skb->protocol); > >> > > __entry->reason = reason; > >> > > + __assign_str(reason_str, drop_reasons[reason]); > >> > > ), > >> > > > >> > > TP_printk("skbaddr=%p protocol=%u location=%p reason: %s", > >> > > __entry->skbaddr, __entry->protocol, __entry->location, > >> > > - drop_reasons[__entry->reason]) > >> > > + __get_str(reason_str)) > >> > > ); > >> > > > >> > > TRACE_EVENT(consume_skb, > >> > > -- > >> > > 2.37.2 > >> > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next] net: skb: fix kfree_skb event output error in perf 2022-08-26 15:53 ` Eric Dumazet @ 2022-08-28 5:08 ` Menglong Dong 2022-08-28 5:21 ` Menglong Dong 1 sibling, 0 replies; 7+ messages in thread From: Menglong Dong @ 2022-08-28 5:08 UTC (permalink / raw) To: Eric Dumazet, Jakub Kicinski Cc: Steven Rostedt, Ingo Molnar, David Miller, Menglong Dong, David Ahern, Dongli Zhang, LKML On Fri, Aug 26, 2022 at 11:54 PM Eric Dumazet <edumazet@google.com> wrote: > > On Fri, Aug 26, 2022 at 8:44 AM Menglong Dong <menglong8.dong@gmail.com> wrote: [...] > > I think that your initial proposal was to stuff __FILE__ or __LINE__ > which was a no go, because they would require anyone having fresh > kernel source to make any mapping. > > Also, we added SKB_NOT_DROPPED_YET in first position in the list. > > UAPI would have made this kind of change not possible. > > I am not suggesting to make enum skb_drop_reason UAPI, I want this to > be clear :) > I think I finally figured out what is happening. (After a whole day of research). The old version uses TRACE_DEFINE_ENUM() to export the enum to user space through the format: cat /tracing/events/skb/kfree_skb/format name: kfree_skb ID: 1528 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:void * skbaddr; offset:8; size:8; signed:0; field:void * location; offset:16; size:8; signed:0; field:unsigned short protocol; offset:24; size:2; signed:0; field:enum skb_drop_reason reason; offset:28; size:4; signed:0; print fmt: "skbaddr=%p protocol=%u location=%p reason: %s", REC->skbaddr, REC->protocol, REC->location, __print_symbolic(REC->reason, { 1, "NOT_SPECIFIED" }, { 2, "NO_SOCKET" }, { 3, "PKT_TOO_SMALL" }, { 4, "TCP_CSUM" }, { 5, "SOCKET_FILTER" }, { 6, "UDP_CSUM" }, { 7, "NETFILTER_DROP" }, { 8, "OTHERHOST" }, { 9, "IP_CSUM" }, { 10, "IP_INHDR" }, { 11, "IP_RPFILTER" }, { 12, "UNICAST_IN_L2_MULTICAST" }, { 13, "XFRM_POLICY" }, { 14, "IP_NOPROTO" }, { 15, "SOCKET_RCVBUFF" }, { 16, "PROTO_MEM" }, { 17, "TCP_MD5NOTFOUND" }, { 18, "TCP_MD5UNEXPECTED" }, { 19, "TCP_MD5FAILURE" }, { 20, "SOCKET_BACKLOG" }, { 21, "TCP_FLAGS" }, { 22, "TCP_ZEROWINDOW" }, { 23, "TCP_OLD_DATA" }, { 24, "TCP_OVERWINDOW" }, { 25, "TCP_OFOMERGE" }, { 26, "TCP_RFC7323_PAWS" }, { 27, "TCP_INVALID_SEQUENCE" }, { 28, "TCP_RESET" }, { 29, "TCP_INVALID_SYN" }, { 30, "TCP_CLOSE" }, { 31, "TCP_FASTOPEN" }, { 32, "TCP_OLD_ACK" }, { 33, "TCP_TOO_OLD_ACK" }, { 34, "TCP_ACK_UNSENT_DATA" }, { 35, "TCP_OFO_QUEUE_PRUNE" }, { 36, "TCP_OFO_DROP" }, { 37, "IP_OUTNOROUTES" }, { 38, "BPF_CGROUP_EGRESS" }, { 39, "IPV6DISABLED" }, { 40, "NEIGH_CREATEFAIL" }, { 41, "NEIGH_FAILED" }, { 42, "NEIGH_QUEUEFULL" }, { 43, "NEIGH_DEAD" }, { 44, "TC_EGRESS" }, { 45, "QDISC_DROP" }, { 46, "CPU_BACKLOG" }, { 47, "XDP" }, { 48, "TC_INGRESS" }, { 49, "UNHANDLED_PROTO" }, { 50, "SKB_CSUM" }, { 51, "SKB_GSO_SEG" }, { 52, "SKB_UCOPY_FAULT" }, { 53, "DEV_HDR" }, { 54, "DEV_READY" }, { 55, "FULL_RING" }, { 56, "NOMEM" }, { 57, "HDR_TRUNC" }, { 58, "TAP_FILTER" }, { 59, "TAP_TXFILTER" }, { 60, "ICMP_CSUM" }, { 61, "INVALID_PROTO" }, { 62, "IP_INADDRERRORS" }, { 63, "IP_INNOROUTES" }, { 64, "PKT_TOO_BIG" }, { 65, "MAX" }) perf will convert the enum to string by the information in format. What the TRACE_DEFINE_ENUM() do can be seen in the update_event_printk(), which will replace the enum name in the format with its real value. And it seems that this is the right way to export drop reason string to users. And without this TRACE_DEFINE_ENUM(), the output will be like this: print fmt: "skbaddr=%p protocol=%u location=%p reason: %s", REC->skbaddr, REC->protocol, REC->location, __print_symbolic(REC->reason, { SKB_DROP_REASON_NOT_SPECIFIED, "NOT_SPECIFIED" } ................... ------------------------------------------------------------------------------------------------ And the string copy I committed before seems to add a lot cost. Following is the time spend that I call trace_kfree_skb() for 50,000,000 times (in ns) in a kernel module with __string() field (kfree_skb event is enabled, of course): 5129727715 5133374989 5133208716 5131911803 5122289360 5113165536 5113767180 5133562923 5112673360 5127777968 And following is the time with 'drop_reason[reason]': 4004919507 4008643292 4007113108 4004181997 4011144465 4022534656 4022078663 4002597895 4072512068 3990457695 Following is the time that I use __print_symbolic() instead: 4050504157 4049853381 4064370880 4014673377 4063128507 4058000000 4070997417 4017538272 4003874038 4009775761 Seems __print_symbolic() is more efficient than I thought. ------------------------------------------------------------------------------------------ Now I'm not sure we can still keep this 'auto-generate', as we need to define TRACE_DEFINE_ENUM. The script in 'net/core/Makefile' should be like this to generate a header, which is used to define the array 'const char * const drop_reasons[]' and TRACE_DEFINE_ENUM(): clean-files := $(srctree)/include/net/dropreason_str.h quiet_cmd_dropreason_str = GEN $@ cmd_dropreason_str = awk -F ',' 'BEGIN{ print "\#include <net/dropreason.h>\n"; \ print "\#define __DEFINE_DROP_REASON(FN, FNe) \\"; last = "" }\ /^enum skb_drop/ { dr=1; }\ /^\};/ { dr=0; }\ /^\tSKB_DROP_REASON_/ {\ if (dr) {\ if (length(last) != 0)\ printf "\tFN(%s) \\\n", last;\ last = $$1;\ sub(/\tSKB_DROP_REASON_/, "", last);\ }\ }\ END{ printf "\tFNe(%s)\n", last; }' $< > $@ $(srctree)/include/net/dropreason_str.h: $(srctree)/include/net/dropreason.h $(call cmd,dropreason_str) $(obj-y)/*: $(srctree)/include/net/dropreason_str.h Then, we can define TRACE_DEFINE_ENUM in include/trace/events/skb.h: #undef EM #define EM(name) TRACE_DEFINE_ENUM(SKB_DROP_REASON_##name); __DEFINE_DROP_REASON(EM, EM) Thanks! Menglong Dong > > > > I'll do the test to see the impact between integer, string copy and > > __print_symbolic. Then we can decide the solutions. > > > > Thanks! > > Menglong Dong > > > > >> > > >> > trace_safe_str() is using is_kernel_rodata() which should return true > > >> > for drop_reasons[X] ? > > >> > > > >> > $ grep drop_reasons net/core/dropreason_str.c > > >> > const char * const drop_reasons[] = { > > >> > ... > > >> > > > >> > > > >> > > > >> > > > > >> > > Therefore, fix this by adding a '__string' field to the TP_STRUCT of > > >> > > kfree_skb, which is 'reason_str', and passing it to TP_printk(). > > >> > > > > >> > > (Not sure if we should still keep the 'reason' field in > > >> > > TP_STRUCT__entry) > > >> > > > >> > Maybe for event/trace filtering purposes ? > > >> > > > >> > > > > >> > > Reported-by: Eric Dumazet <edumazet@google.com> > > >> > > Signed-off-by: Menglong Dong <imagedong@tencent.com> > > >> > > --- > > >> > > include/trace/events/skb.h | 4 +++- > > >> > > 1 file changed, 3 insertions(+), 1 deletion(-) > > >> > > > > >> > > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h > > >> > > index 45264e4bb254..7235554141c3 100644 > > >> > > --- a/include/trace/events/skb.h > > >> > > +++ b/include/trace/events/skb.h > > >> > > @@ -24,6 +24,7 @@ TRACE_EVENT(kfree_skb, > > >> > > __field(void *, location) > > >> > > __field(unsigned short, protocol) > > >> > > __field(enum skb_drop_reason, reason) > > >> > > + __string(reason_str, drop_reasons[reason]) > > >> > > ), > > >> > > > > >> > > TP_fast_assign( > > >> > > @@ -31,11 +32,12 @@ TRACE_EVENT(kfree_skb, > > >> > > __entry->location = location; > > >> > > __entry->protocol = ntohs(skb->protocol); > > >> > > __entry->reason = reason; > > >> > > + __assign_str(reason_str, drop_reasons[reason]); > > >> > > ), > > >> > > > > >> > > TP_printk("skbaddr=%p protocol=%u location=%p reason: %s", > > >> > > __entry->skbaddr, __entry->protocol, __entry->location, > > >> > > - drop_reasons[__entry->reason]) > > >> > > + __get_str(reason_str)) > > >> > > ); > > >> > > > > >> > > TRACE_EVENT(consume_skb, > > >> > > -- > > >> > > 2.37.2 > > >> > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next] net: skb: fix kfree_skb event output error in perf 2022-08-26 15:53 ` Eric Dumazet 2022-08-28 5:08 ` Menglong Dong @ 2022-08-28 5:21 ` Menglong Dong 1 sibling, 0 replies; 7+ messages in thread From: Menglong Dong @ 2022-08-28 5:21 UTC (permalink / raw) To: Eric Dumazet Cc: Steven Rostedt, Ingo Molnar, David Miller, Menglong Dong, David Ahern, Hao Peng, Dongli Zhang, LKML On Fri, Aug 26, 2022 at 11:54 PM Eric Dumazet <edumazet@google.com> wrote: > > On Fri, Aug 26, 2022 at 8:44 AM Menglong Dong <menglong8.dong@gmail.com> wrote: > > > > On Fri, Aug 26, 2022 at 11:07 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > > > > > On Thu, Aug 25, 2022 at 9:47 PM Menglong Dong <menglong8.dong@gmail.com> wrote: > > >> > > >> On Thu, Aug 25, 2022 at 11:32 PM Eric Dumazet <edumazet@google.com> wrote: > > >> > > > >> > On Wed, Aug 24, 2022 at 10:37 PM <menglong8.dong@gmail.com> wrote: > > >> > > > > >> > > From: Menglong Dong <imagedong@tencent.com> > > >> > > > > >> > > As Eric reported, the 'reason' field is not presented when trace the > > >> > > kfree_skb event by perf: > > >> > > > > >> > > $ perf record -e skb:kfree_skb -a sleep 10 > > >> > > $ perf script > > >> > > ip_defrag 14605 [021] 221.614303: skb:kfree_skb: > > >> > > skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1 > > >> > > reason: > > >> > > > > >> > > The cause seems to be passing kernel address directly to TP_printk(), > > >> > > which is not right. > > >> > > > >> > Why ? > > >> > > > >> > > >> I think it is because of how perf passes data to the user space. From > > >> what 'perf_output_sample()' do, we can know that perf passes the data > > >> of entry (with other data) to the user, and the user generates the output > > >> string from the format string (which can be obtained from > > >> /sys/kernel/debug/tracing/event/skb/kfree_skb/format) and the entry data. > > >> > > >> Therefore, perf can't get the string of drop reasons from the entry, only > > >> the enum. > > >> > > >> > It seems this adds an expensive copy of a string that should reside in > > >> > rodata section of vmlinux, thus can not disappear... > > >> > (Also the ring buffer entry will have a dynamic size ?) > > >> > > > >> > > >> It indeed will add additional cost, but it seems unavoidable. In the old > > >> version, __print_symbolic() is used, which will loop all the drop reason > > >> from a array and find corresponding string: > > >> > > >> TP_printk("skbaddr=%p protocol=%u location=%p reason: %s", > > >> __entry->skbaddr, __entry->protocol, __entry->location, > > >> __print_symbolic(__entry->reason, > > >> TRACE_SKB_DROP_REASON)) > > >> > > >> And I think the cost of coping strings may be less than this loop? as the > > >> drop reasons are getting more and more. > > > > > > > > > We are back to original feedback about all this stuff. > > > > > > Please measure the tax on a workload dropping 5,000,000 packets per second > > > when/if a "perf -e skb:kfree_skb" is attempted by a clueless admin :) > > > > > > > Okay, I'll do such a test. > > > > > If just using an integer instead of a string has a measurable impact, we probably should stick to an integer. > > > > > > kfree_skb tracing is really for experts, they probably can have a tool to understand what a particular integer value means. > > > > > > Then we can also make sure to only add new values to the end of the enum, to have stable integer values among different kernel versions. > > > > > > > In fact, this is exactly what I wanted to do. Users can get little information > > from the output of perf or ftrace for the kfree_skb event without a > > tools, such as dropwatch. > > > > I keep adding new values to the end of the enum. And I tried to > > make the enum as uapi, as user space tools need the enum to > > understand what the integer values mean. Hmm......that commit > > was rejected :) > > I think that your initial proposal was to stuff __FILE__ or __LINE__ > which was a no go, because they would require anyone having fresh > kernel source to make any mapping. > Almost forget to explain.......That's another buddy, who wanted to remove the drop reason and replace with __FILE__ and __LINE__, not me! My initial proposal was to create a tracepoint for snmp :) > Also, we added SKB_NOT_DROPPED_YET in first position in the list. > > UAPI would have made this kind of change not possible. > > I am not suggesting to make enum skb_drop_reason UAPI, I want this to > be clear :) > > > > > I'll do the test to see the impact between integer, string copy and > > __print_symbolic. Then we can decide the solutions. > > > > Thanks! > > Menglong Dong > > > > >> > > >> > trace_safe_str() is using is_kernel_rodata() which should return true > > >> > for drop_reasons[X] ? > > >> > > > >> > $ grep drop_reasons net/core/dropreason_str.c > > >> > const char * const drop_reasons[] = { > > >> > ... > > >> > > > >> > > > >> > > > >> > > > > >> > > Therefore, fix this by adding a '__string' field to the TP_STRUCT of > > >> > > kfree_skb, which is 'reason_str', and passing it to TP_printk(). > > >> > > > > >> > > (Not sure if we should still keep the 'reason' field in > > >> > > TP_STRUCT__entry) > > >> > > > >> > Maybe for event/trace filtering purposes ? > > >> > > > >> > > > > >> > > Reported-by: Eric Dumazet <edumazet@google.com> > > >> > > Signed-off-by: Menglong Dong <imagedong@tencent.com> > > >> > > --- > > >> > > include/trace/events/skb.h | 4 +++- > > >> > > 1 file changed, 3 insertions(+), 1 deletion(-) > > >> > > > > >> > > diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h > > >> > > index 45264e4bb254..7235554141c3 100644 > > >> > > --- a/include/trace/events/skb.h > > >> > > +++ b/include/trace/events/skb.h > > >> > > @@ -24,6 +24,7 @@ TRACE_EVENT(kfree_skb, > > >> > > __field(void *, location) > > >> > > __field(unsigned short, protocol) > > >> > > __field(enum skb_drop_reason, reason) > > >> > > + __string(reason_str, drop_reasons[reason]) > > >> > > ), > > >> > > > > >> > > TP_fast_assign( > > >> > > @@ -31,11 +32,12 @@ TRACE_EVENT(kfree_skb, > > >> > > __entry->location = location; > > >> > > __entry->protocol = ntohs(skb->protocol); > > >> > > __entry->reason = reason; > > >> > > + __assign_str(reason_str, drop_reasons[reason]); > > >> > > ), > > >> > > > > >> > > TP_printk("skbaddr=%p protocol=%u location=%p reason: %s", > > >> > > __entry->skbaddr, __entry->protocol, __entry->location, > > >> > > - drop_reasons[__entry->reason]) > > >> > > + __get_str(reason_str)) > > >> > > ); > > >> > > > > >> > > TRACE_EVENT(consume_skb, > > >> > > -- > > >> > > 2.37.2 > > >> > > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-08-28 5:22 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-25 5:37 [PATCH net-next] net: skb: fix kfree_skb event output error in perf menglong8.dong 2022-08-25 15:31 ` Eric Dumazet 2022-08-26 4:47 ` Menglong Dong [not found] ` <CANn89iKVpRMDPgiQ1O6=H1M05iXX3o7rGARa1Z6HptWnAgi_Sg@mail.gmail.com> 2022-08-26 15:44 ` Menglong Dong 2022-08-26 15:53 ` Eric Dumazet 2022-08-28 5:08 ` Menglong Dong 2022-08-28 5:21 ` Menglong Dong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).