All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
@ 2023-10-07  5:06 Yajun Deng
  2023-10-07  5:29 ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Yajun Deng @ 2023-10-07  5:06 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni
  Cc: netdev, linux-kernel, Yajun Deng, Alexander Lobakin

Although there is a kfree_skb_reason() helper function that can be used to
find the reason why this skb is dropped, but most callers didn't increase
one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.

For the users, people are more concerned about why the dropped in ip
is increasing.

Introduce netdev_core_stats_inc() for trace the caller of
dev_core_stats_*_inc(). Also, add __code to netdev_core_stats_alloc(), as
it's called unlinkly.

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
v7: use WRITE_ONCE and READ_ONCE instead of '++'
v6: merge netdev_core_stats and netdev_core_stats_inc together
v5: Access the per cpu pointer before reach the relevant offset.
v4: Introduce netdev_core_stats_inc() instead of export dev_core_stats_*_inc()
v3: __cold should be added to the netdev_core_stats_alloc().
v2: use __cold instead of inline in dev_core_stats().
v1: https://lore.kernel.org/netdev/20230911082016.3694700-1-yajun.deng@linux.dev/
---
 include/linux/netdevice.h | 21 ++++-----------------
 net/core/dev.c            | 20 ++++++++++++++++++--
 2 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e070a4540fba..11d704bfec9b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4002,32 +4002,19 @@ static __always_inline bool __is_skb_forwardable(const struct net_device *dev,
 	return false;
 }
 
-struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev);
-
-static inline struct net_device_core_stats __percpu *dev_core_stats(struct net_device *dev)
-{
-	/* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
-	struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
-
-	if (likely(p))
-		return p;
-
-	return netdev_core_stats_alloc(dev);
-}
+void netdev_core_stats_inc(struct net_device *dev, u32 offset);
 
 #define DEV_CORE_STATS_INC(FIELD)						\
 static inline void dev_core_stats_##FIELD##_inc(struct net_device *dev)		\
 {										\
-	struct net_device_core_stats __percpu *p;				\
-										\
-	p = dev_core_stats(dev);						\
-	if (p)									\
-		this_cpu_inc(p->FIELD);						\
+	netdev_core_stats_inc(dev,						\
+			offsetof(struct net_device_core_stats, FIELD));		\
 }
 DEV_CORE_STATS_INC(rx_dropped)
 DEV_CORE_STATS_INC(tx_dropped)
 DEV_CORE_STATS_INC(rx_nohandler)
 DEV_CORE_STATS_INC(rx_otherhost_dropped)
+#undef DEV_CORE_STATS_INC
 
 static __always_inline int ____dev_forward_skb(struct net_device *dev,
 					       struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 606a366cc209..a38ea90771f5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10497,7 +10497,8 @@ void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64,
 }
 EXPORT_SYMBOL(netdev_stats_to_stats64);
 
-struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev)
+static __cold struct net_device_core_stats __percpu *netdev_core_stats_alloc(
+		struct net_device *dev)
 {
 	struct net_device_core_stats __percpu *p;
 
@@ -10510,7 +10511,22 @@ struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device
 	/* This READ_ONCE() pairs with the cmpxchg() above */
 	return READ_ONCE(dev->core_stats);
 }
-EXPORT_SYMBOL(netdev_core_stats_alloc);
+
+void netdev_core_stats_inc(struct net_device *dev, u32 offset)
+{
+	/* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
+	struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
+	unsigned long *field;
+
+	if (unlikely(!p))
+		p = netdev_core_stats_alloc(dev);
+
+	if (p) {
+		field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
+		WRITE_ONCE(*field, READ_ONCE(*field) + 1);
+	}
+}
+EXPORT_SYMBOL_GPL(netdev_core_stats_inc);
 
 /**
  *	dev_get_stats	- get network device statistics
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-07  5:06 [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc() Yajun Deng
@ 2023-10-07  5:29 ` Eric Dumazet
  2023-10-07  6:34   ` Yajun Deng
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2023-10-07  5:29 UTC (permalink / raw)
  To: Yajun Deng; +Cc: davem, kuba, pabeni, netdev, linux-kernel, Alexander Lobakin

On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>
> Although there is a kfree_skb_reason() helper function that can be used to
> find the reason why this skb is dropped, but most callers didn't increase
> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.
>
...

> +
> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
> +{
> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
> +       unsigned long *field;
> +
> +       if (unlikely(!p))
> +               p = netdev_core_stats_alloc(dev);
> +
> +       if (p) {
> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);

This is broken...

As I explained earlier, dev_core_stats_xxxx(dev) can be called from
many different contexts:

1) process contexts, where preemption and migration are allowed.
2) interrupt contexts.

Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.

I _think_ I already gave you how to deal with this ?

Please try instead:

+void netdev_core_stats_inc(struct net_device *dev, u32 offset)
+{
+       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
+       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
+       unsigned long __percpu *field;
+
+       if (unlikely(!p)) {
+               p = netdev_core_stats_alloc(dev);
+               if (!p)
+                       return;
+       }
+       field = (__force unsigned long __percpu *)((__force void *)p + offset);
+       this_cpu_inc(*field);
+}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-07  5:29 ` Eric Dumazet
@ 2023-10-07  6:34   ` Yajun Deng
  2023-10-08  6:45     ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Yajun Deng @ 2023-10-07  6:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, netdev, linux-kernel, Alexander Lobakin


On 2023/10/7 13:29, Eric Dumazet wrote:
> On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>> Although there is a kfree_skb_reason() helper function that can be used to
>> find the reason why this skb is dropped, but most callers didn't increase
>> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.
>>
> ...
>
>> +
>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
>> +{
>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
>> +       unsigned long *field;
>> +
>> +       if (unlikely(!p))
>> +               p = netdev_core_stats_alloc(dev);
>> +
>> +       if (p) {
>> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
>> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
> This is broken...
>
> As I explained earlier, dev_core_stats_xxxx(dev) can be called from
> many different contexts:
>
> 1) process contexts, where preemption and migration are allowed.
> 2) interrupt contexts.
>
> Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.
>
> I _think_ I already gave you how to deal with this ?


Yes, I replied in v6.

https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a3172870d@linux.dev/

> Please try instead:
>
> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
> +{
> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
> +       unsigned long __percpu *field;
> +
> +       if (unlikely(!p)) {
> +               p = netdev_core_stats_alloc(dev);
> +               if (!p)
> +                       return;
> +       }
> +       field = (__force unsigned long __percpu *)((__force void *)p + offset);
> +       this_cpu_inc(*field);
> +}


This wouldn't trace anything even the rx_dropped is in increasing. It 
needs to add an extra operation, such as:

pr_info, ++, trace_xxx... . I don't know what's going on.

If this is adopted, I need to send two patches, one is  introduce 
netdev_core_stats_inc, another is add an tracepoint , like:


+void netdev_core_stats_inc(struct net_device *dev, u32 offset)
+{
+       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
+       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
+       unsigned long __percpu *field;
+
+       if (unlikely(!p)) {
+               p = netdev_core_stats_alloc(dev);
+               if (!p)
+                       return;
+       }
+	trace_netdev_core_stats_inc(dev, offset);
+       field = (__force unsigned long __percpu *)((__force void *)p + offset);
+       this_cpu_inc(*field);
+}


--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h

+TRACE_EVENT(netdev_core_stats_inc,
+
+       TP_PROTO(struct net_device *dev,
+                u32 offset),
+
+       TP_ARGS(dev, offset),
+
+       TP_STRUCT__entry(
+               __string(       name,           dev->name )
+               __string(       driver, netdev_drivername(dev))
+               __field(        u32,            offset          )
+       ),
+
+       TP_fast_assign(
+               __assign_str(name, dev->name);
+               __assign_str(driver, netdev_drivername(dev));
+               __entry->offset = offset;
+       ),
+
+       TP_printk("dev=%s driver=%s offset=%u",
+               __get_str(name), __get_str(driver), __entry->offset)
+);


We can trace netdev_core_stats_inc by tracepoint or kprobe.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-07  6:34   ` Yajun Deng
@ 2023-10-08  6:45     ` Eric Dumazet
  2023-10-08  6:59       ` Yajun Deng
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2023-10-08  6:45 UTC (permalink / raw)
  To: Yajun Deng; +Cc: davem, kuba, pabeni, netdev, linux-kernel, Alexander Lobakin

On Sat, Oct 7, 2023 at 8:34 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>
>
> On 2023/10/7 13:29, Eric Dumazet wrote:
> > On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng <yajun.deng@linux.dev> wrote:
> >> Although there is a kfree_skb_reason() helper function that can be used to
> >> find the reason why this skb is dropped, but most callers didn't increase
> >> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.
> >>
> > ...
> >
> >> +
> >> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
> >> +{
> >> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
> >> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
> >> +       unsigned long *field;
> >> +
> >> +       if (unlikely(!p))
> >> +               p = netdev_core_stats_alloc(dev);
> >> +
> >> +       if (p) {
> >> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
> >> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
> > This is broken...
> >
> > As I explained earlier, dev_core_stats_xxxx(dev) can be called from
> > many different contexts:
> >
> > 1) process contexts, where preemption and migration are allowed.
> > 2) interrupt contexts.
> >
> > Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.
> >
> > I _think_ I already gave you how to deal with this ?
>
>
> Yes, I replied in v6.
>
> https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a3172870d@linux.dev/
>
> > Please try instead:
> >
> > +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
> > +{
> > +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
> > +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
> > +       unsigned long __percpu *field;
> > +
> > +       if (unlikely(!p)) {
> > +               p = netdev_core_stats_alloc(dev);
> > +               if (!p)
> > +                       return;
> > +       }
> > +       field = (__force unsigned long __percpu *)((__force void *)p + offset);
> > +       this_cpu_inc(*field);
> > +}
>
>
> This wouldn't trace anything even the rx_dropped is in increasing. It
> needs to add an extra operation, such as:

I honestly do not know what you are talking about.

Have you even tried to change your patch to use

field = (__force unsigned long __percpu *)((__force void *)p + offset);
this_cpu_inc(*field);

Instead of the clearly buggy code you had instead :

    field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
     WRITE_ONCE(*field, READ_ONCE(*field) + 1);

If your v7 submission was ok for tracing what you wanted,
I fail to see why a v8 with 3 lines changed would not work.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-08  6:45     ` Eric Dumazet
@ 2023-10-08  6:59       ` Yajun Deng
  2023-10-08  7:18         ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Yajun Deng @ 2023-10-08  6:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, netdev, linux-kernel, Alexander Lobakin


On 2023/10/8 14:45, Eric Dumazet wrote:
> On Sat, Oct 7, 2023 at 8:34 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>
>> On 2023/10/7 13:29, Eric Dumazet wrote:
>>> On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>> Although there is a kfree_skb_reason() helper function that can be used to
>>>> find the reason why this skb is dropped, but most callers didn't increase
>>>> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.
>>>>
>>> ...
>>>
>>>> +
>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
>>>> +{
>>>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
>>>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
>>>> +       unsigned long *field;
>>>> +
>>>> +       if (unlikely(!p))
>>>> +               p = netdev_core_stats_alloc(dev);
>>>> +
>>>> +       if (p) {
>>>> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
>>>> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
>>> This is broken...
>>>
>>> As I explained earlier, dev_core_stats_xxxx(dev) can be called from
>>> many different contexts:
>>>
>>> 1) process contexts, where preemption and migration are allowed.
>>> 2) interrupt contexts.
>>>
>>> Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.
>>>
>>> I _think_ I already gave you how to deal with this ?
>>
>> Yes, I replied in v6.
>>
>> https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a3172870d@linux.dev/
>>
>>> Please try instead:
>>>
>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
>>> +{
>>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
>>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
>>> +       unsigned long __percpu *field;
>>> +
>>> +       if (unlikely(!p)) {
>>> +               p = netdev_core_stats_alloc(dev);
>>> +               if (!p)
>>> +                       return;
>>> +       }
>>> +       field = (__force unsigned long __percpu *)((__force void *)p + offset);
>>> +       this_cpu_inc(*field);
>>> +}
>>
>> This wouldn't trace anything even the rx_dropped is in increasing. It
>> needs to add an extra operation, such as:
> I honestly do not know what you are talking about.
>
> Have you even tried to change your patch to use
>
> field = (__force unsigned long __percpu *)((__force void *)p + offset);
> this_cpu_inc(*field);


Yes, I tested this code. But the following couldn't show anything even 
if the rx_dropped is increasing.

'sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc'

It needs to add anything else. The above command will show correctly.

>
> Instead of the clearly buggy code you had instead :
>
>      field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
>       WRITE_ONCE(*field, READ_ONCE(*field) + 1);
>
> If your v7 submission was ok for tracing what you wanted,
> I fail to see why a v8 with 3 lines changed would not work.


Me too.

If I add a pr_info in your code, the kprobe will be ok.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-08  6:59       ` Yajun Deng
@ 2023-10-08  7:18         ` Eric Dumazet
  2023-10-08  8:44           ` Yajun Deng
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2023-10-08  7:18 UTC (permalink / raw)
  To: Yajun Deng; +Cc: davem, kuba, pabeni, netdev, linux-kernel, Alexander Lobakin

On Sun, Oct 8, 2023 at 9:00 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>
>
> On 2023/10/8 14:45, Eric Dumazet wrote:
> > On Sat, Oct 7, 2023 at 8:34 AM Yajun Deng <yajun.deng@linux.dev> wrote:
> >>
> >> On 2023/10/7 13:29, Eric Dumazet wrote:
> >>> On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng <yajun.deng@linux.dev> wrote:
> >>>> Although there is a kfree_skb_reason() helper function that can be used to
> >>>> find the reason why this skb is dropped, but most callers didn't increase
> >>>> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.
> >>>>
> >>> ...
> >>>
> >>>> +
> >>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
> >>>> +{
> >>>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
> >>>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
> >>>> +       unsigned long *field;
> >>>> +
> >>>> +       if (unlikely(!p))
> >>>> +               p = netdev_core_stats_alloc(dev);
> >>>> +
> >>>> +       if (p) {
> >>>> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
> >>>> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
> >>> This is broken...
> >>>
> >>> As I explained earlier, dev_core_stats_xxxx(dev) can be called from
> >>> many different contexts:
> >>>
> >>> 1) process contexts, where preemption and migration are allowed.
> >>> 2) interrupt contexts.
> >>>
> >>> Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.
> >>>
> >>> I _think_ I already gave you how to deal with this ?
> >>
> >> Yes, I replied in v6.
> >>
> >> https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a3172870d@linux.dev/
> >>
> >>> Please try instead:
> >>>
> >>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
> >>> +{
> >>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
> >>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
> >>> +       unsigned long __percpu *field;
> >>> +
> >>> +       if (unlikely(!p)) {
> >>> +               p = netdev_core_stats_alloc(dev);
> >>> +               if (!p)
> >>> +                       return;
> >>> +       }
> >>> +       field = (__force unsigned long __percpu *)((__force void *)p + offset);
> >>> +       this_cpu_inc(*field);
> >>> +}
> >>
> >> This wouldn't trace anything even the rx_dropped is in increasing. It
> >> needs to add an extra operation, such as:
> > I honestly do not know what you are talking about.
> >
> > Have you even tried to change your patch to use
> >
> > field = (__force unsigned long __percpu *)((__force void *)p + offset);
> > this_cpu_inc(*field);
>
>
> Yes, I tested this code. But the following couldn't show anything even
> if the rx_dropped is increasing.
>
> 'sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc'

Well, I am not sure about this, "bpftrace" worked for me.

Make sure your toolchain generates something that looks like what I got:

000000000000ef20 <netdev_core_stats_inc>:
    ef20: f3 0f 1e fa          endbr64
    ef24: e8 00 00 00 00        call   ef29 <netdev_core_stats_inc+0x9>
ef25: R_X86_64_PLT32 __fentry__-0x4
    ef29: 55                    push   %rbp
    ef2a: 48 89 e5              mov    %rsp,%rbp
    ef2d: 53                    push   %rbx
    ef2e: 89 f3                mov    %esi,%ebx
    ef30: 48 8b 87 f0 01 00 00 mov    0x1f0(%rdi),%rax
    ef37: 48 85 c0              test   %rax,%rax
    ef3a: 74 0b                je     ef47 <netdev_core_stats_inc+0x27>
    ef3c: 89 d9                mov    %ebx,%ecx
    ef3e: 65 48 ff 04 08        incq   %gs:(%rax,%rcx,1)
    ef43: 5b                    pop    %rbx
    ef44: 5d                    pop    %rbp
    ef45: c3                    ret
    ef46: cc                    int3
    ef47: e8 00 00 00 00        call   ef4c <netdev_core_stats_inc+0x2c>
ef48: R_X86_64_PLT32 .text.unlikely.+0x13c
    ef4c: 48 85 c0              test   %rax,%rax
    ef4f: 75 eb                jne    ef3c <netdev_core_stats_inc+0x1c>
    ef51: eb f0                jmp    ef43 <netdev_core_stats_inc+0x23>
    ef53: 66 66 66 66 2e 0f 1f data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
    ef5a: 84 00 00 00 00 00

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-08  7:18         ` Eric Dumazet
@ 2023-10-08  8:44           ` Yajun Deng
  2023-10-08  8:53             ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Yajun Deng @ 2023-10-08  8:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, netdev, linux-kernel, Alexander Lobakin


On 2023/10/8 15:18, Eric Dumazet wrote:
> On Sun, Oct 8, 2023 at 9:00 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>
>> On 2023/10/8 14:45, Eric Dumazet wrote:
>>> On Sat, Oct 7, 2023 at 8:34 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>> On 2023/10/7 13:29, Eric Dumazet wrote:
>>>>> On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>>>> Although there is a kfree_skb_reason() helper function that can be used to
>>>>>> find the reason why this skb is dropped, but most callers didn't increase
>>>>>> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.
>>>>>>
>>>>> ...
>>>>>
>>>>>> +
>>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
>>>>>> +{
>>>>>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
>>>>>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
>>>>>> +       unsigned long *field;
>>>>>> +
>>>>>> +       if (unlikely(!p))
>>>>>> +               p = netdev_core_stats_alloc(dev);
>>>>>> +
>>>>>> +       if (p) {
>>>>>> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
>>>>>> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
>>>>> This is broken...
>>>>>
>>>>> As I explained earlier, dev_core_stats_xxxx(dev) can be called from
>>>>> many different contexts:
>>>>>
>>>>> 1) process contexts, where preemption and migration are allowed.
>>>>> 2) interrupt contexts.
>>>>>
>>>>> Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.
>>>>>
>>>>> I _think_ I already gave you how to deal with this ?
>>>> Yes, I replied in v6.
>>>>
>>>> https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a3172870d@linux.dev/
>>>>
>>>>> Please try instead:
>>>>>
>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
>>>>> +{
>>>>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
>>>>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
>>>>> +       unsigned long __percpu *field;
>>>>> +
>>>>> +       if (unlikely(!p)) {
>>>>> +               p = netdev_core_stats_alloc(dev);
>>>>> +               if (!p)
>>>>> +                       return;
>>>>> +       }
>>>>> +       field = (__force unsigned long __percpu *)((__force void *)p + offset);
>>>>> +       this_cpu_inc(*field);
>>>>> +}
>>>> This wouldn't trace anything even the rx_dropped is in increasing. It
>>>> needs to add an extra operation, such as:
>>> I honestly do not know what you are talking about.
>>>
>>> Have you even tried to change your patch to use
>>>
>>> field = (__force unsigned long __percpu *)((__force void *)p + offset);
>>> this_cpu_inc(*field);
>>
>> Yes, I tested this code. But the following couldn't show anything even
>> if the rx_dropped is increasing.
>>
>> 'sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc'
> Well, I am not sure about this, "bpftrace" worked for me.
>
> Make sure your toolchain generates something that looks like what I got:
>
> 000000000000ef20 <netdev_core_stats_inc>:
>      ef20: f3 0f 1e fa          endbr64
>      ef24: e8 00 00 00 00        call   ef29 <netdev_core_stats_inc+0x9>
> ef25: R_X86_64_PLT32 __fentry__-0x4
>      ef29: 55                    push   %rbp
>      ef2a: 48 89 e5              mov    %rsp,%rbp
>      ef2d: 53                    push   %rbx
>      ef2e: 89 f3                mov    %esi,%ebx
>      ef30: 48 8b 87 f0 01 00 00 mov    0x1f0(%rdi),%rax
>      ef37: 48 85 c0              test   %rax,%rax
>      ef3a: 74 0b                je     ef47 <netdev_core_stats_inc+0x27>
>      ef3c: 89 d9                mov    %ebx,%ecx
>      ef3e: 65 48 ff 04 08        incq   %gs:(%rax,%rcx,1)
>      ef43: 5b                    pop    %rbx
>      ef44: 5d                    pop    %rbp
>      ef45: c3                    ret
>      ef46: cc                    int3
>      ef47: e8 00 00 00 00        call   ef4c <netdev_core_stats_inc+0x2c>
> ef48: R_X86_64_PLT32 .text.unlikely.+0x13c
>      ef4c: 48 85 c0              test   %rax,%rax
>      ef4f: 75 eb                jne    ef3c <netdev_core_stats_inc+0x1c>
>      ef51: eb f0                jmp    ef43 <netdev_core_stats_inc+0x23>
>      ef53: 66 66 66 66 2e 0f 1f data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
>      ef5a: 84 00 00 00 00 00


I'll share some I can see it.

1.

objdump -D vmlinux

ffffffff81b2f170 <netdev_core_stats_inc>:
ffffffff81b2f170:    e8 8b ea 55 ff           callq ffffffff8108dc00 
<__fentry__>
ffffffff81b2f175:    55                       push   %rbp
ffffffff81b2f176:    48 89 e5                 mov    %rsp,%rbp
ffffffff81b2f179:    48 83 ec 08              sub    $0x8,%rsp
ffffffff81b2f17d:    48 8b 87 e8 01 00 00     mov 0x1e8(%rdi),%rax
ffffffff81b2f184:    48 85 c0                 test   %rax,%rax
ffffffff81b2f187:    74 0d                    je ffffffff81b2f196 
<netdev_core_stats_inc+0x26>
ffffffff81b2f189:    89 f6                    mov    %esi,%esi
ffffffff81b2f18b:    65 48 ff 04 30           incq %gs:(%rax,%rsi,1)
ffffffff81b2f190:    c9                       leaveq
ffffffff81b2f191:    e9 aa 31 6d 00           jmpq ffffffff82202340 
<__x86_return_thunk>
ffffffff81b2f196:    89 75 fc                 mov %esi,-0x4(%rbp)
ffffffff81b2f199:    e8 82 ff ff ff           callq ffffffff81b2f120 
<netdev_core_stats_alloc>
ffffffff81b2f19e:    8b 75 fc                 mov -0x4(%rbp),%esi
ffffffff81b2f1a1:    48 85 c0                 test   %rax,%rax
ffffffff81b2f1a4:    75 e3                    jne ffffffff81b2f189 
<netdev_core_stats_inc+0x19>
ffffffff81b2f1a6:    c9                       leaveq
ffffffff81b2f1a7:    e9 94 31 6d 00           jmpq ffffffff82202340 
<__x86_return_thunk>
ffffffff81b2f1ac:    0f 1f 40 00              nopl   0x0(%rax)


2.

sudo cat /proc/kallsyms | grep netdev_core_stats_inc

ffffffff9c72f120 T netdev_core_stats_inc
ffffffff9ca2676c t netdev_core_stats_inc.cold
ffffffff9d5235e0 r __ksymtab_netdev_core_stats_inc


3.

➜  ~ ifconfig enp34s0f0
enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 10.10.30.88  netmask 255.255.255.0  broadcast 10.10.30.255
         inet6 fe80::6037:806c:14b6:f1ca  prefixlen 64  scopeid 0x20<link>
         ether 04:d4:c4:5c:81:42  txqueuelen 1000  (Ethernet)
         RX packets 29024  bytes 3118278 (3.1 MB)
         RX errors 0  dropped 794  overruns 0  frame 0
         TX packets 16961  bytes 2662290 (2.6 MB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
         device interrupt 29  memory 0x39fff4000000-39fff47fffff

➜  ~ ifconfig enp34s0f0
enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 10.10.30.88  netmask 255.255.255.0  broadcast 10.10.30.255
         inet6 fe80::6037:806c:14b6:f1ca  prefixlen 64  scopeid 0x20<link>
         ether 04:d4:c4:5c:81:42  txqueuelen 1000  (Ethernet)
         RX packets 29272  bytes 3148997 (3.1 MB)
         RX errors 0  dropped 798  overruns 0  frame 0
         TX packets 17098  bytes 2683547 (2.6 MB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
         device interrupt 29  memory 0x39fff4000000-39fff47fffff


The rx_dropped is increasing.


4.

sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc

TIME     PID     TID     COMM            FUNC

(Empty, I didn't see anything.)


5.

sudo trace-cmd record -p function -l netdev_core_stats_inc

sudo trace-cmd report

(Empty, I didn't see anything.)


If I add a 'pr_info("\n");'   like:

+      pr_info("\n");
         field = (__force unsigned long __percpu *)((__force void *)p + 
offset);
         this_cpu_inc(*field);


Everything is OK. The 'pr_info("\n");' can be changed to anything else, 
but not

without it.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-08  8:44           ` Yajun Deng
@ 2023-10-08  8:53             ` Eric Dumazet
  2023-10-08  9:12               ` Yajun Deng
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2023-10-08  8:53 UTC (permalink / raw)
  To: Yajun Deng; +Cc: davem, kuba, pabeni, netdev, linux-kernel, Alexander Lobakin

On Sun, Oct 8, 2023 at 10:44 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>
>
> On 2023/10/8 15:18, Eric Dumazet wrote:
> > On Sun, Oct 8, 2023 at 9:00 AM Yajun Deng <yajun.deng@linux.dev> wrote:
> >>
> >> On 2023/10/8 14:45, Eric Dumazet wrote:
> >>> On Sat, Oct 7, 2023 at 8:34 AM Yajun Deng <yajun.deng@linux.dev> wrote:
> >>>> On 2023/10/7 13:29, Eric Dumazet wrote:
> >>>>> On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng <yajun.deng@linux.dev> wrote:
> >>>>>> Although there is a kfree_skb_reason() helper function that can be used to
> >>>>>> find the reason why this skb is dropped, but most callers didn't increase
> >>>>>> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.
> >>>>>>
> >>>>> ...
> >>>>>
> >>>>>> +
> >>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
> >>>>>> +{
> >>>>>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
> >>>>>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
> >>>>>> +       unsigned long *field;
> >>>>>> +
> >>>>>> +       if (unlikely(!p))
> >>>>>> +               p = netdev_core_stats_alloc(dev);
> >>>>>> +
> >>>>>> +       if (p) {
> >>>>>> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
> >>>>>> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
> >>>>> This is broken...
> >>>>>
> >>>>> As I explained earlier, dev_core_stats_xxxx(dev) can be called from
> >>>>> many different contexts:
> >>>>>
> >>>>> 1) process contexts, where preemption and migration are allowed.
> >>>>> 2) interrupt contexts.
> >>>>>
> >>>>> Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.
> >>>>>
> >>>>> I _think_ I already gave you how to deal with this ?
> >>>> Yes, I replied in v6.
> >>>>
> >>>> https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a3172870d@linux.dev/
> >>>>
> >>>>> Please try instead:
> >>>>>
> >>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
> >>>>> +{
> >>>>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
> >>>>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
> >>>>> +       unsigned long __percpu *field;
> >>>>> +
> >>>>> +       if (unlikely(!p)) {
> >>>>> +               p = netdev_core_stats_alloc(dev);
> >>>>> +               if (!p)
> >>>>> +                       return;
> >>>>> +       }
> >>>>> +       field = (__force unsigned long __percpu *)((__force void *)p + offset);
> >>>>> +       this_cpu_inc(*field);
> >>>>> +}
> >>>> This wouldn't trace anything even the rx_dropped is in increasing. It
> >>>> needs to add an extra operation, such as:
> >>> I honestly do not know what you are talking about.
> >>>
> >>> Have you even tried to change your patch to use
> >>>
> >>> field = (__force unsigned long __percpu *)((__force void *)p + offset);
> >>> this_cpu_inc(*field);
> >>
> >> Yes, I tested this code. But the following couldn't show anything even
> >> if the rx_dropped is increasing.
> >>
> >> 'sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc'
> > Well, I am not sure about this, "bpftrace" worked for me.
> >
> > Make sure your toolchain generates something that looks like what I got:
> >
> > 000000000000ef20 <netdev_core_stats_inc>:
> >      ef20: f3 0f 1e fa          endbr64
> >      ef24: e8 00 00 00 00        call   ef29 <netdev_core_stats_inc+0x9>
> > ef25: R_X86_64_PLT32 __fentry__-0x4
> >      ef29: 55                    push   %rbp
> >      ef2a: 48 89 e5              mov    %rsp,%rbp
> >      ef2d: 53                    push   %rbx
> >      ef2e: 89 f3                mov    %esi,%ebx
> >      ef30: 48 8b 87 f0 01 00 00 mov    0x1f0(%rdi),%rax
> >      ef37: 48 85 c0              test   %rax,%rax
> >      ef3a: 74 0b                je     ef47 <netdev_core_stats_inc+0x27>
> >      ef3c: 89 d9                mov    %ebx,%ecx
> >      ef3e: 65 48 ff 04 08        incq   %gs:(%rax,%rcx,1)
> >      ef43: 5b                    pop    %rbx
> >      ef44: 5d                    pop    %rbp
> >      ef45: c3                    ret
> >      ef46: cc                    int3
> >      ef47: e8 00 00 00 00        call   ef4c <netdev_core_stats_inc+0x2c>
> > ef48: R_X86_64_PLT32 .text.unlikely.+0x13c
> >      ef4c: 48 85 c0              test   %rax,%rax
> >      ef4f: 75 eb                jne    ef3c <netdev_core_stats_inc+0x1c>
> >      ef51: eb f0                jmp    ef43 <netdev_core_stats_inc+0x23>
> >      ef53: 66 66 66 66 2e 0f 1f data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
> >      ef5a: 84 00 00 00 00 00
>
>
> I'll share some I can see it.
>
> 1.
>
> objdump -D vmlinux
>
> ffffffff81b2f170 <netdev_core_stats_inc>:
> ffffffff81b2f170:    e8 8b ea 55 ff           callq ffffffff8108dc00
> <__fentry__>
> ffffffff81b2f175:    55                       push   %rbp
> ffffffff81b2f176:    48 89 e5                 mov    %rsp,%rbp
> ffffffff81b2f179:    48 83 ec 08              sub    $0x8,%rsp
> ffffffff81b2f17d:    48 8b 87 e8 01 00 00     mov 0x1e8(%rdi),%rax
> ffffffff81b2f184:    48 85 c0                 test   %rax,%rax
> ffffffff81b2f187:    74 0d                    je ffffffff81b2f196
> <netdev_core_stats_inc+0x26>
> ffffffff81b2f189:    89 f6                    mov    %esi,%esi
> ffffffff81b2f18b:    65 48 ff 04 30           incq %gs:(%rax,%rsi,1)
> ffffffff81b2f190:    c9                       leaveq
> ffffffff81b2f191:    e9 aa 31 6d 00           jmpq ffffffff82202340
> <__x86_return_thunk>
> ffffffff81b2f196:    89 75 fc                 mov %esi,-0x4(%rbp)
> ffffffff81b2f199:    e8 82 ff ff ff           callq ffffffff81b2f120
> <netdev_core_stats_alloc>
> ffffffff81b2f19e:    8b 75 fc                 mov -0x4(%rbp),%esi
> ffffffff81b2f1a1:    48 85 c0                 test   %rax,%rax
> ffffffff81b2f1a4:    75 e3                    jne ffffffff81b2f189
> <netdev_core_stats_inc+0x19>
> ffffffff81b2f1a6:    c9                       leaveq
> ffffffff81b2f1a7:    e9 94 31 6d 00           jmpq ffffffff82202340
> <__x86_return_thunk>
> ffffffff81b2f1ac:    0f 1f 40 00              nopl   0x0(%rax)
>
>
> 2.
>
> sudo cat /proc/kallsyms | grep netdev_core_stats_inc
>
> ffffffff9c72f120 T netdev_core_stats_inc
> ffffffff9ca2676c t netdev_core_stats_inc.cold
> ffffffff9d5235e0 r __ksymtab_netdev_core_stats_inc
>
>
> 3.
>
> ➜  ~ ifconfig enp34s0f0
> enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>          inet 10.10.30.88  netmask 255.255.255.0  broadcast 10.10.30.255
>          inet6 fe80::6037:806c:14b6:f1ca  prefixlen 64  scopeid 0x20<link>
>          ether 04:d4:c4:5c:81:42  txqueuelen 1000  (Ethernet)
>          RX packets 29024  bytes 3118278 (3.1 MB)
>          RX errors 0  dropped 794  overruns 0  frame 0
>          TX packets 16961  bytes 2662290 (2.6 MB)
>          TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>          device interrupt 29  memory 0x39fff4000000-39fff47fffff
>
> ➜  ~ ifconfig enp34s0f0
> enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>          inet 10.10.30.88  netmask 255.255.255.0  broadcast 10.10.30.255
>          inet6 fe80::6037:806c:14b6:f1ca  prefixlen 64  scopeid 0x20<link>
>          ether 04:d4:c4:5c:81:42  txqueuelen 1000  (Ethernet)
>          RX packets 29272  bytes 3148997 (3.1 MB)
>          RX errors 0  dropped 798  overruns 0  frame 0
>          TX packets 17098  bytes 2683547 (2.6 MB)
>          TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>          device interrupt 29  memory 0x39fff4000000-39fff47fffff
>
>
> The rx_dropped is increasing.
>
>
> 4.
>
> sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc
>
> TIME     PID     TID     COMM            FUNC
>
> (Empty, I didn't see anything.)
>
>
> 5.
>
> sudo trace-cmd record -p function -l netdev_core_stats_inc
>
> sudo trace-cmd report
>
> (Empty, I didn't see anything.)
>
>
> If I add a 'pr_info("\n");'   like:
>
> +      pr_info("\n");
>          field = (__force unsigned long __percpu *)((__force void *)p +
> offset);
>          this_cpu_inc(*field);
>
>
> Everything is OK. The 'pr_info("\n");' can be changed to anything else,
> but not
>
> without it.

This seems to be a bug that has nothing to do with the patch.

Try getting help from Steven maybe.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-08  8:53             ` Eric Dumazet
@ 2023-10-08  9:12               ` Yajun Deng
  2023-10-09  3:07                 ` Yajun Deng
  0 siblings, 1 reply; 20+ messages in thread
From: Yajun Deng @ 2023-10-08  9:12 UTC (permalink / raw)
  To: Eric Dumazet, rostedt
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Alexander Lobakin


On 2023/10/8 16:53, Eric Dumazet wrote:
> On Sun, Oct 8, 2023 at 10:44 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>
>> On 2023/10/8 15:18, Eric Dumazet wrote:
>>> On Sun, Oct 8, 2023 at 9:00 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>> On 2023/10/8 14:45, Eric Dumazet wrote:
>>>>> On Sat, Oct 7, 2023 at 8:34 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>>>> On 2023/10/7 13:29, Eric Dumazet wrote:
>>>>>>> On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>>>>>> Although there is a kfree_skb_reason() helper function that can be used to
>>>>>>>> find the reason why this skb is dropped, but most callers didn't increase
>>>>>>>> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.
>>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>> +
>>>>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
>>>>>>>> +{
>>>>>>>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
>>>>>>>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
>>>>>>>> +       unsigned long *field;
>>>>>>>> +
>>>>>>>> +       if (unlikely(!p))
>>>>>>>> +               p = netdev_core_stats_alloc(dev);
>>>>>>>> +
>>>>>>>> +       if (p) {
>>>>>>>> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
>>>>>>>> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
>>>>>>> This is broken...
>>>>>>>
>>>>>>> As I explained earlier, dev_core_stats_xxxx(dev) can be called from
>>>>>>> many different contexts:
>>>>>>>
>>>>>>> 1) process contexts, where preemption and migration are allowed.
>>>>>>> 2) interrupt contexts.
>>>>>>>
>>>>>>> Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.
>>>>>>>
>>>>>>> I _think_ I already gave you how to deal with this ?
>>>>>> Yes, I replied in v6.
>>>>>>
>>>>>> https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a3172870d@linux.dev/
>>>>>>
>>>>>>> Please try instead:
>>>>>>>
>>>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
>>>>>>> +{
>>>>>>> +       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
>>>>>>> +       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
>>>>>>> +       unsigned long __percpu *field;
>>>>>>> +
>>>>>>> +       if (unlikely(!p)) {
>>>>>>> +               p = netdev_core_stats_alloc(dev);
>>>>>>> +               if (!p)
>>>>>>> +                       return;
>>>>>>> +       }
>>>>>>> +       field = (__force unsigned long __percpu *)((__force void *)p + offset);
>>>>>>> +       this_cpu_inc(*field);
>>>>>>> +}
>>>>>> This wouldn't trace anything even the rx_dropped is in increasing. It
>>>>>> needs to add an extra operation, such as:
>>>>> I honestly do not know what you are talking about.
>>>>>
>>>>> Have you even tried to change your patch to use
>>>>>
>>>>> field = (__force unsigned long __percpu *)((__force void *)p + offset);
>>>>> this_cpu_inc(*field);
>>>> Yes, I tested this code. But the following couldn't show anything even
>>>> if the rx_dropped is increasing.
>>>>
>>>> 'sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc'
>>> Well, I am not sure about this, "bpftrace" worked for me.
>>>
>>> Make sure your toolchain generates something that looks like what I got:
>>>
>>> 000000000000ef20 <netdev_core_stats_inc>:
>>>       ef20: f3 0f 1e fa          endbr64
>>>       ef24: e8 00 00 00 00        call   ef29 <netdev_core_stats_inc+0x9>
>>> ef25: R_X86_64_PLT32 __fentry__-0x4
>>>       ef29: 55                    push   %rbp
>>>       ef2a: 48 89 e5              mov    %rsp,%rbp
>>>       ef2d: 53                    push   %rbx
>>>       ef2e: 89 f3                mov    %esi,%ebx
>>>       ef30: 48 8b 87 f0 01 00 00 mov    0x1f0(%rdi),%rax
>>>       ef37: 48 85 c0              test   %rax,%rax
>>>       ef3a: 74 0b                je     ef47 <netdev_core_stats_inc+0x27>
>>>       ef3c: 89 d9                mov    %ebx,%ecx
>>>       ef3e: 65 48 ff 04 08        incq   %gs:(%rax,%rcx,1)
>>>       ef43: 5b                    pop    %rbx
>>>       ef44: 5d                    pop    %rbp
>>>       ef45: c3                    ret
>>>       ef46: cc                    int3
>>>       ef47: e8 00 00 00 00        call   ef4c <netdev_core_stats_inc+0x2c>
>>> ef48: R_X86_64_PLT32 .text.unlikely.+0x13c
>>>       ef4c: 48 85 c0              test   %rax,%rax
>>>       ef4f: 75 eb                jne    ef3c <netdev_core_stats_inc+0x1c>
>>>       ef51: eb f0                jmp    ef43 <netdev_core_stats_inc+0x23>
>>>       ef53: 66 66 66 66 2e 0f 1f data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
>>>       ef5a: 84 00 00 00 00 00
>>
>> I'll share some I can see it.
>>
>> 1.
>>
>> objdump -D vmlinux
>>
>> ffffffff81b2f170 <netdev_core_stats_inc>:
>> ffffffff81b2f170:    e8 8b ea 55 ff           callq ffffffff8108dc00
>> <__fentry__>
>> ffffffff81b2f175:    55                       push   %rbp
>> ffffffff81b2f176:    48 89 e5                 mov    %rsp,%rbp
>> ffffffff81b2f179:    48 83 ec 08              sub    $0x8,%rsp
>> ffffffff81b2f17d:    48 8b 87 e8 01 00 00     mov 0x1e8(%rdi),%rax
>> ffffffff81b2f184:    48 85 c0                 test   %rax,%rax
>> ffffffff81b2f187:    74 0d                    je ffffffff81b2f196
>> <netdev_core_stats_inc+0x26>
>> ffffffff81b2f189:    89 f6                    mov    %esi,%esi
>> ffffffff81b2f18b:    65 48 ff 04 30           incq %gs:(%rax,%rsi,1)
>> ffffffff81b2f190:    c9                       leaveq
>> ffffffff81b2f191:    e9 aa 31 6d 00           jmpq ffffffff82202340
>> <__x86_return_thunk>
>> ffffffff81b2f196:    89 75 fc                 mov %esi,-0x4(%rbp)
>> ffffffff81b2f199:    e8 82 ff ff ff           callq ffffffff81b2f120
>> <netdev_core_stats_alloc>
>> ffffffff81b2f19e:    8b 75 fc                 mov -0x4(%rbp),%esi
>> ffffffff81b2f1a1:    48 85 c0                 test   %rax,%rax
>> ffffffff81b2f1a4:    75 e3                    jne ffffffff81b2f189
>> <netdev_core_stats_inc+0x19>
>> ffffffff81b2f1a6:    c9                       leaveq
>> ffffffff81b2f1a7:    e9 94 31 6d 00           jmpq ffffffff82202340
>> <__x86_return_thunk>
>> ffffffff81b2f1ac:    0f 1f 40 00              nopl   0x0(%rax)
>>
>>
>> 2.
>>
>> sudo cat /proc/kallsyms | grep netdev_core_stats_inc
>>
>> ffffffff9c72f120 T netdev_core_stats_inc
>> ffffffff9ca2676c t netdev_core_stats_inc.cold
>> ffffffff9d5235e0 r __ksymtab_netdev_core_stats_inc
>>
>>
>> 3.
>>
>> ➜  ~ ifconfig enp34s0f0
>> enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>>           inet 10.10.30.88  netmask 255.255.255.0  broadcast 10.10.30.255
>>           inet6 fe80::6037:806c:14b6:f1ca  prefixlen 64  scopeid 0x20<link>
>>           ether 04:d4:c4:5c:81:42  txqueuelen 1000  (Ethernet)
>>           RX packets 29024  bytes 3118278 (3.1 MB)
>>           RX errors 0  dropped 794  overruns 0  frame 0
>>           TX packets 16961  bytes 2662290 (2.6 MB)
>>           TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>           device interrupt 29  memory 0x39fff4000000-39fff47fffff
>>
>> ➜  ~ ifconfig enp34s0f0
>> enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>>           inet 10.10.30.88  netmask 255.255.255.0  broadcast 10.10.30.255
>>           inet6 fe80::6037:806c:14b6:f1ca  prefixlen 64  scopeid 0x20<link>
>>           ether 04:d4:c4:5c:81:42  txqueuelen 1000  (Ethernet)
>>           RX packets 29272  bytes 3148997 (3.1 MB)
>>           RX errors 0  dropped 798  overruns 0  frame 0
>>           TX packets 17098  bytes 2683547 (2.6 MB)
>>           TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>           device interrupt 29  memory 0x39fff4000000-39fff47fffff
>>
>>
>> The rx_dropped is increasing.
>>
>>
>> 4.
>>
>> sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc
>>
>> TIME     PID     TID     COMM            FUNC
>>
>> (Empty, I didn't see anything.)
>>
>>
>> 5.
>>
>> sudo trace-cmd record -p function -l netdev_core_stats_inc
>>
>> sudo trace-cmd report
>>
>> (Empty, I didn't see anything.)
>>
>>
>> If I add a 'pr_info("\n");'   like:
>>
>> +      pr_info("\n");
>>           field = (__force unsigned long __percpu *)((__force void *)p +
>> offset);
>>           this_cpu_inc(*field);
>>
>>
>> Everything is OK. The 'pr_info("\n");' can be changed to anything else,
>> but not
>>
>> without it.
> This seems to be a bug that has nothing to do with the patch.
>
> Try getting help from Steven maybe.


Hi Steven,

Need your help.

1. The following code wouldn't trace anything by the command 'sudo 
python3 /usr/share/bcc/tools/trace netdev_core_stats_inc'

+void netdev_core_stats_inc(struct net_device *dev, u32 offset)
+{
+       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
+       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
+       unsigned long __percpu *field;
+
+       if (unlikely(!p)) {
+               p = netdev_core_stats_alloc(dev);
+               if (!p)
+                       return;
+       }
+       field = (__force unsigned long __percpu *)((__force void *)p + offset);
+       this_cpu_inc(*field);
+}
+EXPORT_SYMBOL_GPL(netdev_core_stats_inc);

2. If I add a 'pr_info("\n");', it would be fine. The 'pr_info("\n");' can be changed to others.

+void netdev_core_stats_inc(struct net_device *dev, u32 offset)
+{
+       /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */
+       struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats);
+       unsigned long __percpu *field;
+
+       if (unlikely(!p)) {
+               p = netdev_core_stats_alloc(dev);
+               if (!p)
+                       return;
+       }
+	pr_info("\n");
+       field = (__force unsigned long __percpu *)((__force void *)p + offset);
+       this_cpu_inc(*field);
+}
+EXPORT_SYMBOL_GPL(netdev_core_stats_inc);

I don't know why we need to add something in netdev_core_stats_inc, the trace will be fine.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-08  9:12               ` Yajun Deng
@ 2023-10-09  3:07                 ` Yajun Deng
  2023-10-09  7:53                   ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Yajun Deng @ 2023-10-09  3:07 UTC (permalink / raw)
  To: Eric Dumazet, rostedt, mhiramat, dennis, tj, cl, mark.rutland
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Alexander Lobakin,
	linux-trace-kernel, linux-mm


On 2023/10/8 17:12, Yajun Deng wrote:
>
> On 2023/10/8 16:53, Eric Dumazet wrote:
>> On Sun, Oct 8, 2023 at 10:44 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>
>>> On 2023/10/8 15:18, Eric Dumazet wrote:
>>>> On Sun, Oct 8, 2023 at 9:00 AM Yajun Deng <yajun.deng@linux.dev> 
>>>> wrote:
>>>>> On 2023/10/8 14:45, Eric Dumazet wrote:
>>>>>> On Sat, Oct 7, 2023 at 8:34 AM Yajun Deng <yajun.deng@linux.dev> 
>>>>>> wrote:
>>>>>>> On 2023/10/7 13:29, Eric Dumazet wrote:
>>>>>>>> On Sat, Oct 7, 2023 at 7:06 AM Yajun Deng 
>>>>>>>> <yajun.deng@linux.dev> wrote:
>>>>>>>>> Although there is a kfree_skb_reason() helper function that 
>>>>>>>>> can be used to
>>>>>>>>> find the reason why this skb is dropped, but most callers 
>>>>>>>>> didn't increase
>>>>>>>>> one of rx_dropped, tx_dropped, rx_nohandler and 
>>>>>>>>> rx_otherhost_dropped.
>>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>> +
>>>>>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
>>>>>>>>> +{
>>>>>>>>> +       /* This READ_ONCE() pairs with the write in 
>>>>>>>>> netdev_core_stats_alloc() */
>>>>>>>>> +       struct net_device_core_stats __percpu *p = 
>>>>>>>>> READ_ONCE(dev->core_stats);
>>>>>>>>> +       unsigned long *field;
>>>>>>>>> +
>>>>>>>>> +       if (unlikely(!p))
>>>>>>>>> +               p = netdev_core_stats_alloc(dev);
>>>>>>>>> +
>>>>>>>>> +       if (p) {
>>>>>>>>> +               field = (unsigned long *)((void 
>>>>>>>>> *)this_cpu_ptr(p) + offset);
>>>>>>>>> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
>>>>>>>> This is broken...
>>>>>>>>
>>>>>>>> As I explained earlier, dev_core_stats_xxxx(dev) can be called 
>>>>>>>> from
>>>>>>>> many different contexts:
>>>>>>>>
>>>>>>>> 1) process contexts, where preemption and migration are allowed.
>>>>>>>> 2) interrupt contexts.
>>>>>>>>
>>>>>>>> Adding WRITE_ONCE()/READ_ONCE() is not solving potential races.
>>>>>>>>
>>>>>>>> I _think_ I already gave you how to deal with this ?
>>>>>>> Yes, I replied in v6.
>>>>>>>
>>>>>>> https://lore.kernel.org/all/e25b5f3c-bd97-56f0-de86-b93a3172870d@linux.dev/ 
>>>>>>>
>>>>>>>
>>>>>>>> Please try instead:
>>>>>>>>
>>>>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
>>>>>>>> +{
>>>>>>>> +       /* This READ_ONCE() pairs with the write in 
>>>>>>>> netdev_core_stats_alloc() */
>>>>>>>> +       struct net_device_core_stats __percpu *p = 
>>>>>>>> READ_ONCE(dev->core_stats);
>>>>>>>> +       unsigned long __percpu *field;
>>>>>>>> +
>>>>>>>> +       if (unlikely(!p)) {
>>>>>>>> +               p = netdev_core_stats_alloc(dev);
>>>>>>>> +               if (!p)
>>>>>>>> +                       return;
>>>>>>>> +       }
>>>>>>>> +       field = (__force unsigned long __percpu *)((__force 
>>>>>>>> void *)p + offset);
>>>>>>>> +       this_cpu_inc(*field);
>>>>>>>> +}
>>>>>>> This wouldn't trace anything even the rx_dropped is in 
>>>>>>> increasing. It
>>>>>>> needs to add an extra operation, such as:
>>>>>> I honestly do not know what you are talking about.
>>>>>>
>>>>>> Have you even tried to change your patch to use
>>>>>>
>>>>>> field = (__force unsigned long __percpu *)((__force void *)p + 
>>>>>> offset);
>>>>>> this_cpu_inc(*field);
>>>>> Yes, I tested this code. But the following couldn't show anything 
>>>>> even
>>>>> if the rx_dropped is increasing.
>>>>>
>>>>> 'sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc'
>>>> Well, I am not sure about this, "bpftrace" worked for me.
>>>>
>>>> Make sure your toolchain generates something that looks like what I 
>>>> got:
>>>>
>>>> 000000000000ef20 <netdev_core_stats_inc>:
>>>>       ef20: f3 0f 1e fa          endbr64
>>>>       ef24: e8 00 00 00 00        call   ef29 
>>>> <netdev_core_stats_inc+0x9>
>>>> ef25: R_X86_64_PLT32 __fentry__-0x4
>>>>       ef29: 55                    push   %rbp
>>>>       ef2a: 48 89 e5              mov    %rsp,%rbp
>>>>       ef2d: 53                    push   %rbx
>>>>       ef2e: 89 f3                mov    %esi,%ebx
>>>>       ef30: 48 8b 87 f0 01 00 00 mov    0x1f0(%rdi),%rax
>>>>       ef37: 48 85 c0              test   %rax,%rax
>>>>       ef3a: 74 0b                je     ef47 
>>>> <netdev_core_stats_inc+0x27>
>>>>       ef3c: 89 d9                mov    %ebx,%ecx
>>>>       ef3e: 65 48 ff 04 08        incq   %gs:(%rax,%rcx,1)
>>>>       ef43: 5b                    pop    %rbx
>>>>       ef44: 5d                    pop    %rbp
>>>>       ef45: c3                    ret
>>>>       ef46: cc                    int3
>>>>       ef47: e8 00 00 00 00        call   ef4c 
>>>> <netdev_core_stats_inc+0x2c>
>>>> ef48: R_X86_64_PLT32 .text.unlikely.+0x13c
>>>>       ef4c: 48 85 c0              test   %rax,%rax
>>>>       ef4f: 75 eb                jne    ef3c 
>>>> <netdev_core_stats_inc+0x1c>
>>>>       ef51: eb f0                jmp    ef43 
>>>> <netdev_core_stats_inc+0x23>
>>>>       ef53: 66 66 66 66 2e 0f 1f data16 data16 data16 cs nopw 
>>>> 0x0(%rax,%rax,1)
>>>>       ef5a: 84 00 00 00 00 00
>>>
>>> I'll share some I can see it.
>>>
>>> 1.
>>>
>>> objdump -D vmlinux
>>>
>>> ffffffff81b2f170 <netdev_core_stats_inc>:
>>> ffffffff81b2f170:    e8 8b ea 55 ff           callq ffffffff8108dc00
>>> <__fentry__>
>>> ffffffff81b2f175:    55                       push   %rbp
>>> ffffffff81b2f176:    48 89 e5                 mov    %rsp,%rbp
>>> ffffffff81b2f179:    48 83 ec 08              sub    $0x8,%rsp
>>> ffffffff81b2f17d:    48 8b 87 e8 01 00 00     mov 0x1e8(%rdi),%rax
>>> ffffffff81b2f184:    48 85 c0                 test   %rax,%rax
>>> ffffffff81b2f187:    74 0d                    je ffffffff81b2f196
>>> <netdev_core_stats_inc+0x26>
>>> ffffffff81b2f189:    89 f6                    mov    %esi,%esi
>>> ffffffff81b2f18b:    65 48 ff 04 30           incq %gs:(%rax,%rsi,1)
>>> ffffffff81b2f190:    c9                       leaveq
>>> ffffffff81b2f191:    e9 aa 31 6d 00           jmpq ffffffff82202340
>>> <__x86_return_thunk>
>>> ffffffff81b2f196:    89 75 fc                 mov %esi,-0x4(%rbp)
>>> ffffffff81b2f199:    e8 82 ff ff ff           callq ffffffff81b2f120
>>> <netdev_core_stats_alloc>
>>> ffffffff81b2f19e:    8b 75 fc                 mov -0x4(%rbp),%esi
>>> ffffffff81b2f1a1:    48 85 c0                 test   %rax,%rax
>>> ffffffff81b2f1a4:    75 e3                    jne ffffffff81b2f189
>>> <netdev_core_stats_inc+0x19>
>>> ffffffff81b2f1a6:    c9                       leaveq
>>> ffffffff81b2f1a7:    e9 94 31 6d 00           jmpq ffffffff82202340
>>> <__x86_return_thunk>
>>> ffffffff81b2f1ac:    0f 1f 40 00              nopl   0x0(%rax)
>>>
>>>
>>> 2.
>>>
>>> sudo cat /proc/kallsyms | grep netdev_core_stats_inc
>>>
>>> ffffffff9c72f120 T netdev_core_stats_inc
>>> ffffffff9ca2676c t netdev_core_stats_inc.cold
>>> ffffffff9d5235e0 r __ksymtab_netdev_core_stats_inc
>>>
>>>
>>> 3.
>>>
>>> ➜  ~ ifconfig enp34s0f0
>>> enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
>>>           inet 10.10.30.88  netmask 255.255.255.0  broadcast 
>>> 10.10.30.255
>>>           inet6 fe80::6037:806c:14b6:f1ca  prefixlen 64 scopeid 
>>> 0x20<link>
>>>           ether 04:d4:c4:5c:81:42  txqueuelen 1000  (Ethernet)
>>>           RX packets 29024  bytes 3118278 (3.1 MB)
>>>           RX errors 0  dropped 794  overruns 0  frame 0
>>>           TX packets 16961  bytes 2662290 (2.6 MB)
>>>           TX errors 0  dropped 0 overruns 0  carrier 0 collisions 0
>>>           device interrupt 29  memory 0x39fff4000000-39fff47fffff
>>>
>>> ➜  ~ ifconfig enp34s0f0
>>> enp34s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
>>>           inet 10.10.30.88  netmask 255.255.255.0  broadcast 
>>> 10.10.30.255
>>>           inet6 fe80::6037:806c:14b6:f1ca  prefixlen 64 scopeid 
>>> 0x20<link>
>>>           ether 04:d4:c4:5c:81:42  txqueuelen 1000  (Ethernet)
>>>           RX packets 29272  bytes 3148997 (3.1 MB)
>>>           RX errors 0  dropped 798  overruns 0  frame 0
>>>           TX packets 17098  bytes 2683547 (2.6 MB)
>>>           TX errors 0  dropped 0 overruns 0  carrier 0 collisions 0
>>>           device interrupt 29  memory 0x39fff4000000-39fff47fffff
>>>
>>>
>>> The rx_dropped is increasing.
>>>
>>>
>>> 4.
>>>
>>> sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc
>>>
>>> TIME     PID     TID     COMM            FUNC
>>>
>>> (Empty, I didn't see anything.)
>>>
>>>
>>> 5.
>>>
>>> sudo trace-cmd record -p function -l netdev_core_stats_inc
>>>
>>> sudo trace-cmd report
>>>
>>> (Empty, I didn't see anything.)
>>>
>>>
>>> If I add a 'pr_info("\n");'   like:
>>>
>>> +      pr_info("\n");
>>>           field = (__force unsigned long __percpu *)((__force void 
>>> *)p +
>>> offset);
>>>           this_cpu_inc(*field);
>>>
>>>
>>> Everything is OK. The 'pr_info("\n");' can be changed to anything else,
>>> but not
>>>
>>> without it.
>> This seems to be a bug that has nothing to do with the patch.
>>
>> Try getting help from Steven maybe.
>
>
> Hi Steven,
>
> Need your help.
>
> 1. The following code wouldn't trace anything by the command 'sudo 
> python3 /usr/share/bcc/tools/trace netdev_core_stats_inc'
>
> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
> +{
> +       /* This READ_ONCE() pairs with the write in 
> netdev_core_stats_alloc() */
> +       struct net_device_core_stats __percpu *p = 
> READ_ONCE(dev->core_stats);
> +       unsigned long __percpu *field;
> +
> +       if (unlikely(!p)) {
> +               p = netdev_core_stats_alloc(dev);
> +               if (!p)
> +                       return;
> +       }
> +       field = (__force unsigned long __percpu *)((__force void *)p + 
> offset);
> +       this_cpu_inc(*field);
> +}
> +EXPORT_SYMBOL_GPL(netdev_core_stats_inc);
>
> 2. If I add a 'pr_info("\n");', it would be fine. The 'pr_info("\n");' 
> can be changed to others.
>
> +void netdev_core_stats_inc(struct net_device *dev, u32 offset)
> +{
> +       /* This READ_ONCE() pairs with the write in 
> netdev_core_stats_alloc() */
> +       struct net_device_core_stats __percpu *p = 
> READ_ONCE(dev->core_stats);
> +       unsigned long __percpu *field;
> +
> +       if (unlikely(!p)) {
> +               p = netdev_core_stats_alloc(dev);
> +               if (!p)
> +                       return;
> +       }
> +    pr_info("\n");
> +       field = (__force unsigned long __percpu *)((__force void *)p + 
> offset);
> +       this_cpu_inc(*field);
> +}
> +EXPORT_SYMBOL_GPL(netdev_core_stats_inc);
>
> I don't know why we need to add something in netdev_core_stats_inc, 
> the trace will be fine.
>

I think I found something different in the assembly code.

this_cpu_read:
ffffffff81b2f120 <netdev_core_stats_inc>:
ffffffff81b2f120:       e8 db ea 55 ff          callq ffffffff8108dc00 
<__fentry__>
ffffffff81b2f125:       55                      push   %rbp
ffffffff81b2f126:       48 89 e5                mov    %rsp,%rbp
ffffffff81b2f129:       48 83 ec 08             sub    $0x8,%rsp
ffffffff81b2f12d:       48 8b 87 e8 01 00 00    mov 0x1e8(%rdi),%rax
ffffffff81b2f134:       48 85 c0                test   %rax,%rax
ffffffff81b2f137:       0f 84 2f 76 2f 00       je ffffffff81e2676c 
<netdev_core_stats_inc.cold>
ffffffff81b2f13d:       89 f6                   mov    %esi,%esi
ffffffff81b2f13f:       65 48 8b 04 30          mov %gs:(%rax,%rsi,1),%rax
ffffffff81b2f144:       c9                      leaveq
ffffffff81b2f145:       e9 f6 31 6d 00          jmpq ffffffff82202340 
<__x86_return_thunk>
ffffffff81b2f14a:       66 0f 1f 44 00 00       nopw 0x0(%rax,%rax,1)


this_cpu_write:
ffffffff81b2f120 <netdev_core_stats_inc>:
ffffffff81b2f120:       e8 db ea 55 ff          callq ffffffff8108dc00 
<__fentry__>
ffffffff81b2f125:       55                      push   %rbp
ffffffff81b2f126:       48 89 e5                mov    %rsp,%rbp
ffffffff81b2f129:       48 83 ec 08             sub    $0x8,%rsp
ffffffff81b2f12d:       48 8b 87 e8 01 00 00    mov 0x1e8(%rdi),%rax
ffffffff81b2f134:       48 85 c0                test   %rax,%rax
ffffffff81b2f137:       0f 84 2f 76 2f 00       je ffffffff81e2676c 
<netdev_core_stats_inc.cold>
ffffffff81b2f13d:       89 f6                   mov    %esi,%esi
ffffffff81b2f13f:       65 48 c7 04 30 01 00    movq $0x1,%gs:(%rax,%rsi,1)
ffffffff81b2f146:       00 00
ffffffff81b2f148:       c9                      leaveq
ffffffff81b2f149:       e9 f2 31 6d 00          jmpq ffffffff82202340 
<__x86_return_thunk>
ffffffff81b2f14e:       66 90                   xchg   %ax,%ax


this_cpu_read + this_cpu_write:
ffffffff81b2f0e0 <netdev_core_stats_inc>:
ffffffff81b2f0e0:       e8 1b eb 55 ff          callq ffffffff8108dc00 
<__fentry__>
ffffffff81b2f0e5:       55                      push   %rbp
ffffffff81b2f0e6:       48 89 e5                mov    %rsp,%rbp
ffffffff81b2f0e9:       41 55                   push   %r13
ffffffff81b2f0eb:       41 54                   push   %r12
ffffffff81b2f0ed:       41 89 f4                mov    %esi,%r12d
ffffffff81b2f0f0:       53                      push   %rbx
ffffffff81b2f0f1:       48 8b 9f e8 01 00 00    mov 0x1e8(%rdi),%rbx
ffffffff81b2f0f8:       48 85 db                test   %rbx,%rbx
ffffffff81b2f0fb:       0f 84 1f 76 2f 00       je ffffffff81e26720 
<netdev_core_stats_inc.cold>
ffffffff81b2f101:       44 89 e6                mov    %r12d,%esi
ffffffff81b2f104:       48 01 de                add    %rbx,%rsi
ffffffff81b2f107:       65 48 8b 06             mov %gs:(%rsi),%rax
ffffffff81b2f10b:       65 48 c7 06 01 00 00    movq $0x1,%gs:(%rsi)
ffffffff81b2f112:       00
ffffffff81b2f113:       5b                      pop    %rbx
ffffffff81b2f114:       41 5c                   pop    %r12
ffffffff81b2f116:       41 5d                   pop    %r13
ffffffff81b2f118:       5d                      pop    %rbp
ffffffff81b2f119:       e9 22 32 6d 00          jmpq ffffffff82202340 
<__x86_return_thunk>
ffffffff81b2f11e:       66 90                   xchg   %ax,%ax


this_cpu_inc:
ffffffff81b2f120 <netdev_core_stats_inc>:
ffffffff81b2f120:       e8 db ea 55 ff          callq ffffffff8108dc00 
<__fentry__>
ffffffff81b2f125:       55                      push   %rbp
ffffffff81b2f126:       48 89 e5                mov    %rsp,%rbp
ffffffff81b2f129:       48 83 ec 08             sub    $0x8,%rsp
ffffffff81b2f12d:       48 8b 87 e8 01 00 00    mov 0x1e8(%rdi),%rax
ffffffff81b2f134:       48 85 c0                test   %rax,%rax
ffffffff81b2f137:       0f 84 2f 76 2f 00       je ffffffff81e2676c 
<netdev_core_stats_inc.cold>
ffffffff81b2f13d:       89 f6                   mov    %esi,%esi
ffffffff81b2f13f:       65 48 ff 04 30          incq %gs:(%rax,%rsi,1)
ffffffff81b2f144:       c9                      leaveq
ffffffff81b2f145:       e9 f6 31 6d 00          jmpq ffffffff82202340 
<__x86_return_thunk>
ffffffff81b2f14a:       66 0f 1f 44 00 00       nopw 0x0(%rax,%rax,1)


pr_info + this_cpu_inc:
ffffffff81e26720 <netdev_core_stats_inc>:
ffffffff81e26720:       e8 db 74 26 ff          callq ffffffff8108dc00 
<__fentry__>
ffffffff81e26725:       55                      push   %rbp
ffffffff81e26726:       48 89 e5                mov    %rsp,%rbp
ffffffff81e26729:       41 55                   push   %r13
ffffffff81e2672b:       41 89 f5                mov    %esi,%r13d
ffffffff81e2672e:       41 54                   push   %r12
ffffffff81e26730:       53                      push   %rbx
ffffffff81e26731:       48 8b 9f e8 01 00 00    mov 0x1e8(%rdi),%rbx
ffffffff81e26738:       48 85 db                test   %rbx,%rbx
ffffffff81e2673b:       75 43                   jne ffffffff81e26780 
<netdev_core_stats_inc+0x60>
ffffffff81e2673d:       49 89 fc                mov    %rdi,%r12
ffffffff81e26740:       ba 20 2a 00 00          mov $0x2a20,%edx
ffffffff81e26745:       bf 20 00 00 00          mov    $0x20,%edi
ffffffff81e2674a:       be 20 00 00 00          mov    $0x20,%esi
ffffffff81e2674f:       e8 0c 36 4f ff          callq ffffffff81319d60 
<__alloc_percpu_gfp>
ffffffff81e26754:       48 89 c7                mov    %rax,%rdi
ffffffff81e26757:       48 85 c0                test   %rax,%rax
ffffffff81e2675a:       74 17                   je ffffffff81e26773 
<netdev_core_stats_inc+0x53>
ffffffff81e2675c:       48 89 d8                mov    %rbx,%rax
ffffffff81e2675f:       f0 49 0f b1 bc 24 e8    lock cmpxchg 
%rdi,0x1e8(%r12)
ffffffff81e26766:       01 00 00
ffffffff81e26769:       48 85 c0                test   %rax,%rax
ffffffff81e2676c:       74 05                   je ffffffff81e26773 
<netdev_core_stats_inc+0x53>
ffffffff81e2676e:       e8 1d 25 4f ff          callq ffffffff81318c90 
<free_percpu>
ffffffff81e26773:       49 8b 9c 24 e8 01 00    mov 0x1e8(%r12),%rbx
ffffffff81e2677a:       00
ffffffff81e2677b:       48 85 db                test   %rbx,%rbx
ffffffff81e2677e:       74 11                   je ffffffff81e26791 
<netdev_core_stats_inc+0x71>
ffffffff81e26780:       48 c7 c7 80 cd 77 82    mov $0xffffffff8277cd80,%rdi
ffffffff81e26787:       e8 6e 94 f6 ff          callq ffffffff81d8fbfa 
<_printk>
ffffffff81e2678c:       65 4a ff 04 2b          incq %gs:(%rbx,%r13,1)
ffffffff81e26791:       5b                      pop    %rbx
ffffffff81e26792:       41 5c                   pop    %r12
ffffffff81e26794:       41 5d                   pop    %r13
ffffffff81e26796:       5d                      pop    %rbp
ffffffff81e26797:       e9 a4 bb 3d 00          jmpq ffffffff82202340 
<__x86_return_thunk>


'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make 
the trace work well.

They all have 'pop' instructions in them. This may be the key to making 
the trace work well.

Hi all,

I need your help on percpu and ftrace.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-09  3:07                 ` Yajun Deng
@ 2023-10-09  7:53                   ` Eric Dumazet
  2023-10-09  8:13                     ` Yajun Deng
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2023-10-09  7:53 UTC (permalink / raw)
  To: Yajun Deng
  Cc: rostedt, mhiramat, dennis, tj, cl, mark.rutland, davem, kuba,
	pabeni, netdev, linux-kernel, Alexander Lobakin,
	linux-trace-kernel, linux-mm

On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng <yajun.deng@linux.dev> wrote:

> 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
> the trace work well.
>
> They all have 'pop' instructions in them. This may be the key to making
> the trace work well.
>
> Hi all,
>
> I need your help on percpu and ftrace.
>

I do not think you made sure netdev_core_stats_inc() was never inlined.

Adding more code in it is simply changing how the compiler decides to
inline or not.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-09  7:53                   ` Eric Dumazet
@ 2023-10-09  8:13                     ` Yajun Deng
  2023-10-09  8:20                       ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Yajun Deng @ 2023-10-09  8:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: rostedt, mhiramat, dennis, tj, cl, mark.rutland, davem, kuba,
	pabeni, netdev, linux-kernel, Alexander Lobakin,
	linux-trace-kernel, linux-mm


On 2023/10/9 15:53, Eric Dumazet wrote:
> On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>
>> 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
>> the trace work well.
>>
>> They all have 'pop' instructions in them. This may be the key to making
>> the trace work well.
>>
>> Hi all,
>>
>> I need your help on percpu and ftrace.
>>
> I do not think you made sure netdev_core_stats_inc() was never inlined.
>
> Adding more code in it is simply changing how the compiler decides to
> inline or not.


Yes, you are right. It needs to add the 'noinline' prefix. The 
disassembly code will have 'pop'

instruction.

Thanks.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-09  8:13                     ` Yajun Deng
@ 2023-10-09  8:20                       ` Eric Dumazet
  2023-10-09  8:36                         ` Yajun Deng
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2023-10-09  8:20 UTC (permalink / raw)
  To: Yajun Deng
  Cc: rostedt, mhiramat, dennis, tj, cl, mark.rutland, davem, kuba,
	pabeni, netdev, linux-kernel, Alexander Lobakin,
	linux-trace-kernel, linux-mm

On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>
>
> On 2023/10/9 15:53, Eric Dumazet wrote:
> > On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng <yajun.deng@linux.dev> wrote:
> >
> >> 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
> >> the trace work well.
> >>
> >> They all have 'pop' instructions in them. This may be the key to making
> >> the trace work well.
> >>
> >> Hi all,
> >>
> >> I need your help on percpu and ftrace.
> >>
> > I do not think you made sure netdev_core_stats_inc() was never inlined.
> >
> > Adding more code in it is simply changing how the compiler decides to
> > inline or not.
>
>
> Yes, you are right. It needs to add the 'noinline' prefix. The
> disassembly code will have 'pop'
>
> instruction.
>

The function was fine, you do not need anything like push or pop.

The only needed stuff was the call __fentry__.

The fact that the function was inlined for some invocations was the
issue, because the trace point
is only planted in the out of line function.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-09  8:20                       ` Eric Dumazet
@ 2023-10-09  8:36                         ` Yajun Deng
  2023-10-09  9:30                           ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Yajun Deng @ 2023-10-09  8:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: rostedt, mhiramat, dennis, tj, cl, mark.rutland, davem, kuba,
	pabeni, netdev, linux-kernel, Alexander Lobakin,
	linux-trace-kernel, linux-mm


On 2023/10/9 16:20, Eric Dumazet wrote:
> On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>
>> On 2023/10/9 15:53, Eric Dumazet wrote:
>>> On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>
>>>> 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
>>>> the trace work well.
>>>>
>>>> They all have 'pop' instructions in them. This may be the key to making
>>>> the trace work well.
>>>>
>>>> Hi all,
>>>>
>>>> I need your help on percpu and ftrace.
>>>>
>>> I do not think you made sure netdev_core_stats_inc() was never inlined.
>>>
>>> Adding more code in it is simply changing how the compiler decides to
>>> inline or not.
>>
>> Yes, you are right. It needs to add the 'noinline' prefix. The
>> disassembly code will have 'pop'
>>
>> instruction.
>>
> The function was fine, you do not need anything like push or pop.
>
> The only needed stuff was the call __fentry__.
>
> The fact that the function was inlined for some invocations was the
> issue, because the trace point
> is only planted in the out of line function.


But somehow the following code isn't inline? They didn't need to add the 
'noinline' prefix.

+		field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
+		WRITE_ONCE(*field, READ_ONCE(*field) + 1);

Or
+               (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-09  8:36                         ` Yajun Deng
@ 2023-10-09  9:30                           ` Eric Dumazet
  2023-10-09  9:43                             ` Yajun Deng
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2023-10-09  9:30 UTC (permalink / raw)
  To: Yajun Deng
  Cc: rostedt, mhiramat, dennis, tj, cl, mark.rutland, davem, kuba,
	pabeni, netdev, linux-kernel, Alexander Lobakin,
	linux-trace-kernel, linux-mm

On Mon, Oct 9, 2023 at 10:36 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>
>
> On 2023/10/9 16:20, Eric Dumazet wrote:
> > On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng <yajun.deng@linux.dev> wrote:
> >>
> >> On 2023/10/9 15:53, Eric Dumazet wrote:
> >>> On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng <yajun.deng@linux.dev> wrote:
> >>>
> >>>> 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
> >>>> the trace work well.
> >>>>
> >>>> They all have 'pop' instructions in them. This may be the key to making
> >>>> the trace work well.
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I need your help on percpu and ftrace.
> >>>>
> >>> I do not think you made sure netdev_core_stats_inc() was never inlined.
> >>>
> >>> Adding more code in it is simply changing how the compiler decides to
> >>> inline or not.
> >>
> >> Yes, you are right. It needs to add the 'noinline' prefix. The
> >> disassembly code will have 'pop'
> >>
> >> instruction.
> >>
> > The function was fine, you do not need anything like push or pop.
> >
> > The only needed stuff was the call __fentry__.
> >
> > The fact that the function was inlined for some invocations was the
> > issue, because the trace point
> > is only planted in the out of line function.
>
>
> But somehow the following code isn't inline? They didn't need to add the
> 'noinline' prefix.
>
> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
>
> Or
> +               (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;
>

I think you are very confused.

You only want to trace netdev_core_stats_inc() entry point, not
arbitrary pieces of it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-09  9:30                           ` Eric Dumazet
@ 2023-10-09  9:43                             ` Yajun Deng
  2023-10-09 10:16                               ` Eric Dumazet
  0 siblings, 1 reply; 20+ messages in thread
From: Yajun Deng @ 2023-10-09  9:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: rostedt, mhiramat, dennis, tj, cl, mark.rutland, davem, kuba,
	pabeni, netdev, linux-kernel, Alexander Lobakin,
	linux-trace-kernel, linux-mm


On 2023/10/9 17:30, Eric Dumazet wrote:
> On Mon, Oct 9, 2023 at 10:36 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>
>> On 2023/10/9 16:20, Eric Dumazet wrote:
>>> On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>> On 2023/10/9 15:53, Eric Dumazet wrote:
>>>>> On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>>>
>>>>>> 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
>>>>>> the trace work well.
>>>>>>
>>>>>> They all have 'pop' instructions in them. This may be the key to making
>>>>>> the trace work well.
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I need your help on percpu and ftrace.
>>>>>>
>>>>> I do not think you made sure netdev_core_stats_inc() was never inlined.
>>>>>
>>>>> Adding more code in it is simply changing how the compiler decides to
>>>>> inline or not.
>>>> Yes, you are right. It needs to add the 'noinline' prefix. The
>>>> disassembly code will have 'pop'
>>>>
>>>> instruction.
>>>>
>>> The function was fine, you do not need anything like push or pop.
>>>
>>> The only needed stuff was the call __fentry__.
>>>
>>> The fact that the function was inlined for some invocations was the
>>> issue, because the trace point
>>> is only planted in the out of line function.
>>
>> But somehow the following code isn't inline? They didn't need to add the
>> 'noinline' prefix.
>>
>> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
>> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
>>
>> Or
>> +               (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;
>>
> I think you are very confused.
>
> You only want to trace netdev_core_stats_inc() entry point, not
> arbitrary pieces of it.


Yes, I will trace netdev_core_stats_inc() entry point. I mean to replace

+                                       field = (__force unsigned long 
__percpu *)((__force void *)p + offset);
+                                       this_cpu_inc(*field);

with

+               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
+               WRITE_ONCE(*field, READ_ONCE(*field) + 1);

Or
+               (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;

The netdev_core_stats_inc() entry point will work fine even if it doesn't
have 'noinline' prefix.

I don't know why this code needs to add 'noinline' prefix.
+               field = (__force unsigned long __percpu *)((__force void *)p + offset);
+               this_cpu_inc(*field);


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-09  9:43                             ` Yajun Deng
@ 2023-10-09 10:16                               ` Eric Dumazet
  2023-10-09 10:58                                 ` Yajun Deng
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2023-10-09 10:16 UTC (permalink / raw)
  To: Yajun Deng
  Cc: rostedt, mhiramat, dennis, tj, cl, mark.rutland, davem, kuba,
	pabeni, netdev, linux-kernel, Alexander Lobakin,
	linux-trace-kernel, linux-mm

On Mon, Oct 9, 2023 at 11:43 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>
>
> On 2023/10/9 17:30, Eric Dumazet wrote:
> > On Mon, Oct 9, 2023 at 10:36 AM Yajun Deng <yajun.deng@linux.dev> wrote:
> >>
> >> On 2023/10/9 16:20, Eric Dumazet wrote:
> >>> On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng <yajun.deng@linux.dev> wrote:
> >>>> On 2023/10/9 15:53, Eric Dumazet wrote:
> >>>>> On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng <yajun.deng@linux.dev> wrote:
> >>>>>
> >>>>>> 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
> >>>>>> the trace work well.
> >>>>>>
> >>>>>> They all have 'pop' instructions in them. This may be the key to making
> >>>>>> the trace work well.
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I need your help on percpu and ftrace.
> >>>>>>
> >>>>> I do not think you made sure netdev_core_stats_inc() was never inlined.
> >>>>>
> >>>>> Adding more code in it is simply changing how the compiler decides to
> >>>>> inline or not.
> >>>> Yes, you are right. It needs to add the 'noinline' prefix. The
> >>>> disassembly code will have 'pop'
> >>>>
> >>>> instruction.
> >>>>
> >>> The function was fine, you do not need anything like push or pop.
> >>>
> >>> The only needed stuff was the call __fentry__.
> >>>
> >>> The fact that the function was inlined for some invocations was the
> >>> issue, because the trace point
> >>> is only planted in the out of line function.
> >>
> >> But somehow the following code isn't inline? They didn't need to add the
> >> 'noinline' prefix.
> >>
> >> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
> >> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
> >>
> >> Or
> >> +               (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;
> >>
> > I think you are very confused.
> >
> > You only want to trace netdev_core_stats_inc() entry point, not
> > arbitrary pieces of it.
>
>
> Yes, I will trace netdev_core_stats_inc() entry point. I mean to replace
>
> +                                       field = (__force unsigned long
> __percpu *)((__force void *)p + offset);
> +                                       this_cpu_inc(*field);
>
> with
>
> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
>
> Or
> +               (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;
>
> The netdev_core_stats_inc() entry point will work fine even if it doesn't
> have 'noinline' prefix.
>
> I don't know why this code needs to add 'noinline' prefix.
> +               field = (__force unsigned long __percpu *)((__force void *)p + offset);
> +               this_cpu_inc(*field);
>

C compiler decides to inline or not, depending on various factors.

The most efficient (and small) code is generated by this_cpu_inc()
version, allowing the compiler to inline it.

If you copy/paste this_cpu_inc()  twenty times, then the compiler
would  not inline the function anymore.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-09 10:16                               ` Eric Dumazet
@ 2023-10-09 10:58                                 ` Yajun Deng
  2023-10-09 14:28                                   ` Steven Rostedt
  0 siblings, 1 reply; 20+ messages in thread
From: Yajun Deng @ 2023-10-09 10:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: rostedt, mhiramat, dennis, tj, cl, mark.rutland, davem, kuba,
	pabeni, netdev, linux-kernel, Alexander Lobakin,
	linux-trace-kernel, linux-mm


On 2023/10/9 18:16, Eric Dumazet wrote:
> On Mon, Oct 9, 2023 at 11:43 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>
>> On 2023/10/9 17:30, Eric Dumazet wrote:
>>> On Mon, Oct 9, 2023 at 10:36 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>> On 2023/10/9 16:20, Eric Dumazet wrote:
>>>>> On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>>>> On 2023/10/9 15:53, Eric Dumazet wrote:
>>>>>>> On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng <yajun.deng@linux.dev> wrote:
>>>>>>>
>>>>>>>> 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
>>>>>>>> the trace work well.
>>>>>>>>
>>>>>>>> They all have 'pop' instructions in them. This may be the key to making
>>>>>>>> the trace work well.
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I need your help on percpu and ftrace.
>>>>>>>>
>>>>>>> I do not think you made sure netdev_core_stats_inc() was never inlined.
>>>>>>>
>>>>>>> Adding more code in it is simply changing how the compiler decides to
>>>>>>> inline or not.
>>>>>> Yes, you are right. It needs to add the 'noinline' prefix. The
>>>>>> disassembly code will have 'pop'
>>>>>>
>>>>>> instruction.
>>>>>>
>>>>> The function was fine, you do not need anything like push or pop.
>>>>>
>>>>> The only needed stuff was the call __fentry__.
>>>>>
>>>>> The fact that the function was inlined for some invocations was the
>>>>> issue, because the trace point
>>>>> is only planted in the out of line function.
>>>> But somehow the following code isn't inline? They didn't need to add the
>>>> 'noinline' prefix.
>>>>
>>>> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
>>>> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
>>>>
>>>> Or
>>>> +               (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;
>>>>
>>> I think you are very confused.
>>>
>>> You only want to trace netdev_core_stats_inc() entry point, not
>>> arbitrary pieces of it.
>>
>> Yes, I will trace netdev_core_stats_inc() entry point. I mean to replace
>>
>> +                                       field = (__force unsigned long
>> __percpu *)((__force void *)p + offset);
>> +                                       this_cpu_inc(*field);
>>
>> with
>>
>> +               field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
>> +               WRITE_ONCE(*field, READ_ONCE(*field) + 1);
>>
>> Or
>> +               (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;
>>
>> The netdev_core_stats_inc() entry point will work fine even if it doesn't
>> have 'noinline' prefix.
>>
>> I don't know why this code needs to add 'noinline' prefix.
>> +               field = (__force unsigned long __percpu *)((__force void *)p + offset);
>> +               this_cpu_inc(*field);
>>
> C compiler decides to inline or not, depending on various factors.
>
> The most efficient (and small) code is generated by this_cpu_inc()
> version, allowing the compiler to inline it.
>
> If you copy/paste this_cpu_inc()  twenty times, then the compiler
> would  not inline the function anymore.


Got it. Thank you.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-09 10:58                                 ` Yajun Deng
@ 2023-10-09 14:28                                   ` Steven Rostedt
  2023-10-10  3:46                                     ` Yajun Deng
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2023-10-09 14:28 UTC (permalink / raw)
  To: Yajun Deng
  Cc: Eric Dumazet, mhiramat, dennis, tj, cl, mark.rutland, davem,
	kuba, pabeni, netdev, linux-kernel, Alexander Lobakin,
	linux-trace-kernel, linux-mm

On Mon, 9 Oct 2023 18:58:27 +0800
Yajun Deng <yajun.deng@linux.dev> wrote:

> > C compiler decides to inline or not, depending on various factors.
> >
> > The most efficient (and small) code is generated by this_cpu_inc()
> > version, allowing the compiler to inline it.
> >
> > If you copy/paste this_cpu_inc()  twenty times, then the compiler
> > would  not inline the function anymore.  

Yes, if you want something to be visible by ftrace, it must not be inlined
(as inlined functions are not function calls by definition). And as Eric
stated, the compiler is perfectly allowed to inline something if it
believes it will be more efficient. i.e. There may be code around the function
call that could be more efficient if it wasn't change to parameters. If you
want to make sure a function stays out of line, you must explicitly tell
the compiler you want the function not to ever be inlined (hence the
"noinline" attribute).

> 
> 
> Got it. Thank you.

Great.

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()
  2023-10-09 14:28                                   ` Steven Rostedt
@ 2023-10-10  3:46                                     ` Yajun Deng
  0 siblings, 0 replies; 20+ messages in thread
From: Yajun Deng @ 2023-10-10  3:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Eric Dumazet, mhiramat, dennis, tj, cl, mark.rutland, davem,
	kuba, pabeni, netdev, linux-kernel, Alexander Lobakin,
	linux-trace-kernel, linux-mm


On 2023/10/9 22:28, Steven Rostedt wrote:
> On Mon, 9 Oct 2023 18:58:27 +0800
> Yajun Deng <yajun.deng@linux.dev> wrote:
>
>>> C compiler decides to inline or not, depending on various factors.
>>>
>>> The most efficient (and small) code is generated by this_cpu_inc()
>>> version, allowing the compiler to inline it.
>>>
>>> If you copy/paste this_cpu_inc()  twenty times, then the compiler
>>> would  not inline the function anymore.
> Yes, if you want something to be visible by ftrace, it must not be inlined
> (as inlined functions are not function calls by definition). And as Eric
> stated, the compiler is perfectly allowed to inline something if it
> believes it will be more efficient. i.e. There may be code around the function
> call that could be more efficient if it wasn't change to parameters. If you
> want to make sure a function stays out of line, you must explicitly tell
> the compiler you want the function not to ever be inlined (hence the
> "noinline" attribute).


Thanks for the details.

>>
>> Got it. Thank you.
> Great.
>
> -- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-10-10  3:46 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-07  5:06 [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc() Yajun Deng
2023-10-07  5:29 ` Eric Dumazet
2023-10-07  6:34   ` Yajun Deng
2023-10-08  6:45     ` Eric Dumazet
2023-10-08  6:59       ` Yajun Deng
2023-10-08  7:18         ` Eric Dumazet
2023-10-08  8:44           ` Yajun Deng
2023-10-08  8:53             ` Eric Dumazet
2023-10-08  9:12               ` Yajun Deng
2023-10-09  3:07                 ` Yajun Deng
2023-10-09  7:53                   ` Eric Dumazet
2023-10-09  8:13                     ` Yajun Deng
2023-10-09  8:20                       ` Eric Dumazet
2023-10-09  8:36                         ` Yajun Deng
2023-10-09  9:30                           ` Eric Dumazet
2023-10-09  9:43                             ` Yajun Deng
2023-10-09 10:16                               ` Eric Dumazet
2023-10-09 10:58                                 ` Yajun Deng
2023-10-09 14:28                                   ` Steven Rostedt
2023-10-10  3:46                                     ` Yajun Deng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.