* [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue
@ 2022-04-13 14:06 Wu Zongyong
2022-04-13 19:30 ` Andrii Nakryiko
0 siblings, 1 reply; 6+ messages in thread
From: Wu Zongyong @ 2022-04-13 14:06 UTC (permalink / raw)
To: bpf
Hi,
I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read
the count value from userspace by bpf_map_lookup_elem().
With bpftrace, I can see this tracepoint is hit about 700 times, but the count
I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk()
to the program, and the count I get is normal which is about 700.
The bpf program codes like that:
struct qdisc_dequeue_ctx {
__u64 __pad;
__u64 qdisc;
__u64 txq;
int packets;
__u64 skbaddr;
int ifindex;
__u32 handle;
__u32 parent;
unsigned long txq_state;
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, int);
__type(value, __u32);
__uint(max_entries, 1);
__uint(pinning, LIBBPF_PIN_BY_NAME);
} count_map SEC(".maps");
SEC("tracepoint/qdisc/qdisc_dequeue")
int trace_dequeue(struct qdisc_dequeue_ctx *ctx)
{
int key = 0;
__u32 *value;
__u32 init = 0;
value = bpf_map_lookup_elem(&count_map, &key);
if (value) {
*value += 1;
} else {
bpf_printk("value reset");
bpf_map_update_elem(&count_map, &key, &init, 0);
}
return 0;
}
Any suggestion is appreciated!
Thanks,
Wu Zongyong
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue
2022-04-13 14:06 [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue Wu Zongyong
@ 2022-04-13 19:30 ` Andrii Nakryiko
2022-04-14 1:50 ` Yafang Shao
2022-04-14 2:07 ` Wu Zongyong
0 siblings, 2 replies; 6+ messages in thread
From: Andrii Nakryiko @ 2022-04-13 19:30 UTC (permalink / raw)
To: Wu Zongyong; +Cc: bpf
On Wed, Apr 13, 2022 at 12:08 PM Wu Zongyong
<wuzongyong@linux.alibaba.com> wrote:
>
> Hi,
>
> I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read
> the count value from userspace by bpf_map_lookup_elem().
> With bpftrace, I can see this tracepoint is hit about 700 times, but the count
> I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk()
> to the program, and the count I get is normal which is about 700.
>
> The bpf program codes like that:
>
> struct qdisc_dequeue_ctx {
> __u64 __pad;
> __u64 qdisc;
> __u64 txq;
> int packets;
> __u64 skbaddr;
> int ifindex;
> __u32 handle;
> __u32 parent;
> unsigned long txq_state;
> };
>
> struct {
> __uint(type, BPF_MAP_TYPE_HASH);
> __type(key, int);
> __type(value, __u32);
> __uint(max_entries, 1);
> __uint(pinning, LIBBPF_PIN_BY_NAME);
> } count_map SEC(".maps");
>
> SEC("tracepoint/qdisc/qdisc_dequeue")
> int trace_dequeue(struct qdisc_dequeue_ctx *ctx)
> {
> int key = 0;
> __u32 *value;
> __u32 init = 0;
>
> value = bpf_map_lookup_elem(&count_map, &key);
> if (value) {
> *value += 1;
> } else {
> bpf_printk("value reset");
> bpf_map_update_elem(&count_map, &key, &init, 0);
> }
> return 0;
> }
>
> Any suggestion is appreciated!
>
First, why do you use HASH map for single-key map? ARRAY would make
more sense for keys that are small integers. But I assume your real
world use case needs bigger and more random keys, right?
Second, you have two race conditions. One, you overwrite the value in
the map with this bpf_map_update_elem(..., 0). Use BPF_NOEXISTS for
initialization to avoid overwriting something that another CPU already
set. Another one is your *value += 1 is non-atomic, so you are loosing
updates as well. Use __sync_fetch_and_add(value, 1) for atomic
increment.
Something like this:
value = bpf_map_lookup_elem(&count_map, &key);
if (!value) {
/* BPF_NOEXIST won't allow to override the value that's already set */
bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXISTS);
value = bpf_map_lookup_elem(&count_map, &key);
}
if (!value)
return 0;
__sync_fetch_and_add(value, 1);
> Thanks,
> Wu Zongyong
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue
2022-04-13 19:30 ` Andrii Nakryiko
@ 2022-04-14 1:50 ` Yafang Shao
2022-04-14 2:07 ` Wu Zongyong
1 sibling, 0 replies; 6+ messages in thread
From: Yafang Shao @ 2022-04-14 1:50 UTC (permalink / raw)
To: Andrii Nakryiko; +Cc: Wu Zongyong, bpf
On Thu, Apr 14, 2022 at 8:25 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Apr 13, 2022 at 12:08 PM Wu Zongyong
> <wuzongyong@linux.alibaba.com> wrote:
> >
> > Hi,
> >
> > I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read
> > the count value from userspace by bpf_map_lookup_elem().
> > With bpftrace, I can see this tracepoint is hit about 700 times, but the count
> > I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk()
> > to the program, and the count I get is normal which is about 700.
> >
> > The bpf program codes like that:
> >
> > struct qdisc_dequeue_ctx {
> > __u64 __pad;
> > __u64 qdisc;
> > __u64 txq;
> > int packets;
> > __u64 skbaddr;
> > int ifindex;
> > __u32 handle;
> > __u32 parent;
> > unsigned long txq_state;
> > };
> >
> > struct {
> > __uint(type, BPF_MAP_TYPE_HASH);
> > __type(key, int);
> > __type(value, __u32);
> > __uint(max_entries, 1);
> > __uint(pinning, LIBBPF_PIN_BY_NAME);
> > } count_map SEC(".maps");
> >
> > SEC("tracepoint/qdisc/qdisc_dequeue")
> > int trace_dequeue(struct qdisc_dequeue_ctx *ctx)
> > {
> > int key = 0;
> > __u32 *value;
> > __u32 init = 0;
> >
> > value = bpf_map_lookup_elem(&count_map, &key);
> > if (value) {
> > *value += 1;
> > } else {
> > bpf_printk("value reset");
> > bpf_map_update_elem(&count_map, &key, &init, 0);
> > }
> > return 0;
> > }
> >
> > Any suggestion is appreciated!
> >
>
> First, why do you use HASH map for single-key map? ARRAY would make
> more sense for keys that are small integers. But I assume your real
> world use case needs bigger and more random keys, right?
>
> Second, you have two race conditions. One, you overwrite the value in
> the map with this bpf_map_update_elem(..., 0). Use BPF_NOEXISTS for
> initialization to avoid overwriting something that another CPU already
> set. Another one is your *value += 1 is non-atomic, so you are loosing
> updates as well. Use __sync_fetch_and_add(value, 1) for atomic
> increment.
>
> Something like this:
>
> value = bpf_map_lookup_elem(&count_map, &key);
> if (!value) {
> /* BPF_NOEXIST won't allow to override the value that's already set */
> bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXISTS);
> value = bpf_map_lookup_elem(&count_map, &key);
> }
> if (!value)
> return 0;
>
> __sync_fetch_and_add(value, 1);
>
Hi Andrii,
I'm curious that if we should do it as below,
new_value = *value + 1;
bpf_map_update_elem(&count_map, &key, &new_value, BPF_EXIST);
in case someone, e.g. bpftool, may delete this elem in parallel.
BTW, why don't we have a helper like bpf_map_lookup_elem_value() which
returns the value directly instead of the addr of the value ?
--
Thanks
Yafang
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue
2022-04-13 19:30 ` Andrii Nakryiko
2022-04-14 1:50 ` Yafang Shao
@ 2022-04-14 2:07 ` Wu Zongyong
2022-04-14 3:45 ` Wu Zongyong
1 sibling, 1 reply; 6+ messages in thread
From: Wu Zongyong @ 2022-04-14 2:07 UTC (permalink / raw)
To: Andrii Nakryiko; +Cc: bpf
On Wed, Apr 13, 2022 at 12:30:51PM -0700, Andrii Nakryiko wrote:
> On Wed, Apr 13, 2022 at 12:08 PM Wu Zongyong
> <wuzongyong@linux.alibaba.com> wrote:
> >
> > Hi,
> >
> > I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read
> > the count value from userspace by bpf_map_lookup_elem().
> > With bpftrace, I can see this tracepoint is hit about 700 times, but the count
> > I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk()
> > to the program, and the count I get is normal which is about 700.
> >
> > The bpf program codes like that:
> >
> > struct qdisc_dequeue_ctx {
> > __u64 __pad;
> > __u64 qdisc;
> > __u64 txq;
> > int packets;
> > __u64 skbaddr;
> > int ifindex;
> > __u32 handle;
> > __u32 parent;
> > unsigned long txq_state;
> > };
> >
> > struct {
> > __uint(type, BPF_MAP_TYPE_HASH);
> > __type(key, int);
> > __type(value, __u32);
> > __uint(max_entries, 1);
> > __uint(pinning, LIBBPF_PIN_BY_NAME);
> > } count_map SEC(".maps");
> >
> > SEC("tracepoint/qdisc/qdisc_dequeue")
> > int trace_dequeue(struct qdisc_dequeue_ctx *ctx)
> > {
> > int key = 0;
> > __u32 *value;
> > __u32 init = 0;
> >
> > value = bpf_map_lookup_elem(&count_map, &key);
> > if (value) {
> > *value += 1;
> > } else {
> > bpf_printk("value reset");
> > bpf_map_update_elem(&count_map, &key, &init, 0);
> > }
> > return 0;
> > }
> >
> > Any suggestion is appreciated!
> >
>
> First, why do you use HASH map for single-key map? ARRAY would make
> more sense for keys that are small integers. But I assume your real
> world use case needs bigger and more random keys, right?
>
Yes, this just is a simple test.
>
> Second, you have two race conditions. One, you overwrite the value in
> the map with this bpf_map_update_elem(..., 0). Use BPF_NOEXISTS for
> initialization to avoid overwriting something that another CPU already
> set. Another one is your *value += 1 is non-atomic, so you are loosing
> updates as well. Use __sync_fetch_and_add(value, 1) for atomic
> increment.
__sync_fetch_and_add do solve my problem. Thanks!
I have tried to use spinlock to prevent race conditions, but it seems
that spinlock cannot be used in tracepoint.
>
> Something like this:
>
> value = bpf_map_lookup_elem(&count_map, &key);
> if (!value) {
> /* BPF_NOEXIST won't allow to override the value that's already set */
> bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXISTS);
> value = bpf_map_lookup_elem(&count_map, &key);
> }
> if (!value)
> return 0;
>
> __sync_fetch_and_add(value, 1);
>
>
> > Thanks,
> > Wu Zongyong
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue
2022-04-14 2:07 ` Wu Zongyong
@ 2022-04-14 3:45 ` Wu Zongyong
2022-04-14 21:34 ` Andrii Nakryiko
0 siblings, 1 reply; 6+ messages in thread
From: Wu Zongyong @ 2022-04-14 3:45 UTC (permalink / raw)
To: Andrii Nakryiko; +Cc: bpf
On Thu, Apr 14, 2022 at 10:07:09AM +0800, Wu Zongyong wrote:
> On Wed, Apr 13, 2022 at 12:30:51PM -0700, Andrii Nakryiko wrote:
> > On Wed, Apr 13, 2022 at 12:08 PM Wu Zongyong
> > <wuzongyong@linux.alibaba.com> wrote:
> > >
> > > Hi,
> > >
> > > I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read
> > > the count value from userspace by bpf_map_lookup_elem().
> > > With bpftrace, I can see this tracepoint is hit about 700 times, but the count
> > > I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk()
> > > to the program, and the count I get is normal which is about 700.
> > >
> > > The bpf program codes like that:
> > >
> > > struct qdisc_dequeue_ctx {
> > > __u64 __pad;
> > > __u64 qdisc;
> > > __u64 txq;
> > > int packets;
> > > __u64 skbaddr;
> > > int ifindex;
> > > __u32 handle;
> > > __u32 parent;
> > > unsigned long txq_state;
> > > };
> > >
> > > struct {
> > > __uint(type, BPF_MAP_TYPE_HASH);
> > > __type(key, int);
> > > __type(value, __u32);
> > > __uint(max_entries, 1);
> > > __uint(pinning, LIBBPF_PIN_BY_NAME);
> > > } count_map SEC(".maps");
> > >
> > > SEC("tracepoint/qdisc/qdisc_dequeue")
> > > int trace_dequeue(struct qdisc_dequeue_ctx *ctx)
> > > {
> > > int key = 0;
> > > __u32 *value;
> > > __u32 init = 0;
> > >
> > > value = bpf_map_lookup_elem(&count_map, &key);
> > > if (value) {
> > > *value += 1;
> > > } else {
> > > bpf_printk("value reset");
> > > bpf_map_update_elem(&count_map, &key, &init, 0);
> > > }
> > > return 0;
> > > }
> > >
> > > Any suggestion is appreciated!
> > >
> >
> > First, why do you use HASH map for single-key map? ARRAY would make
> > more sense for keys that are small integers. But I assume your real
> > world use case needs bigger and more random keys, right?
> >
> Yes, this just is a simple test.
>
> >
> > Second, you have two race conditions. One, you overwrite the value in
> > the map with this bpf_map_update_elem(..., 0). Use BPF_NOEXISTS for
> > initialization to avoid overwriting something that another CPU already
> > set. Another one is your *value += 1 is non-atomic, so you are loosing
> > updates as well. Use __sync_fetch_and_add(value, 1) for atomic
> > increment.
>
> __sync_fetch_and_add do solve my problem. Thanks!
Oh, sorry!
The count value is about 700 when I do a bpf_printk() in my bpf program
and with a background command "cat /sys/kernel/debug/tracing/trace_pipe".
If I remove the bpf_printk() or don't read the trace_pipe, the count
value shows abnormal, for example, about 10.
As your suggestion, the code now is:
value = bpf_map_lookup_elem(&count_map, &key);
if (!value) {
bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXIST);
value = bpf_map_lookup_elem(&count_map, &key);
}
if (!value)
return 0;
bpf_printk("hello"); // I don't know why this affect the count value read from userspace
__sync_fetch_and_add(value, 1);
>
> I have tried to use spinlock to prevent race conditions, but it seems
> that spinlock cannot be used in tracepoint.
>
> >
> > Something like this:
> >
> > value = bpf_map_lookup_elem(&count_map, &key);
> > if (!value) {
> > /* BPF_NOEXIST won't allow to override the value that's already set */
> > bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXISTS);
> > value = bpf_map_lookup_elem(&count_map, &key);
> > }
> > if (!value)
> > return 0;
> >
> > __sync_fetch_and_add(value, 1);
> >
> >
> > > Thanks,
> > > Wu Zongyong
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue
2022-04-14 3:45 ` Wu Zongyong
@ 2022-04-14 21:34 ` Andrii Nakryiko
0 siblings, 0 replies; 6+ messages in thread
From: Andrii Nakryiko @ 2022-04-14 21:34 UTC (permalink / raw)
To: Wu Zongyong; +Cc: bpf
On Wed, Apr 13, 2022 at 8:45 PM Wu Zongyong
<wuzongyong@linux.alibaba.com> wrote:
>
> On Thu, Apr 14, 2022 at 10:07:09AM +0800, Wu Zongyong wrote:
> > On Wed, Apr 13, 2022 at 12:30:51PM -0700, Andrii Nakryiko wrote:
> > > On Wed, Apr 13, 2022 at 12:08 PM Wu Zongyong
> > > <wuzongyong@linux.alibaba.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read
> > > > the count value from userspace by bpf_map_lookup_elem().
> > > > With bpftrace, I can see this tracepoint is hit about 700 times, but the count
> > > > I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk()
> > > > to the program, and the count I get is normal which is about 700.
> > > >
> > > > The bpf program codes like that:
> > > >
> > > > struct qdisc_dequeue_ctx {
> > > > __u64 __pad;
> > > > __u64 qdisc;
> > > > __u64 txq;
> > > > int packets;
> > > > __u64 skbaddr;
> > > > int ifindex;
> > > > __u32 handle;
> > > > __u32 parent;
> > > > unsigned long txq_state;
> > > > };
> > > >
> > > > struct {
> > > > __uint(type, BPF_MAP_TYPE_HASH);
> > > > __type(key, int);
> > > > __type(value, __u32);
> > > > __uint(max_entries, 1);
> > > > __uint(pinning, LIBBPF_PIN_BY_NAME);
> > > > } count_map SEC(".maps");
> > > >
> > > > SEC("tracepoint/qdisc/qdisc_dequeue")
> > > > int trace_dequeue(struct qdisc_dequeue_ctx *ctx)
> > > > {
> > > > int key = 0;
> > > > __u32 *value;
> > > > __u32 init = 0;
> > > >
> > > > value = bpf_map_lookup_elem(&count_map, &key);
> > > > if (value) {
> > > > *value += 1;
> > > > } else {
> > > > bpf_printk("value reset");
> > > > bpf_map_update_elem(&count_map, &key, &init, 0);
> > > > }
> > > > return 0;
> > > > }
> > > >
> > > > Any suggestion is appreciated!
> > > >
> > >
> > > First, why do you use HASH map for single-key map? ARRAY would make
> > > more sense for keys that are small integers. But I assume your real
> > > world use case needs bigger and more random keys, right?
> > >
> > Yes, this just is a simple test.
> >
> > >
> > > Second, you have two race conditions. One, you overwrite the value in
> > > the map with this bpf_map_update_elem(..., 0). Use BPF_NOEXISTS for
> > > initialization to avoid overwriting something that another CPU already
> > > set. Another one is your *value += 1 is non-atomic, so you are loosing
> > > updates as well. Use __sync_fetch_and_add(value, 1) for atomic
> > > increment.
> >
> > __sync_fetch_and_add do solve my problem. Thanks!
>
> Oh, sorry!
> The count value is about 700 when I do a bpf_printk() in my bpf program
> and with a background command "cat /sys/kernel/debug/tracing/trace_pipe".
>
> If I remove the bpf_printk() or don't read the trace_pipe, the count
> value shows abnormal, for example, about 10.
Not clear, is it 10 even with __sync_fetch_and_add()?
As for why bpf_printk() makes a difference. One reason might be
because bpf_trace_printk() (called from bpf_printk() macro) takes
trace_printk_lock, which introduces a bit of synchronization point,
which reduces this race window. But it might be something else, don't
know.
>
> As your suggestion, the code now is:
>
> value = bpf_map_lookup_elem(&count_map, &key);
> if (!value) {
> bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXIST);
> value = bpf_map_lookup_elem(&count_map, &key);
> }
> if (!value)
> return 0;
>
> bpf_printk("hello"); // I don't know why this affect the count value read from userspace
>
> __sync_fetch_and_add(value, 1);
>
>
> >
> > I have tried to use spinlock to prevent race conditions, but it seems
> > that spinlock cannot be used in tracepoint.
> >
> > >
> > > Something like this:
> > >
> > > value = bpf_map_lookup_elem(&count_map, &key);
> > > if (!value) {
> > > /* BPF_NOEXIST won't allow to override the value that's already set */
> > > bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXISTS);
> > > value = bpf_map_lookup_elem(&count_map, &key);
> > > }
> > > if (!value)
> > > return 0;
> > >
> > > __sync_fetch_and_add(value, 1);
> > >
> > >
> > > > Thanks,
> > > > Wu Zongyong
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-04-14 21:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-13 14:06 [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue Wu Zongyong
2022-04-13 19:30 ` Andrii Nakryiko
2022-04-14 1:50 ` Yafang Shao
2022-04-14 2:07 ` Wu Zongyong
2022-04-14 3:45 ` Wu Zongyong
2022-04-14 21:34 ` Andrii Nakryiko
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.