* [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue @ 2022-04-13 14:06 Wu Zongyong 2022-04-13 19:30 ` Andrii Nakryiko 0 siblings, 1 reply; 6+ messages in thread From: Wu Zongyong @ 2022-04-13 14:06 UTC (permalink / raw) To: bpf Hi, I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read the count value from userspace by bpf_map_lookup_elem(). With bpftrace, I can see this tracepoint is hit about 700 times, but the count I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk() to the program, and the count I get is normal which is about 700. The bpf program codes like that: struct qdisc_dequeue_ctx { __u64 __pad; __u64 qdisc; __u64 txq; int packets; __u64 skbaddr; int ifindex; __u32 handle; __u32 parent; unsigned long txq_state; }; struct { __uint(type, BPF_MAP_TYPE_HASH); __type(key, int); __type(value, __u32); __uint(max_entries, 1); __uint(pinning, LIBBPF_PIN_BY_NAME); } count_map SEC(".maps"); SEC("tracepoint/qdisc/qdisc_dequeue") int trace_dequeue(struct qdisc_dequeue_ctx *ctx) { int key = 0; __u32 *value; __u32 init = 0; value = bpf_map_lookup_elem(&count_map, &key); if (value) { *value += 1; } else { bpf_printk("value reset"); bpf_map_update_elem(&count_map, &key, &init, 0); } return 0; } Any suggestion is appreciated! Thanks, Wu Zongyong ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue 2022-04-13 14:06 [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue Wu Zongyong @ 2022-04-13 19:30 ` Andrii Nakryiko 2022-04-14 1:50 ` Yafang Shao 2022-04-14 2:07 ` Wu Zongyong 0 siblings, 2 replies; 6+ messages in thread From: Andrii Nakryiko @ 2022-04-13 19:30 UTC (permalink / raw) To: Wu Zongyong; +Cc: bpf On Wed, Apr 13, 2022 at 12:08 PM Wu Zongyong <wuzongyong@linux.alibaba.com> wrote: > > Hi, > > I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read > the count value from userspace by bpf_map_lookup_elem(). > With bpftrace, I can see this tracepoint is hit about 700 times, but the count > I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk() > to the program, and the count I get is normal which is about 700. > > The bpf program codes like that: > > struct qdisc_dequeue_ctx { > __u64 __pad; > __u64 qdisc; > __u64 txq; > int packets; > __u64 skbaddr; > int ifindex; > __u32 handle; > __u32 parent; > unsigned long txq_state; > }; > > struct { > __uint(type, BPF_MAP_TYPE_HASH); > __type(key, int); > __type(value, __u32); > __uint(max_entries, 1); > __uint(pinning, LIBBPF_PIN_BY_NAME); > } count_map SEC(".maps"); > > SEC("tracepoint/qdisc/qdisc_dequeue") > int trace_dequeue(struct qdisc_dequeue_ctx *ctx) > { > int key = 0; > __u32 *value; > __u32 init = 0; > > value = bpf_map_lookup_elem(&count_map, &key); > if (value) { > *value += 1; > } else { > bpf_printk("value reset"); > bpf_map_update_elem(&count_map, &key, &init, 0); > } > return 0; > } > > Any suggestion is appreciated! > First, why do you use HASH map for single-key map? ARRAY would make more sense for keys that are small integers. But I assume your real world use case needs bigger and more random keys, right? Second, you have two race conditions. One, you overwrite the value in the map with this bpf_map_update_elem(..., 0). Use BPF_NOEXISTS for initialization to avoid overwriting something that another CPU already set. Another one is your *value += 1 is non-atomic, so you are loosing updates as well. Use __sync_fetch_and_add(value, 1) for atomic increment. Something like this: value = bpf_map_lookup_elem(&count_map, &key); if (!value) { /* BPF_NOEXIST won't allow to override the value that's already set */ bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXISTS); value = bpf_map_lookup_elem(&count_map, &key); } if (!value) return 0; __sync_fetch_and_add(value, 1); > Thanks, > Wu Zongyong ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue 2022-04-13 19:30 ` Andrii Nakryiko @ 2022-04-14 1:50 ` Yafang Shao 2022-04-14 2:07 ` Wu Zongyong 1 sibling, 0 replies; 6+ messages in thread From: Yafang Shao @ 2022-04-14 1:50 UTC (permalink / raw) To: Andrii Nakryiko; +Cc: Wu Zongyong, bpf On Thu, Apr 14, 2022 at 8:25 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Wed, Apr 13, 2022 at 12:08 PM Wu Zongyong > <wuzongyong@linux.alibaba.com> wrote: > > > > Hi, > > > > I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read > > the count value from userspace by bpf_map_lookup_elem(). > > With bpftrace, I can see this tracepoint is hit about 700 times, but the count > > I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk() > > to the program, and the count I get is normal which is about 700. > > > > The bpf program codes like that: > > > > struct qdisc_dequeue_ctx { > > __u64 __pad; > > __u64 qdisc; > > __u64 txq; > > int packets; > > __u64 skbaddr; > > int ifindex; > > __u32 handle; > > __u32 parent; > > unsigned long txq_state; > > }; > > > > struct { > > __uint(type, BPF_MAP_TYPE_HASH); > > __type(key, int); > > __type(value, __u32); > > __uint(max_entries, 1); > > __uint(pinning, LIBBPF_PIN_BY_NAME); > > } count_map SEC(".maps"); > > > > SEC("tracepoint/qdisc/qdisc_dequeue") > > int trace_dequeue(struct qdisc_dequeue_ctx *ctx) > > { > > int key = 0; > > __u32 *value; > > __u32 init = 0; > > > > value = bpf_map_lookup_elem(&count_map, &key); > > if (value) { > > *value += 1; > > } else { > > bpf_printk("value reset"); > > bpf_map_update_elem(&count_map, &key, &init, 0); > > } > > return 0; > > } > > > > Any suggestion is appreciated! > > > > First, why do you use HASH map for single-key map? ARRAY would make > more sense for keys that are small integers. But I assume your real > world use case needs bigger and more random keys, right? > > Second, you have two race conditions. One, you overwrite the value in > the map with this bpf_map_update_elem(..., 0). Use BPF_NOEXISTS for > initialization to avoid overwriting something that another CPU already > set. Another one is your *value += 1 is non-atomic, so you are loosing > updates as well. Use __sync_fetch_and_add(value, 1) for atomic > increment. > > Something like this: > > value = bpf_map_lookup_elem(&count_map, &key); > if (!value) { > /* BPF_NOEXIST won't allow to override the value that's already set */ > bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXISTS); > value = bpf_map_lookup_elem(&count_map, &key); > } > if (!value) > return 0; > > __sync_fetch_and_add(value, 1); > Hi Andrii, I'm curious that if we should do it as below, new_value = *value + 1; bpf_map_update_elem(&count_map, &key, &new_value, BPF_EXIST); in case someone, e.g. bpftool, may delete this elem in parallel. BTW, why don't we have a helper like bpf_map_lookup_elem_value() which returns the value directly instead of the addr of the value ? -- Thanks Yafang ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue 2022-04-13 19:30 ` Andrii Nakryiko 2022-04-14 1:50 ` Yafang Shao @ 2022-04-14 2:07 ` Wu Zongyong 2022-04-14 3:45 ` Wu Zongyong 1 sibling, 1 reply; 6+ messages in thread From: Wu Zongyong @ 2022-04-14 2:07 UTC (permalink / raw) To: Andrii Nakryiko; +Cc: bpf On Wed, Apr 13, 2022 at 12:30:51PM -0700, Andrii Nakryiko wrote: > On Wed, Apr 13, 2022 at 12:08 PM Wu Zongyong > <wuzongyong@linux.alibaba.com> wrote: > > > > Hi, > > > > I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read > > the count value from userspace by bpf_map_lookup_elem(). > > With bpftrace, I can see this tracepoint is hit about 700 times, but the count > > I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk() > > to the program, and the count I get is normal which is about 700. > > > > The bpf program codes like that: > > > > struct qdisc_dequeue_ctx { > > __u64 __pad; > > __u64 qdisc; > > __u64 txq; > > int packets; > > __u64 skbaddr; > > int ifindex; > > __u32 handle; > > __u32 parent; > > unsigned long txq_state; > > }; > > > > struct { > > __uint(type, BPF_MAP_TYPE_HASH); > > __type(key, int); > > __type(value, __u32); > > __uint(max_entries, 1); > > __uint(pinning, LIBBPF_PIN_BY_NAME); > > } count_map SEC(".maps"); > > > > SEC("tracepoint/qdisc/qdisc_dequeue") > > int trace_dequeue(struct qdisc_dequeue_ctx *ctx) > > { > > int key = 0; > > __u32 *value; > > __u32 init = 0; > > > > value = bpf_map_lookup_elem(&count_map, &key); > > if (value) { > > *value += 1; > > } else { > > bpf_printk("value reset"); > > bpf_map_update_elem(&count_map, &key, &init, 0); > > } > > return 0; > > } > > > > Any suggestion is appreciated! > > > > First, why do you use HASH map for single-key map? ARRAY would make > more sense for keys that are small integers. But I assume your real > world use case needs bigger and more random keys, right? > Yes, this just is a simple test. > > Second, you have two race conditions. One, you overwrite the value in > the map with this bpf_map_update_elem(..., 0). Use BPF_NOEXISTS for > initialization to avoid overwriting something that another CPU already > set. Another one is your *value += 1 is non-atomic, so you are loosing > updates as well. Use __sync_fetch_and_add(value, 1) for atomic > increment. __sync_fetch_and_add do solve my problem. Thanks! I have tried to use spinlock to prevent race conditions, but it seems that spinlock cannot be used in tracepoint. > > Something like this: > > value = bpf_map_lookup_elem(&count_map, &key); > if (!value) { > /* BPF_NOEXIST won't allow to override the value that's already set */ > bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXISTS); > value = bpf_map_lookup_elem(&count_map, &key); > } > if (!value) > return 0; > > __sync_fetch_and_add(value, 1); > > > > Thanks, > > Wu Zongyong ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue 2022-04-14 2:07 ` Wu Zongyong @ 2022-04-14 3:45 ` Wu Zongyong 2022-04-14 21:34 ` Andrii Nakryiko 0 siblings, 1 reply; 6+ messages in thread From: Wu Zongyong @ 2022-04-14 3:45 UTC (permalink / raw) To: Andrii Nakryiko; +Cc: bpf On Thu, Apr 14, 2022 at 10:07:09AM +0800, Wu Zongyong wrote: > On Wed, Apr 13, 2022 at 12:30:51PM -0700, Andrii Nakryiko wrote: > > On Wed, Apr 13, 2022 at 12:08 PM Wu Zongyong > > <wuzongyong@linux.alibaba.com> wrote: > > > > > > Hi, > > > > > > I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read > > > the count value from userspace by bpf_map_lookup_elem(). > > > With bpftrace, I can see this tracepoint is hit about 700 times, but the count > > > I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk() > > > to the program, and the count I get is normal which is about 700. > > > > > > The bpf program codes like that: > > > > > > struct qdisc_dequeue_ctx { > > > __u64 __pad; > > > __u64 qdisc; > > > __u64 txq; > > > int packets; > > > __u64 skbaddr; > > > int ifindex; > > > __u32 handle; > > > __u32 parent; > > > unsigned long txq_state; > > > }; > > > > > > struct { > > > __uint(type, BPF_MAP_TYPE_HASH); > > > __type(key, int); > > > __type(value, __u32); > > > __uint(max_entries, 1); > > > __uint(pinning, LIBBPF_PIN_BY_NAME); > > > } count_map SEC(".maps"); > > > > > > SEC("tracepoint/qdisc/qdisc_dequeue") > > > int trace_dequeue(struct qdisc_dequeue_ctx *ctx) > > > { > > > int key = 0; > > > __u32 *value; > > > __u32 init = 0; > > > > > > value = bpf_map_lookup_elem(&count_map, &key); > > > if (value) { > > > *value += 1; > > > } else { > > > bpf_printk("value reset"); > > > bpf_map_update_elem(&count_map, &key, &init, 0); > > > } > > > return 0; > > > } > > > > > > Any suggestion is appreciated! > > > > > > > First, why do you use HASH map for single-key map? ARRAY would make > > more sense for keys that are small integers. But I assume your real > > world use case needs bigger and more random keys, right? > > > Yes, this just is a simple test. > > > > > Second, you have two race conditions. One, you overwrite the value in > > the map with this bpf_map_update_elem(..., 0). Use BPF_NOEXISTS for > > initialization to avoid overwriting something that another CPU already > > set. Another one is your *value += 1 is non-atomic, so you are loosing > > updates as well. Use __sync_fetch_and_add(value, 1) for atomic > > increment. > > __sync_fetch_and_add do solve my problem. Thanks! Oh, sorry! The count value is about 700 when I do a bpf_printk() in my bpf program and with a background command "cat /sys/kernel/debug/tracing/trace_pipe". If I remove the bpf_printk() or don't read the trace_pipe, the count value shows abnormal, for example, about 10. As your suggestion, the code now is: value = bpf_map_lookup_elem(&count_map, &key); if (!value) { bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXIST); value = bpf_map_lookup_elem(&count_map, &key); } if (!value) return 0; bpf_printk("hello"); // I don't know why this affect the count value read from userspace __sync_fetch_and_add(value, 1); > > I have tried to use spinlock to prevent race conditions, but it seems > that spinlock cannot be used in tracepoint. > > > > > Something like this: > > > > value = bpf_map_lookup_elem(&count_map, &key); > > if (!value) { > > /* BPF_NOEXIST won't allow to override the value that's already set */ > > bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXISTS); > > value = bpf_map_lookup_elem(&count_map, &key); > > } > > if (!value) > > return 0; > > > > __sync_fetch_and_add(value, 1); > > > > > > > Thanks, > > > Wu Zongyong ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue 2022-04-14 3:45 ` Wu Zongyong @ 2022-04-14 21:34 ` Andrii Nakryiko 0 siblings, 0 replies; 6+ messages in thread From: Andrii Nakryiko @ 2022-04-14 21:34 UTC (permalink / raw) To: Wu Zongyong; +Cc: bpf On Wed, Apr 13, 2022 at 8:45 PM Wu Zongyong <wuzongyong@linux.alibaba.com> wrote: > > On Thu, Apr 14, 2022 at 10:07:09AM +0800, Wu Zongyong wrote: > > On Wed, Apr 13, 2022 at 12:30:51PM -0700, Andrii Nakryiko wrote: > > > On Wed, Apr 13, 2022 at 12:08 PM Wu Zongyong > > > <wuzongyong@linux.alibaba.com> wrote: > > > > > > > > Hi, > > > > > > > > I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read > > > > the count value from userspace by bpf_map_lookup_elem(). > > > > With bpftrace, I can see this tracepoint is hit about 700 times, but the count > > > > I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk() > > > > to the program, and the count I get is normal which is about 700. > > > > > > > > The bpf program codes like that: > > > > > > > > struct qdisc_dequeue_ctx { > > > > __u64 __pad; > > > > __u64 qdisc; > > > > __u64 txq; > > > > int packets; > > > > __u64 skbaddr; > > > > int ifindex; > > > > __u32 handle; > > > > __u32 parent; > > > > unsigned long txq_state; > > > > }; > > > > > > > > struct { > > > > __uint(type, BPF_MAP_TYPE_HASH); > > > > __type(key, int); > > > > __type(value, __u32); > > > > __uint(max_entries, 1); > > > > __uint(pinning, LIBBPF_PIN_BY_NAME); > > > > } count_map SEC(".maps"); > > > > > > > > SEC("tracepoint/qdisc/qdisc_dequeue") > > > > int trace_dequeue(struct qdisc_dequeue_ctx *ctx) > > > > { > > > > int key = 0; > > > > __u32 *value; > > > > __u32 init = 0; > > > > > > > > value = bpf_map_lookup_elem(&count_map, &key); > > > > if (value) { > > > > *value += 1; > > > > } else { > > > > bpf_printk("value reset"); > > > > bpf_map_update_elem(&count_map, &key, &init, 0); > > > > } > > > > return 0; > > > > } > > > > > > > > Any suggestion is appreciated! > > > > > > > > > > First, why do you use HASH map for single-key map? ARRAY would make > > > more sense for keys that are small integers. But I assume your real > > > world use case needs bigger and more random keys, right? > > > > > Yes, this just is a simple test. > > > > > > > > Second, you have two race conditions. One, you overwrite the value in > > > the map with this bpf_map_update_elem(..., 0). Use BPF_NOEXISTS for > > > initialization to avoid overwriting something that another CPU already > > > set. Another one is your *value += 1 is non-atomic, so you are loosing > > > updates as well. Use __sync_fetch_and_add(value, 1) for atomic > > > increment. > > > > __sync_fetch_and_add do solve my problem. Thanks! > > Oh, sorry! > The count value is about 700 when I do a bpf_printk() in my bpf program > and with a background command "cat /sys/kernel/debug/tracing/trace_pipe". > > If I remove the bpf_printk() or don't read the trace_pipe, the count > value shows abnormal, for example, about 10. Not clear, is it 10 even with __sync_fetch_and_add()? As for why bpf_printk() makes a difference. One reason might be because bpf_trace_printk() (called from bpf_printk() macro) takes trace_printk_lock, which introduces a bit of synchronization point, which reduces this race window. But it might be something else, don't know. > > As your suggestion, the code now is: > > value = bpf_map_lookup_elem(&count_map, &key); > if (!value) { > bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXIST); > value = bpf_map_lookup_elem(&count_map, &key); > } > if (!value) > return 0; > > bpf_printk("hello"); // I don't know why this affect the count value read from userspace > > __sync_fetch_and_add(value, 1); > > > > > > I have tried to use spinlock to prevent race conditions, but it seems > > that spinlock cannot be used in tracepoint. > > > > > > > > Something like this: > > > > > > value = bpf_map_lookup_elem(&count_map, &key); > > > if (!value) { > > > /* BPF_NOEXIST won't allow to override the value that's already set */ > > > bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXISTS); > > > value = bpf_map_lookup_elem(&count_map, &key); > > > } > > > if (!value) > > > return 0; > > > > > > __sync_fetch_and_add(value, 1); > > > > > > > > > > Thanks, > > > > Wu Zongyong ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-04-14 21:34 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-13 14:06 [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue Wu Zongyong 2022-04-13 19:30 ` Andrii Nakryiko 2022-04-14 1:50 ` Yafang Shao 2022-04-14 2:07 ` Wu Zongyong 2022-04-14 3:45 ` Wu Zongyong 2022-04-14 21:34 ` Andrii Nakryiko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.