All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Wu Zongyong <wuzongyong@linux.alibaba.com>, bpf <bpf@vger.kernel.org>
Subject: Re: [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue
Date: Thu, 14 Apr 2022 09:50:10 +0800	[thread overview]
Message-ID: <CALOAHbAvikKyrSTm6J9YSP0-7ifkxku2q2FeUsZMyxYdDdCqpg@mail.gmail.com> (raw)
In-Reply-To: <CAEf4BzYvBHwsFrp52ZqhP=H1WDdpEeovJcgctv2nioAvBg6FBw@mail.gmail.com>

On Thu, Apr 14, 2022 at 8:25 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Apr 13, 2022 at 12:08 PM Wu Zongyong
> <wuzongyong@linux.alibaba.com> wrote:
> >
> > Hi,
> >
> > I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read
> > the count value from userspace by bpf_map_lookup_elem().
> > With bpftrace, I can see this tracepoint is hit about 700 times, but the count
> > I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk()
> > to the program, and the count I get is normal which is about 700.
> >
> > The bpf program codes like that:
> >
> >         struct qdisc_dequeue_ctx {
> >                 __u64           __pad;
> >                 __u64           qdisc;
> >                 __u64           txq;
> >                 int             packets;
> >                 __u64           skbaddr;
> >                 int             ifindex;
> >                 __u32           handle;
> >                 __u32           parent;
> >                 unsigned long   txq_state;
> >         };
> >
> >         struct {
> >                 __uint(type, BPF_MAP_TYPE_HASH);
> >                 __type(key, int);
> >                 __type(value, __u32);
> >                 __uint(max_entries, 1);
> >                 __uint(pinning, LIBBPF_PIN_BY_NAME);
> >         } count_map SEC(".maps");
> >
> >         SEC("tracepoint/qdisc/qdisc_dequeue")
> >         int trace_dequeue(struct qdisc_dequeue_ctx *ctx)
> >         {
> >                 int key = 0;
> >                 __u32 *value;
> >                 __u32 init = 0;
> >
> >                 value = bpf_map_lookup_elem(&count_map, &key);
> >                 if (value) {
> >                         *value += 1;
> >                 } else {
> >                         bpf_printk("value reset");
> >                         bpf_map_update_elem(&count_map, &key, &init, 0);
> >                 }
> >                 return 0;
> >         }
> >
> > Any suggestion is appreciated!
> >
>
> First, why do you use HASH map for single-key map? ARRAY would make
> more sense for keys that are small integers. But I assume your real
> world use case needs bigger and more random keys, right?
>
> Second, you have two race conditions. One, you overwrite the value in
> the map with this bpf_map_update_elem(..., 0). Use BPF_NOEXISTS for
> initialization to avoid overwriting something that another CPU already
> set. Another one is your *value += 1 is non-atomic, so you are loosing
> updates as well. Use __sync_fetch_and_add(value, 1) for atomic
> increment.
>
> Something like this:
>
> value = bpf_map_lookup_elem(&count_map, &key);
> if (!value) {
>     /* BPF_NOEXIST won't allow to override the value that's already set */
>     bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXISTS);
>     value = bpf_map_lookup_elem(&count_map, &key);
> }
> if (!value)
>     return 0;
>
> __sync_fetch_and_add(value, 1);
>

Hi Andrii,

I'm curious that if we should do it as below,

new_value = *value + 1;
bpf_map_update_elem(&count_map, &key, &new_value, BPF_EXIST);

in case someone, e.g. bpftool, may delete this elem in parallel.

BTW, why don't we have a helper like bpf_map_lookup_elem_value() which
returns the value directly instead of the addr of the value ?

-- 
Thanks
Yafang

  reply	other threads:[~2022-04-14  1:50 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 14:06 [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue Wu Zongyong
2022-04-13 19:30 ` Andrii Nakryiko
2022-04-14  1:50   ` Yafang Shao [this message]
2022-04-14  2:07   ` Wu Zongyong
2022-04-14  3:45     ` Wu Zongyong
2022-04-14 21:34       ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALOAHbAvikKyrSTm6J9YSP0-7ifkxku2q2FeUsZMyxYdDdCqpg@mail.gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=wuzongyong@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.