All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Zongyong <wuzongyong@linux.alibaba.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf <bpf@vger.kernel.org>
Subject: Re: [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue
Date: Thu, 14 Apr 2022 11:45:48 +0800	[thread overview]
Message-ID: <20220414034548.GA27766@L-PF27918B-1352.localdomain> (raw)
In-Reply-To: <20220414020709.GA27635@L-PF27918B-1352.localdomain>

On Thu, Apr 14, 2022 at 10:07:09AM +0800, Wu Zongyong wrote:
> On Wed, Apr 13, 2022 at 12:30:51PM -0700, Andrii Nakryiko wrote:
> > On Wed, Apr 13, 2022 at 12:08 PM Wu Zongyong
> > <wuzongyong@linux.alibaba.com> wrote:
> > >
> > > Hi,
> > >
> > > I tried to count when tracepoint qdisc/qdisc_dequeue hit each time, then read
> > > the count value from userspace by bpf_map_lookup_elem().
> > > With bpftrace, I can see this tracepoint is hit about 700 times, but the count
> > > I get from the bpf map is below 20. It's weird. Then I tried to add a bpf_printk()
> > > to the program, and the count I get is normal which is about 700.
> > >
> > > The bpf program codes like that:
> > >
> > >         struct qdisc_dequeue_ctx {
> > >                 __u64           __pad;
> > >                 __u64           qdisc;
> > >                 __u64           txq;
> > >                 int             packets;
> > >                 __u64           skbaddr;
> > >                 int             ifindex;
> > >                 __u32           handle;
> > >                 __u32           parent;
> > >                 unsigned long   txq_state;
> > >         };
> > >
> > >         struct {
> > >                 __uint(type, BPF_MAP_TYPE_HASH);
> > >                 __type(key, int);
> > >                 __type(value, __u32);
> > >                 __uint(max_entries, 1);
> > >                 __uint(pinning, LIBBPF_PIN_BY_NAME);
> > >         } count_map SEC(".maps");
> > >
> > >         SEC("tracepoint/qdisc/qdisc_dequeue")
> > >         int trace_dequeue(struct qdisc_dequeue_ctx *ctx)
> > >         {
> > >                 int key = 0;
> > >                 __u32 *value;
> > >                 __u32 init = 0;
> > >
> > >                 value = bpf_map_lookup_elem(&count_map, &key);
> > >                 if (value) {
> > >                         *value += 1;
> > >                 } else {
> > >                         bpf_printk("value reset");
> > >                         bpf_map_update_elem(&count_map, &key, &init, 0);
> > >                 }
> > >                 return 0;
> > >         }
> > >
> > > Any suggestion is appreciated!
> > >
> > 
> > First, why do you use HASH map for single-key map? ARRAY would make
> > more sense for keys that are small integers. But I assume your real
> > world use case needs bigger and more random keys, right?
> >
> Yes, this just is a simple test.
> 
> > 
> > Second, you have two race conditions. One, you overwrite the value in
> > the map with this bpf_map_update_elem(..., 0). Use BPF_NOEXISTS for
> > initialization to avoid overwriting something that another CPU already
> > set. Another one is your *value += 1 is non-atomic, so you are loosing
> > updates as well. Use __sync_fetch_and_add(value, 1) for atomic
> > increment.
> 
> __sync_fetch_and_add do solve my problem. Thanks!

Oh, sorry!
The count value is about 700 when I do a bpf_printk() in my bpf program
and with a background command "cat /sys/kernel/debug/tracing/trace_pipe".

If I remove the bpf_printk() or don't read the trace_pipe, the count
value shows abnormal, for example, about 10.

As your suggestion, the code now is:

	value = bpf_map_lookup_elem(&count_map, &key);
        if (!value) {
                bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXIST);
                value = bpf_map_lookup_elem(&count_map, &key);
        }
        if (!value)
                return 0;

        bpf_printk("hello");    // I don't know why this affect the count value read from userspace

        __sync_fetch_and_add(value, 1);


> 
> I have tried to use spinlock to prevent race conditions, but it seems
> that spinlock cannot be used in tracepoint.
> 
> > 
> > Something like this:
> > 
> > value = bpf_map_lookup_elem(&count_map, &key);
> > if (!value) {
> >     /* BPF_NOEXIST won't allow to override the value that's already set */
> >     bpf_map_update_elem(&count_map, &key, &init, BPF_NOEXISTS);
> >     value = bpf_map_lookup_elem(&count_map, &key);
> > }
> > if (!value)
> >     return 0;
> > 
> > __sync_fetch_and_add(value, 1);
> > 
> > 
> > > Thanks,
> > > Wu Zongyong

  reply	other threads:[~2022-04-14  3:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13 14:06 [Question] bpf map value increase unexpectedly with tracepoint qdisc/qdisc_dequeue Wu Zongyong
2022-04-13 19:30 ` Andrii Nakryiko
2022-04-14  1:50   ` Yafang Shao
2022-04-14  2:07   ` Wu Zongyong
2022-04-14  3:45     ` Wu Zongyong [this message]
2022-04-14 21:34       ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220414034548.GA27766@L-PF27918B-1352.localdomain \
    --to=wuzongyong@linux.alibaba.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.