All of lore.kernel.org
 help / color / mirror / Atom feed
* Tracing NVMe Driver with BPF missing events
@ 2022-05-18 21:31 John Mazzie
  2022-05-21  0:10 ` Andrii Nakryiko
  0 siblings, 1 reply; 7+ messages in thread
From: John Mazzie @ 2022-05-18 21:31 UTC (permalink / raw)
  To: bpf, John Mazzie (jmazzie)

My group at Micron is using BPF and love the tracing capabilities it
provides. We are mainly focused on the storage subsystem and BPF has
been really helpful in understanding how the storage subsystem
interacts with our drives while running applications.

In the process of developing a tool using BPF to trace the nvme
driver, we ran into an issue with some missing events. I wanted to
check to see if this is possibly a bug/limitation that I'm hitting or
if it's expected behavior with heavy tracing. We are trying to trace 2
trace points (nvme_setup_cmd and nvme_complete_rq) around 1M times a
second.
We noticed if we just trace one of the two, we see all the expected
events, but if we trace both at the same time, the nvme_complete_rq
misses events. I am using two different percpu_hash maps to count both
events. One for setup and another for complete. My expectation was
that tracing these events would affect performance, somewhat, but not
miss events. Ultimately the tool would be used to trace nvme latencies
at the driver level by device and process.

My tool was developed using libbpf v0.7, and I've tested on Rocky
Linux 8.5 (Kernel 4.18.0), Ubuntu 20.04 (Kernel 5.4) and Fedora 36
(Kernel 5.17.6) with the same results.

Thanks,
John Mazzie
Principal Storage Solutions Engineer
Micron Technology, Inc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Tracing NVMe Driver with BPF missing events
  2022-05-18 21:31 Tracing NVMe Driver with BPF missing events John Mazzie
@ 2022-05-21  0:10 ` Andrii Nakryiko
  2022-05-21 16:52   ` John Mazzie
  0 siblings, 1 reply; 7+ messages in thread
From: Andrii Nakryiko @ 2022-05-21  0:10 UTC (permalink / raw)
  To: John Mazzie; +Cc: bpf, John Mazzie (jmazzie)

On Wed, May 18, 2022 at 2:35 PM John Mazzie <john.p.mazzie@gmail.com> wrote:
>
> My group at Micron is using BPF and love the tracing capabilities it
> provides. We are mainly focused on the storage subsystem and BPF has
> been really helpful in understanding how the storage subsystem
> interacts with our drives while running applications.
>
> In the process of developing a tool using BPF to trace the nvme
> driver, we ran into an issue with some missing events. I wanted to
> check to see if this is possibly a bug/limitation that I'm hitting or
> if it's expected behavior with heavy tracing. We are trying to trace 2
> trace points (nvme_setup_cmd and nvme_complete_rq) around 1M times a
> second.
> We noticed if we just trace one of the two, we see all the expected
> events, but if we trace both at the same time, the nvme_complete_rq

kprobe programs have per-CPU reentrancy protection. That is, if some
BPF kprobe/tracepoint program is running and something happens (e.g.,
BPF program calls some kernel function that has another BPF program
attached to it, or preemption happens and another BPF program is
supposed to run) that would trigger another BPF program, then that
nested BPF program invocation will be skipped.

This might be what happens in your case.

> misses events. I am using two different percpu_hash maps to count both
> events. One for setup and another for complete. My expectation was
> that tracing these events would affect performance, somewhat, but not
> miss events. Ultimately the tool would be used to trace nvme latencies
> at the driver level by device and process.
>
> My tool was developed using libbpf v0.7, and I've tested on Rocky
> Linux 8.5 (Kernel 4.18.0), Ubuntu 20.04 (Kernel 5.4) and Fedora 36
> (Kernel 5.17.6) with the same results.
>
> Thanks,
> John Mazzie
> Principal Storage Solutions Engineer
> Micron Technology, Inc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Tracing NVMe Driver with BPF missing events
  2022-05-21  0:10 ` Andrii Nakryiko
@ 2022-05-21 16:52   ` John Mazzie
  2022-05-24 16:12     ` John Mazzie
  0 siblings, 1 reply; 7+ messages in thread
From: John Mazzie @ 2022-05-21 16:52 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, John Mazzie (jmazzie)

In this case is a BPF program the individual handler of a tracepoint,
or in my context, a BPF program my compiled program that traces both
tracepoints? We aren't running any other BPF tracing during these
tests besides my program counting these 2 tracepoints.

In my program I have 2 handlers, one for
tracepoint:nvme:nvme_setup_cmd and another for
tracepoint:nvme:nvme_complete_rq. I've created a PERCPU_HASH map for
each handler (unique map for each) to use that keeps track of each
time the handler is invoked. The only thing that handler is doing in
each case is incrementing the count value in the map. Though I do
filter by device on each tracepoint. If I comment out the
nvme_setup_cmd code the nvme_complete_rq does get the correct count.

The user side of my program just prints the values for each of these
maps on a 10 second increment.

I can share my code to make this easier, is it preferred that I upload
my code to github and share the link in this thread?

I agree that your suggestion could be my issue, but I just want to
make sure we're on the same page since I'm less familiar with the
internals of BPF.

Thanks,
John

On Fri, May 20, 2022 at 7:10 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, May 18, 2022 at 2:35 PM John Mazzie <john.p.mazzie@gmail.com> wrote:
> >
> > My group at Micron is using BPF and love the tracing capabilities it
> > provides. We are mainly focused on the storage subsystem and BPF has
> > been really helpful in understanding how the storage subsystem
> > interacts with our drives while running applications.
> >
> > In the process of developing a tool using BPF to trace the nvme
> > driver, we ran into an issue with some missing events. I wanted to
> > check to see if this is possibly a bug/limitation that I'm hitting or
> > if it's expected behavior with heavy tracing. We are trying to trace 2
> > trace points (nvme_setup_cmd and nvme_complete_rq) around 1M times a
> > second.
> > We noticed if we just trace one of the two, we see all the expected
> > events, but if we trace both at the same time, the nvme_complete_rq
>
> kprobe programs have per-CPU reentrancy protection. That is, if some
> BPF kprobe/tracepoint program is running and something happens (e.g.,
> BPF program calls some kernel function that has another BPF program
> attached to it, or preemption happens and another BPF program is
> supposed to run) that would trigger another BPF program, then that
> nested BPF program invocation will be skipped.
>
> This might be what happens in your case.
>
> > misses events. I am using two different percpu_hash maps to count both
> > events. One for setup and another for complete. My expectation was
> > that tracing these events would affect performance, somewhat, but not
> > miss events. Ultimately the tool would be used to trace nvme latencies
> > at the driver level by device and process.
> >
> > My tool was developed using libbpf v0.7, and I've tested on Rocky
> > Linux 8.5 (Kernel 4.18.0), Ubuntu 20.04 (Kernel 5.4) and Fedora 36
> > (Kernel 5.17.6) with the same results.
> >
> > Thanks,
> > John Mazzie
> > Principal Storage Solutions Engineer
> > Micron Technology, Inc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Tracing NVMe Driver with BPF missing events
  2022-05-21 16:52   ` John Mazzie
@ 2022-05-24 16:12     ` John Mazzie
  2022-05-24 23:39       ` Andrii Nakryiko
  0 siblings, 1 reply; 7+ messages in thread
From: John Mazzie @ 2022-05-24 16:12 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, John Mazzie (jmazzie)

After thinking about this more, maybe it's more to do with the
interrupt handler for nvme_complete_rq.

In this situation, sometimes the handler for nvme_setup_cmd could be
interrupted to handle nvme_complete_rq and in this situation the
nvme_complete_rq handler wouldn't run. because the nvme_setup_cmd
handler is not complete.

Is this understanding correct? I'm assuming there is no real
workaround for this situation, so we may just need to accept some
missed events.

John

On Sat, May 21, 2022 at 11:52 AM John Mazzie <john.p.mazzie@gmail.com> wrote:
>
> In this case is a BPF program the individual handler of a tracepoint,
> or in my context, a BPF program my compiled program that traces both
> tracepoints? We aren't running any other BPF tracing during these
> tests besides my program counting these 2 tracepoints.
>
> In my program I have 2 handlers, one for
> tracepoint:nvme:nvme_setup_cmd and another for
> tracepoint:nvme:nvme_complete_rq. I've created a PERCPU_HASH map for
> each handler (unique map for each) to use that keeps track of each
> time the handler is invoked. The only thing that handler is doing in
> each case is incrementing the count value in the map. Though I do
> filter by device on each tracepoint. If I comment out the
> nvme_setup_cmd code the nvme_complete_rq does get the correct count.
>
> The user side of my program just prints the values for each of these
> maps on a 10 second increment.
>
> I can share my code to make this easier, is it preferred that I upload
> my code to github and share the link in this thread?
>
> I agree that your suggestion could be my issue, but I just want to
> make sure we're on the same page since I'm less familiar with the
> internals of BPF.
>
> Thanks,
> John
>
> On Fri, May 20, 2022 at 7:10 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, May 18, 2022 at 2:35 PM John Mazzie <john.p.mazzie@gmail.com> wrote:
> > >
> > > My group at Micron is using BPF and love the tracing capabilities it
> > > provides. We are mainly focused on the storage subsystem and BPF has
> > > been really helpful in understanding how the storage subsystem
> > > interacts with our drives while running applications.
> > >
> > > In the process of developing a tool using BPF to trace the nvme
> > > driver, we ran into an issue with some missing events. I wanted to
> > > check to see if this is possibly a bug/limitation that I'm hitting or
> > > if it's expected behavior with heavy tracing. We are trying to trace 2
> > > trace points (nvme_setup_cmd and nvme_complete_rq) around 1M times a
> > > second.
> > > We noticed if we just trace one of the two, we see all the expected
> > > events, but if we trace both at the same time, the nvme_complete_rq
> >
> > kprobe programs have per-CPU reentrancy protection. That is, if some
> > BPF kprobe/tracepoint program is running and something happens (e.g.,
> > BPF program calls some kernel function that has another BPF program
> > attached to it, or preemption happens and another BPF program is
> > supposed to run) that would trigger another BPF program, then that
> > nested BPF program invocation will be skipped.
> >
> > This might be what happens in your case.
> >
> > > misses events. I am using two different percpu_hash maps to count both
> > > events. One for setup and another for complete. My expectation was
> > > that tracing these events would affect performance, somewhat, but not
> > > miss events. Ultimately the tool would be used to trace nvme latencies
> > > at the driver level by device and process.
> > >
> > > My tool was developed using libbpf v0.7, and I've tested on Rocky
> > > Linux 8.5 (Kernel 4.18.0), Ubuntu 20.04 (Kernel 5.4) and Fedora 36
> > > (Kernel 5.17.6) with the same results.
> > >
> > > Thanks,
> > > John Mazzie
> > > Principal Storage Solutions Engineer
> > > Micron Technology, Inc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Tracing NVMe Driver with BPF missing events
  2022-05-24 16:12     ` John Mazzie
@ 2022-05-24 23:39       ` Andrii Nakryiko
  2022-06-03  1:52         ` John Mazzie
  0 siblings, 1 reply; 7+ messages in thread
From: Andrii Nakryiko @ 2022-05-24 23:39 UTC (permalink / raw)
  To: John Mazzie; +Cc: bpf, John Mazzie (jmazzie)

On Tue, May 24, 2022 at 9:12 AM John Mazzie <john.p.mazzie@gmail.com> wrote:
>
> After thinking about this more, maybe it's more to do with the
> interrupt handler for nvme_complete_rq.
>
> In this situation, sometimes the handler for nvme_setup_cmd could be
> interrupted to handle nvme_complete_rq and in this situation the
> nvme_complete_rq handler wouldn't run. because the nvme_setup_cmd
> handler is not complete.
>
> Is this understanding correct?

Yes.

> I'm assuming there is no real
> workaround for this situation, so we may just need to accept some
> missed events.

Try using fentry/fexit programs instead. They use different reentrancy
protection which is at a per-program level.

>
> John
>
> On Sat, May 21, 2022 at 11:52 AM John Mazzie <john.p.mazzie@gmail.com> wrote:
> >
> > In this case is a BPF program the individual handler of a tracepoint,
> > or in my context, a BPF program my compiled program that traces both
> > tracepoints? We aren't running any other BPF tracing during these
> > tests besides my program counting these 2 tracepoints.
> >
> > In my program I have 2 handlers, one for
> > tracepoint:nvme:nvme_setup_cmd and another for
> > tracepoint:nvme:nvme_complete_rq. I've created a PERCPU_HASH map for
> > each handler (unique map for each) to use that keeps track of each
> > time the handler is invoked. The only thing that handler is doing in
> > each case is incrementing the count value in the map. Though I do
> > filter by device on each tracepoint. If I comment out the
> > nvme_setup_cmd code the nvme_complete_rq does get the correct count.
> >
> > The user side of my program just prints the values for each of these
> > maps on a 10 second increment.
> >
> > I can share my code to make this easier, is it preferred that I upload
> > my code to github and share the link in this thread?
> >
> > I agree that your suggestion could be my issue, but I just want to
> > make sure we're on the same page since I'm less familiar with the
> > internals of BPF.
> >
> > Thanks,
> > John
> >
> > On Fri, May 20, 2022 at 7:10 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Wed, May 18, 2022 at 2:35 PM John Mazzie <john.p.mazzie@gmail.com> wrote:
> > > >
> > > > My group at Micron is using BPF and love the tracing capabilities it
> > > > provides. We are mainly focused on the storage subsystem and BPF has
> > > > been really helpful in understanding how the storage subsystem
> > > > interacts with our drives while running applications.
> > > >
> > > > In the process of developing a tool using BPF to trace the nvme
> > > > driver, we ran into an issue with some missing events. I wanted to
> > > > check to see if this is possibly a bug/limitation that I'm hitting or
> > > > if it's expected behavior with heavy tracing. We are trying to trace 2
> > > > trace points (nvme_setup_cmd and nvme_complete_rq) around 1M times a
> > > > second.
> > > > We noticed if we just trace one of the two, we see all the expected
> > > > events, but if we trace both at the same time, the nvme_complete_rq
> > >
> > > kprobe programs have per-CPU reentrancy protection. That is, if some
> > > BPF kprobe/tracepoint program is running and something happens (e.g.,
> > > BPF program calls some kernel function that has another BPF program
> > > attached to it, or preemption happens and another BPF program is
> > > supposed to run) that would trigger another BPF program, then that
> > > nested BPF program invocation will be skipped.
> > >
> > > This might be what happens in your case.
> > >
> > > > misses events. I am using two different percpu_hash maps to count both
> > > > events. One for setup and another for complete. My expectation was
> > > > that tracing these events would affect performance, somewhat, but not
> > > > miss events. Ultimately the tool would be used to trace nvme latencies
> > > > at the driver level by device and process.
> > > >
> > > > My tool was developed using libbpf v0.7, and I've tested on Rocky
> > > > Linux 8.5 (Kernel 4.18.0), Ubuntu 20.04 (Kernel 5.4) and Fedora 36
> > > > (Kernel 5.17.6) with the same results.
> > > >
> > > > Thanks,
> > > > John Mazzie
> > > > Principal Storage Solutions Engineer
> > > > Micron Technology, Inc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Tracing NVMe Driver with BPF missing events
  2022-05-24 23:39       ` Andrii Nakryiko
@ 2022-06-03  1:52         ` John Mazzie
  2022-06-03  3:01           ` Andrii Nakryiko
  0 siblings, 1 reply; 7+ messages in thread
From: John Mazzie @ 2022-06-03  1:52 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, John Mazzie (jmazzie)

Thanks for the help. fentry/fexit seem to be working to get every
event when tracing both events.

I do have one question about how fentry/fexit work in regards to the
function parameters. fexit can access the function parameters in
addition to the return value. Let's say the parameters are pointers
whose value changes between entry and exit on the probed function. Are
the parameters being accessed on entry or exit in fexit. My assumption
would be exit, so I could access the modified values. Is that correct?
The data I'm pulling appears like it might just be the entry
(non-configured) values.

On Tue, May 24, 2022 at 6:40 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, May 24, 2022 at 9:12 AM John Mazzie <john.p.mazzie@gmail.com> wrote:
> >
> > After thinking about this more, maybe it's more to do with the
> > interrupt handler for nvme_complete_rq.
> >
> > In this situation, sometimes the handler for nvme_setup_cmd could be
> > interrupted to handle nvme_complete_rq and in this situation the
> > nvme_complete_rq handler wouldn't run. because the nvme_setup_cmd
> > handler is not complete.
> >
> > Is this understanding correct?
>
> Yes.
>
> > I'm assuming there is no real
> > workaround for this situation, so we may just need to accept some
> > missed events.
>
> Try using fentry/fexit programs instead. They use different reentrancy
> protection which is at a per-program level.
>
> >
> > John
> >
> > On Sat, May 21, 2022 at 11:52 AM John Mazzie <john.p.mazzie@gmail.com> wrote:
> > >
> > > In this case is a BPF program the individual handler of a tracepoint,
> > > or in my context, a BPF program my compiled program that traces both
> > > tracepoints? We aren't running any other BPF tracing during these
> > > tests besides my program counting these 2 tracepoints.
> > >
> > > In my program I have 2 handlers, one for
> > > tracepoint:nvme:nvme_setup_cmd and another for
> > > tracepoint:nvme:nvme_complete_rq. I've created a PERCPU_HASH map for
> > > each handler (unique map for each) to use that keeps track of each
> > > time the handler is invoked. The only thing that handler is doing in
> > > each case is incrementing the count value in the map. Though I do
> > > filter by device on each tracepoint. If I comment out the
> > > nvme_setup_cmd code the nvme_complete_rq does get the correct count.
> > >
> > > The user side of my program just prints the values for each of these
> > > maps on a 10 second increment.
> > >
> > > I can share my code to make this easier, is it preferred that I upload
> > > my code to github and share the link in this thread?
> > >
> > > I agree that your suggestion could be my issue, but I just want to
> > > make sure we're on the same page since I'm less familiar with the
> > > internals of BPF.
> > >
> > > Thanks,
> > > John
> > >
> > > On Fri, May 20, 2022 at 7:10 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Wed, May 18, 2022 at 2:35 PM John Mazzie <john.p.mazzie@gmail.com> wrote:
> > > > >
> > > > > My group at Micron is using BPF and love the tracing capabilities it
> > > > > provides. We are mainly focused on the storage subsystem and BPF has
> > > > > been really helpful in understanding how the storage subsystem
> > > > > interacts with our drives while running applications.
> > > > >
> > > > > In the process of developing a tool using BPF to trace the nvme
> > > > > driver, we ran into an issue with some missing events. I wanted to
> > > > > check to see if this is possibly a bug/limitation that I'm hitting or
> > > > > if it's expected behavior with heavy tracing. We are trying to trace 2
> > > > > trace points (nvme_setup_cmd and nvme_complete_rq) around 1M times a
> > > > > second.
> > > > > We noticed if we just trace one of the two, we see all the expected
> > > > > events, but if we trace both at the same time, the nvme_complete_rq
> > > >
> > > > kprobe programs have per-CPU reentrancy protection. That is, if some
> > > > BPF kprobe/tracepoint program is running and something happens (e.g.,
> > > > BPF program calls some kernel function that has another BPF program
> > > > attached to it, or preemption happens and another BPF program is
> > > > supposed to run) that would trigger another BPF program, then that
> > > > nested BPF program invocation will be skipped.
> > > >
> > > > This might be what happens in your case.
> > > >
> > > > > misses events. I am using two different percpu_hash maps to count both
> > > > > events. One for setup and another for complete. My expectation was
> > > > > that tracing these events would affect performance, somewhat, but not
> > > > > miss events. Ultimately the tool would be used to trace nvme latencies
> > > > > at the driver level by device and process.
> > > > >
> > > > > My tool was developed using libbpf v0.7, and I've tested on Rocky
> > > > > Linux 8.5 (Kernel 4.18.0), Ubuntu 20.04 (Kernel 5.4) and Fedora 36
> > > > > (Kernel 5.17.6) with the same results.
> > > > >
> > > > > Thanks,
> > > > > John Mazzie
> > > > > Principal Storage Solutions Engineer
> > > > > Micron Technology, Inc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Tracing NVMe Driver with BPF missing events
  2022-06-03  1:52         ` John Mazzie
@ 2022-06-03  3:01           ` Andrii Nakryiko
  0 siblings, 0 replies; 7+ messages in thread
From: Andrii Nakryiko @ 2022-06-03  3:01 UTC (permalink / raw)
  To: John Mazzie; +Cc: bpf, John Mazzie (jmazzie)

On Thu, Jun 2, 2022 at 6:52 PM John Mazzie <john.p.mazzie@gmail.com> wrote:
>
> Thanks for the help. fentry/fexit seem to be working to get every
> event when tracing both events.
>
> I do have one question about how fentry/fexit work in regards to the
> function parameters. fexit can access the function parameters in
> addition to the return value. Let's say the parameters are pointers
> whose value changes between entry and exit on the probed function. Are
> the parameters being accessed on entry or exit in fexit. My assumption
> would be exit, so I could access the modified values. Is that correct?
> The data I'm pulling appears like it might just be the entry
> (non-configured) values.

For fexit programs values of registers corresponding to input
arguments is stored before the function call. If function updates
those register it won't be reflected.


>
> On Tue, May 24, 2022 at 6:40 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, May 24, 2022 at 9:12 AM John Mazzie <john.p.mazzie@gmail.com> wrote:
> > >
> > > After thinking about this more, maybe it's more to do with the
> > > interrupt handler for nvme_complete_rq.
> > >
> > > In this situation, sometimes the handler for nvme_setup_cmd could be
> > > interrupted to handle nvme_complete_rq and in this situation the
> > > nvme_complete_rq handler wouldn't run. because the nvme_setup_cmd
> > > handler is not complete.
> > >
> > > Is this understanding correct?
> >
> > Yes.
> >
> > > I'm assuming there is no real
> > > workaround for this situation, so we may just need to accept some
> > > missed events.
> >
> > Try using fentry/fexit programs instead. They use different reentrancy
> > protection which is at a per-program level.
> >
> > >
> > > John
> > >
> > > On Sat, May 21, 2022 at 11:52 AM John Mazzie <john.p.mazzie@gmail.com> wrote:
> > > >
> > > > In this case is a BPF program the individual handler of a tracepoint,
> > > > or in my context, a BPF program my compiled program that traces both
> > > > tracepoints? We aren't running any other BPF tracing during these
> > > > tests besides my program counting these 2 tracepoints.
> > > >
> > > > In my program I have 2 handlers, one for
> > > > tracepoint:nvme:nvme_setup_cmd and another for
> > > > tracepoint:nvme:nvme_complete_rq. I've created a PERCPU_HASH map for
> > > > each handler (unique map for each) to use that keeps track of each
> > > > time the handler is invoked. The only thing that handler is doing in
> > > > each case is incrementing the count value in the map. Though I do
> > > > filter by device on each tracepoint. If I comment out the
> > > > nvme_setup_cmd code the nvme_complete_rq does get the correct count.
> > > >
> > > > The user side of my program just prints the values for each of these
> > > > maps on a 10 second increment.
> > > >
> > > > I can share my code to make this easier, is it preferred that I upload
> > > > my code to github and share the link in this thread?
> > > >
> > > > I agree that your suggestion could be my issue, but I just want to
> > > > make sure we're on the same page since I'm less familiar with the
> > > > internals of BPF.
> > > >
> > > > Thanks,
> > > > John
> > > >
> > > > On Fri, May 20, 2022 at 7:10 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > > On Wed, May 18, 2022 at 2:35 PM John Mazzie <john.p.mazzie@gmail.com> wrote:
> > > > > >
> > > > > > My group at Micron is using BPF and love the tracing capabilities it
> > > > > > provides. We are mainly focused on the storage subsystem and BPF has
> > > > > > been really helpful in understanding how the storage subsystem
> > > > > > interacts with our drives while running applications.
> > > > > >
> > > > > > In the process of developing a tool using BPF to trace the nvme
> > > > > > driver, we ran into an issue with some missing events. I wanted to
> > > > > > check to see if this is possibly a bug/limitation that I'm hitting or
> > > > > > if it's expected behavior with heavy tracing. We are trying to trace 2
> > > > > > trace points (nvme_setup_cmd and nvme_complete_rq) around 1M times a
> > > > > > second.
> > > > > > We noticed if we just trace one of the two, we see all the expected
> > > > > > events, but if we trace both at the same time, the nvme_complete_rq
> > > > >
> > > > > kprobe programs have per-CPU reentrancy protection. That is, if some
> > > > > BPF kprobe/tracepoint program is running and something happens (e.g.,
> > > > > BPF program calls some kernel function that has another BPF program
> > > > > attached to it, or preemption happens and another BPF program is
> > > > > supposed to run) that would trigger another BPF program, then that
> > > > > nested BPF program invocation will be skipped.
> > > > >
> > > > > This might be what happens in your case.
> > > > >
> > > > > > misses events. I am using two different percpu_hash maps to count both
> > > > > > events. One for setup and another for complete. My expectation was
> > > > > > that tracing these events would affect performance, somewhat, but not
> > > > > > miss events. Ultimately the tool would be used to trace nvme latencies
> > > > > > at the driver level by device and process.
> > > > > >
> > > > > > My tool was developed using libbpf v0.7, and I've tested on Rocky
> > > > > > Linux 8.5 (Kernel 4.18.0), Ubuntu 20.04 (Kernel 5.4) and Fedora 36
> > > > > > (Kernel 5.17.6) with the same results.
> > > > > >
> > > > > > Thanks,
> > > > > > John Mazzie
> > > > > > Principal Storage Solutions Engineer
> > > > > > Micron Technology, Inc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-03  3:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-18 21:31 Tracing NVMe Driver with BPF missing events John Mazzie
2022-05-21  0:10 ` Andrii Nakryiko
2022-05-21 16:52   ` John Mazzie
2022-05-24 16:12     ` John Mazzie
2022-05-24 23:39       ` Andrii Nakryiko
2022-06-03  1:52         ` John Mazzie
2022-06-03  3:01           ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.