All of lore.kernel.org
 help / color / mirror / Atom feed
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Jonthan Haslam <jonathan.haslam@gmail.com>,
	linux-trace-kernel@vger.kernel.org, andrii@kernel.org,
	bpf@vger.kernel.org, rostedt@goodmis.org,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] uprobes: reduce contention on uprobes_tree access
Date: Sat, 30 Mar 2024 09:36:31 +0900	[thread overview]
Message-ID: <20240330093631.72273967ba818cb16aeb58b6@kernel.org> (raw)
In-Reply-To: <CAEf4BzbSvMa2+hdTifMKTsNiOL6X=P7eor4LpPKfHM=Y9-71fw@mail.gmail.com>

On Fri, 29 Mar 2024 10:33:57 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Wed, Mar 27, 2024 at 5:45 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Mar 27, 2024 at 5:18 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > >
> > > On Wed, 27 Mar 2024 17:06:01 +0000
> > > Jonthan Haslam <jonathan.haslam@gmail.com> wrote:
> > >
> > > > > > Masami,
> > > > > >
> > > > > > Given the discussion around per-cpu rw semaphore and need for
> > > > > > (internal) batched attachment API for uprobes, do you think you can
> > > > > > apply this patch as is for now? We can then gain initial improvements
> > > > > > in scalability that are also easy to backport, and Jonathan will work
> > > > > > on a more complete solution based on per-cpu RW semaphore, as
> > > > > > suggested by Ingo.
> > > > >
> > > > > Yeah, it is interesting to use per-cpu rw semaphore on uprobe.
> > > > > I would like to wait for the next version.
> > > >
> > > > My initial tests show a nice improvement on the over RW spinlocks but
> > > > significant regression in acquiring a write lock. I've got a few days
> > > > vacation over Easter but I'll aim to get some more formalised results out
> > > > to the thread toward the end of next week.
> > >
> > > As far as the write lock is only on the cold path, I think you can choose
> > > per-cpu RW semaphore. Since it does not do busy wait, the total system
> > > performance impact will be small.
> >
> > No, Masami, unfortunately it's not as simple. In BPF we have BPF
> > multi-uprobe, which can be used to attach to thousands of user
> > functions. It currently creates one uprobe at a time, as we don't
> > really have a batched API. If each such uprobe registration will now
> > take a (relatively) long time, when multiplied by number of attach-to
> > user functions, it will be a horrible regression in terms of
> > attachment/detachment performance.

Ah, got it. So attachment/detachment performance should be counted.

> >
> > So when we switch to per-CPU rw semaphore, we'll need to provide an
> > internal batch uprobe attach/detach API to make sure that attaching to
> > multiple uprobes is still fast.

Yeah, we need such interface like register_uprobes(...).

> >
> > Which is why I was asking to land this patch as is, as it relieves the
> > scalability pains in production and is easy to backport to old
> > kernels. And then we can work on batched APIs and switch to per-CPU rw
> > semaphore.

OK, then I'll push this to for-next at this moment.
Please share if you have a good idea for the batch interface which can be
backported. I guess it should involve updating userspace changes too.

Thank you!

> >
> > So I hope you can reconsider and accept improvements in this patch,
> > while Jonathan will keep working on even better final solution.
> > Thanks!
> >
> > > I look forward to your formalized results :)
> > >
> 
> BTW, as part of BPF selftests, we have a multi-attach test for uprobes
> and USDTs, reporting attach/detach timings:
> $ sudo ./test_progs -v -t uprobe_multi_test/bench
> bpf_testmod.ko is already unloaded.
> Loading bpf_testmod.ko...
> Successfully loaded bpf_testmod.ko.
> test_bench_attach_uprobe:PASS:uprobe_multi_bench__open_and_load 0 nsec
> test_bench_attach_uprobe:PASS:uprobe_multi_bench__attach 0 nsec
> test_bench_attach_uprobe:PASS:uprobes_count 0 nsec
> test_bench_attach_uprobe: attached in   0.120s
> test_bench_attach_uprobe: detached in   0.092s
> #400/5   uprobe_multi_test/bench_uprobe:OK
> test_bench_attach_usdt:PASS:uprobe_multi__open 0 nsec
> test_bench_attach_usdt:PASS:bpf_program__attach_usdt 0 nsec
> test_bench_attach_usdt:PASS:usdt_count 0 nsec
> test_bench_attach_usdt: attached in   0.124s
> test_bench_attach_usdt: detached in   0.064s
> #400/6   uprobe_multi_test/bench_usdt:OK
> #400     uprobe_multi_test:OK
> Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
> Successfully unloaded bpf_testmod.ko.
> 
> So it should be easy for Jonathan to validate his changes with this.
> 
> > > Thank you,
> > >
> > > >
> > > > Jon.
> > > >
> > > > >
> > > > > Thank you,
> > > > >
> > > > > >
> > > > > > >
> > > > > > > BTW, how did you measure the overhead? I think spinlock overhead
> > > > > > > will depend on how much lock contention happens.
> > > > > > >
> > > > > > > Thank you,
> > > > > > >
> > > > > > > >
> > > > > > > > [0] https://docs.kernel.org/locking/spinlocks.html
> > > > > > > >
> > > > > > > > Signed-off-by: Jonathan Haslam <jonathan.haslam@gmail.com>
> > > > > > > > ---
> > > > > > > >  kernel/events/uprobes.c | 22 +++++++++++-----------
> > > > > > > >  1 file changed, 11 insertions(+), 11 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > > > > > > > index 929e98c62965..42bf9b6e8bc0 100644
> > > > > > > > --- a/kernel/events/uprobes.c
> > > > > > > > +++ b/kernel/events/uprobes.c
> > > > > > > > @@ -39,7 +39,7 @@ static struct rb_root uprobes_tree = RB_ROOT;
> > > > > > > >   */
> > > > > > > >  #define no_uprobe_events()   RB_EMPTY_ROOT(&uprobes_tree)
> > > > > > > >
> > > > > > > > -static DEFINE_SPINLOCK(uprobes_treelock);    /* serialize rbtree access */
> > > > > > > > +static DEFINE_RWLOCK(uprobes_treelock);      /* serialize rbtree access */
> > > > > > > >
> > > > > > > >  #define UPROBES_HASH_SZ      13
> > > > > > > >  /* serialize uprobe->pending_list */
> > > > > > > > @@ -669,9 +669,9 @@ static struct uprobe *find_uprobe(struct inode *inode, loff_t offset)
> > > > > > > >  {
> > > > > > > >       struct uprobe *uprobe;
> > > > > > > >
> > > > > > > > -     spin_lock(&uprobes_treelock);
> > > > > > > > +     read_lock(&uprobes_treelock);
> > > > > > > >       uprobe = __find_uprobe(inode, offset);
> > > > > > > > -     spin_unlock(&uprobes_treelock);
> > > > > > > > +     read_unlock(&uprobes_treelock);
> > > > > > > >
> > > > > > > >       return uprobe;
> > > > > > > >  }
> > > > > > > > @@ -701,9 +701,9 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe)
> > > > > > > >  {
> > > > > > > >       struct uprobe *u;
> > > > > > > >
> > > > > > > > -     spin_lock(&uprobes_treelock);
> > > > > > > > +     write_lock(&uprobes_treelock);
> > > > > > > >       u = __insert_uprobe(uprobe);
> > > > > > > > -     spin_unlock(&uprobes_treelock);
> > > > > > > > +     write_unlock(&uprobes_treelock);
> > > > > > > >
> > > > > > > >       return u;
> > > > > > > >  }
> > > > > > > > @@ -935,9 +935,9 @@ static void delete_uprobe(struct uprobe *uprobe)
> > > > > > > >       if (WARN_ON(!uprobe_is_active(uprobe)))
> > > > > > > >               return;
> > > > > > > >
> > > > > > > > -     spin_lock(&uprobes_treelock);
> > > > > > > > +     write_lock(&uprobes_treelock);
> > > > > > > >       rb_erase(&uprobe->rb_node, &uprobes_tree);
> > > > > > > > -     spin_unlock(&uprobes_treelock);
> > > > > > > > +     write_unlock(&uprobes_treelock);
> > > > > > > >       RB_CLEAR_NODE(&uprobe->rb_node); /* for uprobe_is_active() */
> > > > > > > >       put_uprobe(uprobe);
> > > > > > > >  }
> > > > > > > > @@ -1298,7 +1298,7 @@ static void build_probe_list(struct inode *inode,
> > > > > > > >       min = vaddr_to_offset(vma, start);
> > > > > > > >       max = min + (end - start) - 1;
> > > > > > > >
> > > > > > > > -     spin_lock(&uprobes_treelock);
> > > > > > > > +     read_lock(&uprobes_treelock);
> > > > > > > >       n = find_node_in_range(inode, min, max);
> > > > > > > >       if (n) {
> > > > > > > >               for (t = n; t; t = rb_prev(t)) {
> > > > > > > > @@ -1316,7 +1316,7 @@ static void build_probe_list(struct inode *inode,
> > > > > > > >                       get_uprobe(u);
> > > > > > > >               }
> > > > > > > >       }
> > > > > > > > -     spin_unlock(&uprobes_treelock);
> > > > > > > > +     read_unlock(&uprobes_treelock);
> > > > > > > >  }
> > > > > > > >
> > > > > > > >  /* @vma contains reference counter, not the probed instruction. */
> > > > > > > > @@ -1407,9 +1407,9 @@ vma_has_uprobes(struct vm_area_struct *vma, unsigned long start, unsigned long e
> > > > > > > >       min = vaddr_to_offset(vma, start);
> > > > > > > >       max = min + (end - start) - 1;
> > > > > > > >
> > > > > > > > -     spin_lock(&uprobes_treelock);
> > > > > > > > +     read_lock(&uprobes_treelock);
> > > > > > > >       n = find_node_in_range(inode, min, max);
> > > > > > > > -     spin_unlock(&uprobes_treelock);
> > > > > > > > +     read_unlock(&uprobes_treelock);
> > > > > > > >
> > > > > > > >       return !!n;
> > > > > > > >  }
> > > > > > > > --
> > > > > > > > 2.43.0
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > > > >
> > > > >
> > > > > --
> > > > > Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > >
> > >
> > > --
> > > Masami Hiramatsu (Google) <mhiramat@kernel.org>


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

  reply	other threads:[~2024-03-30  0:36 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-21 14:57 [PATCH] uprobes: reduce contention on uprobes_tree access Jonathan Haslam
2024-03-21 16:11 ` Andrii Nakryiko
2024-03-24  3:28 ` Ingo Molnar
2024-03-25 19:12   ` Jonthan Haslam
2024-03-25 23:14     ` Andrii Nakryiko
2024-03-26 11:55       ` Jonthan Haslam
2024-03-25  3:03 ` Masami Hiramatsu
2024-03-25 19:04   ` Jonthan Haslam
2024-03-26 23:42     ` Masami Hiramatsu
2024-03-26 16:01   ` Andrii Nakryiko
2024-03-26 23:42     ` Masami Hiramatsu
2024-03-27 17:06       ` Jonthan Haslam
2024-03-28  0:18         ` Masami Hiramatsu
2024-03-28  0:45           ` Andrii Nakryiko
2024-03-29 17:33             ` Andrii Nakryiko
2024-03-30  0:36               ` Masami Hiramatsu [this message]
2024-03-30  5:26                 ` Andrii Nakryiko
2024-04-10 10:38                 ` Jonthan Haslam
2024-04-10 23:21                   ` Masami Hiramatsu
2024-04-11  8:41                     ` Jonthan Haslam
2024-04-18 11:10                     ` Jonthan Haslam
2024-04-19  0:43                       ` Masami Hiramatsu
2024-04-03 11:05           ` Jonthan Haslam
2024-04-03 17:50             ` Andrii Nakryiko
2024-04-04 10:45               ` Jonthan Haslam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240330093631.72273967ba818cb16aeb58b6@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=jonathan.haslam@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.