From: Vlastimil Babka <vbabka@suse.cz>
To: Axel Rasmussen <axelrasmussen@google.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ingo Molnar <mingo@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Michel Lespinasse <walken@google.com>,
Daniel Jordan <daniel.m.jordan@oracle.com>,
Laurent Dufour <ldufour@linux.ibm.com>,
Jann Horn <jannh@google.com>,
Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Yafang Shao <laoar.shao@gmail.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v3 2/2] mmap_lock: add tracepoints around lock acquisition
Date: Tue, 20 Oct 2020 16:50:30 +0200 [thread overview]
Message-ID: <1b9238b7-17f2-6c1e-b37e-cf65424f504b@suse.cz> (raw)
In-Reply-To: <20201009220524.485102-3-axelrasmussen@google.com>
On 10/10/20 12:05 AM, Axel Rasmussen wrote:
> The goal of these tracepoints is to be able to debug lock contention
> issues. This lock is acquired on most (all?) mmap / munmap / page fault
> operations, so a multi-threaded process which does a lot of these can
> experience significant contention.
>
> We trace just before we start acquisition, when the acquisition returns
> (whether it succeeded or not), and when the lock is released (or
> downgraded). The events are broken out by lock type (read / write).
>
> The events are also broken out by memcg path. For container-based
> workloads, users often think of several processes in a memcg as a single
> logical "task", so collecting statistics at this level is useful.
>
> The end goal is to get latency information. This isn't directly included
> in the trace events. Instead, users are expected to compute the time
> between "start locking" and "acquire returned", using e.g. synthetic
> events or BPF. The benefit we get from this is simpler code.
>
> Because we use tracepoint_enabled() to decide whether or not to trace,
> this patch has effectively no overhead unless tracepoints are enabled at
> runtime. If tracepoints are enabled, there is a performance impact, but
> how much depends on exactly what e.g. the BPF program does.
>
> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Yeah I agree with this approach that follows the page ref one.
...
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> new file mode 100644
> index 000000000000..b849287bd12a
> --- /dev/null
> +++ b/mm/mmap_lock.c
> @@ -0,0 +1,87 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/mmap_lock.h>
> +
> +#include <linux/mm.h>
> +#include <linux/cgroup.h>
> +#include <linux/memcontrol.h>
> +#include <linux/mmap_lock.h>
> +#include <linux/percpu.h>
> +#include <linux/smp.h>
> +#include <linux/trace_events.h>
> +
> +/*
> + * We have to export these, as drivers use mmap_lock, and our inline functions
> + * in the header check if the tracepoint is enabled. They can't be GPL, as e.g.
> + * the nvidia driver is an existing caller of this code.
I don't think this argument works in the kernel community. I would just remove
this comment.
> + */
> +EXPORT_SYMBOL(__tracepoint_mmap_lock_start_locking);
> +EXPORT_SYMBOL(__tracepoint_mmap_lock_acquire_returned);
> +EXPORT_SYMBOL(__tracepoint_mmap_lock_released);
You can use EXPORT_TRACEPOINT_SYMBOL() here.
> +#ifdef CONFIG_MEMCG
> +
> +DEFINE_PER_CPU(char[MAX_FILTER_STR_VAL], trace_memcg_path);
> +
> +/*
> + * Write the given mm_struct's memcg path to a percpu buffer, and return a
> + * pointer to it. If the path cannot be determined, the buffer will contain the
> + * empty string.
> + *
> + * Note: buffers are allocated per-cpu to avoid locking, so preemption must be
> + * disabled by the caller before calling us, and re-enabled only after the
> + * caller is done with the pointer.
> + */
> +static const char *get_mm_memcg_path(struct mm_struct *mm)
> +{
> + struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm);
> +
> + if (memcg != NULL && likely(memcg->css.cgroup != NULL)) {
> + char *buf = this_cpu_ptr(trace_memcg_path);
> +
> + cgroup_path(memcg->css.cgroup, buf, MAX_FILTER_STR_VAL);
> + return buf;
> + }
> + return "";
> +}
> +
> +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \
> + do { \
> + if (trace_mmap_lock_##type##_enabled()) { \
Is this check really needed? We only got called from the functions inlined in
the .h file because tracepoint_enabled() was true in the first place, so this
seems redundant.
> + get_cpu(); \
> + trace_mmap_lock_##type(mm, get_mm_memcg_path(mm), \
> + ##__VA_ARGS__); \
> + put_cpu(); \
> + } \
> + } while (0)
> +
> +#else /* !CONFIG_MEMCG */
> +
> +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \
> + trace_mmap_lock_##type(mm, "", ##__VA_ARGS__)
> +
> +#endif /* CONFIG_MEMCG */
> +
> +/*
> + * Trace calls must be in a separate file, as otherwise there's a circular
> + * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h.
> + */
> +
> +void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write)
> +{
> + TRACE_MMAP_LOCK_EVENT(start_locking, mm, write, true);
Seems wasteful to have an always-true success field here. Yeah, not reusing the
same event class for all three tracepoints means more code, but for tracing
efficiency it's worth it, IMHO.
> +}
> +EXPORT_SYMBOL(__mmap_lock_do_trace_start_locking);
> +
> +void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write,
> + bool success)
> +{
> + TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, write, success);
> +}
> +EXPORT_SYMBOL(__mmap_lock_do_trace_acquire_returned);
> +
> +void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write)
> +{
> + TRACE_MMAP_LOCK_EVENT(released, mm, write, true);
Ditto.
> +}
> +EXPORT_SYMBOL(__mmap_lock_do_trace_released);
>
next prev parent reply other threads:[~2020-10-20 14:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-09 22:05 [PATCH v3 0/2] Add tracepoints around mmap_lock acquisition Axel Rasmussen
2020-10-09 22:05 ` [PATCH v3 1/2] tracing: support "bool" type in synthetic trace events Axel Rasmussen
2020-10-12 14:15 ` Steven Rostedt
2020-10-12 14:26 ` Tom Zanussi
2020-10-12 14:46 ` Steven Rostedt
2020-10-12 16:23 ` Axel Rasmussen
2020-10-13 19:41 ` David Rientjes
2020-10-09 22:05 ` [PATCH v3 2/2] mmap_lock: add tracepoints around lock acquisition Axel Rasmussen
2020-10-09 22:35 ` Michel Lespinasse
2020-10-10 5:31 ` Yafang Shao
2020-10-13 19:42 ` David Rientjes
2020-10-20 14:50 ` Vlastimil Babka [this message]
2020-10-20 18:17 ` Axel Rasmussen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1b9238b7-17f2-6c1e-b37e-cf65424f504b@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=chinwen.chang@mediatek.com \
--cc=daniel.m.jordan@oracle.com \
--cc=jannh@google.com \
--cc=laoar.shao@gmail.com \
--cc=ldufour@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).