All of lore.kernel.org
 help / color / mirror / Atom feed
From: Axel Rasmussen <axelrasmussen@google.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Michel Lespinasse <walken@google.com>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	Jann Horn <jannh@google.com>,
	Chinwen Chang <chinwen.chang@mediatek.com>,
	Davidlohr Bueso <dbueso@suse.de>,
	David Rientjes <rientjes@google.com>,
	Yafang Shao <laoar.shao@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>
Subject: Re: [PATCH v4 1/1] mmap_lock: add tracepoints around lock acquisition
Date: Fri, 23 Oct 2020 10:38:20 -0700	[thread overview]
Message-ID: <CAJHvVcjzZgsvdzciR5v_wkgf3M7aD_vNGv3TXrf5Z5K6SLprSA@mail.gmail.com> (raw)
In-Reply-To: <fa6b9d13-0ef5-4d5d-bda3-657300028e23@suse.cz>

On Fri, Oct 23, 2020 at 7:00 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 10/20/20 8:47 PM, Axel Rasmussen wrote:
> > The goal of these tracepoints is to be able to debug lock contention
> > issues. This lock is acquired on most (all?) mmap / munmap / page fault
> > operations, so a multi-threaded process which does a lot of these can
> > experience significant contention.
> >
> > We trace just before we start acquisition, when the acquisition returns
> > (whether it succeeded or not), and when the lock is released (or
> > downgraded). The events are broken out by lock type (read / write).
> >
> > The events are also broken out by memcg path. For container-based
> > workloads, users often think of several processes in a memcg as a single
> > logical "task", so collecting statistics at this level is useful.
> >
> > The end goal is to get latency information. This isn't directly included
> > in the trace events. Instead, users are expected to compute the time
> > between "start locking" and "acquire returned", using e.g. synthetic
> > events or BPF. The benefit we get from this is simpler code.
> >
> > Because we use tracepoint_enabled() to decide whether or not to trace,
> > this patch has effectively no overhead unless tracepoints are enabled at
> > runtime. If tracepoints are enabled, there is a performance impact, but
> > how much depends on exactly what e.g. the BPF program does.
> >
> > Reviewed-by: Michel Lespinasse <walken@google.com>
> > Acked-by: Yafang Shao <laoar.shao@gmail.com>
> > Acked-by: David Rientjes <rientjes@google.com>
> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
>
> All seem fine to me, except I started to wonder..
>
> > +
> > +#ifdef CONFIG_MEMCG
> > +
> > +DEFINE_PER_CPU(char[MAX_FILTER_STR_VAL], trace_memcg_path);
> > +
> > +/*
> > + * Write the given mm_struct's memcg path to a percpu buffer, and return a
> > + * pointer to it. If the path cannot be determined, the buffer will contain the
> > + * empty string.
> > + *
> > + * Note: buffers are allocated per-cpu to avoid locking, so preemption must be
> > + * disabled by the caller before calling us, and re-enabled only after the
> > + * caller is done with the pointer.
>
> Is this enough? What if we fill the buffer and then an interrupt comes and the
> handler calls here again? We overwrite the buffer and potentially report a wrong
> cgroup after the execution resumes?
> If nothing worse can happen (are interrupts disabled while the ftrace code is
> copying from the buffer?), then it's probably ok?

I think you're right, get_cpu()/put_cpu() only deals with preemption,
not interrupts.

I'm somewhat sure this code can be called in interrupt context, so I
don't think we can use locks to prevent this situation. I think it
works like this: say we acquire the lock, an interrupt happens, and
then we try to acquire again on the same CPU; we can't sleep, so we're
stuck.

I think we can't kmalloc here (instead of a percpu buffer) either,
since I would guess that kmalloc may also acquire mmap_lock itself?

Is adding local_irq_save()/local_irq_restore() in addition to
get_cpu()/put_cpu() sufficient?

>
> > + */
> > +static const char *get_mm_memcg_path(struct mm_struct *mm)
> > +{
> > +     struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm);
> > +
> > +     if (memcg != NULL && likely(memcg->css.cgroup != NULL)) {
> > +             char *buf = this_cpu_ptr(trace_memcg_path);
> > +
> > +             cgroup_path(memcg->css.cgroup, buf, MAX_FILTER_STR_VAL);
> > +             return buf;
> > +     }
> > +     return "";
> > +}
> > +
> > +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...)                                   \
> > +     do {                                                                   \
> > +             get_cpu();                                                     \
> > +             trace_mmap_lock_##type(mm, get_mm_memcg_path(mm),              \
> > +                                    ##__VA_ARGS__);                         \
> > +             put_cpu();                                                     \
> > +     } while (0)
> > +
> > +#else /* !CONFIG_MEMCG */
> > +
> > +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...)                                   \
> > +     trace_mmap_lock_##type(mm, "", ##__VA_ARGS__)
> > +
> > +#endif /* CONFIG_MEMCG */
> > +
> > +/*
> > + * Trace calls must be in a separate file, as otherwise there's a circular
> > + * dependency between linux/mmap_lock.h and trace/events/mmap_lock.h.
> > + */
> > +
> > +void __mmap_lock_do_trace_start_locking(struct mm_struct *mm, bool write)
> > +{
> > +     TRACE_MMAP_LOCK_EVENT(start_locking, mm, write);
> > +}
> > +EXPORT_SYMBOL(__mmap_lock_do_trace_start_locking);
> > +
> > +void __mmap_lock_do_trace_acquire_returned(struct mm_struct *mm, bool write,
> > +                                        bool success)
> > +{
> > +     TRACE_MMAP_LOCK_EVENT(acquire_returned, mm, write, success);
> > +}
> > +EXPORT_SYMBOL(__mmap_lock_do_trace_acquire_returned);
> > +
> > +void __mmap_lock_do_trace_released(struct mm_struct *mm, bool write)
> > +{
> > +     TRACE_MMAP_LOCK_EVENT(released, mm, write);
> > +}
> > +EXPORT_SYMBOL(__mmap_lock_do_trace_released);
> >
>

  reply	other threads:[~2020-10-23 17:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-20 18:47 [PATCH v4 0/1] Add tracepoints around mmap_lock acquisition Axel Rasmussen
2020-10-20 18:47 ` Axel Rasmussen
2020-10-20 18:47 ` [PATCH v4 1/1] mmap_lock: add tracepoints around lock acquisition Axel Rasmussen
2020-10-20 18:47   ` Axel Rasmussen
2020-10-23 13:59   ` Vlastimil Babka
2020-10-23 17:38     ` Axel Rasmussen [this message]
2020-10-23 17:38       ` Axel Rasmussen
2020-10-23 17:56       ` Vlastimil Babka
2020-10-26 14:54         ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJHvVcjzZgsvdzciR5v_wkgf3M7aD_vNGv3TXrf5Z5K6SLprSA@mail.gmail.com \
    --to=axelrasmussen@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=chinwen.chang@mediatek.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dbueso@suse.de \
    --cc=jannh@google.com \
    --cc=laoar.shao@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=vbabka@suse.cz \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.