All of lore.kernel.org
 help / color / mirror / Atom feed
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: Beau Belgrave <beaub@linux.microsoft.com>
Cc: rostedt@goodmis.org, mathieu.desnoyers@efficios.com,
	dcook@linux.microsoft.com, alanau@linux.microsoft.com,
	brauner@kernel.org, akpm@linux-foundation.org,
	ebiederm@xmission.com, keescook@chromium.org, tglx@linutronix.de,
	linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v8 00/11] tracing/user_events: Remote write ABI
Date: Fri, 24 Mar 2023 08:05:43 +0800	[thread overview]
Message-ID: <20230324080543.b977c6fa769d52eedb7b428e@kernel.org> (raw)
In-Reply-To: <20230221211143.574-1-beaub@linux.microsoft.com>

Hi Beau,

Sorry for replying so late.
I reviewed the series and I think it looks good to me.
This direction is good from the user/kernel interaction viewpoint.
I have just a couple of comments, and reply to the each mail.

Thank you!

On Tue, 21 Feb 2023 13:11:32 -0800
Beau Belgrave <beaub@linux.microsoft.com> wrote:

> As part of the discussions for user_events aligned with user space
> tracers, it was determined that user programs should register a aligned
> value to set or clear a bit when an event becomes enabled. Currently a
> shared page is being used that requires mmap(). Remove the shared page
> implementation and move to a user registered address implementation.
> 
> In this new model during the event registration from user programs 3 new
> values are specified. The first is the address to update when the event
> is either enabled or disabled. The second is the bit to set/clear to
> reflect the event being enabled. The third is the size of the value at
> the specified address.
> 
> This allows for a local 32/64-bit value in user programs to support
> both kernel and user tracers. As an example, setting bit 31 for kernel
> tracers when the event becomes enabled allows for user tracers to use
> the other bits for ref counts or other flags. The kernel side updates
> the bit atomically, user programs need to also update these values
> atomically.
> 
> User provided addresses must be aligned on a natural boundary, this
> allows for single page checking and prevents odd behaviors such as a
> enable value straddling 2 pages instead of a single page.
> 
> When page faults are encountered they are done asyncly via a workqueue.
> If the page faults back in, the write update is attempted again. If the
> page cannot fault-in, then we log and wait until the next time the event
> is enabled/disabled. This is to prevent possible infinite loops resulting
> from bad user processes unmapping or changing protection values after
> registering the address.
> 
> Change history
> 
> V8:
> Rebase to 6.2-rc8.
> 
> V7:
> Rebase to 6.2-rc4.
> 
> Added flags to register ioctl, validates it's 0 for now. Future patches
> will enable other types of formats/options as needed.
> 
> V6:
> Rebase to 6.2-rc2.
> 
> Fixed small typos, code style.
> 
> Changed from synchronize_rcu() to queue_rcu_work() to allow an rcu
> delay asyncly when mm is being removed and in an appropriate context
> for mmdrop().
> 
> V5:
> GFP_NOWAIT is still needed in user_event_enabler_dup(), due to rcu lock.
> 
> V4:
> Rebase to 6.1-rc7.
> 
> Moved user_events_fork() out of task signal lock and dropped use of
> GFP_NOWAIT. All allocations are now GFP_KERNEL or GFP_KERNEL_ACCOUNT.
> 
> Added boot parameter user_events_max= to limit global events.
> 
> Added sysctl value kernel.user_events_max to limit global events.
> 
> Added cgroup tracking of memory allocated for events.
> 
> V3:
> Rebase to 6.1-rc6.
> 
> Removed RFC tag on series.
> 
> Updated documentation to reflect ABI changes.
> 
> Added self-test for ABI specific clone/fork cases.
> 
> Moved user_event_mm removal into do_exit() to ensure RSS task accounting
> is done properly in async fault paths. Also lets us remove the delayed
> mmdrop(), saving memory in each user_event_mm struct.
> 
> Fixed timing window where task exits, but write could be in-progress.
> During exit we now take mmap_write_lock to ensure we drain writes.
> 
> V2:
> Rebase to 6.1-rc5.
> 
> Added various comments based on feedback.
> 
> Added enable_size to register struct, allows 32/64 bit addresses
> as long as the enable_bit fits and the address is naturally aligned.
> 
> Changed user_event_enabler_write to accept a new flag indicating if a
> fault fixup should be done or not. This allows user_event_enabler_create
> to return back failures to the user ioctl reg call and retry to fault
> in data.
> 
> Added tracking fork/exec/exit of tasks to have the user_event_mm lifetime
> tied more to the task than the file. This came with extra requirements
> around when you can lock, such as softirq cases, as well as a RCU
> pattern to ensure fork/exec/exit take minimal lock times.
> 
> Changed enablers to use a single word-aligned value for saving the bit
> to set and any flags, such as faulting asyncly or being freed. This was
> required to ensure atomic bit set/test for fork cases where taking the
> event_mutex is not a good scalability decision.
> 
> Added unregister IOCTL, since file lifetime no longer limits the enable
> time for any events (the mm does).
> 
> Updated sample code to reflect the new remote write based ABI.
> 
> Updated self-test code to reflect the new remote write based ABI.
> 
> Beau Belgrave (11):
>   tracing/user_events: Split header into uapi and kernel
>   tracing/user_events: Track fork/exec/exit for mm lifetime
>   tracing/user_events: Use remote writes for event enablement
>   tracing/user_events: Fixup enable faults asyncly
>   tracing/user_events: Add ioctl for disabling addresses
>   tracing/user_events: Update self-tests to write ABI
>   tracing/user_events: Add ABI self-test
>   tracing/user_events: Use write ABI in example
>   tracing/user_events: Update documentation for ABI
>   tracing/user_events: Charge event allocs to cgroups
>   tracing/user_events: Limit global user_event count
> 
>  Documentation/trace/user_events.rst           | 177 ++--
>  fs/exec.c                                     |   2 +
>  include/linux/sched.h                         |   5 +
>  include/linux/user_events.h                   | 101 +-
>  include/uapi/linux/user_events.h              |  81 ++
>  kernel/exit.c                                 |   2 +
>  kernel/fork.c                                 |   2 +
>  kernel/trace/Kconfig                          |   5 +-
>  kernel/trace/trace_events_user.c              | 863 +++++++++++++++---
>  samples/user_events/example.c                 |  47 +-
>  tools/testing/selftests/user_events/Makefile  |   2 +-
>  .../testing/selftests/user_events/abi_test.c  | 226 +++++
>  .../testing/selftests/user_events/dyn_test.c  |   2 +-
>  .../selftests/user_events/ftrace_test.c       | 162 ++--
>  .../testing/selftests/user_events/perf_test.c |  39 +-
>  15 files changed, 1317 insertions(+), 399 deletions(-)
>  create mode 100644 include/uapi/linux/user_events.h
>  create mode 100644 tools/testing/selftests/user_events/abi_test.c
> 
> 
> base-commit: ceaa837f96adb69c0df0397937cd74991d5d821a
> -- 
> 2.25.1
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

      parent reply	other threads:[~2023-03-24  0:05 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-21 21:11 [PATCH v8 00/11] tracing/user_events: Remote write ABI Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 01/11] tracing/user_events: Split header into uapi and kernel Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 02/11] tracing/user_events: Track fork/exec/exit for mm lifetime Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 03/11] tracing/user_events: Use remote writes for event enablement Beau Belgrave
2023-03-28 20:59   ` Steven Rostedt
2023-02-21 21:11 ` [PATCH v8 04/11] tracing/user_events: Fixup enable faults asyncly Beau Belgrave
2023-03-28 21:20   ` Steven Rostedt
2023-03-28 21:42     ` Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 05/11] tracing/user_events: Add ioctl for disabling addresses Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 06/11] tracing/user_events: Update self-tests to write ABI Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 07/11] tracing/user_events: Add ABI self-test Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 08/11] tracing/user_events: Use write ABI in example Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 09/11] tracing/user_events: Update documentation for ABI Beau Belgrave
2023-03-24  0:06   ` Masami Hiramatsu
2023-03-24 16:47     ` Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 10/11] tracing/user_events: Charge event allocs to cgroups Beau Belgrave
2023-02-21 21:11 ` [PATCH v8 11/11] tracing/user_events: Limit global user_event count Beau Belgrave
2023-03-24  0:18   ` Masami Hiramatsu
2023-03-24  8:54     ` Vlastimil Babka
2023-03-24 16:43       ` Beau Belgrave
2023-03-24 17:06         ` Steven Rostedt
2023-03-26 15:22           ` Masami Hiramatsu
2023-03-24  0:05 ` Masami Hiramatsu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230324080543.b977c6fa769d52eedb7b428e@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=alanau@linux.microsoft.com \
    --cc=beaub@linux.microsoft.com \
    --cc=brauner@kernel.org \
    --cc=dcook@linux.microsoft.com \
    --cc=ebiederm@xmission.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-devel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.