All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Mike Christie <michael.christie@oracle.com>
Cc: geert@linux-m68k.org, vverma@digitalocean.com, hdanton@sina.com,
	hch@infradead.org, stefanha@redhat.com, jasowang@redhat.com,
	mst@redhat.com, sgarzare@redhat.com,
	virtualization@lists.linux-foundation.org,
	christian.brauner@ubuntu.com, axboe@kernel.dk,
	linux-kernel@vger.kernel.org, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH V6 06/10] fork: add helpers to clone a process for kernel use
Date: Fri, 17 Dec 2021 12:53:06 -0600	[thread overview]
Message-ID: <87o85fcbcd.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <20211129194707.5863-7-michael.christie@oracle.com> (Mike Christie's message of "Mon, 29 Nov 2021 13:47:03 -0600")

Mike Christie <michael.christie@oracle.com> writes:

> The vhost layer is creating kthreads to execute IO and management
> operations. These threads need to share a mm with a userspace thread,
> inherit cgroups, and we would like to have the thread accounted for
> under the userspace thread's rlimit nproc value so a user can't overwhelm
> the system with threads when creating VMs.
>
> We have helpers for cgroups and mm but not for the rlimit nproc and in
> the future we will probably want helpers for things like namespaces. For
> those two items and to allow future sharing/inheritance, this patch adds
> two helpers, user_worker_create and user_worker_start that allow callers
> to create threads that copy or inherit the caller's attributes like mm,
> cgroups, namespaces, etc, and are accounted for under the callers rlimits
> nproc value similar to if the caller did a clone() in userspace. However,
> instead of returning to userspace the thread is usable in the kernel for
> modules like vhost or layers like io_uring.

If you are making this a general API it would be good to wrap the called
function the way kthread_create does so that the code in the function
can just return and let the wrapper call do_exit for it, especially if
you are going to have modular users.

There is a lot of deep magic in what happens if a thread created with
kernel_thread returns.  It makes sense to expose that magic to the 1 or
2 callers that use kernel_thread directly.  It does not make sense to
expose to anything higher up and in creating a nice API you are doing
that.

Currently I have just removed all of the modular users of do_exit
and in the process of removing do_exit itself so I am a little more
sensitive to this than I would ordinarily be.  But I think my comment
stands even without my changes you conflict with.

Eric


> [added flag validation code from Christian Brauner's SIG_IGN patch]
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/sched/task.h |  5 +++
>  kernel/fork.c              | 72 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 77 insertions(+)
>
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index f8a658700075..ecb21c0d95ce 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -95,6 +95,11 @@ struct mm_struct *copy_init_mm(void);
>  extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
>  extern long kernel_wait4(pid_t, int __user *, int, struct rusage *);
>  int kernel_wait(pid_t pid, int *stat);
> +struct task_struct *user_worker_create(int (*fn)(void *), void *arg, int node,
> +				       unsigned long clone_flags,
> +				       u32 worker_flags);
> +__printf(2, 3)
> +void user_worker_start(struct task_struct *tsk, const char namefmt[], ...);
>  
>  extern void free_task(struct task_struct *tsk);
>  
> diff --git a/kernel/fork.c b/kernel/fork.c
> index c9152596a285..e72239ae1e08 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2543,6 +2543,78 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
>  	return copy_process(NULL, 0, node, &args);
>  }
>  
> +static bool user_worker_flags_valid(struct kernel_clone_args *kargs)
> +{
> +	/* Verify that no unknown flags are passed along. */
> +	if (kargs->worker_flags & ~(USER_WORKER_IO | USER_WORKER |
> +				    USER_WORKER_NO_FILES | USER_WORKER_SIG_IGN))
> +		return false;
> +
> +	/*
> +	 * If we're ignoring all signals don't allow sharing struct sighand and
> +	 * don't bother clearing signal handlers.
> +	 */
> +	if ((kargs->flags & (CLONE_SIGHAND | CLONE_CLEAR_SIGHAND)) &&
> +	    (kargs->worker_flags & USER_WORKER_SIG_IGN))
> +		return false;
> +
> +	return true;
> +}
> +
> +/**
> + * user_worker_create - create a copy of a process to be used by the kernel
> + * @fn: thread stack
> + * @arg: data to be passed to fn
> + * @node: numa node to allocate task from
> + * @clone_flags: CLONE flags
> + * @worker_flags: USER_WORKER flags
> + *
> + * This returns a created task, or an error pointer. The returned task is
> + * inactive, and the caller must fire it up through user_worker_start(). If
> + * this is an PF_IO_WORKER all singals but KILL and STOP are blocked.
> + */
> +struct task_struct *user_worker_create(int (*fn)(void *), void *arg, int node,
> +				       unsigned long clone_flags,
> +				       u32 worker_flags)
> +{
> +	struct kernel_clone_args args = {
> +		.flags		= ((lower_32_bits(clone_flags) | CLONE_VM |
> +				   CLONE_UNTRACED) & ~CSIGNAL),
> +		.exit_signal	= (lower_32_bits(clone_flags) & CSIGNAL),
> +		.stack		= (unsigned long)fn,
> +		.stack_size	= (unsigned long)arg,
> +		.worker_flags	= USER_WORKER | worker_flags,
> +	};
> +
> +	if (!user_worker_flags_valid(&args))
> +		return ERR_PTR(-EINVAL);
> +
> +	return copy_process(NULL, 0, node, &args);
> +}
> +EXPORT_SYMBOL_GPL(user_worker_create);
> +
> +/**
> + * user_worker_start - Start a task created with user_worker_create
> + * @tsk: task to wake up
> + * @namefmt: printf-style format string for the thread name
> + * @arg: arguments for @namefmt
> + */
> +void user_worker_start(struct task_struct *tsk, const char namefmt[], ...)
> +{
> +	char name[TASK_COMM_LEN];
> +	va_list args;
> +
> +	WARN_ON(!(tsk->flags & PF_USER_WORKER));
> +
> +	va_start(args, namefmt);
> +	vsnprintf(name, sizeof(name), namefmt, args);
> +	set_task_comm(tsk, name);
> +	va_end(args);
> +
> +	wake_up_new_task(tsk);
> +}
> +EXPORT_SYMBOL_GPL(user_worker_start);
> +
>  /*
>   *  Ok, this is the main fork-routine.
>   *

WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: Mike Christie <michael.christie@oracle.com>
Cc: axboe@kernel.dk, hdanton@sina.com, mst@redhat.com,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, hch@infradead.org,
	vverma@digitalocean.com, geert@linux-m68k.org,
	stefanha@redhat.com, christian.brauner@ubuntu.com,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH V6 06/10] fork: add helpers to clone a process for kernel use
Date: Fri, 17 Dec 2021 12:53:06 -0600	[thread overview]
Message-ID: <87o85fcbcd.fsf@email.froward.int.ebiederm.org> (raw)
In-Reply-To: <20211129194707.5863-7-michael.christie@oracle.com> (Mike Christie's message of "Mon, 29 Nov 2021 13:47:03 -0600")

Mike Christie <michael.christie@oracle.com> writes:

> The vhost layer is creating kthreads to execute IO and management
> operations. These threads need to share a mm with a userspace thread,
> inherit cgroups, and we would like to have the thread accounted for
> under the userspace thread's rlimit nproc value so a user can't overwhelm
> the system with threads when creating VMs.
>
> We have helpers for cgroups and mm but not for the rlimit nproc and in
> the future we will probably want helpers for things like namespaces. For
> those two items and to allow future sharing/inheritance, this patch adds
> two helpers, user_worker_create and user_worker_start that allow callers
> to create threads that copy or inherit the caller's attributes like mm,
> cgroups, namespaces, etc, and are accounted for under the callers rlimits
> nproc value similar to if the caller did a clone() in userspace. However,
> instead of returning to userspace the thread is usable in the kernel for
> modules like vhost or layers like io_uring.

If you are making this a general API it would be good to wrap the called
function the way kthread_create does so that the code in the function
can just return and let the wrapper call do_exit for it, especially if
you are going to have modular users.

There is a lot of deep magic in what happens if a thread created with
kernel_thread returns.  It makes sense to expose that magic to the 1 or
2 callers that use kernel_thread directly.  It does not make sense to
expose to anything higher up and in creating a nice API you are doing
that.

Currently I have just removed all of the modular users of do_exit
and in the process of removing do_exit itself so I am a little more
sensitive to this than I would ordinarily be.  But I think my comment
stands even without my changes you conflict with.

Eric


> [added flag validation code from Christian Brauner's SIG_IGN patch]
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/sched/task.h |  5 +++
>  kernel/fork.c              | 72 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 77 insertions(+)
>
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index f8a658700075..ecb21c0d95ce 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -95,6 +95,11 @@ struct mm_struct *copy_init_mm(void);
>  extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
>  extern long kernel_wait4(pid_t, int __user *, int, struct rusage *);
>  int kernel_wait(pid_t pid, int *stat);
> +struct task_struct *user_worker_create(int (*fn)(void *), void *arg, int node,
> +				       unsigned long clone_flags,
> +				       u32 worker_flags);
> +__printf(2, 3)
> +void user_worker_start(struct task_struct *tsk, const char namefmt[], ...);
>  
>  extern void free_task(struct task_struct *tsk);
>  
> diff --git a/kernel/fork.c b/kernel/fork.c
> index c9152596a285..e72239ae1e08 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2543,6 +2543,78 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
>  	return copy_process(NULL, 0, node, &args);
>  }
>  
> +static bool user_worker_flags_valid(struct kernel_clone_args *kargs)
> +{
> +	/* Verify that no unknown flags are passed along. */
> +	if (kargs->worker_flags & ~(USER_WORKER_IO | USER_WORKER |
> +				    USER_WORKER_NO_FILES | USER_WORKER_SIG_IGN))
> +		return false;
> +
> +	/*
> +	 * If we're ignoring all signals don't allow sharing struct sighand and
> +	 * don't bother clearing signal handlers.
> +	 */
> +	if ((kargs->flags & (CLONE_SIGHAND | CLONE_CLEAR_SIGHAND)) &&
> +	    (kargs->worker_flags & USER_WORKER_SIG_IGN))
> +		return false;
> +
> +	return true;
> +}
> +
> +/**
> + * user_worker_create - create a copy of a process to be used by the kernel
> + * @fn: thread stack
> + * @arg: data to be passed to fn
> + * @node: numa node to allocate task from
> + * @clone_flags: CLONE flags
> + * @worker_flags: USER_WORKER flags
> + *
> + * This returns a created task, or an error pointer. The returned task is
> + * inactive, and the caller must fire it up through user_worker_start(). If
> + * this is an PF_IO_WORKER all singals but KILL and STOP are blocked.
> + */
> +struct task_struct *user_worker_create(int (*fn)(void *), void *arg, int node,
> +				       unsigned long clone_flags,
> +				       u32 worker_flags)
> +{
> +	struct kernel_clone_args args = {
> +		.flags		= ((lower_32_bits(clone_flags) | CLONE_VM |
> +				   CLONE_UNTRACED) & ~CSIGNAL),
> +		.exit_signal	= (lower_32_bits(clone_flags) & CSIGNAL),
> +		.stack		= (unsigned long)fn,
> +		.stack_size	= (unsigned long)arg,
> +		.worker_flags	= USER_WORKER | worker_flags,
> +	};
> +
> +	if (!user_worker_flags_valid(&args))
> +		return ERR_PTR(-EINVAL);
> +
> +	return copy_process(NULL, 0, node, &args);
> +}
> +EXPORT_SYMBOL_GPL(user_worker_create);
> +
> +/**
> + * user_worker_start - Start a task created with user_worker_create
> + * @tsk: task to wake up
> + * @namefmt: printf-style format string for the thread name
> + * @arg: arguments for @namefmt
> + */
> +void user_worker_start(struct task_struct *tsk, const char namefmt[], ...)
> +{
> +	char name[TASK_COMM_LEN];
> +	va_list args;
> +
> +	WARN_ON(!(tsk->flags & PF_USER_WORKER));
> +
> +	va_start(args, namefmt);
> +	vsnprintf(name, sizeof(name), namefmt, args);
> +	set_task_comm(tsk, name);
> +	va_end(args);
> +
> +	wake_up_new_task(tsk);
> +}
> +EXPORT_SYMBOL_GPL(user_worker_start);
> +
>  /*
>   *  Ok, this is the main fork-routine.
>   *
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2021-12-17 18:53 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-29 19:46 [PATCH V6 01/10] Use copy_process in vhost layer Mike Christie
2021-11-29 19:46 ` Mike Christie
2021-11-29 19:46 ` [PATCH V6 01/10] fork: Make IO worker options flag based Mike Christie
2021-11-29 19:46   ` Mike Christie
2021-11-29 19:46 ` [PATCH V6 02/10] fork/vm: Move common PF_IO_WORKER behavior to new flag Mike Christie
2021-11-29 19:46   ` Mike Christie
2021-11-29 19:47 ` [PATCH V6 03/10] fork: add USER_WORKER flag to not dup/clone files Mike Christie
2021-11-29 19:47   ` Mike Christie
2021-11-29 19:47 ` [PATCH V6 04/10] fork: Add USER_WORKER flag to ignore signals Mike Christie
2021-11-29 19:47   ` Mike Christie
2021-11-29 19:47 ` [PATCH V6 05/10] signal: Perfom autoreap for PF_USER_WORKER Mike Christie
2021-11-29 19:47   ` Mike Christie
2021-12-17 18:42   ` Eric W. Biederman
2021-12-17 18:42     ` Eric W. Biederman
2021-11-29 19:47 ` [PATCH V6 06/10] fork: add helpers to clone a process for kernel use Mike Christie
2021-11-29 19:47   ` Mike Christie
2021-12-17 18:53   ` Eric W. Biederman [this message]
2021-12-17 18:53     ` Eric W. Biederman
2021-11-29 19:47 ` [PATCH V6 07/10] io_uring: switch to user_worker Mike Christie
2021-11-29 19:47   ` Mike Christie
2021-11-29 19:47 ` [PATCH V6 08/10] fork: remove create_io_thread Mike Christie
2021-11-29 19:47   ` Mike Christie
2021-11-29 19:47 ` [PATCH V6 09/10] vhost: move worker thread fields to new struct Mike Christie
2021-11-29 19:47   ` Mike Christie
2021-11-29 19:47 ` [PATCH V6 10/10] vhost: use user_worker to check RLIMITs Mike Christie
2021-11-29 19:47   ` Mike Christie
2021-12-17 19:01   ` Eric W. Biederman
2021-12-17 19:01     ` Eric W. Biederman
2021-12-08 20:34 ` [PATCH V6 01/10] Use copy_process in vhost layer Michael S. Tsirkin
2021-12-08 20:34   ` Michael S. Tsirkin
2021-12-08 22:13   ` michael.christie
2021-12-08 22:13     ` michael.christie
2021-12-09  9:32     ` Christian Brauner
2021-12-17 19:26 ` Eric W. Biederman
2021-12-17 19:26   ` Eric W. Biederman
2021-12-17 22:08   ` michael.christie
2021-12-17 22:08     ` michael.christie
2021-12-22  0:20     ` Eric W. Biederman
2021-12-22  0:20       ` Eric W. Biederman
2021-12-22 17:32       ` Mike Christie
2021-12-22 17:32         ` Mike Christie
2021-12-22 18:24         ` Eric W. Biederman
2021-12-22 18:24           ` Eric W. Biederman
2021-12-22 20:25           ` Michael S. Tsirkin
2021-12-22 20:25             ` Michael S. Tsirkin
2022-01-17 16:41           ` Mike Christie
2022-01-17 16:41             ` Mike Christie
2022-01-17 17:31             ` Eric W. Biederman
2022-01-17 17:31               ` Eric W. Biederman
2022-01-18 18:51               ` Mike Christie
2022-01-18 18:51                 ` Mike Christie
2022-01-18 19:00                 ` Mike Christie
2022-01-18 19:00                   ` Mike Christie
2022-01-18 19:12                 ` Eric W. Biederman
2022-01-18 19:12                   ` Eric W. Biederman
2022-02-02 21:02                   ` Mike Christie
2022-02-02 21:02                     ` Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o85fcbcd.fsf@email.froward.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=axboe@kernel.dk \
    --cc=christian.brauner@ubuntu.com \
    --cc=geert@linux-m68k.org \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=hdanton@sina.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.christie@oracle.com \
    --cc=mst@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=vverma@digitalocean.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.