All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <christian.brauner@ubuntu.com>
To: Mike Christie <michael.christie@oracle.com>
Cc: stefanha@redhat.com, jasowang@redhat.com, mst@redhat.com,
	sgarzare@redhat.com, virtualization@lists.linux-foundation.org,
	axboe@kernel.dk, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/8] fork: add option to not clone or dup files
Date: Fri, 17 Sep 2021 10:54:55 +0200	[thread overview]
Message-ID: <20210917085455.xggfayhxdq6eejz4@wittgenstein> (raw)
In-Reply-To: <20210916212051.6918-4-michael.christie@oracle.com>

On Thu, Sep 16, 2021 at 04:20:46PM -0500, Mike Christie wrote:
> Each vhost device gets a thread that is used to perform IO and management
> operations. Instead of a thread that is accessing a device, the thread is
> part of the device, so when it calls kernel_copy_process we can't dup or
> clone the parent's (Qemu thread that does the VHOST_SET_OWNER ioctl)
> files/FDS because it would do an extra increment on ourself.
> 
> Later, when we do:
> 
> Qemu process exits:
> 	do_exit -> exit_files -> put_files_struct -> close_files
> 
> we would leak the device's resources because of that extra refcount
> on the fd or file_struct.
> 
> This patch adds a no_files option so these worker threads can prevent
> taking an extra refcount on themselves.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  include/linux/sched/task.h |  3 ++-
>  kernel/fork.c              | 14 +++++++++++---
>  2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index c55f1eb69d41..d0b0872f56cc 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -32,6 +32,7 @@ struct kernel_clone_args {
>  	size_t set_tid_size;
>  	int cgroup;
>  	int io_thread;
> +	int no_files;
>  	struct cgroup *cgrp;
>  	struct css_set *cset;
>  };
> @@ -86,7 +87,7 @@ extern pid_t kernel_clone(struct kernel_clone_args *kargs);
>  struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node);
>  struct task_struct *kernel_copy_process(int (*fn)(void *), void *arg, int node,
>  					unsigned long clone_flags,
> -					int io_thread);
> +					int io_thread, int no_files);
>  struct task_struct *fork_idle(int);
>  struct mm_struct *copy_init_mm(void);
>  extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index cec7b6011beb..a0468e30b27e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1532,7 +1532,8 @@ static int copy_fs(unsigned long clone_flags, struct task_struct *tsk)
>  	return 0;
>  }
>  
> -static int copy_files(unsigned long clone_flags, struct task_struct *tsk)
> +static int copy_files(unsigned long clone_flags, struct task_struct *tsk,
> +		      int no_files)
>  {
>  	struct files_struct *oldf, *newf;
>  	int error = 0;
> @@ -1544,6 +1545,11 @@ static int copy_files(unsigned long clone_flags, struct task_struct *tsk)
>  	if (!oldf)
>  		goto out;
>  
> +	if (no_files) {
> +		tsk->files = NULL;
> +		goto out;
> +	}
> +
>  	if (clone_flags & CLONE_FILES) {
>  		atomic_inc(&oldf->count);
>  		goto out;
> @@ -2179,7 +2185,7 @@ static __latent_entropy struct task_struct *copy_process(
>  	retval = copy_semundo(clone_flags, p);
>  	if (retval)
>  		goto bad_fork_cleanup_security;
> -	retval = copy_files(clone_flags, p);
> +	retval = copy_files(clone_flags, p, args->no_files);
>  	if (retval)
>  		goto bad_fork_cleanup_semundo;
>  	retval = copy_fs(clone_flags, p);
> @@ -2539,6 +2545,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
>   * @node: numa node to allocate task from
>   * @clone_flags: CLONE flags
>   * @io_thread: 1 if this will be a PF_IO_WORKER else 0.
> + * @no_files: Do not duplicate or copy the parent's open files.
>   *
>   * This returns a created task, or an error pointer. The returned task is
>   * inactive, and the caller must fire it up through wake_up_new_task(p). If
> @@ -2546,7 +2553,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
>   */
>  struct task_struct *kernel_copy_process(int (*fn)(void *), void *arg, int node,
>  					unsigned long clone_flags,
> -					int io_thread)
> +					int io_thread, int no_files)

I think that the addition of no_files together with io_thread might be a
sign to introduce a set of kernel internal clone flags in
kernel_clone_args.

  reply	other threads:[~2021-09-17  8:55 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-16 21:20 [PATCH 0/8] Use copy_process/create_io_thread in vhost layer Mike Christie
2021-09-16 21:20 ` Mike Christie
2021-09-16 21:20 ` [PATCH 1/8] fork: add helper to clone a process Mike Christie
2021-09-16 21:20   ` Mike Christie
2021-09-17  6:00   ` Christoph Hellwig
2021-09-17  6:00     ` Christoph Hellwig
2021-09-17  7:44     ` Christian Brauner
2021-09-17  8:01       ` Christoph Hellwig
2021-09-17  8:01         ` Christoph Hellwig
2021-09-17  8:43         ` Christian Brauner
2021-09-17  8:48   ` Christian Brauner
2021-09-16 21:20 ` [PATCH 2/8] signal: Export ignore_signals Mike Christie
2021-09-16 21:20   ` Mike Christie
2021-09-16 21:20 ` [PATCH 3/8] fork: add option to not clone or dup files Mike Christie
2021-09-16 21:20   ` Mike Christie
2021-09-17  8:54   ` Christian Brauner [this message]
2021-09-16 21:20 ` [PATCH 4/8] fork: move PF_IO_WORKER's kernel frame setup to new flag Mike Christie
2021-09-16 21:20   ` Mike Christie
2021-09-16 21:20 ` [PATCH 5/8] io_uring: switch to kernel_copy_process Mike Christie
2021-09-16 21:20   ` Mike Christie
2021-09-16 21:20 ` [PATCH 6/8] vhost: move worker thread fields to new struct Mike Christie
2021-09-16 21:20   ` Mike Christie
2021-09-16 21:20 ` [PATCH 7/8] vhost: use kernel_copy_process to check RLIMITs and inherit cgroups Mike Christie
2021-09-16 21:20   ` Mike Christie
2021-09-19  8:24   ` Hillf Danton
2021-09-20 20:47   ` kernel test robot
2021-09-20 20:47     ` kernel test robot
2021-09-20 20:47     ` kernel test robot
2021-09-16 21:20 ` [PATCH 8/8] vhost: remove cgroup code Mike Christie
2021-09-16 21:20   ` Mike Christie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210917085455.xggfayhxdq6eejz4@wittgenstein \
    --to=christian.brauner@ubuntu.com \
    --cc=axboe@kernel.dk \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.christie@oracle.com \
    --cc=mst@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.