All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Christie <michael.christie@oracle.com>
To: geert@linux-m68k.org, vverma@digitalocean.com, hdanton@sina.com,
	hch@infradead.org, stefanha@redhat.com, jasowang@redhat.com,
	mst@redhat.com, sgarzare@redhat.com,
	virtualization@lists.linux-foundation.org,
	christian.brauner@ubuntu.com, axboe@kernel.dk,
	linux-kernel@vger.kernel.org
Subject: [PATCH V4 0/8] Use copy_process/create_io_thread in vhost layer
Date: Thu,  7 Oct 2021 16:44:40 -0500	[thread overview]
Message-ID: <20211007214448.6282-1-michael.christie@oracle.com> (raw)

The following patches apply over Linus's, Jens's or mst's trees, but were
made over linux-next because patch 6:

io_uring: switch to kernel_worker

has merge conflicts with Jens Axboe's for-next branch and Paul Moore's
selinux tree's next branch.

This is V4 of the patchset. It should handle all the review comments
posted in V1 - V3. If I missed a comment, please let me know.

This patchset allows the vhost layer to do a copy_process on the thread
that does the VHOST_SET_OWNER ioctl like how io_uring does a copy_process
against its userspace app (Jens, the patches make create_io_thread more
generic so that's why you are cc'd). This allows the vhost layer's worker
threads to inherit cgroups, namespaces, address space, etc and this worker
thread will also be accounted for against that owner/parent process's
RLIMIT_NPROC limit.

If you are not familiar with qemu and vhost here is more detailed
problem description:

Qemu will create vhost devices in the kernel which perform network, SCSI,
etc IO and management operations from worker threads created by the
kthread API. Because the kthread API does a copy_process on the kthreadd
thread, the vhost layer has to use kthread_use_mm to access the Qemu
thread's memory and cgroup_attach_task_all to add itself to the Qemu
thread's cgroups.

The problem with this approach is that we then have to add new functions/
args/functionality for every thing we want to inherit. I started doing
that here:

https://lkml.org/lkml/2021/6/23/1233

for the RLIMIT_NPROC check, but it seems it might be easier to just
inherit everything from the beginning, becuase I'd need to do something
like that patch several times. For example, the current approach does not
support cgroups v2 so commands like virsh emulatorpin do not work. The
qemu process can go over its RLIMIT_NPROC. And for future vhost interfaces
where we export the vhost thread pid we will want the namespace info.

V4:
- Drop NO_SIG patch and replaced with Christian's SIG_IGN patch.
- Merged Christian's kernel_worker_flags_valid helpers into patch 5 that
  added the new kernel worker functions.
- Fixed extra "i" issue.
- Added PF_USER_WORKER flag and added check that kernel_worker_start users
  had that flag set. Also dropped patches that passed worker flags to
  copy_thread and replaced with PF_USER_WORKER check.
V3:
- Add parentheses in p->flag and work_flags check in copy_thread.
- Fix check in arm/arm64 which was doing the reverse of other archs
  where it did likely(!flags) instead of unlikely(flags).
V2:
- Rename kernel_copy_process to kernel_worker.
- Instead of exporting functions, make kernel_worker() a proper
  function/API that does common work for the caller.
- Instead of adding new fields to kernel_clone_args for each option
  make it flag based similar to CLONE_*.
- Drop unused completion struct in vhost.
- Fix compile warnings by merging vhost cgroup cleanup patch and
  vhost conversion patch.




WARNING: multiple messages have this Message-ID (diff)
From: Mike Christie <michael.christie@oracle.com>
To: geert@linux-m68k.org, vverma@digitalocean.com, hdanton@sina.com,
	hch@infradead.org, stefanha@redhat.com, jasowang@redhat.com,
	mst@redhat.com, sgarzare@redhat.com,
	virtualization@lists.linux-foundation.org,
	christian.brauner@ubuntu.com, axboe@kernel.dk,
	linux-kernel@vger.kernel.org
Subject: [PATCH V4 0/8] Use copy_process/create_io_thread in vhost layer
Date: Thu,  7 Oct 2021 16:44:40 -0500	[thread overview]
Message-ID: <20211007214448.6282-1-michael.christie@oracle.com> (raw)

The following patches apply over Linus's, Jens's or mst's trees, but were
made over linux-next because patch 6:

io_uring: switch to kernel_worker

has merge conflicts with Jens Axboe's for-next branch and Paul Moore's
selinux tree's next branch.

This is V4 of the patchset. It should handle all the review comments
posted in V1 - V3. If I missed a comment, please let me know.

This patchset allows the vhost layer to do a copy_process on the thread
that does the VHOST_SET_OWNER ioctl like how io_uring does a copy_process
against its userspace app (Jens, the patches make create_io_thread more
generic so that's why you are cc'd). This allows the vhost layer's worker
threads to inherit cgroups, namespaces, address space, etc and this worker
thread will also be accounted for against that owner/parent process's
RLIMIT_NPROC limit.

If you are not familiar with qemu and vhost here is more detailed
problem description:

Qemu will create vhost devices in the kernel which perform network, SCSI,
etc IO and management operations from worker threads created by the
kthread API. Because the kthread API does a copy_process on the kthreadd
thread, the vhost layer has to use kthread_use_mm to access the Qemu
thread's memory and cgroup_attach_task_all to add itself to the Qemu
thread's cgroups.

The problem with this approach is that we then have to add new functions/
args/functionality for every thing we want to inherit. I started doing
that here:

https://lkml.org/lkml/2021/6/23/1233

for the RLIMIT_NPROC check, but it seems it might be easier to just
inherit everything from the beginning, becuase I'd need to do something
like that patch several times. For example, the current approach does not
support cgroups v2 so commands like virsh emulatorpin do not work. The
qemu process can go over its RLIMIT_NPROC. And for future vhost interfaces
where we export the vhost thread pid we will want the namespace info.

V4:
- Drop NO_SIG patch and replaced with Christian's SIG_IGN patch.
- Merged Christian's kernel_worker_flags_valid helpers into patch 5 that
  added the new kernel worker functions.
- Fixed extra "i" issue.
- Added PF_USER_WORKER flag and added check that kernel_worker_start users
  had that flag set. Also dropped patches that passed worker flags to
  copy_thread and replaced with PF_USER_WORKER check.
V3:
- Add parentheses in p->flag and work_flags check in copy_thread.
- Fix check in arm/arm64 which was doing the reverse of other archs
  where it did likely(!flags) instead of unlikely(flags).
V2:
- Rename kernel_copy_process to kernel_worker.
- Instead of exporting functions, make kernel_worker() a proper
  function/API that does common work for the caller.
- Instead of adding new fields to kernel_clone_args for each option
  make it flag based similar to CLONE_*.
- Drop unused completion struct in vhost.
- Fix compile warnings by merging vhost cgroup cleanup patch and
  vhost conversion patch.



_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

             reply	other threads:[~2021-10-07 21:45 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-07 21:44 Mike Christie [this message]
2021-10-07 21:44 ` [PATCH V4 0/8] Use copy_process/create_io_thread in vhost layer Mike Christie
2021-10-07 21:44 ` [PATCH V4 1/8] fork: Make IO worker options flag based Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-07 21:44 ` [PATCH V4 2/8] fork: move PF_IO_WORKER's kernel frame setup to new flag Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-08  8:21   ` Geert Uytterhoeven
2021-10-08  8:21     ` Geert Uytterhoeven
2021-10-22  9:55   ` Michael S. Tsirkin
2021-10-22  9:55     ` Michael S. Tsirkin
2021-10-07 21:44 ` [PATCH V4 3/8] fork: add option to not clone or dup files Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-07 21:44 ` [PATCH V4 4/8] fork: Add KERNEL_WORKER flag to ignore signals Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-07 21:44 ` [PATCH V4 5/8] fork: add helper to clone a process Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-07 21:44 ` [PATCH V4 6/8] io_uring: switch to kernel_worker Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-08  6:42   ` kernel test robot
2021-10-08  6:42     ` kernel test robot
2021-10-08  6:42     ` kernel test robot
2021-10-08  7:10   ` kernel test robot
2021-10-08  7:10     ` kernel test robot
2021-10-07 21:44 ` [PATCH V4 7/8] vhost: move worker thread fields to new struct Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-07 21:44 ` [PATCH V4 8/8] vhost: use kernel_worker to check RLIMITs and inherit v2 cgroups Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-12  6:43 ` [PATCH V4 0/8] Use copy_process/create_io_thread in vhost layer Christoph Hellwig
2021-10-12  6:43   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211007214448.6282-1-michael.christie@oracle.com \
    --to=michael.christie@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=christian.brauner@ubuntu.com \
    --cc=geert@linux-m68k.org \
    --cc=hch@infradead.org \
    --cc=hdanton@sina.com \
    --cc=jasowang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=vverma@digitalocean.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.