All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/8] Use copy_process/create_io_thread in vhost layer
@ 2021-10-07 21:44 ` Mike Christie
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel

The following patches apply over Linus's, Jens's or mst's trees, but were
made over linux-next because patch 6:

io_uring: switch to kernel_worker

has merge conflicts with Jens Axboe's for-next branch and Paul Moore's
selinux tree's next branch.

This is V4 of the patchset. It should handle all the review comments
posted in V1 - V3. If I missed a comment, please let me know.

This patchset allows the vhost layer to do a copy_process on the thread
that does the VHOST_SET_OWNER ioctl like how io_uring does a copy_process
against its userspace app (Jens, the patches make create_io_thread more
generic so that's why you are cc'd). This allows the vhost layer's worker
threads to inherit cgroups, namespaces, address space, etc and this worker
thread will also be accounted for against that owner/parent process's
RLIMIT_NPROC limit.

If you are not familiar with qemu and vhost here is more detailed
problem description:

Qemu will create vhost devices in the kernel which perform network, SCSI,
etc IO and management operations from worker threads created by the
kthread API. Because the kthread API does a copy_process on the kthreadd
thread, the vhost layer has to use kthread_use_mm to access the Qemu
thread's memory and cgroup_attach_task_all to add itself to the Qemu
thread's cgroups.

The problem with this approach is that we then have to add new functions/
args/functionality for every thing we want to inherit. I started doing
that here:

https://lkml.org/lkml/2021/6/23/1233

for the RLIMIT_NPROC check, but it seems it might be easier to just
inherit everything from the beginning, becuase I'd need to do something
like that patch several times. For example, the current approach does not
support cgroups v2 so commands like virsh emulatorpin do not work. The
qemu process can go over its RLIMIT_NPROC. And for future vhost interfaces
where we export the vhost thread pid we will want the namespace info.

V4:
- Drop NO_SIG patch and replaced with Christian's SIG_IGN patch.
- Merged Christian's kernel_worker_flags_valid helpers into patch 5 that
  added the new kernel worker functions.
- Fixed extra "i" issue.
- Added PF_USER_WORKER flag and added check that kernel_worker_start users
  had that flag set. Also dropped patches that passed worker flags to
  copy_thread and replaced with PF_USER_WORKER check.
V3:
- Add parentheses in p->flag and work_flags check in copy_thread.
- Fix check in arm/arm64 which was doing the reverse of other archs
  where it did likely(!flags) instead of unlikely(flags).
V2:
- Rename kernel_copy_process to kernel_worker.
- Instead of exporting functions, make kernel_worker() a proper
  function/API that does common work for the caller.
- Instead of adding new fields to kernel_clone_args for each option
  make it flag based similar to CLONE_*.
- Drop unused completion struct in vhost.
- Fix compile warnings by merging vhost cgroup cleanup patch and
  vhost conversion patch.




^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V4 0/8] Use copy_process/create_io_thread in vhost layer
@ 2021-10-07 21:44 ` Mike Christie
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel

The following patches apply over Linus's, Jens's or mst's trees, but were
made over linux-next because patch 6:

io_uring: switch to kernel_worker

has merge conflicts with Jens Axboe's for-next branch and Paul Moore's
selinux tree's next branch.

This is V4 of the patchset. It should handle all the review comments
posted in V1 - V3. If I missed a comment, please let me know.

This patchset allows the vhost layer to do a copy_process on the thread
that does the VHOST_SET_OWNER ioctl like how io_uring does a copy_process
against its userspace app (Jens, the patches make create_io_thread more
generic so that's why you are cc'd). This allows the vhost layer's worker
threads to inherit cgroups, namespaces, address space, etc and this worker
thread will also be accounted for against that owner/parent process's
RLIMIT_NPROC limit.

If you are not familiar with qemu and vhost here is more detailed
problem description:

Qemu will create vhost devices in the kernel which perform network, SCSI,
etc IO and management operations from worker threads created by the
kthread API. Because the kthread API does a copy_process on the kthreadd
thread, the vhost layer has to use kthread_use_mm to access the Qemu
thread's memory and cgroup_attach_task_all to add itself to the Qemu
thread's cgroups.

The problem with this approach is that we then have to add new functions/
args/functionality for every thing we want to inherit. I started doing
that here:

https://lkml.org/lkml/2021/6/23/1233

for the RLIMIT_NPROC check, but it seems it might be easier to just
inherit everything from the beginning, becuase I'd need to do something
like that patch several times. For example, the current approach does not
support cgroups v2 so commands like virsh emulatorpin do not work. The
qemu process can go over its RLIMIT_NPROC. And for future vhost interfaces
where we export the vhost thread pid we will want the namespace info.

V4:
- Drop NO_SIG patch and replaced with Christian's SIG_IGN patch.
- Merged Christian's kernel_worker_flags_valid helpers into patch 5 that
  added the new kernel worker functions.
- Fixed extra "i" issue.
- Added PF_USER_WORKER flag and added check that kernel_worker_start users
  had that flag set. Also dropped patches that passed worker flags to
  copy_thread and replaced with PF_USER_WORKER check.
V3:
- Add parentheses in p->flag and work_flags check in copy_thread.
- Fix check in arm/arm64 which was doing the reverse of other archs
  where it did likely(!flags) instead of unlikely(flags).
V2:
- Rename kernel_copy_process to kernel_worker.
- Instead of exporting functions, make kernel_worker() a proper
  function/API that does common work for the caller.
- Instead of adding new fields to kernel_clone_args for each option
  make it flag based similar to CLONE_*.
- Drop unused completion struct in vhost.
- Fix compile warnings by merging vhost cgroup cleanup patch and
  vhost conversion patch.



_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH V4 1/8] fork: Make IO worker options flag based
  2021-10-07 21:44 ` Mike Christie
@ 2021-10-07 21:44   ` Mike Christie
  -1 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel

This patchset adds a couple new options to kernel_clone_args for IO thread
like/related users. Instead of adding new fields to kernel_clone_args for
each option, this moves us to a flags based approach by first converting
io_thread.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 include/linux/sched/task.h | 4 +++-
 kernel/fork.c              | 4 ++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ef02be869cf2..48417c735438 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -18,8 +18,11 @@ struct css_set;
 /* All the bits taken by the old clone syscall. */
 #define CLONE_LEGACY_FLAGS 0xffffffffULL
 
+#define KERN_WORKER_IO		BIT(0)
+
 struct kernel_clone_args {
 	u64 flags;
+	u32 worker_flags;
 	int __user *pidfd;
 	int __user *child_tid;
 	int __user *parent_tid;
@@ -31,7 +34,6 @@ struct kernel_clone_args {
 	/* Number of elements in *set_tid */
 	size_t set_tid_size;
 	int cgroup;
-	int io_thread;
 	struct cgroup *cgrp;
 	struct css_set *cset;
 };
diff --git a/kernel/fork.c b/kernel/fork.c
index 38681ad44c76..3988106e9609 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2026,7 +2026,7 @@ static __latent_entropy struct task_struct *copy_process(
 	p = dup_task_struct(current, node);
 	if (!p)
 		goto fork_out;
-	if (args->io_thread) {
+	if (args->worker_flags & KERN_WORKER_IO) {
 		/*
 		 * Mark us an IO worker, and block any signal that isn't
 		 * fatal or STOP
@@ -2526,7 +2526,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
 		.exit_signal	= (lower_32_bits(flags) & CSIGNAL),
 		.stack		= (unsigned long)fn,
 		.stack_size	= (unsigned long)arg,
-		.io_thread	= 1,
+		.worker_flags	= KERN_WORKER_IO,
 	};
 
 	return copy_process(NULL, 0, node, &args);
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 1/8] fork: Make IO worker options flag based
@ 2021-10-07 21:44   ` Mike Christie
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel
  Cc: Mike Christie

This patchset adds a couple new options to kernel_clone_args for IO thread
like/related users. Instead of adding new fields to kernel_clone_args for
each option, this moves us to a flags based approach by first converting
io_thread.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 include/linux/sched/task.h | 4 +++-
 kernel/fork.c              | 4 ++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ef02be869cf2..48417c735438 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -18,8 +18,11 @@ struct css_set;
 /* All the bits taken by the old clone syscall. */
 #define CLONE_LEGACY_FLAGS 0xffffffffULL
 
+#define KERN_WORKER_IO		BIT(0)
+
 struct kernel_clone_args {
 	u64 flags;
+	u32 worker_flags;
 	int __user *pidfd;
 	int __user *child_tid;
 	int __user *parent_tid;
@@ -31,7 +34,6 @@ struct kernel_clone_args {
 	/* Number of elements in *set_tid */
 	size_t set_tid_size;
 	int cgroup;
-	int io_thread;
 	struct cgroup *cgrp;
 	struct css_set *cset;
 };
diff --git a/kernel/fork.c b/kernel/fork.c
index 38681ad44c76..3988106e9609 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2026,7 +2026,7 @@ static __latent_entropy struct task_struct *copy_process(
 	p = dup_task_struct(current, node);
 	if (!p)
 		goto fork_out;
-	if (args->io_thread) {
+	if (args->worker_flags & KERN_WORKER_IO) {
 		/*
 		 * Mark us an IO worker, and block any signal that isn't
 		 * fatal or STOP
@@ -2526,7 +2526,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
 		.exit_signal	= (lower_32_bits(flags) & CSIGNAL),
 		.stack		= (unsigned long)fn,
 		.stack_size	= (unsigned long)arg,
-		.io_thread	= 1,
+		.worker_flags	= KERN_WORKER_IO,
 	};
 
 	return copy_process(NULL, 0, node, &args);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 2/8] fork: move PF_IO_WORKER's kernel frame setup to new flag
  2021-10-07 21:44 ` Mike Christie
@ 2021-10-07 21:44   ` Mike Christie
  -1 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel

The vhost worker threads need the same frame setup as io_uring's worker
threads, but handle signals differently and do not need the same
scheduling behavior. This patch separate's the frame setup parts of
PF_IO_WORKER into a new PF flag PF_USER_WORKER.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 arch/alpha/kernel/process.c      | 2 +-
 arch/arc/kernel/process.c        | 2 +-
 arch/arm/kernel/process.c        | 2 +-
 arch/arm64/kernel/process.c      | 2 +-
 arch/csky/kernel/process.c       | 2 +-
 arch/h8300/kernel/process.c      | 2 +-
 arch/hexagon/kernel/process.c    | 2 +-
 arch/ia64/kernel/process.c       | 2 +-
 arch/m68k/kernel/process.c       | 2 +-
 arch/microblaze/kernel/process.c | 2 +-
 arch/mips/kernel/process.c       | 2 +-
 arch/nds32/kernel/process.c      | 2 +-
 arch/nios2/kernel/process.c      | 2 +-
 arch/openrisc/kernel/process.c   | 2 +-
 arch/parisc/kernel/process.c     | 2 +-
 arch/powerpc/kernel/process.c    | 2 +-
 arch/riscv/kernel/process.c      | 2 +-
 arch/s390/kernel/process.c       | 2 +-
 arch/sh/kernel/process_32.c      | 2 +-
 arch/sparc/kernel/process_32.c   | 2 +-
 arch/sparc/kernel/process_64.c   | 2 +-
 arch/um/kernel/process.c         | 2 +-
 arch/x86/kernel/process.c        | 2 +-
 arch/xtensa/kernel/process.c     | 2 +-
 include/linux/sched.h            | 1 +
 include/linux/sched/task.h       | 1 +
 kernel/fork.c                    | 4 +++-
 27 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index a5123ea426ce..e350fff2ea14 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -249,7 +249,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 	childti->pcb.ksp = (unsigned long) childstack;
 	childti->pcb.flags = 1;	/* set FEN, clear everything else */
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* kernel thread */
 		memset(childstack, 0,
 			sizeof(struct switch_stack) + sizeof(struct pt_regs));
diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 3793876f42d9..c3f4952cce17 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -191,7 +191,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 	childksp[0] = 0;			/* fp */
 	childksp[1] = (unsigned long)ret_from_fork; /* blink */
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(c_regs, 0, sizeof(struct pt_regs));
 
 		c_callee->r13 = kthread_arg;
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 0e2d3051741e..449c9db3942a 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -247,7 +247,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	thread->cpu_domain = get_domain();
 #endif
 
-	if (likely(!(p->flags & (PF_KTHREAD | PF_IO_WORKER)))) {
+	if (likely(!(p->flags & (PF_KTHREAD | PF_USER_WORKER)))) {
 		*childregs = *current_pt_regs();
 		childregs->ARM_r0 = 0;
 		if (stack_start)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 40adb8cdbf5a..e2fe88a3ae90 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -333,7 +333,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 
 	ptrauth_thread_init_kernel(p);
 
-	if (likely(!(p->flags & (PF_KTHREAD | PF_IO_WORKER)))) {
+	if (likely(!(p->flags & (PF_KTHREAD | PF_USER_WORKER)))) {
 		*childregs = *current_pt_regs();
 		childregs->regs[0] = 0;
 
diff --git a/arch/csky/kernel/process.c b/arch/csky/kernel/process.c
index 3d0ca22cd0e2..509f2bfe4ace 100644
--- a/arch/csky/kernel/process.c
+++ b/arch/csky/kernel/process.c
@@ -49,7 +49,7 @@ int copy_thread(unsigned long clone_flags,
 	/* setup thread.sp for switch_to !!! */
 	p->thread.sp = (unsigned long)childstack;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(childregs, 0, sizeof(struct pt_regs));
 		childstack->r15 = (unsigned long) ret_from_kernel_thread;
 		childstack->r10 = kthread_arg;
diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
index 2ac27e4248a4..11baf058b6c5 100644
--- a/arch/h8300/kernel/process.c
+++ b/arch/h8300/kernel/process.c
@@ -112,7 +112,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 
 	childregs = (struct pt_regs *) (THREAD_SIZE + task_stack_page(p)) - 1;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(childregs, 0, sizeof(struct pt_regs));
 		childregs->retpc = (unsigned long) ret_from_kernel_thread;
 		childregs->er4 = topstk; /* arg */
diff --git a/arch/hexagon/kernel/process.c b/arch/hexagon/kernel/process.c
index 6a6835fb4242..f17573b66303 100644
--- a/arch/hexagon/kernel/process.c
+++ b/arch/hexagon/kernel/process.c
@@ -73,7 +73,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 						    sizeof(*ss));
 	ss->lr = (unsigned long)ret_from_fork;
 	p->thread.switch_sp = ss;
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(childregs, 0, sizeof(struct pt_regs));
 		/* r24 <- fn, r25 <- arg */
 		ss->r24 = usp;
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index e56d63f4abf9..4a58daa56af4 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -338,7 +338,7 @@ copy_thread(unsigned long clone_flags, unsigned long user_stack_base,
 
 	ia64_drop_fpu(p);	/* don't pick up stale state from a CPU's fph */
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		if (unlikely(!user_stack_base)) {
 			/* fork_idle() called us */
 			return 0;
diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c
index 1ab692b952cd..e7474a118410 100644
--- a/arch/m68k/kernel/process.c
+++ b/arch/m68k/kernel/process.c
@@ -157,7 +157,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 	 */
 	p->thread.fc = USER_DATA;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* kernel thread */
 		memset(frame, 0, sizeof(struct fork_frame));
 		frame->regs.sr = PS_S;
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 62aa237180b6..5b543be324d4 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -59,7 +59,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 	struct pt_regs *childregs = task_pt_regs(p);
 	struct thread_info *ti = task_thread_info(p);
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* if we're creating a new kernel thread then just zeroing all
 		 * the registers. That's OK for a brand new thread.*/
 		memset(childregs, 0, sizeof(struct pt_regs));
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 95aa86fa6077..d9ca11dd544f 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -120,7 +120,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 	/*  Put the stack after the struct pt_regs.  */
 	childksp = (unsigned long) childregs;
 	p->thread.cp0_status = (read_c0_status() & ~(ST0_CU2|ST0_CU1)) | ST0_KERNEL_CUMASK;
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* kernel thread */
 		unsigned long status = p->thread.cp0_status;
 		memset(childregs, 0, sizeof(struct pt_regs));
diff --git a/arch/nds32/kernel/process.c b/arch/nds32/kernel/process.c
index 391895b54d13..2dba51d1889c 100644
--- a/arch/nds32/kernel/process.c
+++ b/arch/nds32/kernel/process.c
@@ -156,7 +156,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 
 	memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context));
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(childregs, 0, sizeof(struct pt_regs));
 		/* kernel thread fn */
 		p->thread.cpu_context.r6 = stack_start;
diff --git a/arch/nios2/kernel/process.c b/arch/nios2/kernel/process.c
index 9ff37ba2bb60..ce6ad177da15 100644
--- a/arch/nios2/kernel/process.c
+++ b/arch/nios2/kernel/process.c
@@ -109,7 +109,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 	struct switch_stack *childstack =
 		((struct switch_stack *)childregs) - 1;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(childstack, 0,
 			sizeof(struct switch_stack) + sizeof(struct pt_regs));
 
diff --git a/arch/openrisc/kernel/process.c b/arch/openrisc/kernel/process.c
index b0698d9ce14f..d1d189c16676 100644
--- a/arch/openrisc/kernel/process.c
+++ b/arch/openrisc/kernel/process.c
@@ -172,7 +172,7 @@ copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 	sp -= sizeof(struct pt_regs);
 	kregs = (struct pt_regs *)sp;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(kregs, 0, sizeof(struct pt_regs));
 		kregs->gpr[20] = usp; /* fn, kernel thread */
 		kregs->gpr[22] = arg;
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index 38ec4ae81239..257bec7e67d4 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -197,7 +197,7 @@ copy_thread(unsigned long clone_flags, unsigned long usp,
 	extern void * const ret_from_kernel_thread;
 	extern void * const child_return;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* kernel thread */
 		memset(cregs, 0, sizeof(struct pt_regs));
 		if (!usp) /* idle thread */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 50436b52c213..817847723bff 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1700,7 +1700,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 	/* Copy registers */
 	sp -= sizeof(struct pt_regs);
 	childregs = (struct pt_regs *) sp;
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
 		childregs->gpr[1] = sp + sizeof(struct pt_regs);
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 03ac3aa611f5..8deeb94eb51e 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -125,7 +125,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 	struct pt_regs *childregs = task_pt_regs(p);
 
 	/* p->thread holds context to be restored by __switch_to() */
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* Kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
 		childregs->gp = gp_in_global;
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 350e94d0cac2..f596843ab55c 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -130,7 +130,7 @@ int copy_thread(unsigned long clone_flags, unsigned long new_stackp,
 	frame->sf.gprs[9] = (unsigned long)frame;
 
 	/* Store access registers to kernel stack of new process. */
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* kernel thread */
 		memset(&frame->childregs, 0, sizeof(struct pt_regs));
 		frame->childregs.psw.mask = PSW_KERNEL_BITS | PSW_MASK_DAT |
diff --git a/arch/sh/kernel/process_32.c b/arch/sh/kernel/process_32.c
index 717de05c81f4..e74906f53c3e 100644
--- a/arch/sh/kernel/process_32.c
+++ b/arch/sh/kernel/process_32.c
@@ -114,7 +114,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 
 	childregs = task_pt_regs(p);
 	p->thread.sp = (unsigned long) childregs;
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(childregs, 0, sizeof(struct pt_regs));
 		p->thread.pc = (unsigned long) ret_from_kernel_thread;
 		childregs->regs[4] = arg;
diff --git a/arch/sparc/kernel/process_32.c b/arch/sparc/kernel/process_32.c
index bbbe0cfef746..978e0bc10ad4 100644
--- a/arch/sparc/kernel/process_32.c
+++ b/arch/sparc/kernel/process_32.c
@@ -296,7 +296,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
 	ti->ksp = (unsigned long) new_stack;
 	p->thread.kregs = childregs;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		extern int nwindows;
 		unsigned long psr;
 		memset(new_stack, 0, STACKFRAME_SZ + TRACEREG_SZ);
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index d1cc410d2f64..1c45cd5089f4 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -594,7 +594,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
 				       sizeof(struct sparc_stackf));
 	t->fpsaved[0] = 0;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(child_trap_frame, 0, child_stack_sz);
 		__thread_flag_byte_ptr(t)[TI_FLAG_BYTE_CWP] = 
 			(current_pt_regs()->tstate + 1) & TSTATE_CWP;
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 457a38db368b..2bc3141cbf01 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -157,7 +157,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
 		unsigned long arg, struct task_struct * p, unsigned long tls)
 {
 	void (*handler)(void);
-	int kthread = current->flags & (PF_KTHREAD | PF_IO_WORKER);
+	int kthread = current->flags & (PF_KTHREAD | PF_USER_WORKER);
 	int ret = 0;
 
 	p->thread = (struct thread_struct) INIT_THREAD;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 1d9463e3096b..d88be9dd5dfd 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -178,7 +178,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
 	task_user_gs(p) = get_user_gs(current_pt_regs());
 #endif
 
-	if (unlikely(p->flags & PF_IO_WORKER)) {
+	if (unlikely(p->flags & PF_USER_WORKER)) {
 		/*
 		 * An IO thread is a user space thread, but it doesn't
 		 * return to ret_after_fork().
diff --git a/arch/xtensa/kernel/process.c b/arch/xtensa/kernel/process.c
index 060165340612..61ad0bfbd7ea 100644
--- a/arch/xtensa/kernel/process.c
+++ b/arch/xtensa/kernel/process.c
@@ -217,7 +217,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp_thread_fn,
 
 	p->thread.sp = (unsigned long)childregs;
 
-	if (!(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (!(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		struct pt_regs *regs = current_pt_regs();
 		unsigned long usp = usp_thread_fn ?
 			usp_thread_fn : regs->areg[1];
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c1a927ddec64..b1027e916be4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1665,6 +1665,7 @@ extern struct pid *cad_pid;
 #define PF_VCPU			0x00000001	/* I'm a virtual CPU */
 #define PF_IDLE			0x00000002	/* I am an IDLE thread */
 #define PF_EXITING		0x00000004	/* Getting shut down */
+#define PF_USER_WORKER		0x00000008	/* Kernel thread cloned from userspace thread */
 #define PF_IO_WORKER		0x00000010	/* Task is an IO worker */
 #define PF_WQ_WORKER		0x00000020	/* I'm a workqueue worker */
 #define PF_FORKNOEXEC		0x00000040	/* Forked but didn't exec */
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 48417c735438..53599a99d7e0 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -19,6 +19,7 @@ struct css_set;
 #define CLONE_LEGACY_FLAGS 0xffffffffULL
 
 #define KERN_WORKER_IO		BIT(0)
+#define KERN_WORKER_USER	BIT(1)
 
 struct kernel_clone_args {
 	u64 flags;
diff --git a/kernel/fork.c b/kernel/fork.c
index 3988106e9609..4f780424de46 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2035,6 +2035,8 @@ static __latent_entropy struct task_struct *copy_process(
 		siginitsetinv(&p->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
 	}
 
+	if (args->worker_flags & KERN_WORKER_USER)
+		p->flags |= PF_USER_WORKER;
 	/*
 	 * This _must_ happen before we call free_task(), i.e. before we jump
 	 * to any of the bad_fork_* labels. This is to avoid freeing
@@ -2526,7 +2528,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
 		.exit_signal	= (lower_32_bits(flags) & CSIGNAL),
 		.stack		= (unsigned long)fn,
 		.stack_size	= (unsigned long)arg,
-		.worker_flags	= KERN_WORKER_IO,
+		.worker_flags	= KERN_WORKER_IO | KERN_WORKER_USER,
 	};
 
 	return copy_process(NULL, 0, node, &args);
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 2/8] fork: move PF_IO_WORKER's kernel frame setup to new flag
@ 2021-10-07 21:44   ` Mike Christie
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel
  Cc: Mike Christie

The vhost worker threads need the same frame setup as io_uring's worker
threads, but handle signals differently and do not need the same
scheduling behavior. This patch separate's the frame setup parts of
PF_IO_WORKER into a new PF flag PF_USER_WORKER.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 arch/alpha/kernel/process.c      | 2 +-
 arch/arc/kernel/process.c        | 2 +-
 arch/arm/kernel/process.c        | 2 +-
 arch/arm64/kernel/process.c      | 2 +-
 arch/csky/kernel/process.c       | 2 +-
 arch/h8300/kernel/process.c      | 2 +-
 arch/hexagon/kernel/process.c    | 2 +-
 arch/ia64/kernel/process.c       | 2 +-
 arch/m68k/kernel/process.c       | 2 +-
 arch/microblaze/kernel/process.c | 2 +-
 arch/mips/kernel/process.c       | 2 +-
 arch/nds32/kernel/process.c      | 2 +-
 arch/nios2/kernel/process.c      | 2 +-
 arch/openrisc/kernel/process.c   | 2 +-
 arch/parisc/kernel/process.c     | 2 +-
 arch/powerpc/kernel/process.c    | 2 +-
 arch/riscv/kernel/process.c      | 2 +-
 arch/s390/kernel/process.c       | 2 +-
 arch/sh/kernel/process_32.c      | 2 +-
 arch/sparc/kernel/process_32.c   | 2 +-
 arch/sparc/kernel/process_64.c   | 2 +-
 arch/um/kernel/process.c         | 2 +-
 arch/x86/kernel/process.c        | 2 +-
 arch/xtensa/kernel/process.c     | 2 +-
 include/linux/sched.h            | 1 +
 include/linux/sched/task.h       | 1 +
 kernel/fork.c                    | 4 +++-
 27 files changed, 29 insertions(+), 25 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index a5123ea426ce..e350fff2ea14 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -249,7 +249,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 	childti->pcb.ksp = (unsigned long) childstack;
 	childti->pcb.flags = 1;	/* set FEN, clear everything else */
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* kernel thread */
 		memset(childstack, 0,
 			sizeof(struct switch_stack) + sizeof(struct pt_regs));
diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
index 3793876f42d9..c3f4952cce17 100644
--- a/arch/arc/kernel/process.c
+++ b/arch/arc/kernel/process.c
@@ -191,7 +191,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 	childksp[0] = 0;			/* fp */
 	childksp[1] = (unsigned long)ret_from_fork; /* blink */
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(c_regs, 0, sizeof(struct pt_regs));
 
 		c_callee->r13 = kthread_arg;
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 0e2d3051741e..449c9db3942a 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -247,7 +247,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	thread->cpu_domain = get_domain();
 #endif
 
-	if (likely(!(p->flags & (PF_KTHREAD | PF_IO_WORKER)))) {
+	if (likely(!(p->flags & (PF_KTHREAD | PF_USER_WORKER)))) {
 		*childregs = *current_pt_regs();
 		childregs->ARM_r0 = 0;
 		if (stack_start)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 40adb8cdbf5a..e2fe88a3ae90 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -333,7 +333,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 
 	ptrauth_thread_init_kernel(p);
 
-	if (likely(!(p->flags & (PF_KTHREAD | PF_IO_WORKER)))) {
+	if (likely(!(p->flags & (PF_KTHREAD | PF_USER_WORKER)))) {
 		*childregs = *current_pt_regs();
 		childregs->regs[0] = 0;
 
diff --git a/arch/csky/kernel/process.c b/arch/csky/kernel/process.c
index 3d0ca22cd0e2..509f2bfe4ace 100644
--- a/arch/csky/kernel/process.c
+++ b/arch/csky/kernel/process.c
@@ -49,7 +49,7 @@ int copy_thread(unsigned long clone_flags,
 	/* setup thread.sp for switch_to !!! */
 	p->thread.sp = (unsigned long)childstack;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(childregs, 0, sizeof(struct pt_regs));
 		childstack->r15 = (unsigned long) ret_from_kernel_thread;
 		childstack->r10 = kthread_arg;
diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
index 2ac27e4248a4..11baf058b6c5 100644
--- a/arch/h8300/kernel/process.c
+++ b/arch/h8300/kernel/process.c
@@ -112,7 +112,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 
 	childregs = (struct pt_regs *) (THREAD_SIZE + task_stack_page(p)) - 1;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(childregs, 0, sizeof(struct pt_regs));
 		childregs->retpc = (unsigned long) ret_from_kernel_thread;
 		childregs->er4 = topstk; /* arg */
diff --git a/arch/hexagon/kernel/process.c b/arch/hexagon/kernel/process.c
index 6a6835fb4242..f17573b66303 100644
--- a/arch/hexagon/kernel/process.c
+++ b/arch/hexagon/kernel/process.c
@@ -73,7 +73,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 						    sizeof(*ss));
 	ss->lr = (unsigned long)ret_from_fork;
 	p->thread.switch_sp = ss;
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(childregs, 0, sizeof(struct pt_regs));
 		/* r24 <- fn, r25 <- arg */
 		ss->r24 = usp;
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index e56d63f4abf9..4a58daa56af4 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -338,7 +338,7 @@ copy_thread(unsigned long clone_flags, unsigned long user_stack_base,
 
 	ia64_drop_fpu(p);	/* don't pick up stale state from a CPU's fph */
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		if (unlikely(!user_stack_base)) {
 			/* fork_idle() called us */
 			return 0;
diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c
index 1ab692b952cd..e7474a118410 100644
--- a/arch/m68k/kernel/process.c
+++ b/arch/m68k/kernel/process.c
@@ -157,7 +157,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 	 */
 	p->thread.fc = USER_DATA;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* kernel thread */
 		memset(frame, 0, sizeof(struct fork_frame));
 		frame->regs.sr = PS_S;
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 62aa237180b6..5b543be324d4 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -59,7 +59,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 	struct pt_regs *childregs = task_pt_regs(p);
 	struct thread_info *ti = task_thread_info(p);
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* if we're creating a new kernel thread then just zeroing all
 		 * the registers. That's OK for a brand new thread.*/
 		memset(childregs, 0, sizeof(struct pt_regs));
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 95aa86fa6077..d9ca11dd544f 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -120,7 +120,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 	/*  Put the stack after the struct pt_regs.  */
 	childksp = (unsigned long) childregs;
 	p->thread.cp0_status = (read_c0_status() & ~(ST0_CU2|ST0_CU1)) | ST0_KERNEL_CUMASK;
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* kernel thread */
 		unsigned long status = p->thread.cp0_status;
 		memset(childregs, 0, sizeof(struct pt_regs));
diff --git a/arch/nds32/kernel/process.c b/arch/nds32/kernel/process.c
index 391895b54d13..2dba51d1889c 100644
--- a/arch/nds32/kernel/process.c
+++ b/arch/nds32/kernel/process.c
@@ -156,7 +156,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 
 	memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context));
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(childregs, 0, sizeof(struct pt_regs));
 		/* kernel thread fn */
 		p->thread.cpu_context.r6 = stack_start;
diff --git a/arch/nios2/kernel/process.c b/arch/nios2/kernel/process.c
index 9ff37ba2bb60..ce6ad177da15 100644
--- a/arch/nios2/kernel/process.c
+++ b/arch/nios2/kernel/process.c
@@ -109,7 +109,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 	struct switch_stack *childstack =
 		((struct switch_stack *)childregs) - 1;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(childstack, 0,
 			sizeof(struct switch_stack) + sizeof(struct pt_regs));
 
diff --git a/arch/openrisc/kernel/process.c b/arch/openrisc/kernel/process.c
index b0698d9ce14f..d1d189c16676 100644
--- a/arch/openrisc/kernel/process.c
+++ b/arch/openrisc/kernel/process.c
@@ -172,7 +172,7 @@ copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 	sp -= sizeof(struct pt_regs);
 	kregs = (struct pt_regs *)sp;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(kregs, 0, sizeof(struct pt_regs));
 		kregs->gpr[20] = usp; /* fn, kernel thread */
 		kregs->gpr[22] = arg;
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index 38ec4ae81239..257bec7e67d4 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -197,7 +197,7 @@ copy_thread(unsigned long clone_flags, unsigned long usp,
 	extern void * const ret_from_kernel_thread;
 	extern void * const child_return;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* kernel thread */
 		memset(cregs, 0, sizeof(struct pt_regs));
 		if (!usp) /* idle thread */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 50436b52c213..817847723bff 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1700,7 +1700,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 	/* Copy registers */
 	sp -= sizeof(struct pt_regs);
 	childregs = (struct pt_regs *) sp;
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
 		childregs->gpr[1] = sp + sizeof(struct pt_regs);
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 03ac3aa611f5..8deeb94eb51e 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -125,7 +125,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 	struct pt_regs *childregs = task_pt_regs(p);
 
 	/* p->thread holds context to be restored by __switch_to() */
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* Kernel thread */
 		memset(childregs, 0, sizeof(struct pt_regs));
 		childregs->gp = gp_in_global;
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 350e94d0cac2..f596843ab55c 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -130,7 +130,7 @@ int copy_thread(unsigned long clone_flags, unsigned long new_stackp,
 	frame->sf.gprs[9] = (unsigned long)frame;
 
 	/* Store access registers to kernel stack of new process. */
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		/* kernel thread */
 		memset(&frame->childregs, 0, sizeof(struct pt_regs));
 		frame->childregs.psw.mask = PSW_KERNEL_BITS | PSW_MASK_DAT |
diff --git a/arch/sh/kernel/process_32.c b/arch/sh/kernel/process_32.c
index 717de05c81f4..e74906f53c3e 100644
--- a/arch/sh/kernel/process_32.c
+++ b/arch/sh/kernel/process_32.c
@@ -114,7 +114,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
 
 	childregs = task_pt_regs(p);
 	p->thread.sp = (unsigned long) childregs;
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(childregs, 0, sizeof(struct pt_regs));
 		p->thread.pc = (unsigned long) ret_from_kernel_thread;
 		childregs->regs[4] = arg;
diff --git a/arch/sparc/kernel/process_32.c b/arch/sparc/kernel/process_32.c
index bbbe0cfef746..978e0bc10ad4 100644
--- a/arch/sparc/kernel/process_32.c
+++ b/arch/sparc/kernel/process_32.c
@@ -296,7 +296,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
 	ti->ksp = (unsigned long) new_stack;
 	p->thread.kregs = childregs;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		extern int nwindows;
 		unsigned long psr;
 		memset(new_stack, 0, STACKFRAME_SZ + TRACEREG_SZ);
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index d1cc410d2f64..1c45cd5089f4 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -594,7 +594,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
 				       sizeof(struct sparc_stackf));
 	t->fpsaved[0] = 0;
 
-	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		memset(child_trap_frame, 0, child_stack_sz);
 		__thread_flag_byte_ptr(t)[TI_FLAG_BYTE_CWP] = 
 			(current_pt_regs()->tstate + 1) & TSTATE_CWP;
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 457a38db368b..2bc3141cbf01 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -157,7 +157,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
 		unsigned long arg, struct task_struct * p, unsigned long tls)
 {
 	void (*handler)(void);
-	int kthread = current->flags & (PF_KTHREAD | PF_IO_WORKER);
+	int kthread = current->flags & (PF_KTHREAD | PF_USER_WORKER);
 	int ret = 0;
 
 	p->thread = (struct thread_struct) INIT_THREAD;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 1d9463e3096b..d88be9dd5dfd 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -178,7 +178,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
 	task_user_gs(p) = get_user_gs(current_pt_regs());
 #endif
 
-	if (unlikely(p->flags & PF_IO_WORKER)) {
+	if (unlikely(p->flags & PF_USER_WORKER)) {
 		/*
 		 * An IO thread is a user space thread, but it doesn't
 		 * return to ret_after_fork().
diff --git a/arch/xtensa/kernel/process.c b/arch/xtensa/kernel/process.c
index 060165340612..61ad0bfbd7ea 100644
--- a/arch/xtensa/kernel/process.c
+++ b/arch/xtensa/kernel/process.c
@@ -217,7 +217,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp_thread_fn,
 
 	p->thread.sp = (unsigned long)childregs;
 
-	if (!(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
+	if (!(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
 		struct pt_regs *regs = current_pt_regs();
 		unsigned long usp = usp_thread_fn ?
 			usp_thread_fn : regs->areg[1];
diff --git a/include/linux/sched.h b/include/linux/sched.h
index c1a927ddec64..b1027e916be4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1665,6 +1665,7 @@ extern struct pid *cad_pid;
 #define PF_VCPU			0x00000001	/* I'm a virtual CPU */
 #define PF_IDLE			0x00000002	/* I am an IDLE thread */
 #define PF_EXITING		0x00000004	/* Getting shut down */
+#define PF_USER_WORKER		0x00000008	/* Kernel thread cloned from userspace thread */
 #define PF_IO_WORKER		0x00000010	/* Task is an IO worker */
 #define PF_WQ_WORKER		0x00000020	/* I'm a workqueue worker */
 #define PF_FORKNOEXEC		0x00000040	/* Forked but didn't exec */
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 48417c735438..53599a99d7e0 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -19,6 +19,7 @@ struct css_set;
 #define CLONE_LEGACY_FLAGS 0xffffffffULL
 
 #define KERN_WORKER_IO		BIT(0)
+#define KERN_WORKER_USER	BIT(1)
 
 struct kernel_clone_args {
 	u64 flags;
diff --git a/kernel/fork.c b/kernel/fork.c
index 3988106e9609..4f780424de46 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2035,6 +2035,8 @@ static __latent_entropy struct task_struct *copy_process(
 		siginitsetinv(&p->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
 	}
 
+	if (args->worker_flags & KERN_WORKER_USER)
+		p->flags |= PF_USER_WORKER;
 	/*
 	 * This _must_ happen before we call free_task(), i.e. before we jump
 	 * to any of the bad_fork_* labels. This is to avoid freeing
@@ -2526,7 +2528,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
 		.exit_signal	= (lower_32_bits(flags) & CSIGNAL),
 		.stack		= (unsigned long)fn,
 		.stack_size	= (unsigned long)arg,
-		.worker_flags	= KERN_WORKER_IO,
+		.worker_flags	= KERN_WORKER_IO | KERN_WORKER_USER,
 	};
 
 	return copy_process(NULL, 0, node, &args);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 3/8] fork: add option to not clone or dup files
  2021-10-07 21:44 ` Mike Christie
@ 2021-10-07 21:44   ` Mike Christie
  -1 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel

Each vhost device gets a thread that is used to perform IO and management
operations. Instead of a thread that is accessing a device, the thread is
part of the device, so when it calls the kernel_worker() function added in
the next patch we can't dup or clone the parent's files/FDS because it
would do an extra increment on ourself.

Later, when we do:

Qemu process exits:
        do_exit -> exit_files -> put_files_struct -> close_files

we would leak the device's resources because of that extra refcount
on the fd or file_struct.

This patch adds a no_files option so these worker threads can prevent
taking an extra refcount on themselves.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 include/linux/sched/task.h |  1 +
 kernel/fork.c              | 11 +++++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 53599a99d7e0..1153f9e5d10e 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -20,6 +20,7 @@ struct css_set;
 
 #define KERN_WORKER_IO		BIT(0)
 #define KERN_WORKER_USER	BIT(1)
+#define KERN_WORKER_NO_FILES	BIT(2)
 
 struct kernel_clone_args {
 	u64 flags;
diff --git a/kernel/fork.c b/kernel/fork.c
index 4f780424de46..3161edac1236 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1532,7 +1532,8 @@ static int copy_fs(unsigned long clone_flags, struct task_struct *tsk)
 	return 0;
 }
 
-static int copy_files(unsigned long clone_flags, struct task_struct *tsk)
+static int copy_files(unsigned long clone_flags, struct task_struct *tsk,
+		      int no_files)
 {
 	struct files_struct *oldf, *newf;
 	int error = 0;
@@ -1544,6 +1545,11 @@ static int copy_files(unsigned long clone_flags, struct task_struct *tsk)
 	if (!oldf)
 		goto out;
 
+	if (no_files) {
+		tsk->files = NULL;
+		goto out;
+	}
+
 	if (clone_flags & CLONE_FILES) {
 		atomic_inc(&oldf->count);
 		goto out;
@@ -2181,7 +2187,8 @@ static __latent_entropy struct task_struct *copy_process(
 	retval = copy_semundo(clone_flags, p);
 	if (retval)
 		goto bad_fork_cleanup_security;
-	retval = copy_files(clone_flags, p);
+	retval = copy_files(clone_flags, p,
+			    args->worker_flags & KERN_WORKER_NO_FILES);
 	if (retval)
 		goto bad_fork_cleanup_semundo;
 	retval = copy_fs(clone_flags, p);
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 3/8] fork: add option to not clone or dup files
@ 2021-10-07 21:44   ` Mike Christie
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel
  Cc: Mike Christie

Each vhost device gets a thread that is used to perform IO and management
operations. Instead of a thread that is accessing a device, the thread is
part of the device, so when it calls the kernel_worker() function added in
the next patch we can't dup or clone the parent's files/FDS because it
would do an extra increment on ourself.

Later, when we do:

Qemu process exits:
        do_exit -> exit_files -> put_files_struct -> close_files

we would leak the device's resources because of that extra refcount
on the fd or file_struct.

This patch adds a no_files option so these worker threads can prevent
taking an extra refcount on themselves.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 include/linux/sched/task.h |  1 +
 kernel/fork.c              | 11 +++++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 53599a99d7e0..1153f9e5d10e 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -20,6 +20,7 @@ struct css_set;
 
 #define KERN_WORKER_IO		BIT(0)
 #define KERN_WORKER_USER	BIT(1)
+#define KERN_WORKER_NO_FILES	BIT(2)
 
 struct kernel_clone_args {
 	u64 flags;
diff --git a/kernel/fork.c b/kernel/fork.c
index 4f780424de46..3161edac1236 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1532,7 +1532,8 @@ static int copy_fs(unsigned long clone_flags, struct task_struct *tsk)
 	return 0;
 }
 
-static int copy_files(unsigned long clone_flags, struct task_struct *tsk)
+static int copy_files(unsigned long clone_flags, struct task_struct *tsk,
+		      int no_files)
 {
 	struct files_struct *oldf, *newf;
 	int error = 0;
@@ -1544,6 +1545,11 @@ static int copy_files(unsigned long clone_flags, struct task_struct *tsk)
 	if (!oldf)
 		goto out;
 
+	if (no_files) {
+		tsk->files = NULL;
+		goto out;
+	}
+
 	if (clone_flags & CLONE_FILES) {
 		atomic_inc(&oldf->count);
 		goto out;
@@ -2181,7 +2187,8 @@ static __latent_entropy struct task_struct *copy_process(
 	retval = copy_semundo(clone_flags, p);
 	if (retval)
 		goto bad_fork_cleanup_security;
-	retval = copy_files(clone_flags, p);
+	retval = copy_files(clone_flags, p,
+			    args->worker_flags & KERN_WORKER_NO_FILES);
 	if (retval)
 		goto bad_fork_cleanup_semundo;
 	retval = copy_fs(clone_flags, p);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 4/8] fork: Add KERNEL_WORKER flag to ignore signals
  2021-10-07 21:44 ` Mike Christie
@ 2021-10-07 21:44   ` Mike Christie
  -1 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel

From: Christian Brauner <christian.brauner@ubuntu.com>

Since this is mirroring kthread's sig ignore api introduced in

commit 10ab825bdef8 ("change kernel threads to ignore signals instead of
blocking them")

this patch adds an option flag, KERNEL_WORKER_SIG_IGN, handled in
copy_process() after copy_sighand() and copy_signals() to ease the
transition from kthreads to this new api.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 include/linux/sched/task.h | 1 +
 kernel/fork.c              | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 1153f9e5d10e..5f3928fe0544 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -21,6 +21,7 @@ struct css_set;
 #define KERN_WORKER_IO		BIT(0)
 #define KERN_WORKER_USER	BIT(1)
 #define KERN_WORKER_NO_FILES	BIT(2)
+#define KERN_WORKER_SIG_IGN	BIT(3)
 
 struct kernel_clone_args {
 	u64 flags;
diff --git a/kernel/fork.c b/kernel/fork.c
index 3161edac1236..07f9e410fb64 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2213,6 +2213,9 @@ static __latent_entropy struct task_struct *copy_process(
 	if (retval)
 		goto bad_fork_cleanup_io;
 
+	if (args->worker_flags & KERN_WORKER_SIG_IGN)
+		ignore_signals(p);
+
 	stackleak_task_init(p);
 
 	if (pid != &init_struct_pid) {
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 4/8] fork: Add KERNEL_WORKER flag to ignore signals
@ 2021-10-07 21:44   ` Mike Christie
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel
  Cc: Mike Christie

From: Christian Brauner <christian.brauner@ubuntu.com>

Since this is mirroring kthread's sig ignore api introduced in

commit 10ab825bdef8 ("change kernel threads to ignore signals instead of
blocking them")

this patch adds an option flag, KERNEL_WORKER_SIG_IGN, handled in
copy_process() after copy_sighand() and copy_signals() to ease the
transition from kthreads to this new api.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 include/linux/sched/task.h | 1 +
 kernel/fork.c              | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 1153f9e5d10e..5f3928fe0544 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -21,6 +21,7 @@ struct css_set;
 #define KERN_WORKER_IO		BIT(0)
 #define KERN_WORKER_USER	BIT(1)
 #define KERN_WORKER_NO_FILES	BIT(2)
+#define KERN_WORKER_SIG_IGN	BIT(3)
 
 struct kernel_clone_args {
 	u64 flags;
diff --git a/kernel/fork.c b/kernel/fork.c
index 3161edac1236..07f9e410fb64 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2213,6 +2213,9 @@ static __latent_entropy struct task_struct *copy_process(
 	if (retval)
 		goto bad_fork_cleanup_io;
 
+	if (args->worker_flags & KERN_WORKER_SIG_IGN)
+		ignore_signals(p);
+
 	stackleak_task_init(p);
 
 	if (pid != &init_struct_pid) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 5/8] fork: add helper to clone a process
  2021-10-07 21:44 ` Mike Christie
@ 2021-10-07 21:44   ` Mike Christie
  -1 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel

The vhost layer has similar requirements as io_uring where its worker
threads need to access the userspace thread's memory, want to inherit the
parents's cgroups and namespaces, and be checked against the parent's
RLIMITs. Right now, the vhost layer uses the kthread API which has
kthread_use_mm for mem access, and those threads can use
cgroup_attach_task_all for v1 cgroups, but there are no helpers for the
other items.

This adds a helper to clone a process so we can inherit everything we
want in one call. It's a more generic version of create_io_thread which
will be used by the vhost layer and io_uring in later patches in this set.

[added flag validation code from Christian Brauner's patch]
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 include/linux/sched/task.h |  4 +++
 kernel/fork.c              | 71 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 5f3928fe0544..2bfc0629c868 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -89,6 +89,10 @@ extern void exit_itimers(struct signal_struct *);
 
 extern pid_t kernel_clone(struct kernel_clone_args *kargs);
 struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node);
+struct task_struct *kernel_worker(int (*fn)(void *), void *arg, int node,
+				  unsigned long clone_flags, u32 worker_flags);
+__printf(2, 3)
+void kernel_worker_start(struct task_struct *tsk, const char namefmt[], ...);
 struct task_struct *fork_idle(int);
 struct mm_struct *copy_init_mm(void);
 extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
diff --git a/kernel/fork.c b/kernel/fork.c
index 07f9e410fb64..b04e61a965e2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2544,6 +2544,77 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
 	return copy_process(NULL, 0, node, &args);
 }
 
+static bool kernel_worker_flags_valid(struct kernel_clone_args *kargs)
+{
+	/* Verify that no unknown flags are passed along. */
+	if (kargs->worker_flags & ~(KERN_WORKER_IO | KERN_WORKER_USER |
+				    KERN_WORKER_NO_FILES | KERN_WORKER_SIG_IGN))
+		return false;
+
+	/*
+	 * If we're ignoring all signals don't allow sharing struct sighand and
+	 * don't bother clearing signal handlers.
+	 */
+	if ((kargs->flags & (CLONE_SIGHAND | CLONE_CLEAR_SIGHAND)) &&
+	    (kargs->worker_flags & KERN_WORKER_SIG_IGN))
+		return false;
+
+	return true;
+}
+
+/**
+ * kernel_worker - create a copy of a process to be used by the kernel
+ * @fn: thread stack
+ * @arg: data to be passed to fn
+ * @node: numa node to allocate task from
+ * @clone_flags: CLONE flags
+ * @worker_flags: KERN_WORKER flags
+ *
+ * This returns a created task, or an error pointer. The returned task is
+ * inactive, and the caller must fire it up through kernel_worker_start(). If
+ * this is an PF_IO_WORKER all singals but KILL and STOP are blocked.
+ */
+struct task_struct *kernel_worker(int (*fn)(void *), void *arg, int node,
+				  unsigned long clone_flags, u32 worker_flags)
+{
+	struct kernel_clone_args args = {
+		.flags		= ((lower_32_bits(clone_flags) | CLONE_VM |
+				   CLONE_UNTRACED) & ~CSIGNAL),
+		.exit_signal	= (lower_32_bits(clone_flags) & CSIGNAL),
+		.stack		= (unsigned long)fn,
+		.stack_size	= (unsigned long)arg,
+		.worker_flags	= KERN_WORKER_USER | worker_flags,
+	};
+
+	if (!kernel_worker_flags_valid(&args))
+		return ERR_PTR(-EINVAL);
+
+	return copy_process(NULL, 0, node, &args);
+}
+EXPORT_SYMBOL_GPL(kernel_worker);
+
+/**
+ * kernel_worker_start - Start a task created with kernel_worker
+ * @tsk: task to wake up
+ * @namefmt: printf-style format string for the thread name
+ * @arg: arguments for @namefmt
+ */
+void kernel_worker_start(struct task_struct *tsk, const char namefmt[], ...)
+{
+	char name[TASK_COMM_LEN];
+	va_list args;
+
+	WARN_ON(!(tsk->flags & PF_USER_WORKER));
+
+	va_start(args, namefmt);
+	vsnprintf(name, sizeof(name), namefmt, args);
+	set_task_comm(tsk, name);
+	va_end(args);
+
+	wake_up_new_task(tsk);
+}
+EXPORT_SYMBOL_GPL(kernel_worker_start);
+
 /*
  *  Ok, this is the main fork-routine.
  *
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 5/8] fork: add helper to clone a process
@ 2021-10-07 21:44   ` Mike Christie
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel
  Cc: Mike Christie

The vhost layer has similar requirements as io_uring where its worker
threads need to access the userspace thread's memory, want to inherit the
parents's cgroups and namespaces, and be checked against the parent's
RLIMITs. Right now, the vhost layer uses the kthread API which has
kthread_use_mm for mem access, and those threads can use
cgroup_attach_task_all for v1 cgroups, but there are no helpers for the
other items.

This adds a helper to clone a process so we can inherit everything we
want in one call. It's a more generic version of create_io_thread which
will be used by the vhost layer and io_uring in later patches in this set.

[added flag validation code from Christian Brauner's patch]
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 include/linux/sched/task.h |  4 +++
 kernel/fork.c              | 71 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 5f3928fe0544..2bfc0629c868 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -89,6 +89,10 @@ extern void exit_itimers(struct signal_struct *);
 
 extern pid_t kernel_clone(struct kernel_clone_args *kargs);
 struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node);
+struct task_struct *kernel_worker(int (*fn)(void *), void *arg, int node,
+				  unsigned long clone_flags, u32 worker_flags);
+__printf(2, 3)
+void kernel_worker_start(struct task_struct *tsk, const char namefmt[], ...);
 struct task_struct *fork_idle(int);
 struct mm_struct *copy_init_mm(void);
 extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
diff --git a/kernel/fork.c b/kernel/fork.c
index 07f9e410fb64..b04e61a965e2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2544,6 +2544,77 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
 	return copy_process(NULL, 0, node, &args);
 }
 
+static bool kernel_worker_flags_valid(struct kernel_clone_args *kargs)
+{
+	/* Verify that no unknown flags are passed along. */
+	if (kargs->worker_flags & ~(KERN_WORKER_IO | KERN_WORKER_USER |
+				    KERN_WORKER_NO_FILES | KERN_WORKER_SIG_IGN))
+		return false;
+
+	/*
+	 * If we're ignoring all signals don't allow sharing struct sighand and
+	 * don't bother clearing signal handlers.
+	 */
+	if ((kargs->flags & (CLONE_SIGHAND | CLONE_CLEAR_SIGHAND)) &&
+	    (kargs->worker_flags & KERN_WORKER_SIG_IGN))
+		return false;
+
+	return true;
+}
+
+/**
+ * kernel_worker - create a copy of a process to be used by the kernel
+ * @fn: thread stack
+ * @arg: data to be passed to fn
+ * @node: numa node to allocate task from
+ * @clone_flags: CLONE flags
+ * @worker_flags: KERN_WORKER flags
+ *
+ * This returns a created task, or an error pointer. The returned task is
+ * inactive, and the caller must fire it up through kernel_worker_start(). If
+ * this is an PF_IO_WORKER all singals but KILL and STOP are blocked.
+ */
+struct task_struct *kernel_worker(int (*fn)(void *), void *arg, int node,
+				  unsigned long clone_flags, u32 worker_flags)
+{
+	struct kernel_clone_args args = {
+		.flags		= ((lower_32_bits(clone_flags) | CLONE_VM |
+				   CLONE_UNTRACED) & ~CSIGNAL),
+		.exit_signal	= (lower_32_bits(clone_flags) & CSIGNAL),
+		.stack		= (unsigned long)fn,
+		.stack_size	= (unsigned long)arg,
+		.worker_flags	= KERN_WORKER_USER | worker_flags,
+	};
+
+	if (!kernel_worker_flags_valid(&args))
+		return ERR_PTR(-EINVAL);
+
+	return copy_process(NULL, 0, node, &args);
+}
+EXPORT_SYMBOL_GPL(kernel_worker);
+
+/**
+ * kernel_worker_start - Start a task created with kernel_worker
+ * @tsk: task to wake up
+ * @namefmt: printf-style format string for the thread name
+ * @arg: arguments for @namefmt
+ */
+void kernel_worker_start(struct task_struct *tsk, const char namefmt[], ...)
+{
+	char name[TASK_COMM_LEN];
+	va_list args;
+
+	WARN_ON(!(tsk->flags & PF_USER_WORKER));
+
+	va_start(args, namefmt);
+	vsnprintf(name, sizeof(name), namefmt, args);
+	set_task_comm(tsk, name);
+	va_end(args);
+
+	wake_up_new_task(tsk);
+}
+EXPORT_SYMBOL_GPL(kernel_worker_start);
+
 /*
  *  Ok, this is the main fork-routine.
  *
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 6/8] io_uring: switch to kernel_worker
  2021-10-07 21:44 ` Mike Christie
@ 2021-10-07 21:44   ` Mike Christie
  -1 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel

Convert io_uring and io-wq to use kernel_worker.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
---

Note: To avoid patch application conflicts this patch was made over
linux-next which has Jens Axboe's block tree's for-next branch and Paul
Moore's selinux tree's next branch because:

commit: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit
support to io_uring")

from Paul's tree had conflicts.

 fs/io-wq.c                 | 15 ++++++++-------
 fs/io_uring.c              | 11 +++++------
 include/linux/sched/task.h |  1 -
 3 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index 962952951126..32b140754496 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -70,6 +70,9 @@ struct io_worker {
 
 #define IO_WQ_NR_HASH_BUCKETS	(1u << IO_WQ_HASH_ORDER)
 
+#define IO_WQ_CLONE_FLAGS	(CLONE_FS | CLONE_FILES | CLONE_SIGHAND | \
+				 CLONE_THREAD | CLONE_IO)
+
 struct io_wqe_acct {
 	unsigned nr_workers;
 	unsigned max_workers;
@@ -550,13 +553,9 @@ static int io_wqe_worker(void *data)
 	struct io_wqe *wqe = worker->wqe;
 	struct io_wq *wq = wqe->wq;
 	bool last_timeout = false;
-	char buf[TASK_COMM_LEN];
 
 	worker->flags |= (IO_WORKER_F_UP | IO_WORKER_F_RUNNING);
 
-	snprintf(buf, sizeof(buf), "iou-wrk-%d", wq->task->pid);
-	set_task_comm(current, buf);
-
 	audit_alloc_kernel(current);
 
 	while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
@@ -654,7 +653,7 @@ static void io_init_new_worker(struct io_wqe *wqe, struct io_worker *worker,
 	list_add_tail_rcu(&worker->all_list, &wqe->all_list);
 	worker->flags |= IO_WORKER_F_FREE;
 	raw_spin_unlock(&wqe->lock);
-	wake_up_new_task(tsk);
+	kernel_worker_start(tsk, "iou-wrk-%d", wqe->wq->task->pid);
 }
 
 static bool io_wq_work_match_all(struct io_wq_work *work, void *data)
@@ -684,7 +683,8 @@ static void create_worker_cont(struct callback_head *cb)
 	worker = container_of(cb, struct io_worker, create_work);
 	clear_bit_unlock(0, &worker->create_state);
 	wqe = worker->wqe;
-	tsk = create_io_thread(io_wqe_worker, worker, wqe->node);
+	tsk = kernel_worker(io_wqe_worker, worker, wqe->node,
+			    IO_WQ_CLONE_FLAGS, KERN_WORKER_IO);
 	if (!IS_ERR(tsk)) {
 		io_init_new_worker(wqe, worker, tsk);
 		io_worker_release(worker);
@@ -754,7 +754,8 @@ static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
 	if (index == IO_WQ_ACCT_BOUND)
 		worker->flags |= IO_WORKER_F_BOUND;
 
-	tsk = create_io_thread(io_wqe_worker, worker, wqe->node);
+	tsk = kernel_worker(io_wqe_worker, worker, wqe->node, IO_WQ_CLONE_FLAGS,
+			    KERN_WORKER_IO);
 	if (!IS_ERR(tsk)) {
 		io_init_new_worker(wqe, worker, tsk);
 	} else if (!io_should_retry_thread(PTR_ERR(tsk))) {
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 488aa14da287..5e21ae6d4ff4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7440,12 +7440,8 @@ static int io_sq_thread(void *data)
 	struct io_sq_data *sqd = data;
 	struct io_ring_ctx *ctx;
 	unsigned long timeout = 0;
-	char buf[TASK_COMM_LEN];
 	DEFINE_WAIT(wait);
 
-	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
-	set_task_comm(current, buf);
-
 	if (sqd->sq_cpu != -1)
 		set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
 	else
@@ -8669,6 +8665,8 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 		fdput(f);
 	}
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
+		unsigned long flags = CLONE_FS | CLONE_FILES | CLONE_SIGHAND |
+					CLONE_THREAD | CLONE_IO;
 		struct task_struct *tsk;
 		struct io_sq_data *sqd;
 		bool attached;
@@ -8710,7 +8708,8 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 
 		sqd->task_pid = current->pid;
 		sqd->task_tgid = current->tgid;
-		tsk = create_io_thread(io_sq_thread, sqd, NUMA_NO_NODE);
+		tsk = kernel_worker(io_sq_thread, sqd, NUMA_NO_NODE,
+				    flags, KERN_WORKER_IO);
 		if (IS_ERR(tsk)) {
 			ret = PTR_ERR(tsk);
 			goto err_sqpoll;
@@ -8718,7 +8717,7 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 
 		sqd->thread = tsk;
 		ret = io_uring_alloc_task_context(tsk, ctx);
-		wake_up_new_task(tsk);
+		kernel_worker_start(tsk, "iou-sqp-%d", sqd->task_pid);
 		if (ret)
 			goto err;
 	} else if (p->flags & IORING_SETUP_SQ_AFF) {
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 2bfc0629c868..c8ab90e28f11 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -88,7 +88,6 @@ extern void exit_files(struct task_struct *);
 extern void exit_itimers(struct signal_struct *);
 
 extern pid_t kernel_clone(struct kernel_clone_args *kargs);
-struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node);
 struct task_struct *kernel_worker(int (*fn)(void *), void *arg, int node,
 				  unsigned long clone_flags, u32 worker_flags);
 __printf(2, 3)
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 6/8] io_uring: switch to kernel_worker
@ 2021-10-07 21:44   ` Mike Christie
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel
  Cc: Mike Christie

Convert io_uring and io-wq to use kernel_worker.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
---

Note: To avoid patch application conflicts this patch was made over
linux-next which has Jens Axboe's block tree's for-next branch and Paul
Moore's selinux tree's next branch because:

commit: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit
support to io_uring")

from Paul's tree had conflicts.

 fs/io-wq.c                 | 15 ++++++++-------
 fs/io_uring.c              | 11 +++++------
 include/linux/sched/task.h |  1 -
 3 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index 962952951126..32b140754496 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -70,6 +70,9 @@ struct io_worker {
 
 #define IO_WQ_NR_HASH_BUCKETS	(1u << IO_WQ_HASH_ORDER)
 
+#define IO_WQ_CLONE_FLAGS	(CLONE_FS | CLONE_FILES | CLONE_SIGHAND | \
+				 CLONE_THREAD | CLONE_IO)
+
 struct io_wqe_acct {
 	unsigned nr_workers;
 	unsigned max_workers;
@@ -550,13 +553,9 @@ static int io_wqe_worker(void *data)
 	struct io_wqe *wqe = worker->wqe;
 	struct io_wq *wq = wqe->wq;
 	bool last_timeout = false;
-	char buf[TASK_COMM_LEN];
 
 	worker->flags |= (IO_WORKER_F_UP | IO_WORKER_F_RUNNING);
 
-	snprintf(buf, sizeof(buf), "iou-wrk-%d", wq->task->pid);
-	set_task_comm(current, buf);
-
 	audit_alloc_kernel(current);
 
 	while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
@@ -654,7 +653,7 @@ static void io_init_new_worker(struct io_wqe *wqe, struct io_worker *worker,
 	list_add_tail_rcu(&worker->all_list, &wqe->all_list);
 	worker->flags |= IO_WORKER_F_FREE;
 	raw_spin_unlock(&wqe->lock);
-	wake_up_new_task(tsk);
+	kernel_worker_start(tsk, "iou-wrk-%d", wqe->wq->task->pid);
 }
 
 static bool io_wq_work_match_all(struct io_wq_work *work, void *data)
@@ -684,7 +683,8 @@ static void create_worker_cont(struct callback_head *cb)
 	worker = container_of(cb, struct io_worker, create_work);
 	clear_bit_unlock(0, &worker->create_state);
 	wqe = worker->wqe;
-	tsk = create_io_thread(io_wqe_worker, worker, wqe->node);
+	tsk = kernel_worker(io_wqe_worker, worker, wqe->node,
+			    IO_WQ_CLONE_FLAGS, KERN_WORKER_IO);
 	if (!IS_ERR(tsk)) {
 		io_init_new_worker(wqe, worker, tsk);
 		io_worker_release(worker);
@@ -754,7 +754,8 @@ static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
 	if (index == IO_WQ_ACCT_BOUND)
 		worker->flags |= IO_WORKER_F_BOUND;
 
-	tsk = create_io_thread(io_wqe_worker, worker, wqe->node);
+	tsk = kernel_worker(io_wqe_worker, worker, wqe->node, IO_WQ_CLONE_FLAGS,
+			    KERN_WORKER_IO);
 	if (!IS_ERR(tsk)) {
 		io_init_new_worker(wqe, worker, tsk);
 	} else if (!io_should_retry_thread(PTR_ERR(tsk))) {
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 488aa14da287..5e21ae6d4ff4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7440,12 +7440,8 @@ static int io_sq_thread(void *data)
 	struct io_sq_data *sqd = data;
 	struct io_ring_ctx *ctx;
 	unsigned long timeout = 0;
-	char buf[TASK_COMM_LEN];
 	DEFINE_WAIT(wait);
 
-	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
-	set_task_comm(current, buf);
-
 	if (sqd->sq_cpu != -1)
 		set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
 	else
@@ -8669,6 +8665,8 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 		fdput(f);
 	}
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
+		unsigned long flags = CLONE_FS | CLONE_FILES | CLONE_SIGHAND |
+					CLONE_THREAD | CLONE_IO;
 		struct task_struct *tsk;
 		struct io_sq_data *sqd;
 		bool attached;
@@ -8710,7 +8708,8 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 
 		sqd->task_pid = current->pid;
 		sqd->task_tgid = current->tgid;
-		tsk = create_io_thread(io_sq_thread, sqd, NUMA_NO_NODE);
+		tsk = kernel_worker(io_sq_thread, sqd, NUMA_NO_NODE,
+				    flags, KERN_WORKER_IO);
 		if (IS_ERR(tsk)) {
 			ret = PTR_ERR(tsk);
 			goto err_sqpoll;
@@ -8718,7 +8717,7 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 
 		sqd->thread = tsk;
 		ret = io_uring_alloc_task_context(tsk, ctx);
-		wake_up_new_task(tsk);
+		kernel_worker_start(tsk, "iou-sqp-%d", sqd->task_pid);
 		if (ret)
 			goto err;
 	} else if (p->flags & IORING_SETUP_SQ_AFF) {
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 2bfc0629c868..c8ab90e28f11 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -88,7 +88,6 @@ extern void exit_files(struct task_struct *);
 extern void exit_itimers(struct signal_struct *);
 
 extern pid_t kernel_clone(struct kernel_clone_args *kargs);
-struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node);
 struct task_struct *kernel_worker(int (*fn)(void *), void *arg, int node,
 				  unsigned long clone_flags, u32 worker_flags);
 __printf(2, 3)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 7/8] vhost: move worker thread fields to new struct
  2021-10-07 21:44 ` Mike Christie
@ 2021-10-07 21:44   ` Mike Christie
  -1 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel

This is just a prep patch. It moves the worker related fields to a new
vhost_worker struct and moves the code around to create some helpers that
will be used in the next patches.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/vhost/vhost.c | 98 ++++++++++++++++++++++++++++---------------
 drivers/vhost/vhost.h | 11 +++--
 2 files changed, 72 insertions(+), 37 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe2..c9a1f706989c 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -263,8 +263,8 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 		 * sure it was not in the list.
 		 * test_and_set_bit() implies a memory barrier.
 		 */
-		llist_add(&work->node, &dev->work_list);
-		wake_up_process(dev->worker);
+		llist_add(&work->node, &dev->worker->work_list);
+		wake_up_process(dev->worker->task);
 	}
 }
 EXPORT_SYMBOL_GPL(vhost_work_queue);
@@ -272,7 +272,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue);
 /* A lockless hint for busy polling code to exit the loop */
 bool vhost_has_work(struct vhost_dev *dev)
 {
-	return !llist_empty(&dev->work_list);
+	return dev->worker && !llist_empty(&dev->worker->work_list);
 }
 EXPORT_SYMBOL_GPL(vhost_has_work);
 
@@ -343,7 +343,8 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 
 static int vhost_worker(void *data)
 {
-	struct vhost_dev *dev = data;
+	struct vhost_worker *worker = data;
+	struct vhost_dev *dev = worker->dev;
 	struct vhost_work *work, *work_next;
 	struct llist_node *node;
 
@@ -358,7 +359,7 @@ static int vhost_worker(void *data)
 			break;
 		}
 
-		node = llist_del_all(&dev->work_list);
+		node = llist_del_all(&worker->work_list);
 		if (!node)
 			schedule();
 
@@ -368,7 +369,7 @@ static int vhost_worker(void *data)
 		llist_for_each_entry_safe(work, work_next, node, node) {
 			clear_bit(VHOST_WORK_QUEUED, &work->flags);
 			__set_current_state(TASK_RUNNING);
-			kcov_remote_start_common(dev->kcov_handle);
+			kcov_remote_start_common(worker->kcov_handle);
 			work->fn(work);
 			kcov_remote_stop();
 			if (need_resched())
@@ -487,7 +488,6 @@ void vhost_dev_init(struct vhost_dev *dev,
 	dev->byte_weight = byte_weight;
 	dev->use_worker = use_worker;
 	dev->msg_handler = msg_handler;
-	init_llist_head(&dev->work_list);
 	init_waitqueue_head(&dev->wait);
 	INIT_LIST_HEAD(&dev->read_list);
 	INIT_LIST_HEAD(&dev->pending_list);
@@ -579,10 +579,60 @@ static void vhost_detach_mm(struct vhost_dev *dev)
 	dev->mm = NULL;
 }
 
+static void vhost_worker_free(struct vhost_dev *dev)
+{
+	struct vhost_worker *worker = dev->worker;
+
+	if (!worker)
+		return;
+
+	dev->worker = NULL;
+	WARN_ON(!llist_empty(&worker->work_list));
+	kthread_stop(worker->task);
+	kfree(worker);
+}
+
+static int vhost_worker_create(struct vhost_dev *dev)
+{
+	struct vhost_worker *worker;
+	struct task_struct *task;
+	int ret;
+
+	worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
+	if (!worker)
+		return -ENOMEM;
+
+	dev->worker = worker;
+	worker->dev = dev;
+	worker->kcov_handle = kcov_common_handle();
+	init_llist_head(&worker->work_list);
+
+	task = kthread_create(vhost_worker, worker, "vhost-%d", current->pid);
+	if (IS_ERR(task)) {
+		ret = PTR_ERR(task);
+		goto free_worker;
+	}
+
+	worker->task = task;
+	wake_up_process(task); /* avoid contributing to loadavg */
+
+	ret = vhost_attach_cgroups(dev);
+	if (ret)
+		goto stop_worker;
+
+	return 0;
+
+stop_worker:
+	kthread_stop(worker->task);
+free_worker:
+	kfree(worker);
+	dev->worker = NULL;
+	return ret;
+}
+
 /* Caller should have device mutex */
 long vhost_dev_set_owner(struct vhost_dev *dev)
 {
-	struct task_struct *worker;
 	int err;
 
 	/* Is there an owner already? */
@@ -593,36 +643,21 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 
 	vhost_attach_mm(dev);
 
-	dev->kcov_handle = kcov_common_handle();
 	if (dev->use_worker) {
-		worker = kthread_create(vhost_worker, dev,
-					"vhost-%d", current->pid);
-		if (IS_ERR(worker)) {
-			err = PTR_ERR(worker);
-			goto err_worker;
-		}
-
-		dev->worker = worker;
-		wake_up_process(worker); /* avoid contributing to loadavg */
-
-		err = vhost_attach_cgroups(dev);
+		err = vhost_worker_create(dev);
 		if (err)
-			goto err_cgroup;
+			goto err_worker;
 	}
 
 	err = vhost_dev_alloc_iovecs(dev);
 	if (err)
-		goto err_cgroup;
+		goto err_iovecs;
 
 	return 0;
-err_cgroup:
-	if (dev->worker) {
-		kthread_stop(dev->worker);
-		dev->worker = NULL;
-	}
+err_iovecs:
+	vhost_worker_free(dev);
 err_worker:
 	vhost_detach_mm(dev);
-	dev->kcov_handle = 0;
 err_mm:
 	return err;
 }
@@ -712,12 +747,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 	dev->iotlb = NULL;
 	vhost_clear_msg(dev);
 	wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
-	WARN_ON(!llist_empty(&dev->work_list));
-	if (dev->worker) {
-		kthread_stop(dev->worker);
-		dev->worker = NULL;
-		dev->kcov_handle = 0;
-	}
+	vhost_worker_free(dev);
 	vhost_detach_mm(dev);
 }
 EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 638bb640d6b4..102ce25e4e13 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -25,6 +25,13 @@ struct vhost_work {
 	unsigned long		flags;
 };
 
+struct vhost_worker {
+	struct task_struct	*task;
+	struct llist_head	work_list;
+	struct vhost_dev	*dev;
+	u64			kcov_handle;
+};
+
 /* Poll a file (eventfd or socket) */
 /* Note: there's nothing vhost specific about this structure. */
 struct vhost_poll {
@@ -148,8 +155,7 @@ struct vhost_dev {
 	struct vhost_virtqueue **vqs;
 	int nvqs;
 	struct eventfd_ctx *log_ctx;
-	struct llist_head work_list;
-	struct task_struct *worker;
+	struct vhost_worker *worker;
 	struct vhost_iotlb *umem;
 	struct vhost_iotlb *iotlb;
 	spinlock_t iotlb_lock;
@@ -159,7 +165,6 @@ struct vhost_dev {
 	int iov_limit;
 	int weight;
 	int byte_weight;
-	u64 kcov_handle;
 	bool use_worker;
 	int (*msg_handler)(struct vhost_dev *dev,
 			   struct vhost_iotlb_msg *msg);
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 7/8] vhost: move worker thread fields to new struct
@ 2021-10-07 21:44   ` Mike Christie
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel
  Cc: Mike Christie

This is just a prep patch. It moves the worker related fields to a new
vhost_worker struct and moves the code around to create some helpers that
will be used in the next patches.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/vhost/vhost.c | 98 ++++++++++++++++++++++++++++---------------
 drivers/vhost/vhost.h | 11 +++--
 2 files changed, 72 insertions(+), 37 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 59edb5a1ffe2..c9a1f706989c 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -263,8 +263,8 @@ void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
 		 * sure it was not in the list.
 		 * test_and_set_bit() implies a memory barrier.
 		 */
-		llist_add(&work->node, &dev->work_list);
-		wake_up_process(dev->worker);
+		llist_add(&work->node, &dev->worker->work_list);
+		wake_up_process(dev->worker->task);
 	}
 }
 EXPORT_SYMBOL_GPL(vhost_work_queue);
@@ -272,7 +272,7 @@ EXPORT_SYMBOL_GPL(vhost_work_queue);
 /* A lockless hint for busy polling code to exit the loop */
 bool vhost_has_work(struct vhost_dev *dev)
 {
-	return !llist_empty(&dev->work_list);
+	return dev->worker && !llist_empty(&dev->worker->work_list);
 }
 EXPORT_SYMBOL_GPL(vhost_has_work);
 
@@ -343,7 +343,8 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 
 static int vhost_worker(void *data)
 {
-	struct vhost_dev *dev = data;
+	struct vhost_worker *worker = data;
+	struct vhost_dev *dev = worker->dev;
 	struct vhost_work *work, *work_next;
 	struct llist_node *node;
 
@@ -358,7 +359,7 @@ static int vhost_worker(void *data)
 			break;
 		}
 
-		node = llist_del_all(&dev->work_list);
+		node = llist_del_all(&worker->work_list);
 		if (!node)
 			schedule();
 
@@ -368,7 +369,7 @@ static int vhost_worker(void *data)
 		llist_for_each_entry_safe(work, work_next, node, node) {
 			clear_bit(VHOST_WORK_QUEUED, &work->flags);
 			__set_current_state(TASK_RUNNING);
-			kcov_remote_start_common(dev->kcov_handle);
+			kcov_remote_start_common(worker->kcov_handle);
 			work->fn(work);
 			kcov_remote_stop();
 			if (need_resched())
@@ -487,7 +488,6 @@ void vhost_dev_init(struct vhost_dev *dev,
 	dev->byte_weight = byte_weight;
 	dev->use_worker = use_worker;
 	dev->msg_handler = msg_handler;
-	init_llist_head(&dev->work_list);
 	init_waitqueue_head(&dev->wait);
 	INIT_LIST_HEAD(&dev->read_list);
 	INIT_LIST_HEAD(&dev->pending_list);
@@ -579,10 +579,60 @@ static void vhost_detach_mm(struct vhost_dev *dev)
 	dev->mm = NULL;
 }
 
+static void vhost_worker_free(struct vhost_dev *dev)
+{
+	struct vhost_worker *worker = dev->worker;
+
+	if (!worker)
+		return;
+
+	dev->worker = NULL;
+	WARN_ON(!llist_empty(&worker->work_list));
+	kthread_stop(worker->task);
+	kfree(worker);
+}
+
+static int vhost_worker_create(struct vhost_dev *dev)
+{
+	struct vhost_worker *worker;
+	struct task_struct *task;
+	int ret;
+
+	worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
+	if (!worker)
+		return -ENOMEM;
+
+	dev->worker = worker;
+	worker->dev = dev;
+	worker->kcov_handle = kcov_common_handle();
+	init_llist_head(&worker->work_list);
+
+	task = kthread_create(vhost_worker, worker, "vhost-%d", current->pid);
+	if (IS_ERR(task)) {
+		ret = PTR_ERR(task);
+		goto free_worker;
+	}
+
+	worker->task = task;
+	wake_up_process(task); /* avoid contributing to loadavg */
+
+	ret = vhost_attach_cgroups(dev);
+	if (ret)
+		goto stop_worker;
+
+	return 0;
+
+stop_worker:
+	kthread_stop(worker->task);
+free_worker:
+	kfree(worker);
+	dev->worker = NULL;
+	return ret;
+}
+
 /* Caller should have device mutex */
 long vhost_dev_set_owner(struct vhost_dev *dev)
 {
-	struct task_struct *worker;
 	int err;
 
 	/* Is there an owner already? */
@@ -593,36 +643,21 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 
 	vhost_attach_mm(dev);
 
-	dev->kcov_handle = kcov_common_handle();
 	if (dev->use_worker) {
-		worker = kthread_create(vhost_worker, dev,
-					"vhost-%d", current->pid);
-		if (IS_ERR(worker)) {
-			err = PTR_ERR(worker);
-			goto err_worker;
-		}
-
-		dev->worker = worker;
-		wake_up_process(worker); /* avoid contributing to loadavg */
-
-		err = vhost_attach_cgroups(dev);
+		err = vhost_worker_create(dev);
 		if (err)
-			goto err_cgroup;
+			goto err_worker;
 	}
 
 	err = vhost_dev_alloc_iovecs(dev);
 	if (err)
-		goto err_cgroup;
+		goto err_iovecs;
 
 	return 0;
-err_cgroup:
-	if (dev->worker) {
-		kthread_stop(dev->worker);
-		dev->worker = NULL;
-	}
+err_iovecs:
+	vhost_worker_free(dev);
 err_worker:
 	vhost_detach_mm(dev);
-	dev->kcov_handle = 0;
 err_mm:
 	return err;
 }
@@ -712,12 +747,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 	dev->iotlb = NULL;
 	vhost_clear_msg(dev);
 	wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
-	WARN_ON(!llist_empty(&dev->work_list));
-	if (dev->worker) {
-		kthread_stop(dev->worker);
-		dev->worker = NULL;
-		dev->kcov_handle = 0;
-	}
+	vhost_worker_free(dev);
 	vhost_detach_mm(dev);
 }
 EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 638bb640d6b4..102ce25e4e13 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -25,6 +25,13 @@ struct vhost_work {
 	unsigned long		flags;
 };
 
+struct vhost_worker {
+	struct task_struct	*task;
+	struct llist_head	work_list;
+	struct vhost_dev	*dev;
+	u64			kcov_handle;
+};
+
 /* Poll a file (eventfd or socket) */
 /* Note: there's nothing vhost specific about this structure. */
 struct vhost_poll {
@@ -148,8 +155,7 @@ struct vhost_dev {
 	struct vhost_virtqueue **vqs;
 	int nvqs;
 	struct eventfd_ctx *log_ctx;
-	struct llist_head work_list;
-	struct task_struct *worker;
+	struct vhost_worker *worker;
 	struct vhost_iotlb *umem;
 	struct vhost_iotlb *iotlb;
 	spinlock_t iotlb_lock;
@@ -159,7 +165,6 @@ struct vhost_dev {
 	int iov_limit;
 	int weight;
 	int byte_weight;
-	u64 kcov_handle;
 	bool use_worker;
 	int (*msg_handler)(struct vhost_dev *dev,
 			   struct vhost_iotlb_msg *msg);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 8/8] vhost: use kernel_worker to check RLIMITs and inherit v2 cgroups
  2021-10-07 21:44 ` Mike Christie
@ 2021-10-07 21:44   ` Mike Christie
  -1 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel

For vhost workers we use the kthread API which inherit's its values from
and checks against the kthreadd thread. This results in cgroups v2 not
working and the wrong RLIMITs being checked. This patch has us use the
kernel_copy_process function which will inherit its values/checks from the
thread that owns the device.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/vhost/vhost.c | 65 +++++++++++++++----------------------------
 drivers/vhost/vhost.h |  7 ++++-
 2 files changed, 28 insertions(+), 44 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c9a1f706989c..9aa04fcdf210 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -22,7 +22,6 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/kthread.h>
-#include <linux/cgroup.h>
 #include <linux/module.h>
 #include <linux/sort.h>
 #include <linux/sched/mm.h>
@@ -344,17 +343,14 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 static int vhost_worker(void *data)
 {
 	struct vhost_worker *worker = data;
-	struct vhost_dev *dev = worker->dev;
 	struct vhost_work *work, *work_next;
 	struct llist_node *node;
 
-	kthread_use_mm(dev->mm);
-
 	for (;;) {
 		/* mb paired w/ kthread_stop */
 		set_current_state(TASK_INTERRUPTIBLE);
 
-		if (kthread_should_stop()) {
+		if (test_bit(VHOST_WORKER_FLAG_STOP, &worker->flags)) {
 			__set_current_state(TASK_RUNNING);
 			break;
 		}
@@ -376,8 +372,9 @@ static int vhost_worker(void *data)
 				schedule();
 		}
 	}
-	kthread_unuse_mm(dev->mm);
-	return 0;
+
+	complete(worker->exit_done);
+	do_exit(0);
 }
 
 static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
@@ -517,31 +514,6 @@ long vhost_dev_check_owner(struct vhost_dev *dev)
 }
 EXPORT_SYMBOL_GPL(vhost_dev_check_owner);
 
-struct vhost_attach_cgroups_struct {
-	struct vhost_work work;
-	struct task_struct *owner;
-	int ret;
-};
-
-static void vhost_attach_cgroups_work(struct vhost_work *work)
-{
-	struct vhost_attach_cgroups_struct *s;
-
-	s = container_of(work, struct vhost_attach_cgroups_struct, work);
-	s->ret = cgroup_attach_task_all(s->owner, current);
-}
-
-static int vhost_attach_cgroups(struct vhost_dev *dev)
-{
-	struct vhost_attach_cgroups_struct attach;
-
-	attach.owner = current;
-	vhost_work_init(&attach.work, vhost_attach_cgroups_work);
-	vhost_work_queue(dev, &attach.work);
-	vhost_work_dev_flush(dev);
-	return attach.ret;
-}
-
 /* Caller should have device mutex */
 bool vhost_dev_has_owner(struct vhost_dev *dev)
 {
@@ -579,6 +551,16 @@ static void vhost_detach_mm(struct vhost_dev *dev)
 	dev->mm = NULL;
 }
 
+static void vhost_worker_stop(struct vhost_worker *worker)
+{
+	DECLARE_COMPLETION_ONSTACK(exit_done);
+
+	worker->exit_done = &exit_done;
+	set_bit(VHOST_WORKER_FLAG_STOP, &worker->flags);
+	wake_up_process(worker->task);
+	wait_for_completion(worker->exit_done);
+}
+
 static void vhost_worker_free(struct vhost_dev *dev)
 {
 	struct vhost_worker *worker = dev->worker;
@@ -588,7 +570,7 @@ static void vhost_worker_free(struct vhost_dev *dev)
 
 	dev->worker = NULL;
 	WARN_ON(!llist_empty(&worker->work_list));
-	kthread_stop(worker->task);
+	vhost_worker_stop(worker);
 	kfree(worker);
 }
 
@@ -603,27 +585,24 @@ static int vhost_worker_create(struct vhost_dev *dev)
 		return -ENOMEM;
 
 	dev->worker = worker;
-	worker->dev = dev;
 	worker->kcov_handle = kcov_common_handle();
 	init_llist_head(&worker->work_list);
 
-	task = kthread_create(vhost_worker, worker, "vhost-%d", current->pid);
+	/*
+	 * vhost used to use the kthread API which ignores all signals by
+	 * default and the drivers expect this behavior.
+	 */
+	task = kernel_worker(vhost_worker, worker, NUMA_NO_NODE, CLONE_FS,
+			     KERN_WORKER_NO_FILES | KERN_WORKER_SIG_IGN);
 	if (IS_ERR(task)) {
 		ret = PTR_ERR(task);
 		goto free_worker;
 	}
 
 	worker->task = task;
-	wake_up_process(task); /* avoid contributing to loadavg */
-
-	ret = vhost_attach_cgroups(dev);
-	if (ret)
-		goto stop_worker;
-
+	kernel_worker_start(task, "vhost-%d", current->pid);
 	return 0;
 
-stop_worker:
-	kthread_stop(worker->task);
 free_worker:
 	kfree(worker);
 	dev->worker = NULL;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 102ce25e4e13..09748694cb66 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -25,11 +25,16 @@ struct vhost_work {
 	unsigned long		flags;
 };
 
+enum {
+	VHOST_WORKER_FLAG_STOP,
+};
+
 struct vhost_worker {
 	struct task_struct	*task;
+	struct completion	*exit_done;
 	struct llist_head	work_list;
-	struct vhost_dev	*dev;
 	u64			kcov_handle;
+	unsigned long		flags;
 };
 
 /* Poll a file (eventfd or socket) */
-- 
2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH V4 8/8] vhost: use kernel_worker to check RLIMITs and inherit v2 cgroups
@ 2021-10-07 21:44   ` Mike Christie
  0 siblings, 0 replies; 29+ messages in thread
From: Mike Christie @ 2021-10-07 21:44 UTC (permalink / raw)
  To: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel
  Cc: Mike Christie

For vhost workers we use the kthread API which inherit's its values from
and checks against the kthreadd thread. This results in cgroups v2 not
working and the wrong RLIMITs being checked. This patch has us use the
kernel_copy_process function which will inherit its values/checks from the
thread that owns the device.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/vhost/vhost.c | 65 +++++++++++++++----------------------------
 drivers/vhost/vhost.h |  7 ++++-
 2 files changed, 28 insertions(+), 44 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c9a1f706989c..9aa04fcdf210 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -22,7 +22,6 @@
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/kthread.h>
-#include <linux/cgroup.h>
 #include <linux/module.h>
 #include <linux/sort.h>
 #include <linux/sched/mm.h>
@@ -344,17 +343,14 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 static int vhost_worker(void *data)
 {
 	struct vhost_worker *worker = data;
-	struct vhost_dev *dev = worker->dev;
 	struct vhost_work *work, *work_next;
 	struct llist_node *node;
 
-	kthread_use_mm(dev->mm);
-
 	for (;;) {
 		/* mb paired w/ kthread_stop */
 		set_current_state(TASK_INTERRUPTIBLE);
 
-		if (kthread_should_stop()) {
+		if (test_bit(VHOST_WORKER_FLAG_STOP, &worker->flags)) {
 			__set_current_state(TASK_RUNNING);
 			break;
 		}
@@ -376,8 +372,9 @@ static int vhost_worker(void *data)
 				schedule();
 		}
 	}
-	kthread_unuse_mm(dev->mm);
-	return 0;
+
+	complete(worker->exit_done);
+	do_exit(0);
 }
 
 static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
@@ -517,31 +514,6 @@ long vhost_dev_check_owner(struct vhost_dev *dev)
 }
 EXPORT_SYMBOL_GPL(vhost_dev_check_owner);
 
-struct vhost_attach_cgroups_struct {
-	struct vhost_work work;
-	struct task_struct *owner;
-	int ret;
-};
-
-static void vhost_attach_cgroups_work(struct vhost_work *work)
-{
-	struct vhost_attach_cgroups_struct *s;
-
-	s = container_of(work, struct vhost_attach_cgroups_struct, work);
-	s->ret = cgroup_attach_task_all(s->owner, current);
-}
-
-static int vhost_attach_cgroups(struct vhost_dev *dev)
-{
-	struct vhost_attach_cgroups_struct attach;
-
-	attach.owner = current;
-	vhost_work_init(&attach.work, vhost_attach_cgroups_work);
-	vhost_work_queue(dev, &attach.work);
-	vhost_work_dev_flush(dev);
-	return attach.ret;
-}
-
 /* Caller should have device mutex */
 bool vhost_dev_has_owner(struct vhost_dev *dev)
 {
@@ -579,6 +551,16 @@ static void vhost_detach_mm(struct vhost_dev *dev)
 	dev->mm = NULL;
 }
 
+static void vhost_worker_stop(struct vhost_worker *worker)
+{
+	DECLARE_COMPLETION_ONSTACK(exit_done);
+
+	worker->exit_done = &exit_done;
+	set_bit(VHOST_WORKER_FLAG_STOP, &worker->flags);
+	wake_up_process(worker->task);
+	wait_for_completion(worker->exit_done);
+}
+
 static void vhost_worker_free(struct vhost_dev *dev)
 {
 	struct vhost_worker *worker = dev->worker;
@@ -588,7 +570,7 @@ static void vhost_worker_free(struct vhost_dev *dev)
 
 	dev->worker = NULL;
 	WARN_ON(!llist_empty(&worker->work_list));
-	kthread_stop(worker->task);
+	vhost_worker_stop(worker);
 	kfree(worker);
 }
 
@@ -603,27 +585,24 @@ static int vhost_worker_create(struct vhost_dev *dev)
 		return -ENOMEM;
 
 	dev->worker = worker;
-	worker->dev = dev;
 	worker->kcov_handle = kcov_common_handle();
 	init_llist_head(&worker->work_list);
 
-	task = kthread_create(vhost_worker, worker, "vhost-%d", current->pid);
+	/*
+	 * vhost used to use the kthread API which ignores all signals by
+	 * default and the drivers expect this behavior.
+	 */
+	task = kernel_worker(vhost_worker, worker, NUMA_NO_NODE, CLONE_FS,
+			     KERN_WORKER_NO_FILES | KERN_WORKER_SIG_IGN);
 	if (IS_ERR(task)) {
 		ret = PTR_ERR(task);
 		goto free_worker;
 	}
 
 	worker->task = task;
-	wake_up_process(task); /* avoid contributing to loadavg */
-
-	ret = vhost_attach_cgroups(dev);
-	if (ret)
-		goto stop_worker;
-
+	kernel_worker_start(task, "vhost-%d", current->pid);
 	return 0;
 
-stop_worker:
-	kthread_stop(worker->task);
 free_worker:
 	kfree(worker);
 	dev->worker = NULL;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 102ce25e4e13..09748694cb66 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -25,11 +25,16 @@ struct vhost_work {
 	unsigned long		flags;
 };
 
+enum {
+	VHOST_WORKER_FLAG_STOP,
+};
+
 struct vhost_worker {
 	struct task_struct	*task;
+	struct completion	*exit_done;
 	struct llist_head	work_list;
-	struct vhost_dev	*dev;
 	u64			kcov_handle;
+	unsigned long		flags;
 };
 
 /* Poll a file (eventfd or socket) */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 6/8] io_uring: switch to kernel_worker
  2021-10-07 21:44   ` Mike Christie
  (?)
@ 2021-10-08  6:42     ` kernel test robot
  -1 siblings, 0 replies; 29+ messages in thread
From: kernel test robot @ 2021-10-08  6:42 UTC (permalink / raw)
  To: Mike Christie, geert, vverma, hdanton, hch, stefanha, jasowang,
	mst, sgarzare, virtualization, christian.brauner
  Cc: llvm, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5385 bytes --]

Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20211007]
[cannot apply to mst-vhost/linux-next vgupta-arc/for-next arm64/for-next/core uclinux-h8/h8300-next geert-m68k/for-next openrisc/for-next deller-parisc/for-next powerpc/next s390/features linus/master v5.15-rc4 v5.15-rc3 v5.15-rc2 v5.15-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mike-Christie/Use-copy_process-create_io_thread-in-vhost-layer/20211008-093610
base:    f8dc23b3dc0cc5b32dfd0c446e59377736d073a7
config: hexagon-randconfig-r045-20211007 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project b1a45c62f03ecbeb4544b0c65a01ee4586235a61)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/19238ec927cb55bbd6fd6bdf64bac6a99f457b8c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mike-Christie/Use-copy_process-create_io_thread-in-vhost-layer/20211008-093610
        git checkout 19238ec927cb55bbd6fd6bdf64bac6a99f457b8c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/fork.c:161:13: warning: no previous prototype for function 'arch_release_task_struct' [-Wmissing-prototypes]
   void __weak arch_release_task_struct(struct task_struct *tsk)
               ^
   kernel/fork.c:161:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __weak arch_release_task_struct(struct task_struct *tsk)
   ^
   static 
   kernel/fork.c:814:20: warning: no previous prototype for function 'arch_task_cache_init' [-Wmissing-prototypes]
   void __init __weak arch_task_cache_init(void) { }
                      ^
   kernel/fork.c:814:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __init __weak arch_task_cache_init(void) { }
   ^
   static 
   kernel/fork.c:909:12: warning: no previous prototype for function 'arch_dup_task_struct' [-Wmissing-prototypes]
   int __weak arch_dup_task_struct(struct task_struct *dst,
              ^
   kernel/fork.c:909:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int __weak arch_dup_task_struct(struct task_struct *dst,
   ^
   static 
>> kernel/fork.c:2581:21: warning: no previous prototype for function 'create_io_thread' [-Wmissing-prototypes]
   struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
                       ^
   kernel/fork.c:2581:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
   ^
   static 
   4 warnings generated.


vim +/create_io_thread +2581 kernel/fork.c

13585fa0668c72 Nadav Amit    2019-04-25  2574  
cc440e8738e5c8 Jens Axboe    2021-03-04  2575  /*
cc440e8738e5c8 Jens Axboe    2021-03-04  2576   * This is like kernel_clone(), but shaved down and tailored to just
cc440e8738e5c8 Jens Axboe    2021-03-04  2577   * creating io_uring workers. It returns a created task, or an error pointer.
cc440e8738e5c8 Jens Axboe    2021-03-04  2578   * The returned task is inactive, and the caller must fire it up through
cc440e8738e5c8 Jens Axboe    2021-03-04  2579   * wake_up_new_task(p). All signals are blocked in the created task.
cc440e8738e5c8 Jens Axboe    2021-03-04  2580   */
cc440e8738e5c8 Jens Axboe    2021-03-04 @2581  struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
cc440e8738e5c8 Jens Axboe    2021-03-04  2582  {
cc440e8738e5c8 Jens Axboe    2021-03-04  2583  	unsigned long flags = CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|
cc440e8738e5c8 Jens Axboe    2021-03-04  2584  				CLONE_IO;
cc440e8738e5c8 Jens Axboe    2021-03-04  2585  	struct kernel_clone_args args = {
cc440e8738e5c8 Jens Axboe    2021-03-04  2586  		.flags		= ((lower_32_bits(flags) | CLONE_VM |
cc440e8738e5c8 Jens Axboe    2021-03-04  2587  				    CLONE_UNTRACED) & ~CSIGNAL),
cc440e8738e5c8 Jens Axboe    2021-03-04  2588  		.exit_signal	= (lower_32_bits(flags) & CSIGNAL),
cc440e8738e5c8 Jens Axboe    2021-03-04  2589  		.stack		= (unsigned long)fn,
cc440e8738e5c8 Jens Axboe    2021-03-04  2590  		.stack_size	= (unsigned long)arg,
3f1f508b402889 Mike Christie 2021-10-07  2591  		.worker_flags	= KERN_WORKER_IO | KERN_WORKER_USER,
cc440e8738e5c8 Jens Axboe    2021-03-04  2592  	};
cc440e8738e5c8 Jens Axboe    2021-03-04  2593  
b16b3855d89fba Jens Axboe    2021-03-26  2594  	return copy_process(NULL, 0, node, &args);
cc440e8738e5c8 Jens Axboe    2021-03-04  2595  }
cc440e8738e5c8 Jens Axboe    2021-03-04  2596  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25386 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 6/8] io_uring: switch to kernel_worker
@ 2021-10-08  6:42     ` kernel test robot
  0 siblings, 0 replies; 29+ messages in thread
From: kernel test robot @ 2021-10-08  6:42 UTC (permalink / raw)
  To: Mike Christie, geert, vverma, hdanton, hch, stefanha, jasowang,
	mst, sgarzare, virtualization, christian.brauner
  Cc: llvm, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5385 bytes --]

Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20211007]
[cannot apply to mst-vhost/linux-next vgupta-arc/for-next arm64/for-next/core uclinux-h8/h8300-next geert-m68k/for-next openrisc/for-next deller-parisc/for-next powerpc/next s390/features linus/master v5.15-rc4 v5.15-rc3 v5.15-rc2 v5.15-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mike-Christie/Use-copy_process-create_io_thread-in-vhost-layer/20211008-093610
base:    f8dc23b3dc0cc5b32dfd0c446e59377736d073a7
config: hexagon-randconfig-r045-20211007 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project b1a45c62f03ecbeb4544b0c65a01ee4586235a61)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/19238ec927cb55bbd6fd6bdf64bac6a99f457b8c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mike-Christie/Use-copy_process-create_io_thread-in-vhost-layer/20211008-093610
        git checkout 19238ec927cb55bbd6fd6bdf64bac6a99f457b8c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/fork.c:161:13: warning: no previous prototype for function 'arch_release_task_struct' [-Wmissing-prototypes]
   void __weak arch_release_task_struct(struct task_struct *tsk)
               ^
   kernel/fork.c:161:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __weak arch_release_task_struct(struct task_struct *tsk)
   ^
   static 
   kernel/fork.c:814:20: warning: no previous prototype for function 'arch_task_cache_init' [-Wmissing-prototypes]
   void __init __weak arch_task_cache_init(void) { }
                      ^
   kernel/fork.c:814:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __init __weak arch_task_cache_init(void) { }
   ^
   static 
   kernel/fork.c:909:12: warning: no previous prototype for function 'arch_dup_task_struct' [-Wmissing-prototypes]
   int __weak arch_dup_task_struct(struct task_struct *dst,
              ^
   kernel/fork.c:909:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int __weak arch_dup_task_struct(struct task_struct *dst,
   ^
   static 
>> kernel/fork.c:2581:21: warning: no previous prototype for function 'create_io_thread' [-Wmissing-prototypes]
   struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
                       ^
   kernel/fork.c:2581:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
   ^
   static 
   4 warnings generated.


vim +/create_io_thread +2581 kernel/fork.c

13585fa0668c72 Nadav Amit    2019-04-25  2574  
cc440e8738e5c8 Jens Axboe    2021-03-04  2575  /*
cc440e8738e5c8 Jens Axboe    2021-03-04  2576   * This is like kernel_clone(), but shaved down and tailored to just
cc440e8738e5c8 Jens Axboe    2021-03-04  2577   * creating io_uring workers. It returns a created task, or an error pointer.
cc440e8738e5c8 Jens Axboe    2021-03-04  2578   * The returned task is inactive, and the caller must fire it up through
cc440e8738e5c8 Jens Axboe    2021-03-04  2579   * wake_up_new_task(p). All signals are blocked in the created task.
cc440e8738e5c8 Jens Axboe    2021-03-04  2580   */
cc440e8738e5c8 Jens Axboe    2021-03-04 @2581  struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
cc440e8738e5c8 Jens Axboe    2021-03-04  2582  {
cc440e8738e5c8 Jens Axboe    2021-03-04  2583  	unsigned long flags = CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|
cc440e8738e5c8 Jens Axboe    2021-03-04  2584  				CLONE_IO;
cc440e8738e5c8 Jens Axboe    2021-03-04  2585  	struct kernel_clone_args args = {
cc440e8738e5c8 Jens Axboe    2021-03-04  2586  		.flags		= ((lower_32_bits(flags) | CLONE_VM |
cc440e8738e5c8 Jens Axboe    2021-03-04  2587  				    CLONE_UNTRACED) & ~CSIGNAL),
cc440e8738e5c8 Jens Axboe    2021-03-04  2588  		.exit_signal	= (lower_32_bits(flags) & CSIGNAL),
cc440e8738e5c8 Jens Axboe    2021-03-04  2589  		.stack		= (unsigned long)fn,
cc440e8738e5c8 Jens Axboe    2021-03-04  2590  		.stack_size	= (unsigned long)arg,
3f1f508b402889 Mike Christie 2021-10-07  2591  		.worker_flags	= KERN_WORKER_IO | KERN_WORKER_USER,
cc440e8738e5c8 Jens Axboe    2021-03-04  2592  	};
cc440e8738e5c8 Jens Axboe    2021-03-04  2593  
b16b3855d89fba Jens Axboe    2021-03-26  2594  	return copy_process(NULL, 0, node, &args);
cc440e8738e5c8 Jens Axboe    2021-03-04  2595  }
cc440e8738e5c8 Jens Axboe    2021-03-04  2596  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25386 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 6/8] io_uring: switch to kernel_worker
@ 2021-10-08  6:42     ` kernel test robot
  0 siblings, 0 replies; 29+ messages in thread
From: kernel test robot @ 2021-10-08  6:42 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5476 bytes --]

Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20211007]
[cannot apply to mst-vhost/linux-next vgupta-arc/for-next arm64/for-next/core uclinux-h8/h8300-next geert-m68k/for-next openrisc/for-next deller-parisc/for-next powerpc/next s390/features linus/master v5.15-rc4 v5.15-rc3 v5.15-rc2 v5.15-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mike-Christie/Use-copy_process-create_io_thread-in-vhost-layer/20211008-093610
base:    f8dc23b3dc0cc5b32dfd0c446e59377736d073a7
config: hexagon-randconfig-r045-20211007 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project b1a45c62f03ecbeb4544b0c65a01ee4586235a61)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/19238ec927cb55bbd6fd6bdf64bac6a99f457b8c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mike-Christie/Use-copy_process-create_io_thread-in-vhost-layer/20211008-093610
        git checkout 19238ec927cb55bbd6fd6bdf64bac6a99f457b8c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/fork.c:161:13: warning: no previous prototype for function 'arch_release_task_struct' [-Wmissing-prototypes]
   void __weak arch_release_task_struct(struct task_struct *tsk)
               ^
   kernel/fork.c:161:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __weak arch_release_task_struct(struct task_struct *tsk)
   ^
   static 
   kernel/fork.c:814:20: warning: no previous prototype for function 'arch_task_cache_init' [-Wmissing-prototypes]
   void __init __weak arch_task_cache_init(void) { }
                      ^
   kernel/fork.c:814:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __init __weak arch_task_cache_init(void) { }
   ^
   static 
   kernel/fork.c:909:12: warning: no previous prototype for function 'arch_dup_task_struct' [-Wmissing-prototypes]
   int __weak arch_dup_task_struct(struct task_struct *dst,
              ^
   kernel/fork.c:909:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int __weak arch_dup_task_struct(struct task_struct *dst,
   ^
   static 
>> kernel/fork.c:2581:21: warning: no previous prototype for function 'create_io_thread' [-Wmissing-prototypes]
   struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
                       ^
   kernel/fork.c:2581:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
   ^
   static 
   4 warnings generated.


vim +/create_io_thread +2581 kernel/fork.c

13585fa0668c72 Nadav Amit    2019-04-25  2574  
cc440e8738e5c8 Jens Axboe    2021-03-04  2575  /*
cc440e8738e5c8 Jens Axboe    2021-03-04  2576   * This is like kernel_clone(), but shaved down and tailored to just
cc440e8738e5c8 Jens Axboe    2021-03-04  2577   * creating io_uring workers. It returns a created task, or an error pointer.
cc440e8738e5c8 Jens Axboe    2021-03-04  2578   * The returned task is inactive, and the caller must fire it up through
cc440e8738e5c8 Jens Axboe    2021-03-04  2579   * wake_up_new_task(p). All signals are blocked in the created task.
cc440e8738e5c8 Jens Axboe    2021-03-04  2580   */
cc440e8738e5c8 Jens Axboe    2021-03-04 @2581  struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
cc440e8738e5c8 Jens Axboe    2021-03-04  2582  {
cc440e8738e5c8 Jens Axboe    2021-03-04  2583  	unsigned long flags = CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|
cc440e8738e5c8 Jens Axboe    2021-03-04  2584  				CLONE_IO;
cc440e8738e5c8 Jens Axboe    2021-03-04  2585  	struct kernel_clone_args args = {
cc440e8738e5c8 Jens Axboe    2021-03-04  2586  		.flags		= ((lower_32_bits(flags) | CLONE_VM |
cc440e8738e5c8 Jens Axboe    2021-03-04  2587  				    CLONE_UNTRACED) & ~CSIGNAL),
cc440e8738e5c8 Jens Axboe    2021-03-04  2588  		.exit_signal	= (lower_32_bits(flags) & CSIGNAL),
cc440e8738e5c8 Jens Axboe    2021-03-04  2589  		.stack		= (unsigned long)fn,
cc440e8738e5c8 Jens Axboe    2021-03-04  2590  		.stack_size	= (unsigned long)arg,
3f1f508b402889 Mike Christie 2021-10-07  2591  		.worker_flags	= KERN_WORKER_IO | KERN_WORKER_USER,
cc440e8738e5c8 Jens Axboe    2021-03-04  2592  	};
cc440e8738e5c8 Jens Axboe    2021-03-04  2593  
b16b3855d89fba Jens Axboe    2021-03-26  2594  	return copy_process(NULL, 0, node, &args);
cc440e8738e5c8 Jens Axboe    2021-03-04  2595  }
cc440e8738e5c8 Jens Axboe    2021-03-04  2596  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 25386 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 6/8] io_uring: switch to kernel_worker
  2021-10-07 21:44   ` Mike Christie
@ 2021-10-08  7:10     ` kernel test robot
  -1 siblings, 0 replies; 29+ messages in thread
From: kernel test robot @ 2021-10-08  7:10 UTC (permalink / raw)
  To: Mike Christie, geert, vverma, hdanton, hch, stefanha, jasowang,
	mst, sgarzare, virtualization, christian.brauner
  Cc: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5336 bytes --]

Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20211007]
[cannot apply to mst-vhost/linux-next vgupta-arc/for-next arm64/for-next/core uclinux-h8/h8300-next geert-m68k/for-next openrisc/for-next deller-parisc/for-next powerpc/next s390/features linus/master v5.15-rc4 v5.15-rc3 v5.15-rc2 v5.15-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mike-Christie/Use-copy_process-create_io_thread-in-vhost-layer/20211008-093610
base:    f8dc23b3dc0cc5b32dfd0c446e59377736d073a7
config: arc-randconfig-r043-20211007 (attached as .config)
compiler: arc-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/19238ec927cb55bbd6fd6bdf64bac6a99f457b8c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mike-Christie/Use-copy_process-create_io_thread-in-vhost-layer/20211008-093610
        git checkout 19238ec927cb55bbd6fd6bdf64bac6a99f457b8c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/fork.c:161:13: warning: no previous prototype for 'arch_release_task_struct' [-Wmissing-prototypes]
     161 | void __weak arch_release_task_struct(struct task_struct *tsk)
         |             ^~~~~~~~~~~~~~~~~~~~~~~~
   kernel/fork.c:814:20: warning: no previous prototype for 'arch_task_cache_init' [-Wmissing-prototypes]
     814 | void __init __weak arch_task_cache_init(void) { }
         |                    ^~~~~~~~~~~~~~~~~~~~
   kernel/fork.c:909:12: warning: no previous prototype for 'arch_dup_task_struct' [-Wmissing-prototypes]
     909 | int __weak arch_dup_task_struct(struct task_struct *dst,
         |            ^~~~~~~~~~~~~~~~~~~~
>> kernel/fork.c:2581:21: warning: no previous prototype for 'create_io_thread' [-Wmissing-prototypes]
    2581 | struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
         |                     ^~~~~~~~~~~~~~~~
   In file included from include/linux/perf_event.h:25,
                    from include/linux/trace_events.h:10,
                    from include/trace/syscall.h:7,
                    from include/linux/syscalls.h:87,
                    from kernel/fork.c:54:
   arch/arc/include/asm/perf_event.h:126:27: warning: 'arc_pmu_cache_map' defined but not used [-Wunused-const-variable=]
     126 | static const unsigned int arc_pmu_cache_map[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
         |                           ^~~~~~~~~~~~~~~~~
   arch/arc/include/asm/perf_event.h:91:27: warning: 'arc_pmu_ev_hw_map' defined but not used [-Wunused-const-variable=]
      91 | static const char * const arc_pmu_ev_hw_map[] = {
         |                           ^~~~~~~~~~~~~~~~~


vim +/create_io_thread +2581 kernel/fork.c

13585fa0668c72 Nadav Amit    2019-04-25  2574  
cc440e8738e5c8 Jens Axboe    2021-03-04  2575  /*
cc440e8738e5c8 Jens Axboe    2021-03-04  2576   * This is like kernel_clone(), but shaved down and tailored to just
cc440e8738e5c8 Jens Axboe    2021-03-04  2577   * creating io_uring workers. It returns a created task, or an error pointer.
cc440e8738e5c8 Jens Axboe    2021-03-04  2578   * The returned task is inactive, and the caller must fire it up through
cc440e8738e5c8 Jens Axboe    2021-03-04  2579   * wake_up_new_task(p). All signals are blocked in the created task.
cc440e8738e5c8 Jens Axboe    2021-03-04  2580   */
cc440e8738e5c8 Jens Axboe    2021-03-04 @2581  struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
cc440e8738e5c8 Jens Axboe    2021-03-04  2582  {
cc440e8738e5c8 Jens Axboe    2021-03-04  2583  	unsigned long flags = CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|
cc440e8738e5c8 Jens Axboe    2021-03-04  2584  				CLONE_IO;
cc440e8738e5c8 Jens Axboe    2021-03-04  2585  	struct kernel_clone_args args = {
cc440e8738e5c8 Jens Axboe    2021-03-04  2586  		.flags		= ((lower_32_bits(flags) | CLONE_VM |
cc440e8738e5c8 Jens Axboe    2021-03-04  2587  				    CLONE_UNTRACED) & ~CSIGNAL),
cc440e8738e5c8 Jens Axboe    2021-03-04  2588  		.exit_signal	= (lower_32_bits(flags) & CSIGNAL),
cc440e8738e5c8 Jens Axboe    2021-03-04  2589  		.stack		= (unsigned long)fn,
cc440e8738e5c8 Jens Axboe    2021-03-04  2590  		.stack_size	= (unsigned long)arg,
3f1f508b402889 Mike Christie 2021-10-07  2591  		.worker_flags	= KERN_WORKER_IO | KERN_WORKER_USER,
cc440e8738e5c8 Jens Axboe    2021-03-04  2592  	};
cc440e8738e5c8 Jens Axboe    2021-03-04  2593  
b16b3855d89fba Jens Axboe    2021-03-26  2594  	return copy_process(NULL, 0, node, &args);
cc440e8738e5c8 Jens Axboe    2021-03-04  2595  }
cc440e8738e5c8 Jens Axboe    2021-03-04  2596  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30309 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 6/8] io_uring: switch to kernel_worker
@ 2021-10-08  7:10     ` kernel test robot
  0 siblings, 0 replies; 29+ messages in thread
From: kernel test robot @ 2021-10-08  7:10 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5421 bytes --]

Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20211007]
[cannot apply to mst-vhost/linux-next vgupta-arc/for-next arm64/for-next/core uclinux-h8/h8300-next geert-m68k/for-next openrisc/for-next deller-parisc/for-next powerpc/next s390/features linus/master v5.15-rc4 v5.15-rc3 v5.15-rc2 v5.15-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mike-Christie/Use-copy_process-create_io_thread-in-vhost-layer/20211008-093610
base:    f8dc23b3dc0cc5b32dfd0c446e59377736d073a7
config: arc-randconfig-r043-20211007 (attached as .config)
compiler: arc-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/19238ec927cb55bbd6fd6bdf64bac6a99f457b8c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mike-Christie/Use-copy_process-create_io_thread-in-vhost-layer/20211008-093610
        git checkout 19238ec927cb55bbd6fd6bdf64bac6a99f457b8c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/fork.c:161:13: warning: no previous prototype for 'arch_release_task_struct' [-Wmissing-prototypes]
     161 | void __weak arch_release_task_struct(struct task_struct *tsk)
         |             ^~~~~~~~~~~~~~~~~~~~~~~~
   kernel/fork.c:814:20: warning: no previous prototype for 'arch_task_cache_init' [-Wmissing-prototypes]
     814 | void __init __weak arch_task_cache_init(void) { }
         |                    ^~~~~~~~~~~~~~~~~~~~
   kernel/fork.c:909:12: warning: no previous prototype for 'arch_dup_task_struct' [-Wmissing-prototypes]
     909 | int __weak arch_dup_task_struct(struct task_struct *dst,
         |            ^~~~~~~~~~~~~~~~~~~~
>> kernel/fork.c:2581:21: warning: no previous prototype for 'create_io_thread' [-Wmissing-prototypes]
    2581 | struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
         |                     ^~~~~~~~~~~~~~~~
   In file included from include/linux/perf_event.h:25,
                    from include/linux/trace_events.h:10,
                    from include/trace/syscall.h:7,
                    from include/linux/syscalls.h:87,
                    from kernel/fork.c:54:
   arch/arc/include/asm/perf_event.h:126:27: warning: 'arc_pmu_cache_map' defined but not used [-Wunused-const-variable=]
     126 | static const unsigned int arc_pmu_cache_map[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
         |                           ^~~~~~~~~~~~~~~~~
   arch/arc/include/asm/perf_event.h:91:27: warning: 'arc_pmu_ev_hw_map' defined but not used [-Wunused-const-variable=]
      91 | static const char * const arc_pmu_ev_hw_map[] = {
         |                           ^~~~~~~~~~~~~~~~~


vim +/create_io_thread +2581 kernel/fork.c

13585fa0668c72 Nadav Amit    2019-04-25  2574  
cc440e8738e5c8 Jens Axboe    2021-03-04  2575  /*
cc440e8738e5c8 Jens Axboe    2021-03-04  2576   * This is like kernel_clone(), but shaved down and tailored to just
cc440e8738e5c8 Jens Axboe    2021-03-04  2577   * creating io_uring workers. It returns a created task, or an error pointer.
cc440e8738e5c8 Jens Axboe    2021-03-04  2578   * The returned task is inactive, and the caller must fire it up through
cc440e8738e5c8 Jens Axboe    2021-03-04  2579   * wake_up_new_task(p). All signals are blocked in the created task.
cc440e8738e5c8 Jens Axboe    2021-03-04  2580   */
cc440e8738e5c8 Jens Axboe    2021-03-04 @2581  struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
cc440e8738e5c8 Jens Axboe    2021-03-04  2582  {
cc440e8738e5c8 Jens Axboe    2021-03-04  2583  	unsigned long flags = CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|
cc440e8738e5c8 Jens Axboe    2021-03-04  2584  				CLONE_IO;
cc440e8738e5c8 Jens Axboe    2021-03-04  2585  	struct kernel_clone_args args = {
cc440e8738e5c8 Jens Axboe    2021-03-04  2586  		.flags		= ((lower_32_bits(flags) | CLONE_VM |
cc440e8738e5c8 Jens Axboe    2021-03-04  2587  				    CLONE_UNTRACED) & ~CSIGNAL),
cc440e8738e5c8 Jens Axboe    2021-03-04  2588  		.exit_signal	= (lower_32_bits(flags) & CSIGNAL),
cc440e8738e5c8 Jens Axboe    2021-03-04  2589  		.stack		= (unsigned long)fn,
cc440e8738e5c8 Jens Axboe    2021-03-04  2590  		.stack_size	= (unsigned long)arg,
3f1f508b402889 Mike Christie 2021-10-07  2591  		.worker_flags	= KERN_WORKER_IO | KERN_WORKER_USER,
cc440e8738e5c8 Jens Axboe    2021-03-04  2592  	};
cc440e8738e5c8 Jens Axboe    2021-03-04  2593  
b16b3855d89fba Jens Axboe    2021-03-26  2594  	return copy_process(NULL, 0, node, &args);
cc440e8738e5c8 Jens Axboe    2021-03-04  2595  }
cc440e8738e5c8 Jens Axboe    2021-03-04  2596  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 30309 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 2/8] fork: move PF_IO_WORKER's kernel frame setup to new flag
  2021-10-07 21:44   ` Mike Christie
@ 2021-10-08  8:21     ` Geert Uytterhoeven
  -1 siblings, 0 replies; 29+ messages in thread
From: Geert Uytterhoeven @ 2021-10-08  8:21 UTC (permalink / raw)
  To: Mike Christie
  Cc: vverma, hdanton, Christoph Hellwig, Stefan Hajnoczi, Jason Wang,
	Michael S. Tsirkin,
	Stefano Garzarella --cc virtualization @ lists .
	linux-foundation . org, virtualization, Christian Brauner,
	Jens Axboe, Linux Kernel Mailing List

On Thu, Oct 7, 2021 at 11:45 PM Mike Christie
<michael.christie@oracle.com> wrote:
> The vhost worker threads need the same frame setup as io_uring's worker
> threads, but handle signals differently and do not need the same
> scheduling behavior. This patch separate's the frame setup parts of
> PF_IO_WORKER into a new PF flag PF_USER_WORKER.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>

>  arch/m68k/kernel/process.c       | 2 +-

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 2/8] fork: move PF_IO_WORKER's kernel frame setup to new flag
@ 2021-10-08  8:21     ` Geert Uytterhoeven
  0 siblings, 0 replies; 29+ messages in thread
From: Geert Uytterhoeven @ 2021-10-08  8:21 UTC (permalink / raw)
  To: Mike Christie
  Cc: Jens Axboe, hdanton, Michael S. Tsirkin,
	Linux Kernel Mailing List, virtualization, Christoph Hellwig,
	vverma, Stefan Hajnoczi, Christian Brauner

On Thu, Oct 7, 2021 at 11:45 PM Mike Christie
<michael.christie@oracle.com> wrote:
> The vhost worker threads need the same frame setup as io_uring's worker
> threads, but handle signals differently and do not need the same
> scheduling behavior. This patch separate's the frame setup parts of
> PF_IO_WORKER into a new PF flag PF_USER_WORKER.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>

>  arch/m68k/kernel/process.c       | 2 +-

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 0/8] Use copy_process/create_io_thread in vhost layer
  2021-10-07 21:44 ` Mike Christie
@ 2021-10-12  6:43   ` Christoph Hellwig
  -1 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2021-10-12  6:43 UTC (permalink / raw)
  To: Mike Christie
  Cc: geert, vverma, hdanton, hch, stefanha, jasowang, mst, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel

The whole series looks good to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 0/8] Use copy_process/create_io_thread in vhost layer
@ 2021-10-12  6:43   ` Christoph Hellwig
  0 siblings, 0 replies; 29+ messages in thread
From: Christoph Hellwig @ 2021-10-12  6:43 UTC (permalink / raw)
  To: Mike Christie
  Cc: axboe, hdanton, mst, linux-kernel, virtualization, hch, vverma,
	geert, stefanha, christian.brauner

The whole series looks good to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 2/8] fork: move PF_IO_WORKER's kernel frame setup to new flag
  2021-10-07 21:44   ` Mike Christie
@ 2021-10-22  9:55     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2021-10-22  9:55 UTC (permalink / raw)
  To: Mike Christie
  Cc: geert, vverma, hdanton, hch, stefanha, jasowang, sgarzare,
	virtualization, christian.brauner, axboe, linux-kernel

On Thu, Oct 07, 2021 at 04:44:42PM -0500, Mike Christie wrote:
> The vhost worker threads need the same frame setup as io_uring's worker
> threads, but handle signals differently and do not need the same
> scheduling behavior. This patch separate's the frame setup parts of
> PF_IO_WORKER into a new PF flag PF_USER_WORKER.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  arch/alpha/kernel/process.c      | 2 +-
>  arch/arc/kernel/process.c        | 2 +-
>  arch/arm/kernel/process.c        | 2 +-
>  arch/arm64/kernel/process.c      | 2 +-
>  arch/csky/kernel/process.c       | 2 +-
>  arch/h8300/kernel/process.c      | 2 +-
>  arch/hexagon/kernel/process.c    | 2 +-
>  arch/ia64/kernel/process.c       | 2 +-
>  arch/m68k/kernel/process.c       | 2 +-
>  arch/microblaze/kernel/process.c | 2 +-
>  arch/mips/kernel/process.c       | 2 +-
>  arch/nds32/kernel/process.c      | 2 +-
>  arch/nios2/kernel/process.c      | 2 +-
>  arch/openrisc/kernel/process.c   | 2 +-
>  arch/parisc/kernel/process.c     | 2 +-
>  arch/powerpc/kernel/process.c    | 2 +-
>  arch/riscv/kernel/process.c      | 2 +-
>  arch/s390/kernel/process.c       | 2 +-
>  arch/sh/kernel/process_32.c      | 2 +-
>  arch/sparc/kernel/process_32.c   | 2 +-
>  arch/sparc/kernel/process_64.c   | 2 +-
>  arch/um/kernel/process.c         | 2 +-
>  arch/x86/kernel/process.c        | 2 +-
>  arch/xtensa/kernel/process.c     | 2 +-
>  include/linux/sched.h            | 1 +
>  include/linux/sched/task.h       | 1 +
>  kernel/fork.c                    | 4 +++-
>  27 files changed, 29 insertions(+), 25 deletions(-)


For something that's touching include/linux/sched.h
and all arches at once, this has not been CC'd widely enough.


> diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
> index a5123ea426ce..e350fff2ea14 100644
> --- a/arch/alpha/kernel/process.c
> +++ b/arch/alpha/kernel/process.c
> @@ -249,7 +249,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>  	childti->pcb.ksp = (unsigned long) childstack;
>  	childti->pcb.flags = 1;	/* set FEN, clear everything else */
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* kernel thread */
>  		memset(childstack, 0,
>  			sizeof(struct switch_stack) + sizeof(struct pt_regs));
> diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
> index 3793876f42d9..c3f4952cce17 100644
> --- a/arch/arc/kernel/process.c
> +++ b/arch/arc/kernel/process.c
> @@ -191,7 +191,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>  	childksp[0] = 0;			/* fp */
>  	childksp[1] = (unsigned long)ret_from_fork; /* blink */
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(c_regs, 0, sizeof(struct pt_regs));
>  
>  		c_callee->r13 = kthread_arg;
> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> index 0e2d3051741e..449c9db3942a 100644
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -247,7 +247,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
>  	thread->cpu_domain = get_domain();
>  #endif
>  
> -	if (likely(!(p->flags & (PF_KTHREAD | PF_IO_WORKER)))) {
> +	if (likely(!(p->flags & (PF_KTHREAD | PF_USER_WORKER)))) {
>  		*childregs = *current_pt_regs();
>  		childregs->ARM_r0 = 0;
>  		if (stack_start)
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 40adb8cdbf5a..e2fe88a3ae90 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -333,7 +333,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
>  
>  	ptrauth_thread_init_kernel(p);
>  
> -	if (likely(!(p->flags & (PF_KTHREAD | PF_IO_WORKER)))) {
> +	if (likely(!(p->flags & (PF_KTHREAD | PF_USER_WORKER)))) {
>  		*childregs = *current_pt_regs();
>  		childregs->regs[0] = 0;
>  
> diff --git a/arch/csky/kernel/process.c b/arch/csky/kernel/process.c
> index 3d0ca22cd0e2..509f2bfe4ace 100644
> --- a/arch/csky/kernel/process.c
> +++ b/arch/csky/kernel/process.c
> @@ -49,7 +49,7 @@ int copy_thread(unsigned long clone_flags,
>  	/* setup thread.sp for switch_to !!! */
>  	p->thread.sp = (unsigned long)childstack;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		childstack->r15 = (unsigned long) ret_from_kernel_thread;
>  		childstack->r10 = kthread_arg;
> diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
> index 2ac27e4248a4..11baf058b6c5 100644
> --- a/arch/h8300/kernel/process.c
> +++ b/arch/h8300/kernel/process.c
> @@ -112,7 +112,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>  
>  	childregs = (struct pt_regs *) (THREAD_SIZE + task_stack_page(p)) - 1;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		childregs->retpc = (unsigned long) ret_from_kernel_thread;
>  		childregs->er4 = topstk; /* arg */
> diff --git a/arch/hexagon/kernel/process.c b/arch/hexagon/kernel/process.c
> index 6a6835fb4242..f17573b66303 100644
> --- a/arch/hexagon/kernel/process.c
> +++ b/arch/hexagon/kernel/process.c
> @@ -73,7 +73,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  						    sizeof(*ss));
>  	ss->lr = (unsigned long)ret_from_fork;
>  	p->thread.switch_sp = ss;
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		/* r24 <- fn, r25 <- arg */
>  		ss->r24 = usp;
> diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
> index e56d63f4abf9..4a58daa56af4 100644
> --- a/arch/ia64/kernel/process.c
> +++ b/arch/ia64/kernel/process.c
> @@ -338,7 +338,7 @@ copy_thread(unsigned long clone_flags, unsigned long user_stack_base,
>  
>  	ia64_drop_fpu(p);	/* don't pick up stale state from a CPU's fph */
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		if (unlikely(!user_stack_base)) {
>  			/* fork_idle() called us */
>  			return 0;
> diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c
> index 1ab692b952cd..e7474a118410 100644
> --- a/arch/m68k/kernel/process.c
> +++ b/arch/m68k/kernel/process.c
> @@ -157,7 +157,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  	 */
>  	p->thread.fc = USER_DATA;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* kernel thread */
>  		memset(frame, 0, sizeof(struct fork_frame));
>  		frame->regs.sr = PS_S;
> diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
> index 62aa237180b6..5b543be324d4 100644
> --- a/arch/microblaze/kernel/process.c
> +++ b/arch/microblaze/kernel/process.c
> @@ -59,7 +59,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  	struct pt_regs *childregs = task_pt_regs(p);
>  	struct thread_info *ti = task_thread_info(p);
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* if we're creating a new kernel thread then just zeroing all
>  		 * the registers. That's OK for a brand new thread.*/
>  		memset(childregs, 0, sizeof(struct pt_regs));
> diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
> index 95aa86fa6077..d9ca11dd544f 100644
> --- a/arch/mips/kernel/process.c
> +++ b/arch/mips/kernel/process.c
> @@ -120,7 +120,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>  	/*  Put the stack after the struct pt_regs.  */
>  	childksp = (unsigned long) childregs;
>  	p->thread.cp0_status = (read_c0_status() & ~(ST0_CU2|ST0_CU1)) | ST0_KERNEL_CUMASK;
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* kernel thread */
>  		unsigned long status = p->thread.cp0_status;
>  		memset(childregs, 0, sizeof(struct pt_regs));
> diff --git a/arch/nds32/kernel/process.c b/arch/nds32/kernel/process.c
> index 391895b54d13..2dba51d1889c 100644
> --- a/arch/nds32/kernel/process.c
> +++ b/arch/nds32/kernel/process.c
> @@ -156,7 +156,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
>  
>  	memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context));
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		/* kernel thread fn */
>  		p->thread.cpu_context.r6 = stack_start;
> diff --git a/arch/nios2/kernel/process.c b/arch/nios2/kernel/process.c
> index 9ff37ba2bb60..ce6ad177da15 100644
> --- a/arch/nios2/kernel/process.c
> +++ b/arch/nios2/kernel/process.c
> @@ -109,7 +109,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  	struct switch_stack *childstack =
>  		((struct switch_stack *)childregs) - 1;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(childstack, 0,
>  			sizeof(struct switch_stack) + sizeof(struct pt_regs));
>  
> diff --git a/arch/openrisc/kernel/process.c b/arch/openrisc/kernel/process.c
> index b0698d9ce14f..d1d189c16676 100644
> --- a/arch/openrisc/kernel/process.c
> +++ b/arch/openrisc/kernel/process.c
> @@ -172,7 +172,7 @@ copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  	sp -= sizeof(struct pt_regs);
>  	kregs = (struct pt_regs *)sp;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(kregs, 0, sizeof(struct pt_regs));
>  		kregs->gpr[20] = usp; /* fn, kernel thread */
>  		kregs->gpr[22] = arg;
> diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
> index 38ec4ae81239..257bec7e67d4 100644
> --- a/arch/parisc/kernel/process.c
> +++ b/arch/parisc/kernel/process.c
> @@ -197,7 +197,7 @@ copy_thread(unsigned long clone_flags, unsigned long usp,
>  	extern void * const ret_from_kernel_thread;
>  	extern void * const child_return;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* kernel thread */
>  		memset(cregs, 0, sizeof(struct pt_regs));
>  		if (!usp) /* idle thread */
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 50436b52c213..817847723bff 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1700,7 +1700,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>  	/* Copy registers */
>  	sp -= sizeof(struct pt_regs);
>  	childregs = (struct pt_regs *) sp;
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* kernel thread */
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		childregs->gpr[1] = sp + sizeof(struct pt_regs);
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 03ac3aa611f5..8deeb94eb51e 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -125,7 +125,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  	struct pt_regs *childregs = task_pt_regs(p);
>  
>  	/* p->thread holds context to be restored by __switch_to() */
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* Kernel thread */
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		childregs->gp = gp_in_global;
> diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
> index 350e94d0cac2..f596843ab55c 100644
> --- a/arch/s390/kernel/process.c
> +++ b/arch/s390/kernel/process.c
> @@ -130,7 +130,7 @@ int copy_thread(unsigned long clone_flags, unsigned long new_stackp,
>  	frame->sf.gprs[9] = (unsigned long)frame;
>  
>  	/* Store access registers to kernel stack of new process. */
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* kernel thread */
>  		memset(&frame->childregs, 0, sizeof(struct pt_regs));
>  		frame->childregs.psw.mask = PSW_KERNEL_BITS | PSW_MASK_DAT |
> diff --git a/arch/sh/kernel/process_32.c b/arch/sh/kernel/process_32.c
> index 717de05c81f4..e74906f53c3e 100644
> --- a/arch/sh/kernel/process_32.c
> +++ b/arch/sh/kernel/process_32.c
> @@ -114,7 +114,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  
>  	childregs = task_pt_regs(p);
>  	p->thread.sp = (unsigned long) childregs;
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		p->thread.pc = (unsigned long) ret_from_kernel_thread;
>  		childregs->regs[4] = arg;
> diff --git a/arch/sparc/kernel/process_32.c b/arch/sparc/kernel/process_32.c
> index bbbe0cfef746..978e0bc10ad4 100644
> --- a/arch/sparc/kernel/process_32.c
> +++ b/arch/sparc/kernel/process_32.c
> @@ -296,7 +296,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
>  	ti->ksp = (unsigned long) new_stack;
>  	p->thread.kregs = childregs;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		extern int nwindows;
>  		unsigned long psr;
>  		memset(new_stack, 0, STACKFRAME_SZ + TRACEREG_SZ);
> diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
> index d1cc410d2f64..1c45cd5089f4 100644
> --- a/arch/sparc/kernel/process_64.c
> +++ b/arch/sparc/kernel/process_64.c
> @@ -594,7 +594,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
>  				       sizeof(struct sparc_stackf));
>  	t->fpsaved[0] = 0;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(child_trap_frame, 0, child_stack_sz);
>  		__thread_flag_byte_ptr(t)[TI_FLAG_BYTE_CWP] = 
>  			(current_pt_regs()->tstate + 1) & TSTATE_CWP;
> diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
> index 457a38db368b..2bc3141cbf01 100644
> --- a/arch/um/kernel/process.c
> +++ b/arch/um/kernel/process.c
> @@ -157,7 +157,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
>  		unsigned long arg, struct task_struct * p, unsigned long tls)
>  {
>  	void (*handler)(void);
> -	int kthread = current->flags & (PF_KTHREAD | PF_IO_WORKER);
> +	int kthread = current->flags & (PF_KTHREAD | PF_USER_WORKER);
>  	int ret = 0;
>  
>  	p->thread = (struct thread_struct) INIT_THREAD;
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 1d9463e3096b..d88be9dd5dfd 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -178,7 +178,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
>  	task_user_gs(p) = get_user_gs(current_pt_regs());
>  #endif
>  
> -	if (unlikely(p->flags & PF_IO_WORKER)) {
> +	if (unlikely(p->flags & PF_USER_WORKER)) {
>  		/*
>  		 * An IO thread is a user space thread, but it doesn't
>  		 * return to ret_after_fork().
> diff --git a/arch/xtensa/kernel/process.c b/arch/xtensa/kernel/process.c
> index 060165340612..61ad0bfbd7ea 100644
> --- a/arch/xtensa/kernel/process.c
> +++ b/arch/xtensa/kernel/process.c
> @@ -217,7 +217,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp_thread_fn,
>  
>  	p->thread.sp = (unsigned long)childregs;
>  
> -	if (!(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (!(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		struct pt_regs *regs = current_pt_regs();
>  		unsigned long usp = usp_thread_fn ?
>  			usp_thread_fn : regs->areg[1];
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index c1a927ddec64..b1027e916be4 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1665,6 +1665,7 @@ extern struct pid *cad_pid;
>  #define PF_VCPU			0x00000001	/* I'm a virtual CPU */
>  #define PF_IDLE			0x00000002	/* I am an IDLE thread */
>  #define PF_EXITING		0x00000004	/* Getting shut down */
> +#define PF_USER_WORKER		0x00000008	/* Kernel thread cloned from userspace thread */
>  #define PF_IO_WORKER		0x00000010	/* Task is an IO worker */
>  #define PF_WQ_WORKER		0x00000020	/* I'm a workqueue worker */
>  #define PF_FORKNOEXEC		0x00000040	/* Forked but didn't exec */
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index 48417c735438..53599a99d7e0 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -19,6 +19,7 @@ struct css_set;
>  #define CLONE_LEGACY_FLAGS 0xffffffffULL
>  
>  #define KERN_WORKER_IO		BIT(0)
> +#define KERN_WORKER_USER	BIT(1)
>  
>  struct kernel_clone_args {
>  	u64 flags;
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 3988106e9609..4f780424de46 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2035,6 +2035,8 @@ static __latent_entropy struct task_struct *copy_process(
>  		siginitsetinv(&p->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
>  	}
>  
> +	if (args->worker_flags & KERN_WORKER_USER)
> +		p->flags |= PF_USER_WORKER;
>  	/*
>  	 * This _must_ happen before we call free_task(), i.e. before we jump
>  	 * to any of the bad_fork_* labels. This is to avoid freeing
> @@ -2526,7 +2528,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
>  		.exit_signal	= (lower_32_bits(flags) & CSIGNAL),
>  		.stack		= (unsigned long)fn,
>  		.stack_size	= (unsigned long)arg,
> -		.worker_flags	= KERN_WORKER_IO,
> +		.worker_flags	= KERN_WORKER_IO | KERN_WORKER_USER,
>  	};
>  
>  	return copy_process(NULL, 0, node, &args);
> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH V4 2/8] fork: move PF_IO_WORKER's kernel frame setup to new flag
@ 2021-10-22  9:55     ` Michael S. Tsirkin
  0 siblings, 0 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2021-10-22  9:55 UTC (permalink / raw)
  To: Mike Christie
  Cc: axboe, hdanton, linux-kernel, virtualization, hch, vverma, geert,
	stefanha, christian.brauner

On Thu, Oct 07, 2021 at 04:44:42PM -0500, Mike Christie wrote:
> The vhost worker threads need the same frame setup as io_uring's worker
> threads, but handle signals differently and do not need the same
> scheduling behavior. This patch separate's the frame setup parts of
> PF_IO_WORKER into a new PF flag PF_USER_WORKER.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  arch/alpha/kernel/process.c      | 2 +-
>  arch/arc/kernel/process.c        | 2 +-
>  arch/arm/kernel/process.c        | 2 +-
>  arch/arm64/kernel/process.c      | 2 +-
>  arch/csky/kernel/process.c       | 2 +-
>  arch/h8300/kernel/process.c      | 2 +-
>  arch/hexagon/kernel/process.c    | 2 +-
>  arch/ia64/kernel/process.c       | 2 +-
>  arch/m68k/kernel/process.c       | 2 +-
>  arch/microblaze/kernel/process.c | 2 +-
>  arch/mips/kernel/process.c       | 2 +-
>  arch/nds32/kernel/process.c      | 2 +-
>  arch/nios2/kernel/process.c      | 2 +-
>  arch/openrisc/kernel/process.c   | 2 +-
>  arch/parisc/kernel/process.c     | 2 +-
>  arch/powerpc/kernel/process.c    | 2 +-
>  arch/riscv/kernel/process.c      | 2 +-
>  arch/s390/kernel/process.c       | 2 +-
>  arch/sh/kernel/process_32.c      | 2 +-
>  arch/sparc/kernel/process_32.c   | 2 +-
>  arch/sparc/kernel/process_64.c   | 2 +-
>  arch/um/kernel/process.c         | 2 +-
>  arch/x86/kernel/process.c        | 2 +-
>  arch/xtensa/kernel/process.c     | 2 +-
>  include/linux/sched.h            | 1 +
>  include/linux/sched/task.h       | 1 +
>  kernel/fork.c                    | 4 +++-
>  27 files changed, 29 insertions(+), 25 deletions(-)


For something that's touching include/linux/sched.h
and all arches at once, this has not been CC'd widely enough.


> diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
> index a5123ea426ce..e350fff2ea14 100644
> --- a/arch/alpha/kernel/process.c
> +++ b/arch/alpha/kernel/process.c
> @@ -249,7 +249,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>  	childti->pcb.ksp = (unsigned long) childstack;
>  	childti->pcb.flags = 1;	/* set FEN, clear everything else */
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* kernel thread */
>  		memset(childstack, 0,
>  			sizeof(struct switch_stack) + sizeof(struct pt_regs));
> diff --git a/arch/arc/kernel/process.c b/arch/arc/kernel/process.c
> index 3793876f42d9..c3f4952cce17 100644
> --- a/arch/arc/kernel/process.c
> +++ b/arch/arc/kernel/process.c
> @@ -191,7 +191,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>  	childksp[0] = 0;			/* fp */
>  	childksp[1] = (unsigned long)ret_from_fork; /* blink */
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(c_regs, 0, sizeof(struct pt_regs));
>  
>  		c_callee->r13 = kthread_arg;
> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> index 0e2d3051741e..449c9db3942a 100644
> --- a/arch/arm/kernel/process.c
> +++ b/arch/arm/kernel/process.c
> @@ -247,7 +247,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
>  	thread->cpu_domain = get_domain();
>  #endif
>  
> -	if (likely(!(p->flags & (PF_KTHREAD | PF_IO_WORKER)))) {
> +	if (likely(!(p->flags & (PF_KTHREAD | PF_USER_WORKER)))) {
>  		*childregs = *current_pt_regs();
>  		childregs->ARM_r0 = 0;
>  		if (stack_start)
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 40adb8cdbf5a..e2fe88a3ae90 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -333,7 +333,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
>  
>  	ptrauth_thread_init_kernel(p);
>  
> -	if (likely(!(p->flags & (PF_KTHREAD | PF_IO_WORKER)))) {
> +	if (likely(!(p->flags & (PF_KTHREAD | PF_USER_WORKER)))) {
>  		*childregs = *current_pt_regs();
>  		childregs->regs[0] = 0;
>  
> diff --git a/arch/csky/kernel/process.c b/arch/csky/kernel/process.c
> index 3d0ca22cd0e2..509f2bfe4ace 100644
> --- a/arch/csky/kernel/process.c
> +++ b/arch/csky/kernel/process.c
> @@ -49,7 +49,7 @@ int copy_thread(unsigned long clone_flags,
>  	/* setup thread.sp for switch_to !!! */
>  	p->thread.sp = (unsigned long)childstack;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		childstack->r15 = (unsigned long) ret_from_kernel_thread;
>  		childstack->r10 = kthread_arg;
> diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
> index 2ac27e4248a4..11baf058b6c5 100644
> --- a/arch/h8300/kernel/process.c
> +++ b/arch/h8300/kernel/process.c
> @@ -112,7 +112,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>  
>  	childregs = (struct pt_regs *) (THREAD_SIZE + task_stack_page(p)) - 1;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		childregs->retpc = (unsigned long) ret_from_kernel_thread;
>  		childregs->er4 = topstk; /* arg */
> diff --git a/arch/hexagon/kernel/process.c b/arch/hexagon/kernel/process.c
> index 6a6835fb4242..f17573b66303 100644
> --- a/arch/hexagon/kernel/process.c
> +++ b/arch/hexagon/kernel/process.c
> @@ -73,7 +73,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  						    sizeof(*ss));
>  	ss->lr = (unsigned long)ret_from_fork;
>  	p->thread.switch_sp = ss;
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		/* r24 <- fn, r25 <- arg */
>  		ss->r24 = usp;
> diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
> index e56d63f4abf9..4a58daa56af4 100644
> --- a/arch/ia64/kernel/process.c
> +++ b/arch/ia64/kernel/process.c
> @@ -338,7 +338,7 @@ copy_thread(unsigned long clone_flags, unsigned long user_stack_base,
>  
>  	ia64_drop_fpu(p);	/* don't pick up stale state from a CPU's fph */
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		if (unlikely(!user_stack_base)) {
>  			/* fork_idle() called us */
>  			return 0;
> diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c
> index 1ab692b952cd..e7474a118410 100644
> --- a/arch/m68k/kernel/process.c
> +++ b/arch/m68k/kernel/process.c
> @@ -157,7 +157,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  	 */
>  	p->thread.fc = USER_DATA;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* kernel thread */
>  		memset(frame, 0, sizeof(struct fork_frame));
>  		frame->regs.sr = PS_S;
> diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
> index 62aa237180b6..5b543be324d4 100644
> --- a/arch/microblaze/kernel/process.c
> +++ b/arch/microblaze/kernel/process.c
> @@ -59,7 +59,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  	struct pt_regs *childregs = task_pt_regs(p);
>  	struct thread_info *ti = task_thread_info(p);
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* if we're creating a new kernel thread then just zeroing all
>  		 * the registers. That's OK for a brand new thread.*/
>  		memset(childregs, 0, sizeof(struct pt_regs));
> diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
> index 95aa86fa6077..d9ca11dd544f 100644
> --- a/arch/mips/kernel/process.c
> +++ b/arch/mips/kernel/process.c
> @@ -120,7 +120,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>  	/*  Put the stack after the struct pt_regs.  */
>  	childksp = (unsigned long) childregs;
>  	p->thread.cp0_status = (read_c0_status() & ~(ST0_CU2|ST0_CU1)) | ST0_KERNEL_CUMASK;
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* kernel thread */
>  		unsigned long status = p->thread.cp0_status;
>  		memset(childregs, 0, sizeof(struct pt_regs));
> diff --git a/arch/nds32/kernel/process.c b/arch/nds32/kernel/process.c
> index 391895b54d13..2dba51d1889c 100644
> --- a/arch/nds32/kernel/process.c
> +++ b/arch/nds32/kernel/process.c
> @@ -156,7 +156,7 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
>  
>  	memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context));
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		/* kernel thread fn */
>  		p->thread.cpu_context.r6 = stack_start;
> diff --git a/arch/nios2/kernel/process.c b/arch/nios2/kernel/process.c
> index 9ff37ba2bb60..ce6ad177da15 100644
> --- a/arch/nios2/kernel/process.c
> +++ b/arch/nios2/kernel/process.c
> @@ -109,7 +109,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  	struct switch_stack *childstack =
>  		((struct switch_stack *)childregs) - 1;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(childstack, 0,
>  			sizeof(struct switch_stack) + sizeof(struct pt_regs));
>  
> diff --git a/arch/openrisc/kernel/process.c b/arch/openrisc/kernel/process.c
> index b0698d9ce14f..d1d189c16676 100644
> --- a/arch/openrisc/kernel/process.c
> +++ b/arch/openrisc/kernel/process.c
> @@ -172,7 +172,7 @@ copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  	sp -= sizeof(struct pt_regs);
>  	kregs = (struct pt_regs *)sp;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(kregs, 0, sizeof(struct pt_regs));
>  		kregs->gpr[20] = usp; /* fn, kernel thread */
>  		kregs->gpr[22] = arg;
> diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
> index 38ec4ae81239..257bec7e67d4 100644
> --- a/arch/parisc/kernel/process.c
> +++ b/arch/parisc/kernel/process.c
> @@ -197,7 +197,7 @@ copy_thread(unsigned long clone_flags, unsigned long usp,
>  	extern void * const ret_from_kernel_thread;
>  	extern void * const child_return;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* kernel thread */
>  		memset(cregs, 0, sizeof(struct pt_regs));
>  		if (!usp) /* idle thread */
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 50436b52c213..817847723bff 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1700,7 +1700,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>  	/* Copy registers */
>  	sp -= sizeof(struct pt_regs);
>  	childregs = (struct pt_regs *) sp;
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* kernel thread */
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		childregs->gpr[1] = sp + sizeof(struct pt_regs);
> diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
> index 03ac3aa611f5..8deeb94eb51e 100644
> --- a/arch/riscv/kernel/process.c
> +++ b/arch/riscv/kernel/process.c
> @@ -125,7 +125,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  	struct pt_regs *childregs = task_pt_regs(p);
>  
>  	/* p->thread holds context to be restored by __switch_to() */
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* Kernel thread */
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		childregs->gp = gp_in_global;
> diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
> index 350e94d0cac2..f596843ab55c 100644
> --- a/arch/s390/kernel/process.c
> +++ b/arch/s390/kernel/process.c
> @@ -130,7 +130,7 @@ int copy_thread(unsigned long clone_flags, unsigned long new_stackp,
>  	frame->sf.gprs[9] = (unsigned long)frame;
>  
>  	/* Store access registers to kernel stack of new process. */
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		/* kernel thread */
>  		memset(&frame->childregs, 0, sizeof(struct pt_regs));
>  		frame->childregs.psw.mask = PSW_KERNEL_BITS | PSW_MASK_DAT |
> diff --git a/arch/sh/kernel/process_32.c b/arch/sh/kernel/process_32.c
> index 717de05c81f4..e74906f53c3e 100644
> --- a/arch/sh/kernel/process_32.c
> +++ b/arch/sh/kernel/process_32.c
> @@ -114,7 +114,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp, unsigned long arg,
>  
>  	childregs = task_pt_regs(p);
>  	p->thread.sp = (unsigned long) childregs;
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(childregs, 0, sizeof(struct pt_regs));
>  		p->thread.pc = (unsigned long) ret_from_kernel_thread;
>  		childregs->regs[4] = arg;
> diff --git a/arch/sparc/kernel/process_32.c b/arch/sparc/kernel/process_32.c
> index bbbe0cfef746..978e0bc10ad4 100644
> --- a/arch/sparc/kernel/process_32.c
> +++ b/arch/sparc/kernel/process_32.c
> @@ -296,7 +296,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
>  	ti->ksp = (unsigned long) new_stack;
>  	p->thread.kregs = childregs;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		extern int nwindows;
>  		unsigned long psr;
>  		memset(new_stack, 0, STACKFRAME_SZ + TRACEREG_SZ);
> diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
> index d1cc410d2f64..1c45cd5089f4 100644
> --- a/arch/sparc/kernel/process_64.c
> +++ b/arch/sparc/kernel/process_64.c
> @@ -594,7 +594,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
>  				       sizeof(struct sparc_stackf));
>  	t->fpsaved[0] = 0;
>  
> -	if (unlikely(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (unlikely(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		memset(child_trap_frame, 0, child_stack_sz);
>  		__thread_flag_byte_ptr(t)[TI_FLAG_BYTE_CWP] = 
>  			(current_pt_regs()->tstate + 1) & TSTATE_CWP;
> diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
> index 457a38db368b..2bc3141cbf01 100644
> --- a/arch/um/kernel/process.c
> +++ b/arch/um/kernel/process.c
> @@ -157,7 +157,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp,
>  		unsigned long arg, struct task_struct * p, unsigned long tls)
>  {
>  	void (*handler)(void);
> -	int kthread = current->flags & (PF_KTHREAD | PF_IO_WORKER);
> +	int kthread = current->flags & (PF_KTHREAD | PF_USER_WORKER);
>  	int ret = 0;
>  
>  	p->thread = (struct thread_struct) INIT_THREAD;
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 1d9463e3096b..d88be9dd5dfd 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -178,7 +178,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
>  	task_user_gs(p) = get_user_gs(current_pt_regs());
>  #endif
>  
> -	if (unlikely(p->flags & PF_IO_WORKER)) {
> +	if (unlikely(p->flags & PF_USER_WORKER)) {
>  		/*
>  		 * An IO thread is a user space thread, but it doesn't
>  		 * return to ret_after_fork().
> diff --git a/arch/xtensa/kernel/process.c b/arch/xtensa/kernel/process.c
> index 060165340612..61ad0bfbd7ea 100644
> --- a/arch/xtensa/kernel/process.c
> +++ b/arch/xtensa/kernel/process.c
> @@ -217,7 +217,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp_thread_fn,
>  
>  	p->thread.sp = (unsigned long)childregs;
>  
> -	if (!(p->flags & (PF_KTHREAD | PF_IO_WORKER))) {
> +	if (!(p->flags & (PF_KTHREAD | PF_USER_WORKER))) {
>  		struct pt_regs *regs = current_pt_regs();
>  		unsigned long usp = usp_thread_fn ?
>  			usp_thread_fn : regs->areg[1];
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index c1a927ddec64..b1027e916be4 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1665,6 +1665,7 @@ extern struct pid *cad_pid;
>  #define PF_VCPU			0x00000001	/* I'm a virtual CPU */
>  #define PF_IDLE			0x00000002	/* I am an IDLE thread */
>  #define PF_EXITING		0x00000004	/* Getting shut down */
> +#define PF_USER_WORKER		0x00000008	/* Kernel thread cloned from userspace thread */
>  #define PF_IO_WORKER		0x00000010	/* Task is an IO worker */
>  #define PF_WQ_WORKER		0x00000020	/* I'm a workqueue worker */
>  #define PF_FORKNOEXEC		0x00000040	/* Forked but didn't exec */
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index 48417c735438..53599a99d7e0 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -19,6 +19,7 @@ struct css_set;
>  #define CLONE_LEGACY_FLAGS 0xffffffffULL
>  
>  #define KERN_WORKER_IO		BIT(0)
> +#define KERN_WORKER_USER	BIT(1)
>  
>  struct kernel_clone_args {
>  	u64 flags;
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 3988106e9609..4f780424de46 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2035,6 +2035,8 @@ static __latent_entropy struct task_struct *copy_process(
>  		siginitsetinv(&p->blocked, sigmask(SIGKILL)|sigmask(SIGSTOP));
>  	}
>  
> +	if (args->worker_flags & KERN_WORKER_USER)
> +		p->flags |= PF_USER_WORKER;
>  	/*
>  	 * This _must_ happen before we call free_task(), i.e. before we jump
>  	 * to any of the bad_fork_* labels. This is to avoid freeing
> @@ -2526,7 +2528,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
>  		.exit_signal	= (lower_32_bits(flags) & CSIGNAL),
>  		.stack		= (unsigned long)fn,
>  		.stack_size	= (unsigned long)arg,
> -		.worker_flags	= KERN_WORKER_IO,
> +		.worker_flags	= KERN_WORKER_IO | KERN_WORKER_USER,
>  	};
>  
>  	return copy_process(NULL, 0, node, &args);
> -- 
> 2.25.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-10-22  9:55 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-07 21:44 [PATCH V4 0/8] Use copy_process/create_io_thread in vhost layer Mike Christie
2021-10-07 21:44 ` Mike Christie
2021-10-07 21:44 ` [PATCH V4 1/8] fork: Make IO worker options flag based Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-07 21:44 ` [PATCH V4 2/8] fork: move PF_IO_WORKER's kernel frame setup to new flag Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-08  8:21   ` Geert Uytterhoeven
2021-10-08  8:21     ` Geert Uytterhoeven
2021-10-22  9:55   ` Michael S. Tsirkin
2021-10-22  9:55     ` Michael S. Tsirkin
2021-10-07 21:44 ` [PATCH V4 3/8] fork: add option to not clone or dup files Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-07 21:44 ` [PATCH V4 4/8] fork: Add KERNEL_WORKER flag to ignore signals Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-07 21:44 ` [PATCH V4 5/8] fork: add helper to clone a process Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-07 21:44 ` [PATCH V4 6/8] io_uring: switch to kernel_worker Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-08  6:42   ` kernel test robot
2021-10-08  6:42     ` kernel test robot
2021-10-08  6:42     ` kernel test robot
2021-10-08  7:10   ` kernel test robot
2021-10-08  7:10     ` kernel test robot
2021-10-07 21:44 ` [PATCH V4 7/8] vhost: move worker thread fields to new struct Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-07 21:44 ` [PATCH V4 8/8] vhost: use kernel_worker to check RLIMITs and inherit v2 cgroups Mike Christie
2021-10-07 21:44   ` Mike Christie
2021-10-12  6:43 ` [PATCH V4 0/8] Use copy_process/create_io_thread in vhost layer Christoph Hellwig
2021-10-12  6:43   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.